Additive Covariogram Models and Estimation through Projections · 2002. 10. 10. · the variogram...

Additive Covariogram Models and

Estimation through Projections

Miro Powojowski Christian Leger∗†

CRM-2870

October 2002

∗Supported by a grant of NSERC, Canada and by the Centre de recherches mathematiques†Departement de mathematiques et de statistique, Universite de Montreal, C.P. 6128, succursale Centre-ville, Montreal (Quebec) H3C

3J7, Canada; [email protected]

Abstract

The paper considers the problem of estimating the covariogram of a stochastic process. Additive co-variance models are used, but unlike in previous work by other authors, their estimation is based onprojections in the inner product space of sufficiently regular functions. The method can easily accom-modate the estimation of the mean of the process from the data through a linear regression model orthe inclusion of a nugget effect in the additive covariance model. Asymptotic properties of the resultingestimators are worked out, without explicit assumptions about the functional form of model componentsor that of the true covariogram. Expressions for the bias of the estimator in misspecified models (it isunbiased if the models for the mean and the covariance are correctly specified), expressions for the esti-mator’s variance in the normal case and bounds for variance of the estimator under relaxed assumptionsare derived. Both in-fill asymptotics and expanding-domain asymptotics are considered, the latter underthe more restrictive assumption that the process is isotropic. A new definition of in-fill asymptotics basedon the notion of discrepancy is introduced. Non-stationary covariance structures can be accommodatedby the projection estimator and the in-fill asymptotic results hold. The techniques are applied to a dataset of Davis (1973) and illustrate some of the advantages over the traditional methods.

Mathematrical Subject Classification. Primary 62M30; Secondary 62G05, 62P12

Keywords and Phrases. Geostatistics, covariance estimation, stationary processes, isotropic processes,in-fill asymptotics, expanding-domain asymptotics, low-discrepancy sets, star discrepancy, Hardy-Krausevariation

1 Introduction.

For a random process Y (x), x ∈ D, where D is a subset of a d-dimensional Euclidean space, the covariogram isdefined as C(x1, x2) = cov(Y (x1), Y (x2)), the semivariogram is defined as γ(x1, x2) = (1/2) var(Y (x1)−Y (x2)), andthe variogram is defined as 2γ. These definitions do not require the process to be stationary. For a second-orderstationary process, the two are related through γ(x1, x2) = C(0, 0)− C(x1, x2) (Cressie, 1993). A common problemin geostatistics is one of estimating the functions C and γ based on one realisation of the process Y observed at afinite number of locations x1, x2, . . . , xn in D. It is important to note that the knowledge of function values C(x1, x2)for arbitrary (x1, x2) ∈ D2 is required, and not simply the covariances of Y at lags observed in the sample. Thefact of observing only one realisation forces one to make certain assumptions about the process Y , which translateinto restrictions on the form of C and γ. There also exist theoretical reasons for restricting the function familiesconsidered. The covariogram has to be a positive definite function, whereas the variogram has to be conditionallynegative definite. Further restrictions may be desirable. The process Y may be assumed second-order stationary, oreven isotropic, requiring C(x1, x2) and γ(x1, x2) to depend only on x1 − x2 or its length, respectively.

In a typical covariogram estimation problem it is supposed that the observed process Y follows the model

Y = Xβ + η.

The known regressor X usually contains terms corresponding to the mean of the process and any trend that ismodelled, while the parameter β is unknown and the random term η is assumed to have zero mean and a covariogramCY . In practice, the covariogram CY is modelled by a covariance function Cθ, known up to the value of a finite-dimensional vector θ, to be estimated. Different methods have been proposed.

A rather exhaustive discussion of the traditional methods of covariogram and variogram estimation is containedin Cressie (1993). Two broad classes of methods can be distinguished: methods requiring parametric distributionalassumptions concerning the underlying process, such as ML or REML methods, and methods which avoid makingsuch precise parametric hypotheses. Most of the methods of the latter category are based on parametric curvesfrom some valid family of (co)variogram functions (without making distributional assumptions). These families areusually chosen by convenience, not necessarily because it is believed that the true (co)variogram is a member of thisclass. The estimation of the parameters of the (co)variogram is done by fitting the curve to the so-called empiricalvariogram or covariogram.

Covariance function estimation based on empirical (co)variogram estimation suffers from a number of drawbacks.The empirical (co)variogram are based on averages of observations which are about the same distance apart. Thisusually requires some arbitrary binning of the observations and is sometimes difficult to perform if the number ofobservations is low or the process is not isotropic. In fact, the empirical covariogram is meaningless if the observedprocess is non-stationary while the empirical variogram, which requires intrinsic stationarity, is sensitive to departuresfrom it, see e.g., Cressie (1993). Finally, the fitting procedure is usually difficult to assess from a statistical point ofview. Most known theoretical results (whose comprehensive summary may be found in Cressie, 1993), are concernedonly with the properties of the empirical (co)variogram and not those of the fitted (co)variogram function. It appearsthat the problem of obtaining the properties of the fitted (co)variogram function from the empirical (co)variogramhas not been extensively studied. The difficulty lies with the fitting of the curve which is often done through ordinaryor weighted least squares and usually requires optimisation of non-linear and non-quadratic functions.

An alternative approach is based on the assumption that the covariance function is additive:

(1) Cθ =q∑i=1

θ(i)Ci,

where the components Ci are fully specified valid covariance functions and the only parameters to be estimated arethe θ(i). Such a model has been considered for instance by Kitanidis (1985), Stein (1987, 1989), and in the case ofan isotropic covariance function by Shapiro & Botha (1991). The first two authors have considered the MINQUE(minimum norm quadratic unbiased estimator) method of estimation for the parameters θ(i), from practical andtheoretical points of view, respectively. The MINQUE method was originally designed for the purpose of estimationof variance components and has been studied quite extensively in the context of analysis of variance, see Rao andKleffe (1988). This method leads to a family of estimators as it is a function of a first guess of the covariance functionCY , say C0. The practical usefulness of the estimator depends on the closeness of C0 to CY . Moreover, existingtheoretical results establishing properties of MINQUE estimators impose certain assumptions on the relationshipbetween the true covariance function CY and the initial guess C0 (in some sense, the two have to be close — fordetails the reader is referred to Rao & Kleffe, 1988 and Stein, 1989).

In this paper, we introduce an alternative method of estimation of the parameters in the model (1). Let KY

be the n × n matrix made up of the elements CY (xi, xj) and let Kl be the corresponding matrix based on the

1

covariance function Cl. For now, assume that the mean of Y is 0. Note that E(Y Y ′) = KY . We study the estimatordefined as the orthogonal projection of Y Y ′ onto the linear space of symmetric matrices of size n× n over the fieldof real numbers based on an inner product on that space. Because of this algebraic structure, it is possible to derivemany interesting properties of this estimator. For instance, if CY is of the form (1), it is automatically unbiased,otherwise it is unbiased for the closest member to CY in that class. Interestingly, the estimator is a special case ofthe MINQUE estimator, see Section 2.3. Unfortunately, most of the theoretical results of the MINQUE estimatorsdo not apply for this special case. Moreover, in the practice of geostatistics, the hypotheses necessary for theseresults are difficult to establish. The MINQUE estimator also requires the inversion of an n× n matrix whereas ourestimator only requires the inversion of a q × q matrix. Unlike the estimator based on fitting a parametric curve tothe empirical (co)variogram, we obtain theoretical results directly for the covariogram estimator. For instance, theprojection-based estimation yields the mean and variance expressions for the parameters of the estimated covariancecurve. It also makes sense even for non-stationary processes.

One may question the practical applicability of the class of covariance functions (1) which may seem just asarbitrary as fitting a parametric curve since the q functions Ci must be known. But in the isotropic case, Powojowski(2002b) shows that it is possible to find functions Ci and parameters θ(i), i = 1, . . . , q for q large enough such thatthe difference between the true covariance function CY and

∑ni=1 Ci is as small as needed and Powojowski (2002a)

introduces a practical method of choosing the functions Ci and the order q while using the projection estimatorintroduced in this paper to estimate the parameters θ(i).

>From a technical point of view, we introduce what we believe to be a useful, new definition of in-fill asymptotics.Borrowing from research in low-discrepancy sequences, the definition characterises in-fill sequences indirectly, assequences whose discrepancy converges to zero. It turns out that only the rate of this convergence, and not thesequence itself, enters the proofs of the results, via an application of the Koksma-Hlawka inequality. We believe thatthe characterisation of in-fill asymptotics by discrepancy measures can be a valuable theoretical tool. It would appearthat discrepancy measures may also be useful criteria for the evaluation of sampling configurations in practice.

The paper is organised as follows. In Section 2 , we define the projection estimator after introducing the notationand summarising some notions of projections in inner product spaces. Section 3 contains asymptotic analyses of theestimator. First, an asymptotic in-fill setting is considered, where more and more observations of the process Y arecollected on a finite domain. It is shown that in general it is impossible to estimate the covariogram consistently basedon observations from a finite domain, but an upper bound for the asymptotic variance is obtained. The treatmentof the so-called nugget effect is also studied. Note that none of the results to that point require that the processbe second-order stationary. Subsequently, it will be assumed that the process is isotropic and we will show thatas the size of the domain increases indefinitely, the upper bound for the variance of the estimator derived earlierconverges to zero. Theorem 3.6 combines the two concepts of asymptotics and shows how we can obtain a convergentestimator by sampling an expanding region with increasing density. This last result is particularly important froma practical point of view. Finally, an application of the projection estimator is illustrated with a data set of Davis(1973). Technical details are deferred to the appendix.

2 Covariogram estimation through projections

This section describes the notation, reviews standard notions of inner product spaces and introduces the model andthe estimator considered in the remainder of the paper, as well as some extensions of the estimator.

2.1 Notation

To avoid confusion which might arise due to the frequent occurrence of multiple subscripts, the following notationwill be used throughout the paper. Given a set of scalars A(i, j), 1 ≤ i ≤ n, 1 ≤ j ≤ m the notation [A(i, j)] or Awill denote the n×m matrix whose (i, j)-th entry is A(i, j). This notation will be used only in situations where thescalars A(i, j) and the ranges for i and j are clearly defined. Similarly, given a set of scalars B(i), 1 ≤ i ≤ n, [B(i)]or B will denote a (column) vector whose i-th entry is B(i). On the other hand, Ai,j and Bi may denote a matrixand vector from some (doubly) indexed set of matrices or vectors, respectively.

In the most general setting, one considers a random process Y on the domain D, a subset of a d-dimensionalEuclidean space. The process Y is observed at n locations {xi}ni=1, xi ∈ D. Let Yn = (Y (x1), . . . , Y (xn))′ andYn(i) = Y (xi), 1 ≤ i ≤ n. It will be further assumed that

(2) Yn = Xnβ + ηn

with E[ηn] = 0. It will be assumed that Xn has p columns corresponding to different regression terms. ThusXn(l, k) = uk(xl), 1 ≤ k ≤ p, 1 ≤ l ≤ n, where xl is the l-th location in the sample and uk is a continuous function

2

defined on D and it is the k-th regression term in the mean of Y . If present, the term u1 ≡ 1 corresponds to the(non-zero) constant term in the mean of Y . The term uk(xl) = xl(1) would correspond to a linear trend in the meanof Y (x) in the direction of the first component of x. The matrix Xn will always be known, while the p× 1 vector βmay have to be estimated. The function CY (x1, x2) = cov(Y (x1), Y (x2)) = cov(η(x1), η(x2)) is called the covariancefunction of the process Y (and of the zero-mean process η). Let KY,n = var(Yn). Thus KY,n is a symmetric matrixwhose entries are KY,n(i, j) = CY (xi, xj). If Cθ is a given covariance function model, one defines the symmetricmatrix Kθ,n in a similar way, by putting Kθ,n(i, j) = Cθ(xi, xj). Thus Kθ,n is a fixed matrix depending only on themodel Cθ and on the set of locations {xi}ni=1, xi ∈ D.

The model Cθ will always be assumed to be additive, that is of the form (1). Throughout the paper the componentsCi as well as CY will be assumed continuous. For convenience it will be assumed that Ci(0, 0) ≤ 1.

In Section 3.2.2 a discontinuous component W will be introduced, which will result in (possibly discontinuous)models of the form C(γ,θ) = γW +

∑qi=1 θ(i)Ci. The difference in notation is meant to emphasise the different nature

of the functions involved. In all sections preceding 3.3 no stationarity or isotropy assumptions are made about theprocesses η. In Section 3.3 and the remainder of the paper the process η will be assumed isotropic (hence in particularsecond-order stationary). Thus E[η(x)] = 0 for all x ∈ D and the covariance function CY of η (and Y ) depends onlyon ρ = ‖x1 − x2‖. It will then be convenient to introduce explicitly isotropic versions of CY , Ci and Cθ defined byCY (ρ) = CY (x1, x2), Ci(ρ) = Ci(x1, x2) and Cθ(ρ) = Cθ(x1, x2).

Given a model Cθ of the form (1), it will be said that the true covariance function CY is in the span of Cθ (or,equivalently, in the span of the components Ci, 1 ≤ i ≤ q) if and only if there exists a vector θY such that

(3) CY =q∑i=1

θY (i)Ci.

If A is a symmetric matrix, the shorthands A > 0 and A ≥ 0 will mean that A is positive definite and semipositivedefinite, respectively. Similarly if B is another symmetric matrix of the same size as A, A > B and A ≥ B will meanA−B > 0 and A−B ≥ 0, respectively.

2.2 Orthogonal projections and estimation with additive models

The goal of this section is to summarise well-known relationships between orthogonal projections in inner productspaces and linear estimation. Let Vn be the linear space of symmetric matrices of size n × n over the field ofreal numbers. Let K1, . . . ,Kq be fixed, linearly independent elements of Vn with q ≤ dim(Vn) = n(n + 1)/2.Furthermore, one considers the vector subspace span(K1, . . . ,Kq) of Vn — the space of linear combinations of theelements K1, . . . ,Kq. Let

(4) 〈K,J〉V , K ∈ Vn, J ∈ Vn

be any inner product defined on the vector space Vn, thus making it into an inner product space. The inner product(4) gives rise to a norm on Vn defined by

(5) ‖K − J‖V = 〈K − J,K − J〉1/2V .

Let P (J) denote the orthogonal projection of J onto the subspace span(K1, . . . ,Kq). Thus P is a linear transformationsatisfying

(6) 〈Ki, J − P (J)〉V = 0, i = 1, . . . , q.

Since P (J) ∈ span(K1, . . . ,Kq), one can write P (J) =∑qi=1 θJ(i)Ki. Together with (6) one obtains

(7)q∑j=1

〈Ki,Kj〉V θJ(j) = 〈Ki, J〉V , i = 1, . . . , q

or, in matrix form

(8) [〈Ki,Kj〉V ] θJ = [〈Ki, J〉V ].

It is easy to see that the matrix [〈Ki,Kj〉V ] is invertible under the assumption of linear independence of K1, . . . ,Kq.This implies that the equation (8) has exactly one solution, given by

(9) θJ = [〈Ki,Kj〉V ]−1[〈Ki, J〉V ].

3

If Y is an n-dimensional random variable with E[Y ] = 0 and var(Y ) = E[Y Y ′] = KY , let the subset SY of Vn bedefined by SY = {Y Y ′, Y ∈ Rn}. The random process Y gives rise to a probability measure on SY . Therefore,

(10) θ = [〈Ki,Kj〉V ]−1[〈Ki, Y Y′〉V ]

is a random variable. One has

(11) E[θ] = [〈Ki,Kj〉V ]−1[〈Ki,E[Y Y ′]〉V ] = [〈Ki,Kj〉V ]−1[〈Ki,KY 〉V ].

Therefore by the uniqueness of the solution of (8), one concludes that

(12) P (KY ) =q∑i=1

E[θ(i)]Ki.

On the other hand, if KY ∈ span(K1, . . . ,Kq), then for some vector θY one has

(13) KY =q∑i=1

θY (i)Ki.

By elementary properties of projections, one immediately obtains P (KY ) = KY and, again by the uniqueness of thesolution of (8) it follows that E[θ(i)] = θY (i).

Summing up, it follows that for any choice of inner product 〈., .〉V , the random variable θ of (10) is the vectorminimising ‖Y Y ′ −

∑qi=1 α(i)Ki‖V . The mean vector E[θ] is the vector minimising ‖KY −

∑qi=1 α(i)Ki‖V . Fur-

thermore, if KY is of the form (13), then E[θ] = θY . If on the other hand KY is not of the form (13), the vectorθ = E[θ], given by (11) is still a meaningful parameter, since it defines the orthogonal projection

∑qi=1 θ(i)Ki of KY

onto span(K1, . . . ,Kq).

2.3 The estimator

The goal is to estimate the unknown covariance function of the process Y from the observations Yn. If β in theequation (2) is unknown, it may also be necessary to estimate it, otherwise one can work directly with ηn.

In this sense, knowing β is equivalent to putting X = 0. To motivate the discussion, it is initially assumed thatX = 0 and hence Yn = ηn. One observes that

(14) E[YnY ′n] = KY,n.

Furthermore, valid covariance functions Ci, 1 ≤ i ≤ q are assumed to be fully specified. The functions Ci give riseto the symmetric matrices Ki,n. One considers the class of covariance function models in (1), which results in thecovariance matrix models

(15) Kθ,n =q∑i=1

θ(i)Ki,n,

where it will be assumed that the θ(i) are such that Cθ is a valid covariance function. A member of the class (1) issought which will be in some way closest to the unknown true covariance function CY .

We use the general approach outlined in Section 2.2 with the inner product

(16) 〈A,B〉 = tr(AB′).

The resulting norm ‖A−B‖ = 〈A−B, (A−B)′〉1/2, known as the Frobenius norm, is the square root of the sum ofsquares of elements of A−B which, for a symmetric matrix, equals the sum of squares of its eigenvalues. In variousproofs we will also use the L∞ norm ‖A‖∞ = maxi,j=1,...,n |Ai,j | and the spectral radius norm ‖A‖o = sup‖v‖=1 ‖Av‖where the vector norm is the L2 norm. We can easily show that the three norms are equivalent, i.e.

‖A‖∞ ≤ ‖A‖ ≤ n‖A‖o(17)

‖A‖∞ ≥ 1n2‖A‖ ≥ 1

n2‖A‖o.(18)

But only the norm (16) will be used to estimate θ.

4

The estimator of θ that we propose is the projection of the matrix YnY ′n (whose expectation is KY ) onto the linearspace spanned by the matrices Ki,n. Equivalently, the θn(i) are selected so as to minimise ‖YnY ′n−

∑qi=1 θn(i)Ki,n‖.

The resulting estimator is

(19) θn = [tr(Ki,nKj,n)]−1[tr(Ki,nYnY′n)] = [tr(Ki,nKj,n)]−1[Y ′nKi,nYn]

where θn = (θn(1), . . . , θn(q))′. The expression (19) should be compared to the general form (10). In particular,it follows that if the true covariogram CY is of the form (3) for some θY , then the estimator is unbiased for θY .Otherwise θn is still a meaningful parameter in the sense that

∑qi=1 θn(i)Ki,n is the closest matrix of the form∑q

i=1 αiKi,n to the matrix Y Y ′ where closest is defined in terms of the norm induced by the inner product (16).Similarly, θn = E[θn] defines

∑qi=1 θn(i)Ki,n, which is the closest matrix of the form

∑qi=1 αiKi,n to the matrix KY,n.

In the more general case of unknown β, Yn no longer has mean 0, so one must instead work with the residualsen = Yn − Xnβ = PnYn of the regression model (2), where β is the least-squares estimator of β and Pn = In −Xn(X ′

nXn)−1X ′n is the orthogonal projection to the columns of Xn. Note that E[en] = 0 and that

(20) E[ene′n] = PnKY,nPn = UY,n,

which is a generalisation of (14). Let

(21) Ui,n = PnKi,nPn.

If KY,n satisfies (13), then the norm ‖UY,n −∑qi=1 θY (i)Ui,n‖ = 0. Hence to estimate the θ(i), the matrix ene

′n is

projected onto the linear space spanned by the matrices Ui,n rather than the matrices Ki,n. Equivalently, the θi,nare selected so as to minimise ‖ene′n −

∑qi=1 θi,nUi,n‖. The resulting estimator is

(22) θn = [tr(Ui,nUj,n)]−1[tr(Ui,nene′n)] = [tr(Ui,nUj,n)]−1[e′nUi,nen].

Again, if the true covariogram CY is of the form (3) for some θY , then the estimator (22) is unbiased for θY , byan argument similar to that of Section 2.2. The resulting estimate of the covariance function CY based on theobservations Yn is Cθ =

∑qi=1 θn(i)Ci.

By the linearity in e′nUi,nen, it is easy to see that the mean and variance of θV,n are given by

E[θn] = [tr(Ui,nUj,n)]−1[tr(Ui,nUY,n)] and(23)

var(θn) = [tr(Ui,nUj,n)]−1 var([e′nUi,nen])[tr(Ui,nUj,n)]−1.(24)

Later it will often be useful to make the assumption

(25) supn,An 6=0

var[Y ′nAnYn]tr(AnKY,nAnKY,n)

= c <∞

where the matrices An are symmetric. To ensure that the denominator does not vanish, it will be assumed that KY,n

is nonsingular for all n. In particular, the condition in (25) yields the bound var(Y ′nAnYn) ≤ c tr(AnKY,nAnKY,n).If Y is a Gaussian process, Yn is multinormal and c = 2, in which case (25) holds by the relation var[Y ′nAnYn] =2 tr(AnKY,nAnKY,n).

The estimator θn of (22) is based on the inner product (16) and this suggests a possibility of extending theestimation method to other inner products. In particular, for a fixed n×n symmetric matrix V with positive entriesthe following defines an inner product

(26) 〈A,B〉V = 〈A ∗ V,B〉 = 〈A,B ∗ V 〉 =n∑k=1

n∑l=1

A(k, l)B(k, l)V (k, l).

The resulting estimator may now be expressed as

(27) θV,n = [tr((Ui,n ∗ Vn)Uj,n)]−1[tr((Ui,n ∗ Vn)ene′n)] = [tr((Ui,n ∗ Vn)Uj,n)]−1[e′n(Ui,n ∗ Vn)en].

There may be very good reasons for considering such a modified inner product. For example, in geostatistics thecovariogram estimation is only an intermediate step in some spatial prediction procedure such as kriging. In suchcases it is often more important to estimate the covariogram more accurately at short distances, while inaccuracies

5

at greater distances may not be so important. Thus the matrix V may be defined by V (k, l) = v(‖xk − xl‖) wherev is some non-increasing, positive function of distance. Such extensions were considered in Powojowski (2000).

We mentioned earlier that our estimator is a special case of the MINQUE family of estimators. Letting K0,n bean initial guess of the matrix KY,n (often of the form

∑qi=1 α0(i)Ki,n where α0 is an initial guess of θ), the MINQUE

estimator has the form

θMINQUE = [tr(R′nK−10,nKi,nK

−10,nRnKj,n)]−1[Y ′nR

′nK

−10,nKi,nK

−10,nRnYn]

whereRn = In −Xn(X ′

nK−10,nXn)−1X ′

nK−10,n.

It is easy to see that the estimator (22) is a special case of MINQUE, where K0,n = I, the identity matrix. Notethat the general MINQUE estimator requires inverting the n× n matrix K0,n whereas the projection estimator onlyrequires inverting a q × q matrix, where q is usually much smaller than n. Kitanidis (1985) considers the MINQUEestimator with K0,n = I in a simulation study comparing it to the MINQUE estimator with K0,n =

∑qi=1 α(i)Ki,n for

some fixed values α(i) where the fitted model and the true model were of the form∑qi=1 θ(i)Ki,n. Not surprisingly,

the latter model performed better under those circumstances. However, this estimator can become quite unstable ifthe true model is not of the form

∑qi=1 θ(i)Ki,n (Powojowski, unpublished) and establishing its theoretical properties

in such a case is quite difficult.Finally, as has been pointed out by many authors (for a review, see, for example Rao & Kleffe, 1988), the problem

at hand imposes certain constraints on the values of θ(i) if the resulting estimate is to be a valid covariance function,namely

∑θ(i)Ki,n > 0 is required. In many situations even more severe constraints may be necessary. It may be

required that θ(i) > 0 for all i, and in the covariogram estimation it is necessary that∑θ(i)Ci(xj , xk) be a positive

definite function. When these constraints are imposed, the optimisation may have to be carried out in a convexcone and not the entire vector space. Possible ways of addressing these difficulties include truncated estimators orquadratic optimization with linear constraints. In general, such methods tend to introduce a bias, but they oftenreduce the estimator’s MSE. These concerns are not relevant to the asymptotic results derived in the remainder ofthis paper for the case where (3) holds. Otherwise, a judicious choice of the model components Ci can still guaranteethat the limit is a valid covariance function (see e.g. Powojowski, 2002b). However, to apply the estimator (22) inpractice for a finite sample, one will have to address these issues.

3 Asymptotic results

This section contains the main results of the paper. Firstly, the in-fill asymptotic setting considered throughout thepaper is defined. Assuming in-fill sampling on a finite domain, one obtains an expression for the asymptotic meanand a bound for the asymptotic variance of the projection estimator. Subsequently, the effect of varying the sizeof the domain on which the in-fill sequence is defined on the obtained asymptotic variance bound is investigated.Various extensions and special cases are considered such as the presence of the so-called nugget effect, or the case ofa process with known mean.

3.1 Asymptotic settings

Various asymptotic settings are possible in geostatistics. In all cases it will be assumed that (2) holds. The number ofobservations n will be allowed to tend to infinity. However, additional considerations arise in defining a setting for anasymptotic theory. These have to do with the relative locations of the observations xi in the domain of the process,and the size and shape of the domain itself. Many authors (e.g., Cressie, 1993) have distinguished between two basicasymptotic settings: the so-called in-fill asymptotics, in which the observations are placed within a compact domainD, and the expanding-domain asymptotics, where the observations are spread over an increasing family of domains{Di}i=1,..., Di ⊆ Di+1. Clearly, even this does not fully define the problem. In both settings the observations maybe placed on some regular grid or in any geometric arrangement whatsoever. Each such arrangement generally givesrise to a different model as in (2), even though the underlying process Y is the same.

Various precise definitions of asymptotic settings have been used by many authors. In-fill configurations havebeen considered, among others, by Stein (1987, 1989); Stein & Handcock (1989); Lahiri (1996). Expanding-domainschemes in which the minimal distance between observations remains bounded from below by a positive value havebeen considered by Cressie & Grondona (1992); Cressie (1993) and others. Finally, schemes combining both in-filland expanding-domain properties in a sampling configuration have been considered by Hall & Patil (1994); Lahiriet al. (1999).

6

3.1.1 Sampling point sets and low-discrepancy point sequences

This section reviews some basic notions of low-discrepancy point sequences which will be required in the remainder ofthe paper. A general background on this subject may be found in Niederreiter (1992) and a comprehensive overviewof recent work may be found in Niederreiter & Spanier (2000). Given a set D ⊂ Rd, a sample point set on D is anyfinite set S = {x1, . . . , xn} of points in D. The cardinality of S will be denoted by |S|. Throughout this paper Dwill be a compact, simply connected set.

Definition 3.1 Let ψ ∈ Cd(D) (ψ has d-th order continuous partials on D) and let x0 ∈ D be a fixed point. Thevariation of ψ in the sense of Hardy and Krause is defined as

(28) V ∗D,x0(ψ) =

∑∅6=u⊆{1,...,d}

∫Du

∂|u|∂xuψ(xu, x0)

dxuwhere the notation ψ(xu, x0) means that all components xu(j) where j /∈ u are set to x0(j).

Let V ∗D(ψ) = infx0∈D V∗D,x0

(ψ).

Definition 3.2 The star discrepancy of the sampling point set S is defined as

(29) D∗(S) = ‖Funif (x)− FS(x)‖∞ = supx∈D

µ(1{y∈D:y(j)≤x(j), j=1,...,d})−1|S|

∑s∈S

1{s(j)≤x(j), j=1,...,d}

.Funif denotes the uniform probability measure on D. This measure is clearly absolutely continuous with respect tothe Lebesgue measure and f ≡ 1/µ(D) will denote its Radon-Nikodym derivative — the uniform probability densityfunction on D. The following result will be essential.

Theorem 3.1 (Koksma-Hlawka) For any sampling point set S, any ψ ∈ Cd(D) and any fixed x0 ∈ D the followinginequality holds:

(30) e =∫

D

ψ(x)f(x)dx− 1|S|

∑s∈S

ψ(s) =

∫D

ψ(x) d(Funif (x)− FS(x)) ≤ D∗(S)V ∗D,x0

(ψ)

and hence e ≤ D∗(S)V ∗D(ψ).

Sequences {Sn}∞n=1 of sampling point sets on D with |Sn| = n will be considered and for the sake of brevity thenotation

(31) δn = D∗(Sn)

will be used. The (star) discrepancy δn plays an important role in the rate of convergence of the expected value ofthe projection estimator as we shall soon see. The following result is attributed to Korobov (1959).

Theorem 3.2 For the unit cube D = [0, 1]d there exist sampling point set sequences with discrepancies satisfying

(32) δn < cn−1(log n)d

for some constant c. A sequence {Sn}∞n=1 satisfying (32) is called a low-discrepancy sequence.

The original statement of the theorem contains a method of constructing low-discrepancy sequences. The problemof constructing low-discrepancy sequences on various domains has been studied by many authors. In particular, Fang& Wang (1994) discuss such a construction on an arbitrary compact domain.

3.1.2 Asymptotic setting definitions

Precise meaning will now be given to the notion of in-fill asymptotics used in subsequent discussion. Our in-fillasymptotic context differs from that considered by other authors in that it does not require the observations to beequally spaced (as opposed to, for example, Stein, 1987) but nevertheless specifies a precise limiting coverage (asopposed to Stein, 1989, where the limiting coverage is not explicitly characterised).

7

Definition 3.3 An in-fill sampling domain (D, {Sn}∞n=1) consists of a compact, simply connected domain D ⊂ Rdand a sequence {Sn}∞n=1 of sampling point sets on D.

For each n, Sn = {xn1 , . . . , xnn}, with |Sn| = n and let δn be as in (31). The vector Yn = (Y (xn1 ), . . . , Y (xnn))′ will be

referred to as the sample of size n.Next, in-fill sampling and expanding domains will be combined. One considers a sequence of in-fill sampling

domains. Let an in-fill sampling domain (D, {Sn}∞n=1) be given. Let {rm}∞m=1 be an increasing and unboundedsequence of real numbers with r1 = 1. Let Tm(x) = rmx, x ∈ Rd be the dilation operator. It will be assumed thatthe origin 0 lies in the interior of D, so that the images Tm(D) increase to cover all of Rd. Furthermore, let

Dm = Tm(D)(33)xnm,j = Tm(xnj ), x

nj ∈ Sn, j = 1, . . . , n(34)

Sm,n = {xnm,1, . . . , xnm,n}(35)

Thus each pair (Dm, {Sm,n}∞n=1) is an in-fill sampling domain. The following definition will be useful in discussingsituations where the sampled domain is allowed to expand.

Definition 3.4 The sequence {(Dm, {Sm,n}∞n=1)}∞m=1, where Dm and Sm,n are as in (33) and (35) with D a compactsimply connected set containing 0 as an interior point, will be called a sequence of expanding in-fill domains. Tosimplify notation it will be assumed that µ(D) = 1, where µ is the Lebesgue measure.

The requirement that µ(D) = 1 does not cause any loss of generality, since all asymptotic arguments will deal withthe case where Dm is allowed to increase indefinitely as m increases.

Remark 3.1. With the definitions above, one has

(36) D∗(Sm,n) = D∗(Sn) and hence δm,n = δn for m ≥ 1.

3.2 Asymptotics on a bounded domain

In this section it will be assumed that observations are collected on a fixed bounded domain, while the number ofobservations increases.

3.2.1 Process with unknown mean or a trend (X 6= 0)

This section explores the properties of the projection estimator as an increasing number of observations from anin-fill sampling sequence on a finite domain become available. We are interested in finding the limiting mean of theestimator and in bounding its variance. >From (23),

(37) E[θn] = A−1X,nMX,n,

where the estimator θn is of the form (22) and AX,n and MX,n are defined as follows:

AX,n = (1/n2)[tr(Ui,nUj,n)](38)

MX,n = (1/n2)[tr(Ui,nUY,n)].(39)

Here, UY,n and Ui,n are defined as in (20) and (21). To bound the variance, we introduce

BX,n = (1/n4)[tr(Ui,nUY,nUj,nUY,n)](40)

EX,n = A−1X,n diag(BX,n)A−1

X,n,(41)

where diag(BX,n) is a matrix whose diagonal elements are the same as those of BX,n, while the off-diagonal elementsare zero. To describe the limits of the quantities (38) through (41), we will need to introduce functions φi whichdepend on the functions ui used in defining the matrix of regressors Xn as in Section 2.1. Consider the matrixRn = (1/n)X ′

nXn and let the p× p matrix R be defined by

(42) R(i, j) =∫D

ui(ξ)uj(ξ)f(ξ)dξ,

8

where f ≡ 1/µ(D). It is easily seen that

(43) ‖Rn −R‖ ≤ p‖Rn −R‖∞ ≤ pmaxk1,k2=1,...,p V∗D(uk1(·)uk2(·))δn = kRδn

and if R is invertible then for large enough n the matrix X ′nXn is invertible and by Lemma A.5

(44) ‖(n(X ′nXn)−1)−R−1‖∞ ≤ ‖n(X ′

nXn)−1 −R−1‖ ≤ ‖R−1‖2kRδn.

Therefore, if the observations come from an in-fill sampling domain (D, {Sn}∞n=1), Rn converges to R.Note that the estimator θn of (22) depends on Xn only through Pn, the projection matrix on the space orthogonal

to the columns of Xn. This projection remains unchanged if an invertible linear transformation is applied to the rowsof Xn. In particular, we can transform it using the matrix P such that P ′RP = I. Hence, without loss of generality,we can assume that R = I, so that ‖R‖ = ‖R−1‖ = p. This means that we can assume that

(45)∫D

ui(ξ)uj(ξ)f(ξ)dξ = δi,j ,

where δi,j = 1 if i = j and zero otherwise. In the remainder of this paper, this will indeed be assumed.Let’s now introduce the functions φi’s:

(46) φi(xl1 , xl2) = Ci(xl1 , xl2)−p∑k=1

uk(xl2)∫D

uk(ξ)Ci(ξ, xl1)f(ξ)dξ

−p∑k=1

uk(xl1)∫D


+p∑

k1=1

p∑k2=1

uk1(xl1)uk2(xl1)∫D

∫D

uk1(ξ)uk2(ξ)Ci(ξ, η)f(ξ)f(η)dξdη.

Similarly, one defines φY by replacing Ci by CY in the equation above. These definitions are rather technical innature. Their precise role can be seen in the proofs of the results of this section, but intuitively they arise fromconsidering cov(Y (x1) − Y (x1), Y (x2) − Y (x2)) where Y = Xβ. The terms involving integrals can be associatedwith the covariances of the predictors Y with Y and with themselves. As will be shown in Lemma 3.1, the followingmatrices will be the limits of (38) through (41):

AX(i, j) =∫D

∫D

φi(ξ1, ξ2)φj(ξ2, ξ1)f(ξ1)f(ξ2)dξ1dξ2(47)

MX(i) =∫D

∫D

φi(ξ1, ξ2)φY (ξ2, ξ1)f(ξ1)f(ξ2)dξ1dξ2(48)

BX(i, j) =∫D

∫D

hX,i(ξ, η)hX,j(η, ξ)f(ξ)f(η)dξdη(49)

hX,i(ξ, η) =∫D

φi(ξ, λ)φY (λ, η)f(λ)dλ(50)

and EX = A−1X diag(BX)A−1

X .(51)

Lemma 3.1 For observations coming from an in-fill sampling domain (D, {Sn}∞n=1), with the uk, k = 1, . . . , p con-tinuous and such that R = I, assuming AX is invertible, there exist constants such that

‖AX,n −AX‖∞ ≤ cA,Xδn(52)‖MX,n −MX‖∞ ≤ cM,Xδn(53)‖BX,n −BX‖∞ ≤ cB,Xδn(54)‖EX,n − EX‖∞ ≤ cE,Xδn.(55)

The following result is the main result of this section.

9

Theorem 3.3 Under the model (2), with Xn containing continuous regressor functions and such that the matrix Rof (42) equals the identity, if the observations come from an in-fill sampling domain (D, {Sn}∞n=1), and the matrixAX of (47) is invertible, there is a constant such that the mean of the projection estimator θn defined by (22) satisfies

(56) ‖E[θn]−A−1X MX‖∞ ≤ cθ,Xδn.

If, moreover, (25) holds, the variance of the estimator satisfies

(57) var(θn(i)) ≤ qc (EX(i, i) + cE,Xδn).

Remark 3.2. Note that if the covariance function of the process CY =∑qi=1 θY (i)Ci as in (1), then φY (x1, x2) =∑q

i=1 θY (i)φi(x1, x2) and so it is easy to see that MX(i) =∑qj=1 θY (j)AX(i, j), i.e., MX = AXθY . Hence the

asymptotic limit of the mean of the estimator, A−1X MX , is simply θY , i.e., the estimator is unbiased. This is of course

not surprising since we have already argued that it is unbiased for all n when CY satisfies (1).If on the other hand CY does not satisfy (1), the parameter θ∞ ≡ A−1

X MX can still be defined as the limit ofE[θn] and it has some interesting properties. From earlier discussion and from the discussion in Section 2.2 it followsthat

(58) limn→∞

E[Cθn(x, y)] =

q∑i=1

( limn→∞

E[θn(i)])Ci(x, y) =q∑i=1

θ∞(i)Ci(x, y)

which, when viewed as a function of (x, y), is the orthogonal projection of the function φY (x, y) onto the spacespanned by the functions φi(x, y), where the inner product between two functions φ1, φ2 on D2 is defined by

(59) 〈φ1, φ2〉 =∫D

∫D

φ1(ξ, η)φ2(η, ξ)f(ξ)f(η)dξdη.

This is somewhat comforting, since it means that if the functions Ci are selected so that the space they span issufficiently rich to contain elements close to CY , the obtained estimator’s bias should be small.

Remark 3.3. The theorem shows that the variance of the components of θn remains bounded above by thediagonal entries of the matrix E even as the number of observations increases in this fixed domain. In the case of aGaussian process, we can even be more specific:

(60) var(θn) = 2A−1n BnA

−1n ,

which converges to 2A−1BA−1. This matrix will rarely be zero. Since A is invertible, A−1BA−1 has the same rankas B and it is semipositive definite being a variance matrix. Hence the limiting matrix could only be zero if B = 0.This last condition will generally not hold (see the next remark for the case where the mean is known where it iseasier to see why). So even though (57) is an upper bound, this discussion shows that in general, there will not beconvergence of the variance to zero and therefore θn is not consistent. This should be compared to the results ofMatheron (1965), who shows the impossibility of consistent estimation of the empirical variogram based on completeinformation about the process over a finite domain.

Remark 3.4. Consider the case when the mean of the process is assumed known. This is equivalent to assumingthat the matrix Xn = 0 in (2). Then we can define An, Mn, Bn, and En as in equations (38) through (41) by replacingUi,n by Ki,n and UY,n by KY,n. Their limits A, M , B, E are as in equations (47) through (51) replacing φi(x1, x2)by Ci(x1, x2) and φY (x1, x2) by CY (x1, x2). With these definitions, the conclusions of Lemma 3.1 and Theorem 3.3remain valid. In this case, it is easier to see that the matrix B is unlikely to be 0 (see the previous remark). Forinstance, if covariance functions Ci and CY are everywhere nonnegative, then the function hi(ξ, η) will be positive(unless the support of f is where the covariance functions are zero).

In this case, the limiting covariance function (58) can now be viewed as the orthogonal projection of the functionCY (x, y) onto the space spanned by the functions Ci(x, y) (instead of the projection of φY onto the space spannedby the φi) with the same inner product (59).

3.2.2 The nugget effect

In the practice of geostatistics it is common to consider processes of the form

(61) Yε(x) = Y (x) + ε(x)

10

where Y (x) is as in previous sections and where ε(x) is a zero-mean random variable with a finite variance γ andwhere for x1 6= x2 the random variables ε(x1) and ε(x2) are uncorrelated. The processes Y (x) and ε(x) are assumeduncorrelated as well. The variance of the term ε(x) is traditionally called the nugget effect in geostatistics. Thissection reviews the effect of the presence of a nugget effect in the model in the setting of the previous section.

To estimate the covariance of the process with a nugget effect, a discontinuous covariance component is added tothe model (1). This discontinuous component will be called the nugget effect covariance component and it will bedenoted by W in order to differentiate it from the continuous components Ci. The component W is defined as

(62) W (ξ, η) =

{1 if ξ = η

0 otherwise.

It will be assumed that the model (61) holds and hence the true covariance function of Yε is

(63) CY,ε = CY + γW

where W is discontinuous as defined above, while CY , the covariance function of the process Y , is a continuousfunction as in the previous sections. The model to be fitted will be of the form

(64) Cγ,θ = γW +q∑i=1

θ(i)Ci = γW + Cθ.

Because of the discontinuity of the component W , the theory of the previous section does not apply directly. In fact,as we will soon see, the rate of convergence for γ and for θ will differ.

Following the argument leading to the definition of θn in (22), the projection estimator is easily obtained:

(65) (γn, θn)′ = ∆−1n Ψn,

where

∆n =[n− p [tr(Ui,n)]

′

[tr(Ui,n)] [tr(Ui,nUj,n)]

](66)

Ψn =[

e′nen[e′nUi,nen]

].(67)

Therefore, given an observed vector Yn one has

(68) Cγn,θn= γnUW,n +

q∑i=1

θ(i)Ui,n,

where UW,n = PnInPn = Pn, since Pn is a projection matrix.To obtain the asymptotic behavior of the projection estimator (65), we define the following matrices

Γn =[

1n 00 1

n2 Iq

](69)

Υn = E[Ψn] =[tr(UW,nU(Y,ε),n)[tr(Ui,nU(Y,ε),n)]

](70)

Ξn =[tr(UW,nU(Y,ε),nUW,nU(Y,ε),n) [tr(Ui,nU(Y,ε),nUW,nU(Y,ε),n)]

′

[tr(Ui,nU(Y,ε),nUW,nU(Y,ε),n)] [tr(Ui,nU(Y,ε),nUj,nU(Y,ε),n)]

],(71)

where U(Y,ε),n = Pn(KY,n + γIn)Pn = UY,n + γPn. We now define

Aε,n = Γn∆n =[n−1(n− p) n−1[tr(Ui,n)]

′

n−2[tr(Ui,n)] AX,n

](72)

Mε,n = ΓnΥn =[n−1 tr(UY,n) + γn−1(n− p)MXn + γn−2[tr(Ui,n)]

](73)

Bε,n = ΓnΞnΓn(74)

Eε,n = A−1ε,n diag(Bε,n)A−1

ε,n(75)

11

where p is the number of columns in the regression matrix X, [tr(Ui,n)] is a q × 1 matrix, [tr(Ui,nUj,n)] is a q × qmatrix and where the matrices Ui,n are defined as in (22) while AX,n and MX,n were defined in the previous section.It is important to note that the premultiplication by the matrix Γn implies that the terms involving γ alone convergeat the rate n−1, whereas those that involve θ alone converge at the rate n−2 as in the previous section. Note alsothat the matrices are no longer symmetric with one cross-product with a n−1 term whereas the other has a n−2 term.This will have an effect on the limiting matrices as we will now see. To extend the results of the previous section tothe case with the nugget effect, we need the following definitions:

a(i) =∫D

φi(ξ, ξ)f(ξ)dξ, with φi as in (46),(76)

m0 = γ +∫D

φY (ξ, ξ)f(ξ)dξ,(77)

b0 =∫D

∫D

φY (ξ, η)φY (η, ξ)f(ξ)f(η)dξdη,(78)

b(i) =∫D

∫D

hX,i(ξ, η)φY (η, ξ)f(ξ)f(η)dξdη, with hX,i as in (50),(79)

Aε =[1 a′

0 AX

], where AX is q × q given by (47),(80)

Mε =[m0

MX

], where MX is q × 1 given by (48),(81)

Bε =[b0 b′

b BX

], where BX is q × q given by (49),(82)

Eε = A−1ε diag(Bε)(A−1

ε )′.(83)

The following lemma further generalises Lemma 3.1:

Lemma 3.2 For observations coming from the process in (61) sampled on an in-fill sampling domain (D, {Sn}∞n=1),with the uk, k = 1, . . . , p continuous and such that R = I, assuming AX is invertible, there exist constants such that

‖Aε,n −Aε‖∞ ≤ cA,εδn(84)‖Mε,n −Mε‖∞ ≤ cM,εδn(85)‖Bε,n −Bε‖∞ ≤ cB,εδn(86)‖Eε,n − Eε‖∞ ≤ cE,εδn.(87)

The following result holds:

Theorem 3.4 Under the model (61) with Y as in (2), where X contains continuous regressor functions and suchthat the matrix R of (43) equals the identity, if the observations come from an in-fill sampling domain (D, {Sn}∞n=1),and the matrix Aε of (80) is invertible, the projection estimator defined by (65) has the mean satisfying

(88)

wwwwwE[γnθn

]−A−1

ε Mε

wwwww∞

=

wwwwwE[γnθn

]−

[m0 − a′A−1

X MX

A−1X MX

]wwwww∞

≤ cθ,εδn

where θ is given by (56). If, moreover, (25) holds, the variance of the estimator satisfies

var(γn) ≤ (q + 1)c (Eε(1, 1) + cE,εδn)(89)

var(θn(i)) ≤ (q + 1)c (Eε(i+ 1, i+ 1) + cE,εδn), i = 1, . . . , q.(90)

Remark 3.5. If the covariance of the process Yε satisfies the linear model equation (64), then the projectionestimator for γ and θ is asymptotically unbiased since A−1

X MX = θ and it is easy to see that m0 − a′θ = γ in thiscase. In fact, for any finite n, the estimator is unbiased. But if the covariance model for the Y part of Yε is notcorrect, then both the estimators of θ and γ will be biased. Also, as in the case without the nugget effect, the limitingvariance of the estimator is finite and non-zero in general.

12

Remark 3.6. It is of course of interest to find out what happens if one does not include the term γW inthe covariance model (64) even though there is a nugget effect so that the covariance function is given by (63).Interestingly, there is no effect on the asymptotic mean and variance of θn. To see this, one examines the formulae(47), (48) and (49). Clearly, (47) is unaffected, while (48) yields

(91) limn→∞

1n2

tr(Ui,n(UY,n + γIn)) = limn→∞

MXn+ limn→∞

1n2

tr(Ui,n) = MX(i)

so that the limit is unchanged with that of the previous section. A similar argument shows that the limit of BX,n is alsounchanged. Hence if the covariance model for Y is correct, then the covariance estimator Cθn

(x, y) is asymptoticallyunbiased for x 6= y. Without the nugget component, of course, the estimator is biased for x = y.

It is important to note that, in general, omitting a component in the covariance which is present in the truecovariance, say θ1C1, will lead to biased estimates of the other components and change the asymptotic mean andvariances of these components. It is the very special nature of the nugget component W (a discontinuous deltafunction) which changes the rates of convergence which leads to this surprising conclusion.

3.3 Asymptotics on expanding domains

In the previous sections it was shown that under some rather general conditions, if the observations come froman in-fill sampling sequence on a compact domain, the variances of the components θ(i) of the estimator (22) arebounded by a multiple of the diagonal elements of the matrix E (where the precise definition of E may be given by(51) or (83) depending on context). It was also seen that those bounds are generally strictly positive and that evenin the simple case of a Gaussian process, var(θn(i)) remains bounded away from zero.

In this section attention will be focused on isotropic processes and the effect of the size of the finite domain onthe limiting variance will be considered. The limit matrix E will be found to depend on the size of the domain D.The main result derived in the subsequent sections states that under fairly mild regularity conditions the entries ofthe matrix E converge to zero as the sampled domain D is allowed to grow indefinitely. It will therefore follow thatvar(θn(i)) can be made arbitrarily small by sampling a sufficiently large domain at a sufficient number of locations.

It will be assumed that a sequence {(Dm, {Sm,n}∞n=1)}∞m=1 of expanding in-fill domains is given, as in Definition3.4. The subscript m runs over domains in the expanding sequence {Dm}.

The results obtained so far do not make the assumption of stationarity of η in (2). Throughout the remainingsections it will be assumed that the random process η, as well as the component covariance models Ci, are isotropicand therefore (second order) stationary. In the case of an isotropic process the covariance functions Ci(x1, x2),Cθ(x1, x2) and CY (x1, x2) depend only on ρ = ‖x1 − x2‖ and the following notation will be used along with thecurrent notation: Ci(ρ) = Ci(x1, x2), Cθ(ρ) = Cθ(x1, x2) and CY (ρ) = CY (x1, x2). Thus the model (2) will remainunchanged, but (1) will now have an equivalent isotropic version

(92) Cθ =q∑i=1

θ(i)Ci.

3.3.1 Isotropic fields

In the case of the isotropic field, the equations (52) and (53) take a simplified form. The double integrals of (52) and(53) can be replaced by single integrals if the general covariance forms CY , Cθ and Ci are replaced by their isotropiccounterparts CY , Cθ and Ci. These new expressions will be essential in subsequent considerations deriving the limitof the matrix E as the size of the domain D grows indefinitely. This section establishes some fairly technical detailsnecessary to obtain the new expressions.

One considers the measure

(93) F2(B) =∫B

f(ξ)f(η)dξdη

where B is any Lebesgue-measurable subset of D2. The following defines a measure on [0,diam(D)]:

(94) G(A) = F2({(ξ, η) : ‖ξ − η‖ ∈ A})

where A is any Lebesgue-measurable subset of [0,diam(D)]. (The fact that the distance function is continuousguarantees that the set {(ξ, η) : ‖ξ − η‖ ∈ A} is measurable.) It is easily seen that if φi(ξ, η) = Φi(ρ), i = 1, 2 aremeasurable and isotropic, then

(95)∫D

∫D

φ1(ξ, η)φ2(η, ξ)dF2(ξ, η) =∫ diam(D)

0

Φ1(ρ)Φ2(ρ)dG(ρ).

13

The measure G will generally depend on the size and shape of D. The non-negative function G(ρ), ρ > 0 will bedefined by

(96) G(ρ) = F2({(ξ, η) : ‖ξ − η‖ ≤ ρ}) = G([0, ρ]).

Examples of the function G for some regular domains for d = 2 can be found in Bartlett (1964) or Diggle (1983).Next one considers a sequence of expanding in-fill domains (Dm, {Sm,n}∞n=1) as in Definition 3.4, with fm ≡

1/µ(Dm), which gives rise to the sequences of measures {F2,m}∞m=1 and functions {Gm}∞m=1, where F2,m is definedby (93) with f replaced by fm, and Gm is given by (96) with F2 = F2,m. The sequence {Gm}∞m=1 will play animportant role in the next lemma. To describe the edge effect present in a bounded domain, the set

(97) ∂ρ(D) = {x ∈ D : ∃y ∈ Rd \D : ‖x− y‖ < ρ}

will be called the edge strip of D of depth ρ — the subset of D containing all points which are less than ρ awayfrom some point outside of D.

Loosely speaking, the following lemma decomposes the function Gm into two components: one which dependsonly on the dimensionality d of the embedding space, and another, which depends on the geometry of the domainand which may be associated with the edge effect.

Lemma 3.3 Let (Dm, {Sm,n}∞n=1) be a sequence of expanding in-fill domains. Let µ(D) = 1 and let D satisfy thefollowing condition:

(98) supρ>0

µ(∂ρ(D))ρ

≤ c <∞

where µ is the Lebesgue measure on Rd and c is a constant. Then

Gm(ρ) = αGr−dm ρd −Rm(ρ) with αG =

πd/2

Γ(d/2 + 1)and(99)

0 ≤ Rm(ρ) ≤ cαGr−d−1m ρd+1.(100)

For example, if D is a circle in the plane, the constant c in (98) is its circumference, while for a three-dimensionalball, c is the surface area of the sphere. The last lemma gives bounds for the function G. In addition to these boundstwo other properties are easy consequences of the definition of G and will be of interest. Firstly, G is non-decreasing.Secondly, if F2 is absolutely continuous, G is continuous. As a result, the function G may be used to define a Stieltjesintegral.

3.3.2 Regularity conditions

This section summarises the most important assumptions which will be made about the observed process Y , itscovariogram, the model, its components and the sequence of expanding in-fill domains which will be considered.These restrictions are mostly technical and are invoked to guarantee the existence of various limits appearing in thesubsequent results. While this is not strictly necessary, it will help simplify the arguments to assume throughout theremainder of this paper that D1 = [0, 1]d, that is, that D1 is the unit hypercube and, consequently, the followingcondition holds.

Condition 3.1 The sequence of expanding in-fill domains {(Dm, {Sm,n}∞n=1)}∞m=1 is such that Dm = [0, rm]d,m =1, 2, . . .. Thus the conclusion of Lemma (3.3) holds.

Moreover, the following conditions hold.

Condition 3.2 The covariance function of the process Y satisfies the following condition:

(101)∫ ∞

0

|CY (ρ)|ρd−1dρ <∞

and consequently, for some constants cY and cY Y :∫Dm

|CY (ξ, λ)|fm(λ)dλ ≤ cY r−dm for all ξ ∈ Dm(102) ∫

Dm

∫Dm

|CY (ξ, λ)|fm(λ)fm(ξ)dλdξ ≤ cY Y r−dm .(103)

14

Condition 3.3 The model covariance functions satisfy the following condition for some α > 0 and τ > 0

(104) |Ci(ρ)| ≤ αρ−d/2−τ

and consequently, for some constants ci and cii∫Dm

|Ci(ξ, λ)|fm(λ)dλ ≤ cir−d/2−τm for all ξ ∈ Dm(105) ∫

Dm

∫Dm

|Ci(ξ, λ)|fm(λ)fm(ξ)dλdξ ≤ ciir−d/2−τm(106)

for i = 1, . . . , q.

Working with different domains Dm implies that the regressor functions ui are defined on different domains.For this reason and to ensure the orthonormality, it is necessary to consider a separate linear transformation of theregressor functions for each domain Dm.

Condition 3.4 It is assumed that on each domain Dm, the regressor functions {ui}pi=1 can be linearly transformedinto regressor functions {um,i}pi=1 satisfying:

(107)∫Dm

um,i(ξ)um,j(ξ)fm(ξ) dξ = δi,j .

Condition 3.5 There exists a finite constant c such that for all m, multi-index vectors u satisfying |u| ≤ d and fori = 1, . . . , p

(108)∂|u|

∂uum,i(ξ) ≤ c, ξ ∈ Dm

Condition 3.4 can be always satisfied for linearly independent, summable functions on a compact domain. Condition3.5 can be shown to be satisfied if, for example, the regressors {ui}pi=1 are polynomials (see Lemma A.8 in theAppendix).

3.3.3 Expanding-domain convergence

For each m = 1, 2, . . ., one defines am, m0,m, b0,m, bm, Aε,m, Mε,m, Bε,m and Eε,m by putting D = Dm in thedefinitions (76), (77), (78), (79), (80), (81), (82) and (83), respectively. One defines:

a(i) = Ci(0)(109)m0 = γ + CY (0)(110)

A(i, j) = dαG

∫ ∞

0

Ci(ρ)Cj(ρ)ρd−1dρ i, j = 1, . . . , q with αG as in Lemma 3.3(111)

M(i) = dαG

∫ ∞

0

CY (ρ)Ci(ρ)ρd−1dρ i = 1, . . . , q(112)

Aε =[1 a′

0 A

](113)

Mε =[m0

M

](114)

θ = A−1M.(115)

Theorem 3.5 Let the sequence {rm} be increasing and unbounded. Let Conditions 3.1, 3.2, 3.3, 3.4 and 3.5 hold.Let am, m0,m, θm, Eε,m, a, m0, θ, γ be as defined earlier. Then the following limit holds:

(116) limrm→∞

[m0,m − a′mθm

θm

]=

[m0 − a′θ

θ

]=

[γ + CY (0)−

∑qi=1 θ(i)Ci(0)

θ

].

Moreover, there exists a fixed vector of positive numbers E(i), i = 1, . . . , q + 1 such that for m = 1, . . .

(117) rdmEε,m(i) ≤ E(i).

15

The last result provides an important insight into the way in which the variance of the projection estimator decreasesas the size of the domain increases. For each domain Dm the bounds (84) - (87) and (88) hold with Aε replacedby Aε,m, Mε replaced by Mε,m, etc. and with the constants cA,ε, cM,ε, cB,ε, cE,ε and cθ,ε replaced by constantsdependent on the domain Dm: cA,ε,m, cM,ε,m, cB,ε,m, cE,ε,m and cθ,ε,m. For a sufficiently large n, the dominatingterm in the variance of γn or θn(i) is of the form (q + 1)cEε,m(j) ≤ (q + 1)cE(j)r−dm with j = 1 for γn and j = i+ 1for θn(i). This bound vanishes as the domain increases, although n may have to increase correspondingly. The nextsection explores the relationship between the sample size and the size of the domain required to ensure that thevariance vanishes.

3.4 Mixed in-fill and expanding domain asymptotics

The results of previous sections can be combined to construct a consistent estimator provided that an increasingsequence of domains is sampled with increasing intensity. The increasing domain size rm reduces the bound (117) onlimiting variance of an estimator based on sampling that domain. On the other hand, the constants cθ,ε and cE,ε in(88) and (89) will in general depend on the domain Dm and for this reason it will be denoted by cθ,ε,m and cE,ε,m.The way in which they depend on rm is described by the following lemma.

Lemma 3.4 If the functions Ci, i = 1, . . . , q and CY have uniformly bounded partials of orders not exceeding 2d andConditions 3.1 – 3.5 hold, the following bounds hold:

cθ,ε,m = O(r4dm )(118)

cE,ε,m = O(r8dm ).(119)

Combining the last lemma with Theorem 3.5 leads to the following theorem.

Theorem 3.6 Let {(Dm, {Sm,n}∞n=1)}∞m=1 be a sequence of expanding in-fill domains and let the assumptions ofTheorem 3.5 hold. Let θm,n be the estimator defined in (65) based on the observations on the sampling set Sm,n. Let{n(m)}∞m=1 be any sequence of integers satisfying, for all m

limm→∞

cθ,ε,mδn(m) = 0, i = 1, . . . , q + 1(120)

limm→∞

cE,ε,mδn(m) = 0, i = 1, . . . , q + 1.(121)

It follows that

(122) (γm,n(m), θm,n(m)) →p (m0 − a′θ, θ) as m→∞.

Moreover, a sufficient requirement for conditions (120) and (121) to hold is

(123) limm→∞

r8dm δn(m) = 0.

4 Surface elevation data

In this section the projection estimator is applied to the data of Davis (1973). The data consists of n = 52 measure-ments of surface elevation and it was collected on a square. Thus the domain is two-dimensional. Firstly, one mustchoose a class of additive models. One such class was suggested by Shapiro & Botha (1991) and it takes the form

(124) CAθ (ρ) =q∑i=1

θ(i)J0(λiρ)

where J0 is the Bessel function of order zero of the first kind and the λi are fixed positive numbers, while the θ(i) arepositive numbers to be estimated. It should be emphasised that the estimation procedure applied by Shapiro andBotha is very different from the projection estimator considered here. Nevertheless, the model is additive so that theprojection estimator (22) can be applied to this model. However, the covariance component functions in the modeldecay very slowly (on the order of ρ−1/2). This is insufficient to satisfy Condition 3.3, hence Theorem 3.6 does notapply and the convergence in probability may not be attainable. The model (124) will be named model A. Anothermodel, named B, will also be considered. It may be parametrised as CBθ (ρ) =

∑qi=1 θ(i)Ci(ρ) where

(125) Ci(ρ) =

2ρ(b(i)2−a(i)2)

(b(i)J0(b(i)ρ)− a(i)J0(a(i)ρ)

)if ρ > 0

1 if ρ = 0.

16

model θ1 θ2 θ3 θ4linear trend, model A 1123.54 359.73 106.796 49.7274linear trend, model B1 1365.13 203.806 74.5592 133.592linear trend, model B2 — 738.566 52.6978 263.894

Table 1: Estimated coefficients for the two mean models.

A covariance model with components given by (125) is described in detail in Powojowski (2000, 2002b). It isa valid model as long as a(i) < b(i) and θ(i) ≥ 0 for i = 1, . . . , q. In addition, it can be easily shown thatCi(ρ) = O(ρ−3/2), which is enough to satisfy Condition 3.3. The particular model A considered is defined by q = 4,with λ1 = 1, λ2 = 2, λ3 = 3 and λ4 = 4. Two models of type B are considered. Model B1 will be defined by q = 4,(a(1), b(1)) = (0.5, 1.5), (a(2), b(2)) = (1.5, 2.5), (a(3), b(3)) = (2.5, 3.5) and (a(4), b(4)) = (3.5, 4.5), while Model B2has q = 3 with the last three pairs of parameters. So Model B2 is like B1 but with the coefficient θ1 being forcedto take the value 0. Criteria for choosing the model parameters will not be discussed here. They are considered ingreater detail by Powojowski (2000, 2002a).

Since the mean of the sampled process Y is not known, it needs to be modelled. In this paper, we considered onlyone such model allowing for an arbitrary linear trend over the sampled square, with corresponding regressor matrixX. The resulting estimates are given in Table 1. The top graph in Figure 1shows the three estimated covariograms.We see that Models A and B1 have very similar covariograms for distances up to 3 whereas Model B2 is quite different.Between distances of 3 and 5, the three models are somewhat different, while beyond distances of 5, Models B1 andB2 are similar and different from Model A. The bottom graph contains the 2704 products of residuals marked asindividual points with coordinates (‖xk − xl‖, e(k)e(l)), where 1 ≤ k ≤ 52, 1 ≤ l ≤ 52 and e(k) = Y (k)− Y (k) is theresidual computed as in (20) along with the loess smooth of these points as computed in Splus6. To try to get anidea which of these three models fit better, we will try to compare this smooth curve to the covariograms, althoughthe comparison will be indirect. Note that the mean of the product of residuals e(k)e(l) is (PKY P )(k, l), ratherthan KY (k, l) which may be estimated by (PKi

θP )(k, l) rather than by Ki

θ(k, l), where P denotes the (regression)

projection in (20). To judge the quality of a given covariance model, we therefore plot the loess smooth estimate ofthe points (‖xk−xl‖, e(k)e(l)) along with the expected value of these points under the combination of regression andcovariance models (‖xk−xl‖, (PKi

θP )(k, l)) where i is A, B1, or B2. If the covariance model is adequate, the smooth

curve based on the product of residuals should pass through the cloud of expected product of residuals. Figure 2shows this cloud superposed over the loess smooth of the product of residuals along with 95% confidence intervals ofthe curve for each of the three covariance models. The curve and especially the confidence intervals are to be takenwith caution since the 2704 products of residuals are clearly not independent. We see that the fit of the covarianceModel B1 is slightly better than that of Model A while Model B2 does not fit well at all. So this shows that forcingthe coefficients θ1 of Model B1 to be 0 is not compatible with the data.

Comparing the results of this section with models previously fitted to the same data (e.g., Ripley, 1988, andWackernagel, 1995), one observes an important difference. The models for the mean of the process used by thoseauthors are linear trends or quadratic surfaces whereas the covariance models included the exponential and the Gaus-sian parametric models. These parametric covariance models do not allow negative covariances whereas the additivemodels that we consider do. Visual examination of Figure 2 suggests that models without negative covariances arenot plausible.

17

Estimated covariance functions

rho

Cov

aria

nce

0�

2�

4�

6�

8�

050

010

0015

00

Model AModel B1Model B2

Loess smooth of products of residuals

rho

Cov

aria

nce

0�

2�

4�

6�

8�

−50

000

5000

1500

0

·

··

······

··

·

·································

·

···

· · ··········

···

··

··················

··············

·

··

···························································

·

··

·

·

·

···

·

·····

·

···········································

···

·

····

··

·

·

·

·

······················

·····················································

·

··

·

·······································

··

·

··································

·····

·

·············································

··

·

···

·

··

·

··········

············

·

·

·······

·

···········

·

······

·····························

·························

·········

·

·

·

·······················································

·······················································

·

·

··········

··

·

·

········

·

····························

····

······

·

··········

·

·

·

···········································

·······

··

········································

·

··

·

··

·

··

·

·········

···

·

·······································

·

····

··

·

·

··

·

··

·

··············································

····················

·

·

·

····························································

·

···········································

·

·

·

··········

···

·

·

·

····················

············

························

··

··································································

··

···

·

··

·

·················

·

·

··

······················

·

·······················

······································

····························································

·

·

·

······························

·

···········

·

······················

···········································

···

·····································

·

···

·

········

····

·····

·

·······

·

··

·

·················

···

······

···············

·

···

·

·

···

··

·

·

·

·

·

·

·····················

·

·················

·············

·

·····

·················

·

········

·

·

·

·······························

·

··

·

··

·

·

·

··

·

·······

·

·

········

·····

······

······

·

·

·····

··················

·

·

·

····

·

·

···

····

·

················

····

·

·················

·

······

·········

·

·

··

·

·······

·········· ·

·

·····

···

·

··

··

·

Figure 1:

18

Model A

rho

Cov

aria

nce

0 2�

4�

6 8�

−50

00

500

1000

1500

2000

2500

·

···

·

······

··

····························

········

·· ························

··········

···········

·

·

···························································

····

·

··

·

·

·

·········

················

··························

·

·····································································································································

·

·····

················································

························

······································································

······················

·

······················

·

····························································································

·············

·······································

·································

·

·

···········································

·

·

··············

·················································································

···························

·

···············

····················································································································································································

···

····················

··········································································································

·····································································································

·

·

···················································································································································································································································

·································

···

·

········································

····

···········································································································································································································································

···························

······················

········ ········ ·

·· ····

·

Model B1

rho

Cov

aria

nce

0 2�

4�

6 8�

−50

00

500

1000

1500

2000

2500

····

·

····································

·········· ·

·········································

·

·

································································

··

·

·

··········

················

·····························································

·

·

·································································································

·

······

···························································································································································································

·

········································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································

·················

········ ········ ··· ·· ·· ·

Model B2

rho

Cov

aria

nce

0 2�

4�

6 8�

−50

00

500

1000

1500

2000

2500

··················································· · ····················

···························································································································································································

······································································································································································································································································································

·································································································································································································································································································

·················································································································································································

··········································································································································································································································································································································································································································································································································································································································································································································································································································································································· ········ ··· ·· ·· ·

Expected product of residuals versusloess smooth of products of residuals

Figure 2:

19

5 Conclusion

The paper proposes a new approach to the problem of estimating the covariance function of a stochastic process.The approach combines two main ideas: using additive models and a new estimation method based on orthogonalprojections. The approach may be viewed as a particular case of MINQUE estimation. The case is presented thatthis approach offers many distinct theoretical and practical advantages over traditional approaches involving theempirical (co)variogram, as well as over general MINQUE estimation.

In comparison with traditional methods the need to estimate the empirical (co)variogram is eliminated andso is the arbitrariness of the bin selection. Since the empirical (co)variograms are meaningless for non-stationaryprocesses and hard to compute for non-isotropic processes, the approach presented is more generally applicable thanthe traditional procedures. The mean and polynomial trends in the mean can be estimated simultaneously withoutmuch complication and without compromising the estimator’s properties. For instance, the estimator by projectionis unbiased even when the mean of the process needs to be estimated from the data (assuming that the model forthe mean is correct and the true covariance function is in the class of models considered). The estimation procedureinvolves only linear algebra, and thus all problems, both theoretical and practical, associated with non-linear, non-quadratic optimisation are avoided. The properties of the estimator can be more easily understood than is the casein the traditional approach. In particular, the properties of the covariance estimator are as easy to obtain as thoseof the estimators of the parameters, unlike for the traditional covariance estimators.

In contrast with the general MINQUE estimation, (which is also unbiased and requires only linear algebra tocompute) the stability and asymptotic properties of the projection estimator are not dependent on the relationshipbetween the initial guess K0 required by MINQUE and the true covariance matrix KY . The linear algebra compu-tations required for the projection estimator involve inverses of much smaller matrices than those required for anyother MINQUE estimator.

It is shown that in the in-fill asymptotic context (sampling of finite domain) it is generally impossible to estimatethe covariogram consistently. An upper bound may, however, be obtained for the asymptotic variance of the estimatorin this context. Furthermore, it is shown that for isotropic processes this upper bound can be made arbitrarily smallif the sampled domain is sufficiently large. Finally, a constructive method is presented to sample more densely anincreasing domain in such a way to guarantee the convergence of the covariance function projection estimator.

The estimator is illustrated using a data set of Davis (1973). In order to apply the projection estimator inpractice, an adequate class of additive models is required. One such class is the class of Shapiro & Botha (1991), asseen in Section 4, although it suffers from rates of decay too slow for the assumptions in some of the results of thispaper. Other flexible classes of models, satisfying the hypotheses needed for the convergence results, are explored inPowojowski (2000, 2002b) and a model of this type is also briefly considered in modelling the Davis data.

APPENDIXThis section contains the proofs of the results.For convenience and without loss of generality, for all covariance model components throughout this appendix it willbe assumed that Ci(0, 0) = 1 or Ci(0) = 1 and that the discrepancy sequences {δn}∞n=1 are nonincreasing. Beforeproving Lemma 3.1 some required lemmas are established.

Lemma A.1 Let (D, {Sn}∞n=1) be an in-fill sampling domain and ψ ∈ Cd(D) be the uniform limit of the sequence offunctions {ψn} in Cd(D) with supx∈D |ψn(x)− ψ(x)| ≤ αn. It follows that∫

D

ψ(x)dx− 1|Sn|

∑s∈Sn

ψn(s) ≤ αn + δnV

∗D(ψ).

Lemma A.2 Let S be a sampling point set on D with |S| = n and D∗(S) = δ. Then S × S is a sampling point seton D ×D with |S × S| = n2 and D∗(S × S) ≤ 2δ.

Lemma A.3 For two q × q matrices A and B and any norm, there exists a finite k such that

(126) ‖ AB ‖≤ k ‖ A ‖ ‖ B ‖

where k = 1 for the Frobenius norm and the spectral radius, and k = q for the L∞ norm.

Lemma A.4 If matrices A and B are positive-definite and A is symmetric, then tr(AB) ≤ ‖ A ‖o tr(B) ≤tr(A) tr(B).

The proofs are omitted, due to their simplicity. The following simple result will be useful.

20

Lemma A.5 Let the sequence of symmetric q×q matrices {An} be such that ‖ An−A ‖∞→ 0 where A is invertible.Then for n large enough to guarantee ‖ An −A ‖∞≤‖ A−1 ‖∞ /(2q3), one has

‖ A−1n −A−1 ‖∞≤ 2q6 ‖ A−1 ‖2

∞‖ An −A ‖∞‖ A−1

n ‖∞≤ 2q ‖ A−1 ‖∞

Proof of Lemma A.5. The proof follows easily from Theorem 2.3.5 of Atkinson & Han (2001) on page 50, andthe relations (17) and (18).

Lemma A.6 Let the estimator be defined as in (22). If (25) holds, then with c given by (25) and the notation as in(38) - (41)

(127) var(θn(k)) ≤ qcEn(k, k), k = 1, . . . , q.

Proof of Lemma A.6. Firstly, one observes that if P is a q × q symmetric non-negative definite matrix, thenq diag(P )− P is non-negative definite. By (25) var(Y ′nUi,nY ) ≤ c tr(Ui,nUY,nUi,nUY,n) and it follows that

[cov(Y ′nUi,nYn, Y′nUj,nYn)] ≤ q diag

([cov(Y ′nUi,nYn, Y

′nUj,nYn)]

)≤ cq diag

([tr(Ui,nUY,nUj,nUY,n)]

)Denoting the k-th column of A−1

X by a one obtains

var(θn(k)) =(A−1X (1/n4)[cov(Y ′nUi,nYn, Y

′nUj,nYn)]A

−1X

)(k, k)

= a′ (1/n4)[cov(Y ′nUi,nYn, Y′nUj,nYn)] a

≤ cq a′ (1/n4) diag([tr(Ui,nUY,nUi,nUY,n)]) a

= cq

(A−1X (1/n4) diag([tr(Ui,nUY,nUi,nUY,n)])(A−1

X )′)

(k, k) = cq EX(k, k)

which concludes the proof.

Proof of Lemma 3.1. We begin with the matrix AX,n. First consider the covariance matrix of en in theestimator (22)

(128) Ui,n(l1, l2) = cov[en(l1), en(l2)]

= Ci(xl1 , xl2)−Xn(l2, ·)(X ′nXn)−1X ′

nKi,n(·, l1)−Xn(l1, ·)(X ′nXn)−1X ′

nKi,n(·, l2)+Xn(l1, ·)(X ′

nXn)−1X ′nKi,nXn(X ′

nXn)−1Xn(l2, ·)′

= Ci(xl1 , xl2)− (u1(xl2), . . . , up(xl2))(X′nXn)−1X ′

n(Ci(x1, xl1), . . . , Ci(xn, xl1))′

− (u1(xl1), . . . , up(xl1))(X′nXn)−1X ′

n(Ci(x1, xl2), . . . , Ci(xn, xl2))′

+ (u1(xl1), . . . , up(xl1))(X′nXn)−1X ′

nKi,nXn(X ′nXn)−1(u1(xl2), . . . , up(xl2))

′

≡ φi,n(xl1 , xl2),

where the sign ≡ is used to define the quantity to its right. By (44) and (45) one has ‖ n(X ′nXn)−1 − I ‖∞≤ VXδn

with

VX = p2 maxk1,k2=1,...,q V∗D(uk1uk2) and(129)

‖ n(X ′nXn)−1 ‖∞≤ JX <∞.(130)

Similarly, using Theorem 3.1 1n

(Xn(·, k))′(Ci(x1, xl1), . . . , Ci(xn, xl1))′ −

∫D


≤ δnV∗D(uk(·)Ci(·, xl1)) ≤ Ruk,Ciδn

21

where

(131) Ruk,Ci = supξ∈D V∗D(uk(·)Ci(·, ξ)) <∞.

Since |Ci(ξ, η)| ≤ 1

(132) 1n

(Xn(·, k))′(Ci(x1, xl1), . . . , Ci(xn, xl1))′ ≤ supξ∈D |uk(ξ)| ≡ Ruk

and using ‖ R−1 ‖= p and Theorem 3.1, one obtainsXn(l2, ·)(X ′nXn)−1X ′

n(Ci(x1, xl1), . . . , Ci(xn, xl1))′

−p∑

k1=1

p∑k2=1

uk1(xl2)∫D

uk2(ξ)Ci(ξ, xl1)f(ξ)dξ

≤p∑

k1=1

|uk1(xl2)|(VX

p∑k2=1

Ruk2+

p∑k2=1

Ruk2 ,Ci

)δn.

Furthermore, putting W (k1, k2) =∫D

∫Duk1(ξ)uk2(η)Ci(ξ, η)f(ξ)f(η)dξdη,, and using Lemma A.2(

1n2X ′nKi,nXn

)(k1, k2)−W (k1, k2)

≤ 2δnV ∗D×D(ψk1,k2,i) ≡ Vuk1 ,uk2 ,Ciδn,(133)

where ψk1,k2,i(ξ, η) = uk1(ξ)uk2(η)Ci(ξ, η)(134)

and

(135)(

Xn(l1, ·)n(X ′nXn)−1

)(k)− uk(xl1)

≤ VXδn

p∑j=1

Ruj = VXRuδn,

where Ru =∑pj=1Ruj

. It is easy to see that |W (k1, k2)| ≤ Ruk1Ruk2

and it follows thatXn(l1, ·)(X ′nXn)−1X ′

nKi,nXn(X ′nXn)−1Xn(l2, ·)′−

p∑k1=1

p∑k2=1

uk1(xl1)uk2(xl2)W (k1, k2)

≤p∑

k1=1

p∑k2=1

(Xn(l1, ·)n(X ′

nXn)−1

)(k1)

((1n2X ′nKi,nXn

)(k1, k2)−W (k1, k2)

)(Xn(l2, ·)n(X ′

nXn)−1

)(k2)

+(Xn(l1, ·)n(X ′

nXn)−1

)(k1)W (k1, k2)

((Xn(l2, ·)n(X ′

nXn)−1

)(k2)− uk2(xl2)

)+

((Xn(l1, ·)n(X ′

nXn)−1

)(k1)− uk1(xl1)

)W (k1, k2)uk2(xl2)

≤

p∑k1=1

p∑k2=1

(Vuk1 ,uk2 ,CiR2uJ

2X + JXVXRuk1

Ruk2R2u + VXRuRuk1

R2uk2

)δn.

Hence it follows that

(136) |φi,n(xl1 , xl2)− φi(xl1 , xl2)| ≤ Qφiδn

where Qφi <∞ is independent of (xl1 , xl2). A similar result holds for φY . Let

Qφ = max{Qφ1 , . . . , Qφq , QφY}(137)

Rφ = supD×D{max{|φ1|, . . . , |φq|, |φY |}} <∞.(138)

22

Using again Theorem 3.1, and the triangle inequality of the form |anbn−ab| ≤ |a||bn−b|+ |b||an−a|+ |an−a||bn−b|one obtains

|AX,n(i, j)−AX(i, j)|

= n∑k=1

n∑l=1

1n2φi,n(xk, xl)φj,n(xl, xk)−

∫D

∫D

φi(ξ1, ξ2)φj(ξ2, ξ1)f(ξ1)f(ξ2)dξ1dξ2

≤ D∗(Sn × Sn)V ∗D×D(φiφj) + 2QφRφδn +Q2

φδ2n ≤ 2V ∗D×D(φiφj)δn

+ 2QφRφδn +Q2φδ

2n.

The same argument is applied to (53) and it follows that the following constants are sufficient

cA,X = maxi,j=1,...,q{2V ∗D×D(φiφj)}+ 2QφRφ +Q2φδ1(139)

cM,X = maxi=1,...,q{2V ∗D×D(φiφY )}+ 2QφRφ +Q2φδ1(140)

Let’s now consider BX,n. Putting hX,i(ξ, η) =∫Dφi(ξ, λ)φY (λ, η)f(λ)dλ one obtains

(141)hX,i(ξ, η)− n∑

k=1

1nφi,n(ξ, xk)φY,n(xk, η)

≤ (V ∗D(φi(ξ, ·)φY (·, η)) + 2QφRφ)δn

+Q2φRφδ

2n.

By the continuity of the functions involved and the compactness of D one has

Rφi,φY= sup(ξ,η)∈D×D V

∗D(φi(ξ, ·)φY (·, η)) <∞(142)

RhX,i= sup(ξ,η)∈D×D |hX,i(ξ, η)| <∞(143) n∑

k=1

1nφi(ξ, xk)φY (xk, η)

≤ R2φ.(144)

Furthermore,hX,i(xk1 , xk3)hX,j(xk1 , xk3)−

n∑k2=1

1nφi,n(xk1 , xk2)φY,n(xk2 , xk3)

n∑k2=1

1nφj,n(xk1 , xk2)φY,n(xk2 , xk3)

≤ ((Rφi,φY

+ 2QφRφ)R2φ + (Rφj ,φY

+ 2QφRφ)RhX,i+ (RhX,i

+R2φ)Q

2φRφδ

2n.

Using Lemma A.1, this leads to

(145) cB,X = maxi,j=1,...,q{(Rφi,φY+ 2QφRφ)R2

φ + (Rφj ,φY+ 2QφRφ)RhX,i

+ (RhX,i+R2

φ)Q2φRφδ1 + 2V ∗D×D(hX,ihX,j)}.

Assuming AX,n is invertible and using (126) and Lemma A.5

‖ EX,n − EX ‖∞≤‖ A−1X,n diag(BX,n)A−1

X,n −A−1X diag(BX)A−1

X ‖∞=‖ A−1

X,n(diag(BX,n)− diag(BX))A−1X,n +A−1

X,n diag(BX)(A−1X,n −A−1

X )

+ (A−1X,n −A−1

X ) diag(BX)A−1X ‖∞

≤ (q2J2A−1

X

cB,X + 2q8JA−1X

‖ diag(BX) ‖∞ ‖ A−1X ‖2

∞ cA,X

+ 2q8cA,X ‖ A−1X ‖3

∞ ‖ diag(BX) ‖∞)δn

where JA−1X

= supn=1,... ‖ A−1X,n ‖∞<∞. It follows that

(146) cE,X = q2J2A−1

X

cB,X + 2q8JA−1X

‖ diag(BX) ‖ ‖ A−1X ‖2 cA,X+

2q8 ‖ A−1X ‖3 ‖ diag(BX) ‖ cA,X .

23

Proof of Theorem 3.3. The proof of (56) is an immediate application of Lemma A.5 and (126), (52) and(53). Putting JM,X = sup1,... ‖MX,n ‖ and θX = A−1

X MX , one obtains:

(147) ‖ E[θX,n]− θX ‖∞=‖ [tr(Ui,nUj,n)]−1E[e′nUi,nen]− θX ‖∞=‖ A−1

X,nMX,n −A−1X MX ‖∞

≤ (2q7 ‖ A−1X ‖2

∞ cA,XJM,X + q ‖ A−1X ‖∞ cM,X)δn = cθ,Xδn.

Finally, (57) follows from Lemma A.6 and (55).

Proof of Lemma 3.2. Clearly one has KW,n = In and hence UW,n = PnKW,nPn = Pn. Recall that Ui,n =PnKi,nPn. For 1 ≤ j ≤ q, using Theorem 3.1

1n

tr(UW,nUj,n) =1n

tr(Uj,n) =n∑l=1

1nφj(xl, xl)(148) 1

ntr(UW,nUj,n)− a(j)

≤ V ∗D(ψj)δn, where ψj(ξ) = φj(ξ, ξ).(149)

Moreover,

tr(UW,nUW,n) = tr(Pn) = n− p(150)U(Y,ε),n = PnK(Y,ε),nPn = Pn(KY,n + γIn)Pn = UY,n + γPn(151)

and so again using Theorem 3.1

1n

tr(UW,nU(Y,ε),n) =1n

tr(UY,n) +1nγ tr(Pn) =

n∑l=1

1nφY (xl, xl) +

n− p

nγ(152) 1

ntr(UW,nU(Y,ε),n)−m0

≤ V ∗D(ψY )δn + γn−1(153)

where ψY (ξ) = φY (ξ, ξ). Furthermore, using Lemma 3.1 and Lemma A.4

(154) 1n2

tr(Ui,nU(Y,ε),n)−MX(i) ≤

1n2

tr(Ui,nUY,n)−MX(i) + γ

1n2

tr(Ui,n)

≤ cM,Xδn + γn−1,

as well as

1n2

tr(UW,nU(Y,ε),nUW,nU(Y,ε),n) =1n2

tr(UY,nUY,n) +2γn2

tr(UY,n) +γ2

n2tr(Pn)

≤n∑k=1

n∑l=1

1n2φY (xk, xl)φY (xl, xk) + (2γcε + γ2)n−1

where

(155) cε = supD(var(Y ))

is a constant. To see the last result, one recalls Lemma A.4 to obtain:

n−1(2γ tr(PnKY,nPn) + γ2 tr(Pn)) ≤ n−12γ ‖ Pn ‖o tr(KY,n) + γ2

≤ n−12γ tr(KY,n) + γ2 ≤ 2γ supD(var(Y )) + γ2.

Hence, by Theorem 3.1 and Lemma A.2

(156) 1n2

tr(UW,nU(Y,ε),nUW,nU(Y,ε),n)− b0

≤ 2V ∗D×D(φ2Y )δn + (2γcε + γ2)n−1.

24

Since limn→∞ n−1/δn = 0, the rate of convergence above is O(δn). Similarly, invoking Lemma A.4 again

1n3

tr(UW,nU(Y,ε),nUj,nU(Y,ε),n) =1n3

tr(UY,nUj,nUY,n) + 2γcεn−1 + γ2n−2(157)

1n4

tr(Ui,nU(Y,ε),nUj,nU(Y,ε),n) =1n4

tr(Ui,nUY,nUj,nUY,n) + 2γcεn−1 + γ2n−2.(158)

By arguments similar to those in the proof of Theorem 3.3 it follows that

(159) 1n3

tr(UY,nUj,nUY,n)− b(j)

≤ (Rφ(Rφi,φY+ 2QφRφ) + 2V ∗D×D(φY hX,j) +RhX,j

Qφ)δn = cbjδn.

Hence and by (54) 1n3

tr(UW,nU(Y,ε),nUj,nU(Y,ε),n)− b(j) ≤ cbjδn + cεn

−1 + γ2n−2(160) 1n4

tr(Ui,nU(Y,ε),nUj,nU(Y,ε),n)−BX(i, j) ≤ cB,Xδn + cεn

−1 + γ2n−2.(161)

>From the remarks above it follows that there exists bounds cA,ε, cM,ε, cB,ε and cθ,ε such that

‖ Aε,n −Aε ‖∞≤ max{cA,X , V ∗D(ψ1), . . . , V ∗D(ψq)}δn + pn−1 ≤ cA,εδn(162)

‖Mε,n −Mε ‖∞≤ max{cM,X , V∗D(ψY )}δn + γn−1 ≤ cM,εδn(163)

‖ Bε,n −Bε ‖∞≤ max{cB,X , cb1 , . . . , cbq}δn + 2γcεn−1 + γ2n−2 ≤ cB,εδn(164)

‖ Eε,n − Eε ‖∞≤ (q2J2A−1

εcB,ε + 2q8JA−1

ε‖ diag(Bε) ‖∞ ‖ A−1

ε ‖2∞ cA,ε(165)

+2q8 ‖ A−1ε ‖3

∞ ‖ diag(Bε) ‖∞ cA,ε)δn ≤ cE,εδn,

where

(166) JA−1ε

= supn=1,... ‖ A−1ε,n ‖∞<∞.

Proof of Theorem 3.4.

‖ E[(γn, θn)′]−A−1ε Mε ‖∞=‖ A−1

ε,nMε,n −A−1ε Mε ‖∞(167)

≤ (‖ A−1ε ‖2

∞ cA,εJMε+ ‖ A−1

ε ‖∞ cM,ε)δnwhere JMε

= sup1,... ‖Mn ‖∞<∞.(168)

Proof of Lemma 3.3. For each m, let Dm = Tm(D) and let Bρ(D) = D \ ∂ρ(D). It is easily seen thatTm(Br−1

m ρ(D)) = Bρ(Tm(D)) and hence µ(Dm \ Bρ(Dm)) ≤ cρrd−1m . Let F1,m(A) =

∫Afm(ξ)dξ for any measurable

subset A of D. Let αG = πd/2/Γ(d/2 + 1). Therefore µ({x ∈ Rd :‖ x ‖≤ ρ}) = αGρd. By the hypotheses of Lemma

3.3 and recalling that µ(D) = 1, F1,m(A) = µ(A)/µ(Dm) one observes that

F1,m(Dm) F1,m({x ∈ Rd :‖ x ‖≤ ρ}) ≥ Gm(ρ)

= F2,m({(ξ, η) ∈ D2m :‖ ξ − η ‖≤ ρ})

≥ F1,m(Bρ(Dm)) F1,m({x ∈ Rd :‖ x ‖≤ ρ})= µ(Dm)−2 µ(Bρ(Dm))µ({x ∈ Rd :‖ x ‖≤ ρ})

≥ µ(Dm)−2 (µ(Dm)− cρrd−1m )αGρd = r−2d

m (rdm − cρrd−1m )αGρd

= αG r−dm ρd − αGcr

−d−1m ρd+1

while F1,m(Dm) F1,m({x ∈ Rd :‖ x ‖≤ ρ}) = αG r−dm ρd. Hence

αG r−dm ρd ≥ Gm(ρ) ≥ αG r

−dm ρd − αGcr

−d−1m ρd+1

and one obtains the required result.

25

The following lemmas will be necessary for the proof of Theorem 3.5.

Lemma A.7 For any real nonnegative function φ ∈ L1(R+)

limrm→∞

1rm

∫ rm

0

φ(ρ)ρdρ = 0.

Lemma A.8 For the orthonormal polynomial regressors uk,m, k = 1, . . . , p, m = 1, 2, . . . on Dm = [−rm/2, rm/2]d

there exists a constant c such that for m = 1, 2, . . .:

(169) supx∈Dm|∂|u|

∂uuk,m(x)| ≤ c.

Proof of Lemma A.8. Initially u = (0, . . . , 0) is assumed. Let mi, i = 1, . . . , l be all the monomials present inthe polynomials uk, k = 1, . . . , p. Thus

uk,m =l∑i=1

ck,i,mmi , with(170) ∫Dm

u2k,m(x)dx = rdm.(171)

Hence mi(x) =∏dj=1 x(j)

γi,j for i = 1, . . . , l and some non-negative integers γi,j and so

κi1,i2,m =∫Dm

mi1(x)mi2(x)dx =(rm

2)d+∑d

j=1(γi1,j+γi2,j)d∏j=1

(1− (−1)1+γi1,j+γi2,j

)(γi1,j + γi2,j) + 1

.

Hence it follows that for all i1, i2 = 1, . . . , l the matrix K defined as

(172) K(i1, i2) =κi1,i2,m

(κi1,i1,m)1/2 (κi2,i2,m)1/2

is positive definite and does not depend on m. Let mi,m(x) = κ−1/2i,i,mr

d/2m mi(x). It follows that,∫

Dm

mi,m(x)mj,m(x)dx = rdmK(i, j)(173)

supx∈Dm|mi,m(x)| = κ

−1/2i,i,mr

d/2m supx∈Dm

|mi(x)|(174)

≤ r−d/2−

∑dj=1 γi,j

m rd/2m (rm2

)∑d

j=1 γi,j ≤ 1.

Let the orthogonal matrix P and the diagonal matrix Λ be such that P ′KP = Λ. Defining vi,m =∑lj=1 P (j, i)mj,m

one has, for some finite constant cv ∫Dm

vi,m(x)vj,m(x)dx = rdmΛ(i, j)(175)

supx∈Dm|vi,m(x)| ≤ cv.(176)

Thus (170) can be rewritten as

(177) uk,m =l∑i=1

Cm(k, i)vi,m

where Cm is a p× l matrix where l ≥ p. By (171), it follows that for all m

(178) I = CmΛC ′m,

where it is seen that the entries of the matrices Cm are uniformly bounded, for all m by some finite constant cλ. Itfollows that

(179) supx∈Dm|uk,m(x)| ≤

l∑i=1

|Cm(k, i)| supx∈Dm|vi,m(x)| ≤ lcvcλ ≤ c

26

for some finite c independent of m, which concludes the proof in the case u = (0, . . . , 0). Extending the result to thecase of generic u is obvious, since the coefficients |P (j, i)| and |Cm(k, i)| have been shown to be bounded uniformlyfor all m, while in a manner similar to (174) it is easily shown that that given the maximal order of monomials is L,the following bound holds

supx∈Dm|∂|u|

∂umi,m(x)| = κ

−1/2i,i,mr

d/2m supx∈Dm

|∂|u|

∂umi(x)| ≤ r

−d/2−∑d

j=1 γi,j

m rd/2m L|u|(rm2

)∑d

j=1(max{γi,j−u(j),0}) ≤ 1,

whence the result follows.

Lemma A.9 For some constants αi and αY , and the constant τ of Condition 3.3, the following bounds hold uniformlyon Dm ×Dm for m = 1, 2, . . .:

|φi,m − Ci| ≤ αir−d/2−τm(180)

|φY,m − CY | ≤ αY r−dm ,(181)

where φi,m is defined in (46) for domain Dm and φY,m is similarly defined.

Proof of Lemma A.9. Applying Lemma A.8 and Conditions 3.2 and 3.3 one has

|φi,m(ξ, η)− Ci(ξ, η)| ≤ p∑k=1

uk,m(ξ)∫D

uk,m(λ)Ci(ξ, λ)fm(λ)dλ

+ p∑k=1

uk,m(η)∫D

uk,m(λ)Ci(η, λ)fm(λ)dλ

+ p∑k1=1

p∑k2=1

uk1,m(ξ)uk2,m(η)

×∫D

∫D

uk1,m(λ1)uk2,m(λ2)Ci(λ1, λ2)fm(λ1)fm(λ2)dλ1dλ2

≤ 2pc2cir−d/2−τm + p2c4ciir

−d/2−τm = αir

−d/2−τm

and (180) is proved. The proof of (181) is similar.

Proof of Theorem 3.5. The proof consists of showing that the matrices Aε,m and Mε,m, defined in (80) and(81) with the domain D = Dm, suitably multipled by a diagonal matrix of constants, converge to the matrices Aεand Mε of (113) and (114) and to show that the diagonal elements of the matrix Bε,m, defined in (82) with D = Dm,are bounded. To show that, we start by showing that this is indeed the case when X = 0 before showing that thelimit is identical if X 6= 0.

We begin with the submatrix A of Aε (113) and the corresponding AX,m of Aε,m. Putting X = 0 in the definitionof AX,m and applying (95) one defines

(182) Am(i, j) = AX=0,m(i, j) =∫ diam(Dm)

0

Ci(ρ)Cj(ρ)dGm(ρ)

= dαGr−dm

∫ diam(Dm)

0

Ci(ρ)Cj(ρ)ρd−1dρ−∫ diam(Dm)

0

Ci(ρ)Cj(ρ)dRm(ρ)

where from Lemma 3.3∣∣∣∣∫ diam(Dm)

0

Ci(ρ)Cj(ρ)dRm(ρ)∣∣∣∣ ≤ cαG(d+ 1)r−d−1

m

∫ diam(Dm)

0

|Ci(ρ)Cj(ρ)|ρddρ.

Condition 3.3 implies that∫∞0|Ci(ρ)Cj(ρ)|ρd−1dρ <∞, and so Lemma A.7 implies

(183) r−1m

∫ rm diam(D)

0

|Ci(ρ)Cj(ρ)|ρddρ→ 0

27

as rm →∞, and the second term in (182) vanishes. Hence

limrm→∞

rdmAm(i, j) = dαG

∫ ∞

0

Ci(ρ)Cj(ρ)ρd−1dρ(184)

limrm→∞

rdmMm(i) = limrm→∞

rdmMX=0,m(i) = dαG

∫ ∞

0

Ci(ρ)CY (ρ)ρd−1dρ(185)

where the last limit is shown through a similar argument.To extend the results (184) and (185) to the case of X 6= 0, the following limits will be useful:

limrm→∞

rdm|AX,m(i, j)−Am(i, j)| = 0(186)

limrm→∞

rdm|MX,m(i)−Mm(i)| = 0.(187)

Let φi,m denote the function φi of (46) for the domain Dm. By Lemma A.9 one has

(188) |φi,m(ξ1, ξ2)φj,m(ξ2, ξ1)− Ci(ξ1, ξ2)Cj(ξ2, ξ1)| ≤ αir−d/2−τm |Cj(ξ1, ξ2)|+ αjr

−d/2−τm |Ci(ξ1, ξ2)|+ αiαjr

−d−2τm

uniformly on Dm ×Dm and hence (186) is proved as follows:

limrm→∞

rdm|AX,m(i, j)−Am(i, j)|

≤ limrm→∞

rdm

∫Dm

∫Dm

|φi,m(ξ1, ξ2)φj,m(ξ2, ξ1)

− Ci(ξ1, ξ2)Cj(ξ2, ξ1)|fm(ξ1)fm(ξ2)dξ1dξ2

≤ limrm→∞

rdm(αjr−d/2−τm ciir−d/2−τm + αir

−d/2−τm cjjr

−d/2−τm + αiαjr

−d−2τm ) = 0.

A similar argument proves (187). To bound |BX,m(i, j)|, three inequalities will be needed. Let ψi(x) = Ci(x, 0) andψY (x) = CY (x, 0). Then by the stationarity of the process and a change of variables(189)∫

Rd

|Ci(ξ, λ)CY (λ, η)|dλ =∫

Rd

|Ci(ξ−η−λ, 0)CY (λ, 0)|dλ =∫

Rd

|ψi(ξ−η−λ)ψY (λ)|dλ = (|ψi|∗|ψY |)(ξ−η) = ϑi(ξ−η)

where ϑi = |ψi| ∗ |ψY | denotes the convolution of the two functions. If ψi ∈ L2 and ψY ∈ L1 (and by its boundedness,also ψY ∈ L2) then the same applies to |ψi| and |ψY |. Firstly, the convolution integral converges since ψi and ψYare in L1. Let F(ψ) denote the Fourier transform of ψ. It follows that F(|ψi|) and F(|ψY |) exist and are all inL2. Moreover, since |ψY | ∈ L1, it follows, for some constant cFψY

, that |F(|ψY |)| ≤ cFψY< ∞. Furthermore by the

convolution theorem |F(|ψi| ∗ |ψY |)| = |F(|ψi|)F(|ψY |)| ≤ cFψY|F(|ψi|)| ∈ L2. Since for L2 functions the Fourier

transform is fully reciprocal, it follows that (|ψi| ∗ |ψY |)(ξ) = ϑi(ξ) ∈ L2. It is also clear that the ϑi are isotropicsince ψi and ψY are. Thus for some constant

∫Rd |ϑi(ξ)ϑj(ξ)|dξ ≤ cϑi,j

<∞ and so

(190)∫Dm

∫Dm

|ϑi(ξ − η)ϑj(ξ − η)|dξdη ≤ 12

∫Dm

(∫Dm

ϑ2i (ξ − η)dξ +

∫Dm

ϑ2j (ξ − η)dξ

)dη ≤ 1

2(cϑi,i + cϑj,j )r

dm,

which is the first of the three inequalities. Secondly, by lemma A.9, (102) of Condition 3.2 and (105) of Condition3.3

(191)∫Dm

|φi,m(ξ, λ)φY,m(λ, η)|dλ

≤∫Dm

(|Ci(ξ, λ) + αir−d/2−τm )(|CY (λ, η)|+ αY r

−dm )dλ

≤∫Dm

|Ci(ξ, λ)CY (λ, η)|dλ+ αir−d/2−τm

∫Dm

|CY (λ, η)|dλ

+ αY r−dm

∫Dm

|Ci(λ, η)|dλ+ αiαY r−d/2−τ−dm

∫Dm

dλ

≤∫Dm

|Ci(ξ, λ)CY (λ, η)|dλ+ cφ,ir−d/2−τm

28

for some constant cφ,i. Lastly,

(192)∫Dm

∫Dm

|ϑi(ξ − η)|dξdη

≤ r3d/2m

(∫Dm

∫Dm

ϑ2i (ξ − η)dξdη

)1/2(∫Dm

∫Dm

r−3dm dξdη

)1/2

≤ r3d/2m (cϑi,irdm)1/2(r−dm )1/2 = (cϑi,i

)1/2r3d/2m .

By (49), the preceding inequalities and by (190), |BX,m(i, j)| is bounded by

r−4dm

∫Dm

∫Dm

(∫Dm

|φi,m(ξ, λ)φY,m(λ, η)|dλ)

×(∫

Dm

|φj,m(ξ, λ)φY,m(λ, η)|dλ)dξdη

≤ r−4dm

∫Dm

∫Dm

(∫Dm

|Ci(ξ, λ)CY (λ, η)|dλ+ cφ,ir−d/2−τm

)(∫

Dm

|Cj(ξ, λ)CY (λ, η)|dλ+ cφ,jr−d/2−τm

)dξdη

≤ r−4dm

∫Dm

∫Dm

ϑi(ξ − η)ϑj(ξ − η)dξdη + r−4dm cφ,jr

−d/2−τm

∫Dm

∫Dm

ϑi(ξ − η)dξdη

+ r−4dm cφ,ir

−d/2−τm

∫Dm

∫Dm

ϑj(ξ − η)dξdη + cφ,icφ,jr−4dm r−d−2τ

m

∫Dm

∫Dm

dξdη

≤ 12(cϑi,i

+ cϑj,j)r−3dm + (c1/2ϑi,i

cφ,j + c1/2ϑj,j

cφ,i)r−3d−τm + cφ,icφ,jr

−3d−2τm

and hence

(193) r3dm |BX,m(i, j)| ≤ cB <∞ for all m.

Next, we consider the elements related to the presence of a nugget effect. Since Ci(ξ, ξ) = Ci(0) and by LemmaA.9

(194) limrm→∞

|am(i)− a(i)| = limrm→∞

∫Dm

(φi,m(ξ, ξ)− Ci(ξ, ξ))fm(ξ)dξ

≤ limrm→∞

∫Dm

|φi,m(ξ, ξ)− Ci(ξ, ξ)|fm(ξ)dξ ≤ limrm→∞

αi

∫Dm

r−d/2−τm r−dm dξ = 0

and similarly

(195) limrm→∞

|m0,m(i)−m0(i)| = 0.

Moreover,

(196) limrm→∞

rdm|b0,m| = limrm→∞

rdm

∫Dm

∫Dm

φY,m(ξ, η)φY,m(η, ξ)fm(ξ)fm(η)dξdη

≤ limrm→∞

rdm

∫Dm

∫Dm

(|CY (ξ, η)|+ αY r−dm )(|CY (η, ξ)|+ αY r

−dm )fm(ξ)fm(η)dξdη

= limrm→∞

rdm

(r−2dm

∫Dm

∫Dm

C2Y (ξ, η)dξdη + 2αY r−3d

m

∫Dm

∫Dm

|CY (ξ, η)|dξdη

+ α2Y r

−4dm

∫Dm

∫Dm

dξdη

)<∞.

The following holds:

(197) limrm→∞

rdm

∫Dm

∫Dm

h2X,j,m(ξ, η)dξdη ≤ lim

rm→∞r−dm

∫Dm

∫Dm

ϑ2j (ξ − η)dξdη + lim

rm→∞O(r−τm + r−2τ

m ) <∞

29

and thus using (196)

(198) limrm→∞

r2dm |bm(i)| = limrm→∞

r2dm

∫Dm

∫Dm

hX,i,m(ξ, η)φY,m(η, ξ)fm(ξ)fm(η)dξdη

≤ limrm→∞

(rdm

∫Dm

∫Dm

h2X,i,m(ξ, η)dξdη

)1/2(r−dm

∫Dm

∫Dm

φ2Y,m(η, ξ)dξdη

)1/2

<∞.

To complete the proof, we define

(199) Γm =[1 00 rdmIq

]and obtains

limrm→∞

ΓmAε,m = Aε (by (194), (186) and (184))(200)

limrm→∞

ΓmMε,m = Mε (by (195), (187) and (185))(201)

rdm ‖ (Γm diag(Bε,m)Γm)(i, j) ‖∞≤ Bε(i, j) <∞ (by (196), (198), (193))(202)

for some fixed matrix Bε and i, j = 1, . . . , q + 1. Hence

(203) limrm→∞

A−1ε,mMε,m = lim

rm→∞(ΓmAε,m)−1(ΓmMε,m) = A−1

ε Mε

which proves (116), while putting Eε(i) = (A−1ε diag(Bε)(A−1

ε )′)(i, i) yields(204)

limrm→∞

rdm(A−1ε,m diag(Bε,m)(A−1

ε,m)′)(i, i) = limrm→∞

((ΓmAε,m)−1(rdmΓm diag(Bε,m)Γm)((ΓmAε,m)′)−1)(i, i) ≤ Eε(i)

which proves (117) and completes the proof of Theorem 3.5.

Proof of Lemma 3.4. The lemma is a technical point which is established through a rather tedious check ofvarious bounds involved in the proofs of Lemmas 3.1 and 3.2 and Theorems 3.3 and 3.4. In particular

VX,m = O(rdm), where VX,m is the quantity in (129) for D = Dm(205)JX,m = O(1), where JX,m is the quantity in (130) for D = Dm(206)

Ruk,Ci,m = O(rdm), where Ruk,Ci,m is the quantity in (131) for D = Dm(207)Ruk,m = O(1), where Ruk,m is the quantity in (132) for D = Dm(208)

Vuk1 ,uk2 ,Ci,m = O(rdm), where Vuk1 ,uk2 ,Ci,m is the quantity in (133) for D = Dm(209)

Qφ,m = O(rdm), where Qφ,m is the quantity in (137) for D = Dm(210)Rφ,m = O(1), where Rφ,m is the quantity in (138) for D = Dm(211)

cA,X,m = O(r2dm ), where cA,X,m is the quantity in (139) for D = Dm(212)

cM,X,m = O(r2dm ), where cM,X,m is the quantity in (140) for D = Dm(213)

Rφi,φY ,m = O(rdm), where Rφi,φY ,m is the quantity in (142) for D = Dm(214)RhX,i,m = O(1), where RhX,i,m is the quantity in (143) for D = Dm(215)

V ∗Dm×Dm(hX,ihX,j) = O(r2dm ),(216)

where V ∗Dm×Dm(hX,ihX,j) is the quantity in (145) for D = Dm

cB,X,m = O(r2dm ), where cB,X,m is the quantity in (145) for D = Dm(217)

JA−1X ,m = O(rdm), where JA−1

X ,m is the quantity in (146) for D = Dm(218)

‖ diag(BX,m) ‖= O(r3dm )(219)

‖ A−1X,m ‖= O(rdm)(220)

cE,X,m = O(r8dm ), where cE,X,m is the quantity in (146) for D = Dm(221)

30

cθ,X,m = O(r4dm ), where cθ,X,m is the quantity in (147) for D = Dm(222)

V ∗Dm(ψY ) = O(rdm),(223)

where V ∗Dm(ψY ) is the quantity in (153) for D = Dm

V ∗Dm×Dm(φ2Y ) = O(r2dm ),(224)

where V ∗Dm×Dm(φ2Y ) is the quantity in (156) for D = Dm

V ∗Dm×Dm(φY hX,j) = O(r2dm ),(225)

where V ∗Dm×Dm(φY hX,j) is the quantity in (159) for D = Dm

cbj ,m = O(r2dm ), where cbj ,m is the quantity in (159) for D = Dm(226)

JA−1ε ,m = O(rdm), where JA−1

ε ,m is the quantity in (166) for D = Dm(227)

JMε,m = O(1), where JMε,m is the quantity in (168) for D = Dm(228)cε,m = O(1), where cε,m is the quantity in (155) for D = Dm(229)

The proof of (205) follows from Condition 3.5, (206) follows from Lemma A.5, (207) follows from the uniformboundedness of partials of Ci of orders not exceeding d and Condition 3.5, (208) is given by Condition 3.5, (209)follows from the uniform boundedness of all partials of Ci of orders not exceeding 2d and (205), (210) is a consequenceof the preceding bounds, (211) is guaranteed by Lemma A.9, (212), (213) and (214) follow from (211) and the uniformboundedness of the partials of φY and φi, i = 1, . . . , q of orders not exceeding d, (215) is a result of (211) and thedefinition of hX,i, (216) is established by showing the uniform boundedness of the partials of hX,i of orders notexceeding 2d, (217) follows from earlier bounds and (145), (218) follows from Lemma A.5, (184) and (186), (219)follows from (200), (220) follows from (193), (221) follows from (146) and earlier bounds, (222) follows from (147)and earlier bounds, (223) and (224) follow from the uniform boundedness of all partials of φY of orders not exceeding2d, (225) follows from the uniform boundedness of all partials of φY and hX,j of orders not exceeding 2d, (226) isobtained from (159) and earlier bounds, (227) is obtained from Lemma A.5 and (200), (228) follows from (201) and(229) follows from the stationarity of η.

Finally, the lemma follows from (167), (165) and earlier bounds.

Proof of Theorem 3.6. Let α > 0 and ε > 0. For i = 1, . . . , q

pα = P(|θm,n(m)(i)− θ(i)| > α

)≤ P

(|θm,n(m)(i)− θm(i)|+ |θm(i)− θ(i)| > α

)for m large enough, one has |θm(i)− θ(i)| ≤ α/2 and hence

pα ≤ P(|θm,n(m)(i)− θm(i)| > α

2

)≤ P

(|θm,n(m)(i)− E[θm,n(m)(i)]|+ |E[θm,n(m)(i)]− θm(i)| > α

2

).

By (88) and the condition (120), one has |E[θm,n(m)(i)] − θm(i)| ≤ cθ,ε,mδn(m) ≤ α/4 for m sufficiently large andhence by Chebyshev’s inequality

pα ≤ P(|θm,n(m)(i)− E[θm,n(m)(i)]| >

α

4

)≤ 16α2

var(θm,n(m)(i))

≤ 16α2

(q + 1)c (Eε,m(i+ 1, i+ 1) + cE,ε,mδn(m))

≤ 16α2

(q + 1)c (r−dm Eε(i+ 1, i+ 1) + cE,ε,mδn(m))

where (90) from Theorem 3.4 and (117) from Theorem 3.5 are used. Hence the result follows by the condition (121).The proof for γm,n(m) follows the same argument.

Acknowledgments.

This paper is based on the thesis of the first author under the co-supervision of the second author and of Marc Mooreand Denis Marcotte of the Ecole Polytechnique de Montreal. Both authors wish to thank them for their help withthe thesis.

31

References

Atkinson, K. & Han, W. (2001). A Functional Analysis Framework, vol. 39 of Texts in Applied Mathematics. NewYork: Springer.

Bartlett, M. S. (1964). Spectral analysis of two-dimensional point processes. Biometrika 44, 299–311.

Cressie, N. A. C. (1993). Statistics for Spatial Data. New York: John Wiley & Sons, revised ed.

Cressie, N. A. C. & Grondona, M. O. (1992). A comparison of variogram estimation with covariogram estimation.In The Art of Statistical Science, K. V. Mardia, ed. Chichester: Wiley.

Davis, J. C. (1973). Statistics and Data Analysis in Geology. New York: John Wiley & Sons.

Diggle, P. J. (1983). Statistical Analysis of Spatial Point Patterns. New York: Academic Press.

Fang, K. T. & Wang, Y. (1994). Number-theoretic Methods in Statistics. London: Chapman & Hall.

Hall, P. & Patil, P. (1994). Properties of nonparametric estimators of autocovariance for stationary random field.Probability Theory and Related Fields 99, 399–424.

Kitanidis, P. K. (1985). Minimum variance unbiased quadratic estimation of covariances of regionalized variables.Journal of the International Association for Mathematical Geology 17, 195–208.

Korobov, N. M. (1959). The approximate computation of multiple integrals. Dokl. Akad. Nauk. SSSR 124,1207–1210.

Lahiri, S. N. (1996). On inconsistency of estimators under infill asymptotics for spatial data. Sankhya 58, 403–417.

Lahiri, S. N., Kaiser, M. S., Cressie, N. & Hsu, N. (1999). Prediction of spatial cumulative distributionfunctions using subsampling. Journal of the American Statistical Association 94, 86–97.

Matheron, G. (1965). Les variables regionalisees et leur estimation. Paris: Masson et Cie.

Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. Philadelphia: SIAM.

Niederreiter, H. & Spanier, J., eds. (2000). Monte Carlo and Quasi-Monte Carlo Methods 1998. New York:Springer.

Powojowski, M. (2000). Sur la modelisation et l’estimation de la fonction de covariance d’un processus aleatoire.Ph.D. thesis, University of Montreal.

Powojowski, M. (2002a). Model selection in covariogram estimation. Tech. rep., In preparation.

Powojowski, M. (2002b). Spectral component additive models of the covariogram. Tech. rep., In preparation.

Rao, C. R. & Kleffe, J. (1988). Estimation of Variance Components and Applications. Amsterdam: North-Holland.

Ripley, B. D. (1988). Statistical Inference for Spatial Processes. Cambridge: Cambridge University Press.

Shapiro, A. & Botha, J. D. (1991). Variogram fitting with a general class of conditionally nonnegative definitefunctions. Computational Statistics and Data Analysis 11, 87–96.

Stein, M. L. (1987). Minimum norm quadratic estimation of spatial variograms. Journal of the American StatisticalAssociation 82, 765–772.

Stein, M. L. (1989). Asymptotic distribution of minimum norm quadratic estimators of the covariance function ofa gaussian random field. The Annals of Statistics 17, 980–1000.

Stein, M. L. & Handcock, M. S. (1989). Some asymptotic properties of kriging when thecovariance function ismisspecified. Journal of the International Association for Mathematical Geology 21, 839–861.

Wackernagel, H. (1995). Multivariate Geostatistics. New York: Springer-Verlag.

32

Additive Covariogram Models and Estimation through Projections · 2002. 10. 10. · the variogram...

Documents

Transcript of Additive Covariogram Models and Estimation through Projections · 2002. 10. 10. · the variogram...