2017 - data.math.au.dk

31
www.csgb.dk RESEARCH REPORT 2017 CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING Abdollah Jalilian, Yongtao Guan and Rasmus Waagepetersen Orthogonal series estimation of the pair correlation function of a spatial point process No. 01, March 2017

Transcript of 2017 - data.math.au.dk

Page 1: 2017 - data.math.au.dk

www.csgb.dk

RESEARCH REPORT 2017

CENTRE FOR STOCHASTIC GEOMETRYAND ADVANCED BIOIMAGING

Abdollah Jalilian, Yongtao Guan and Rasmus Waagepetersen

Orthogonal series estimation of the pair correlation functionof a spatial point process

No. 01, March 2017

Page 2: 2017 - data.math.au.dk

Orthogonal series estimation of the paircorrelation function of a spatial point process

Abdollah Jalilian1, Yongtao Guan2 and Rasmus Waagepetersen3

1Department of Statistics, Razi University, Iran, [email protected] of Management Science, University of Miami, [email protected]

3Department of Mathematical Sciences, Aalborg University, Denmark, [email protected]

Abstract

The pair correlation function is a fundamental spatial point process character-istic that, given the intensity function, determines second order moments ofthe point process. Non-parametric estimation of the pair correlation functionis a typical initial step of a statistical analysis of a spatial point pattern. Kernelestimators are popular but especially for clustered point patterns suffer frombias for small spatial lags. In this paper we introduce a new orthogonal seriesestimator. The new estimator is consistent and asymptotically normal accord-ing to our theoretical and simulation results. Our simulations further showthat the new estimator can outperform the kernel estimators in particular forPoisson and clustered point processes.

Keywords: Asymptotic normality; Consistency; Kernel estimator; Orthogonalseries estimator; Pair correlation function; Spatial point process.

1 Introduction

The pair correlation function is commonly considered the most informative second-order summary statistic of a spatial point process (Stoyan and Stoyan, 1994; Møllerand Waagepetersen, 2003; Illian et al., 2008). Non-parametric estimates of the paircorrelation function are useful for assessing regularity or clustering of a spatial pointpattern and can moreover be used for inferring parametric models for spatial pointprocesses via minimum contrast estimation (Stoyan and Stoyan, 1996; Illian et al.,2008). Although alternatives exist (Yue and Loh, 2013), kernel estimation is the byfar most popular approach (Stoyan and Stoyan, 1994; Møller and Waagepetersen,2003; Illian et al., 2008) which is closely related to kernel estimation of probabilitydensities.

Kernel estimation is computationally fast and works well except at small spatiallags. For spatial lags close to zero, kernel estimators suffer from strong bias, seee.g. the discussion at page 186 in Stoyan and Stoyan (1994), Example 4.7 in Møllerand Waagepetersen (2003) and Section 7.6.2 in Baddeley et al. (2015). The bias

1

Page 3: 2017 - data.math.au.dk

is a major drawback if one attempts to infer a parametric model from the non-parametric estimate since the behavior near zero is important for determining theright parametric model (Jalilian et al., 2013).

In this paper we adapt orthogonal series density estimators (see e.g. the reviewsin Hall, 1987; Efromovich, 2010) to the estimation of the pair correlation function.We derive unbiased estimators of the coefficients in an orthogonal series expansion ofthe pair correlation function and propose a criterion for choosing a certain optimalsmoothing scheme. In the literature on orthogonal series estimation of probabilitydensities, the data are usually assumed to consist of indendent observations from theunknown target density. In our case the situation is more complicated as the dataused for estimation consist of spatial lags between observed pairs of points. Theselags are neither independent nor identically distributed and the sample of lags isbiased due to edge effects. We establish consistency and asymptotic normality ofour new orthogonal series estimator and study its performance in a simulation studyand an application to a tropical rain forest data set.

2 Background

2.1 Spatial point processes

We denote by X a point process on Rd, d ≥ 1, that is, X is a locally finite randomsubset of Rd. For B ⊆ Rd, we let N(B) denote the random number of points inX ∩B. That X is locally finite means that N(B) is finite almost surely whenever Bis bounded. We assume that X has an intensity function ρ and a second-order jointintensity ρ(2) so that for bounded A,B ⊂ Rd,

E{N(B)} =

B

ρ(u)du,

E{N(A)N(B)} =

A∩Bρ(u)du+

A

B

ρ(2)(u, v)dudv.

(2.1)

The pair correlation function g is defined as g(u, v) = ρ(2)(u, v)/{ρ(u)ρ(v)} wheneverρ(u)ρ(v) > 0 (otherwise we define g(u, v) = 0). By (2.1),

Cov{N(A), N(B)} =

A∩Bρ(u)du+

A

B

ρ(u)ρ(v){g(v, u)− 1

}dudv

for bounded A,B ⊂ Rd. Hence, given the intensity function, g determines the covari-ances of count variablesN(A) andN(B). Further, for locations u, v ∈ Rd, g(u, v) > 1(< 1) implies that the presence of a point at v yields an elevated (decreased) proba-bility of observing yet another point in a small neighbourhood of u (e.g. Coeurjollyet al., 2016). In this paper we assume that g is isotropic, i.e. with an abuse of no-tation, g(u, v) = g(‖v − u‖). Examples of pair correlation functions are shown inFigure 6.1.

2

Page 4: 2017 - data.math.au.dk

2.2 Kernel estimation of the pair correlation function

Suppose X is observed within a bounded observation window W ⊂ Rd and letXW = X ∩ W . Let kb(·) be a kernel of the form kb(r) = k(r/b)/b, where k is aprobability density and b > 0 is the bandwidth. Then a kernel density estimator(Stoyan and Stoyan, 1994; Baddeley et al., 2000) of g is

gk(r; b) =1

sadrd−1

6=∑

u,v∈XW

kb(r − ‖v − u‖)ρ(u)ρ(v)|W ∩Wv−u|

, r ≥ 0,

where sad is the surface area of the unit sphere in Rd,∑6= denotes sum over all

distinct points, 1/|W ∩Wh|, h ∈ Rd, is the translation edge correction factor withWh = {u − h : u ∈ W}, and |A| is the volume (Lebesgue measure) of A ⊂ Rd.Variations of this include (Guan, 2007a)

gd(r; b) =1

sad

6=∑

u,v∈XW

kb(r − ‖v − u‖)‖v − u‖d−1ρ(u)ρ(v)|W ∩Wv−u|

, r ≥ 0

and the bias corrected estimator (Guan, 2007a)

gc(r; b) = gd(r; b)/c(r; b), c(r; b) =

∫ min{r,b}

−bkb(t)dt,

assuming k has bounded support [−1, 1]. Regarding the choice of kernel, Illianet al. (2008), p. 230, recommend to use the uniform kernel k(r) = 1(|r| ≤ 1)/2,where 1( · ) denotes the indicator function, but the Epanechnikov kernel k(r) =(3/4)(1 − r2)1(|r| ≤ 1) is another common choice. The choice of the bandwidth bhighly affects the bias and variance of the kernel estimator. In the planar (d = 2)stationary case, Illian et al. (2008), p. 236, recommend b = 0.10/

√ρ based on prac-

tical experience where ρ is an estimate of the constant intensity. The default inspatstat (Baddeley et al., 2015), following Stoyan and Stoyan (1994), is to use theEpanechnikov kernel with b = 0.15/

√ρ.

Guan (2007b) and Guan (2007a) suggest to choose b by composite likelihoodcross validation or by minimizing an estimate of the mean integrated squared errordefined over some interval I as

mise(gm, w) = sad∫

I

E{gm(r; b)− g(r)

}2w(r − rmin)dr, (2.2)

where gm, m = k, d, c, is one of the aforementioned kernel estimators, w ≥ 0 is aweight function and rmin ≥ 0. With I = (0, R), w(r) = rd−1 and rmin = 0, Guan(2007a) suggests to estimate the mean integrated squared error by

M(b) = sad∫ R

0

{gm(r; b)

}2rd−1dr − 2

6=∑

u,v∈XW‖v−u‖≤R

g−{u,v}m (‖v − u‖; b)

ρ(u)ρ(v)|W ∩Wv−u|, (2.3)

where g−{u,v}m , m = k, d, c, is defined as gm but based on (X \ {u, v}) ∩W . Loh andJang (2010) instead use a spatial bootstrap for estimating (2.2). We return to (2.3)in Section 5.

3

Page 5: 2017 - data.math.au.dk

3 Orthogonal series estimation

3.1 The new estimator

For an R > 0, the new orthogonal series estimator of g(r), 0 ≤ rmin < r < rmin +R,is based on an orthogonal series expansion of g(r) on (rmin, rmin +R) :

g(r) =∞∑

k=1

θkφk(r − rmin), (3.1)

where {φk}k≥1 is an orthonormal basis of functions on (0, R) with respect to someweight function w(r) ≥ 0, r ∈ (0, R). That is,

∫ R0φk(r)φl(r)w(r)dr = 1(k = l) and

the coefficients in the expansion are given by θk =∫ R

0g(r + rmin)φk(r)w(r)dr.

For the cosine basis, w(r) = 1 and

φ1(r) = 1/√R, φk(r) = (2/R)1/2 cos{(k − 1)πr/R}, k ≥ 2.

Another example is the Fourier-Bessel basis with w(r) = rd−1 and

φk(r) = 21/2Jν (rαν,k/R) r−ν/{RJν+1(αν,k)}, k ≥ 1,

where ν = (d − 2)/2, Jν is the Bessel function of the first kind of order ν, and{αν,k}∞k=1 is the sequence of successive positive roots of Jν(r).

An estimator of g is obtained by replacing the θk in (3.1) by unbiased estimatorsand truncating or smoothing the infinite sum. A similar approach has a long historyin the context of non-parametric estimation of probability densities, see e.g. thereview in Efromovich (2010). For θk we propose the estimator

θk =1

sad

6=∑

u,v∈XWrmin<‖u−v‖<rmin+R

φk(‖v − u‖ − rmin)w(‖v − u‖ − rmin)

ρ(u)ρ(v)‖v − u‖d−1|W ∩Wv−u|, (3.2)

which is unbiased by the second order Campbell formula, see Section B of the sup-plementary material. This type of estimator has some similarity to the coefficientestimators used for probability density estimation but is based on spatial lags v− uwhich are not independent nor identically distributed. Moreover the estimator isadjusted for the possibly inhomogeneous intensity ρ and corrected for edge effects.

The orthogonal series estimator is finally of the form

go(r; b) =∞∑

k=1

bkθkφk(r − rmin), (3.3)

where b = {bk}∞k=1 is a smoothing/truncation scheme. The simplest smoothingscheme is bk = 1[k ≤ K] for some cut-off K ≥ 1. Section 3.3 considers severalother smoothing schemes.

4

Page 6: 2017 - data.math.au.dk

3.2 Variance of θkThe factor ‖v − u‖d−1 in (3.2) may cause problems when d > 1 where the presenceof two very close points in XW could imply division by a quantity close to zero. Theexpression for the variance of θk given in Section B of the supplementary materialindeed shows that the variance is not finite unless g(r)w(r − rmin)/rd−1 is boundedfor rmin < r < rmin +R. If rmin > 0 this is always satisfied for bounded g. If rmin = 0the condition is still satisfied in case of the Fourier-Bessel basis and bounded g.

For the cosine basis w(r) = 1 so if rmin = 0 we need the boundedness of g(r)/rd−1.If X satisfies a hard core condition (i.e. two points in X cannot be closer than someδ > 0), this is trivially satisfied. Another example is a determinantal point process(Lavancier et al., 2015) for which g(r) = 1− c(r)2 for a correlation function c. Theboundedness is then e.g. satisfied if c(·) is the Gaussian (d ≤ 3) or exponential(d ≤ 2) correlation function. In practice, when using the cosine basis, we take rmin

to be a small positive number to avoid issues with infinite variances.

3.3 Mean integrated squared error and smoothing schemes

The orthogonal series estimator (3.3) has the mean integrated squared error

mise(go, w

)= sad

∫ rmin+R

rmin

E{go(r; b)− g(r)

}2w(r − rmin)dr

= sad∞∑

k=1

E(bkθk − θk)2 = sad∞∑

k=1

[b2k E{(θk)2} − 2bkθ

2k + θ2

k

]. (3.4)

Each term in (3.4) is minimized with bk equal to (cf. Hall, 1987)

b∗k =θ2k

E{(θk)2}=

θ2k

θ2k + Var(θk)

, k ≥ 0, (3.5)

leading to the minimal value sad∑∞

k=1 b∗k Var(θk) of the mean integrated square error.

Unfortunately, the b∗k are unknown.In practice we consider a parametric class of smoothing schemes b(ψ). For prac-

tical reasons we need a finite sum in (3.3) so one component in ψ will be a cut-off index K so that bk(ψ) = 0 when k > K. The simplest smoothing schemeis bk(ψ) = 1(k ≤ K). A more refined scheme is bk(ψ) = 1(k ≤ K)b∗k whereb∗k = θ2

k/(θk)2 is an estimate of the optimal smoothing coefficient b∗k given in (3.5).

Here θ2k is an asymptotically unbiased estimator of θ2

k derived in Section 5. For thesetwo smoothing schemes ψ = K. Adapting the scheme suggested by Wahba (1981),we also consider ψ = (K, c1, c2), c1 > 0, c2 > 1, and bk(ψ) = 1(k ≤ K)/(1 + c1k

c2).In practice we choose the smoothing parameter ψ by minimizing an estimate of themean integrated squared error, see Section 5.

3.4 Expansion of g( · )− 1

For large R, g(rmin + R) is typically close to one. However, for the Fourier-Besselbasis, φk(R) = 0 for all k ≥ 1 which implies go(rmin + R) = 0. Hence the estimator

5

Page 7: 2017 - data.math.au.dk

cannot be consistent for r = rmin + R and the convergence of the estimator forr ∈ (rmin, rmin + R) can be quite slow as the number of terms K in the estimatorincreases. In practice we obtain quicker convergence by applying the Fourier-Besselexpansion to g(r)−1 =

∑k≥1 ϑkφk(r−rmin) so that the estimator becomes go(r; b) =

1 +∑∞

k=1 bkϑkφk(r − rmin) where ϑk = θk −∫ rmin+R

0φk(r)w(r)dr is an estimator of

ϑk =∫ R

0{g(r + rmin) − 1}φk(r)w(r)dr. Note that Var(ϑk) = Var(θk) and go(r; b) −

E{go(r; b)} = go(r; b)−E{go(r; b)}. These identities imply that the results regardingconsistency and asymptotic normality established for go(r; b) in Section 4 are alsovalid for go(r; b).

4 Consistency and asymptotic normality

4.1 Setting

To obtain asymptotic results we assume that X is observed through an increasingsequence of observation windows Wn. For ease of presentation we assume squareobservation windows Wn = ×di=1[−nai, nai] for some ai > 0, i = 1, . . . , d. Moregeneral sequences of windows can be used at the expense of more notation andassumptions. We also consider an associated sequence ψn, n ≥ 1, of smoothingparameters satisfying conditions to be detailed in the following. We let θk,n and go,ndenote the estimators of θk and g obtained from X observed on Wn. Thus

θk,n =1

sad|Wn|

6=∑

u,v∈XWnv−u∈BRrmin

φk(‖v − u‖ − rmin)w(‖v − u‖ − rmin)

ρ(u)ρ(v)‖v − u‖d−1en(v − u)),

where

BRrmin

= {h ∈ Rd | rmin < ‖h‖ < rmin+R} and en(h) = |Wn∩(Wn)h|/|Wn|. (4.1)

Further,

go,n(r; b) =Kn∑

k=1

bk(ψn)θk,nφk(r − rmin)

=1

sad|Wn|

6=∑

u,v∈XWnv−u∈BRrmin

w(‖v − u‖)ϕn(v − u, r)ρ(u)ρ(v)‖v − u‖d−1en(v − u)| ,

where

ϕn(h, r) =Kn∑

k=1

bk(ψn)φk(‖h‖ − rmin)φk(r − rmin). (4.2)

In the results below we refer to higher order normalized joint intensities g(k) of X.Define the k’th order joint intensity of X by the identity

E{ 6=∑

u1,...,uk∈X1(u1 ∈ A1, . . . , uk ∈ Ak)

}=

A1×···×Akρ(k)(v1, . . . , vk)dv1 · · · dvk

6

Page 8: 2017 - data.math.au.dk

for bounded subsets Ai ⊂ Rd, i = 1, . . . , k, where the sum is over distinct u1, . . . , uk.We then let g(k)(v1, . . . , vk) = ρ(k)(v1, . . . , vk)/{ρ(v1) · · · ρ(vk)} and assume with anabuse of notation that the g(k) are translation invariant for k = 3, 4, i.e.

g(k)(v1, . . . , vk) = g(k)(v2 − v1, . . . , vk − v1).

4.2 Consistency of orthogonal series estimator

Consistency of the orthogonal series estimator can be established under fairly mildconditions following the approach in Hall (1987). We first state some conditions thatensure (see Section B of the supplementary material) that Var(θk,n) ≤ C1/|Wn| forsome 0 < C1 <∞:

V1: There exists 0 < ρmin < ρmax <∞ such that for all u ∈ Rd, ρmin ≤ ρ(u) ≤ ρmax.

V2: For any h, h1, h2 ∈ BRrmin

, g(h)w(‖h‖ − rmin) ≤ C2‖h‖d−1 and g(3)(h1, h2) ≤ C3

for constants C2, C3 <∞.

V3: A constant C4 < ∞ can be found such that suph1,h2∈BRrmin

∫Rd∣∣g(4)(h1, h3,

h2 + h3)− g(h1)g(h2)∣∣dh3 ≤ C4.

The first part of V2 is needed to ensure finite variances of the θk,n and is discussedin detail in Section 3.2. The second part simply requires that g(3) is bounded. Thecondition V3 is a weak dependence condition which is also used for asymptoticnormality in Section 4.3 and for estimation of θ2

k in Section 5.Regarding the smoothing scheme, we assume

S1: B = supk,ψ |bk(ψ)| <∞ and for all ψ,∑∞

k=1

∣∣bk(ψ)∣∣ <∞.

S2: ψn → ψ∗ for some ψ∗, and limψ→ψ∗ max1≤k≤m |bk(ψ)− 1| = 0 for all m ≥ 1.

S3: |Wn|−1∑∞

k=1 |bk(ψn)| → 0.

E.g. for the simplest smoothing scheme, ψn = Kn, ψ∗ = ∞ and we assume thatKn/|Wn| tends to zero.

Assuming the above conditions we now verify that the mean integrated squarederror of go,n tends to zero as n→∞. By (3.4),

mise(go,n, w

)/sad =

∞∑

k=1

[bk(ψn)2 Var(θk) + θ2

k{bk(ψn)− 1}2].

By V1–V3 and S1 the right hand side is bounded by

BC1|Wn|−1

∞∑

k=1

|bk(ψn)|+ max1≤k≤m

θ2k

m∑

k=1

(bk(ψn)− 1)2 + (B2 + 1)∞∑

k=m+1

θ2k.

By Parseval’s identity,∑∞

k=1 θ2k < ∞. The last term can thus be made arbitrarily

small by choosing m large enough. It also follows that θ2k tends to zero as k → ∞.

Hence, by S2, the middle term can be made arbitrarily small by choosing n largeenough for any choice of m. Finally, the first term can be made arbitrarily small byS3 and choosing n large enough.

7

Page 9: 2017 - data.math.au.dk

4.3 Asymptotic normality

The estimators θk,n as well as the estimator go,n(r; b) are of the form

Sn =1

sad|Wn|

6=∑

u,v∈XWnv−u∈BRrmin

fn(v − u)

ρ(u)ρ(v)en(v − u)(4.3)

for a sequence of even functions fn : Rd → R. We let τ 2n = |Wn|Var(Sn).

To establish asymptotic normality of estimators of the form (4.3) we need certainmixing properties for X as in Waagepetersen and Guan (2009). The strong mixingcoefficient for the point process X on Rd is given by (Ivanoff, 1982; Politis et al.,1998)

αX(m; a1, a2)

= sup{∣∣pr(E1 ∩ E2)− pr(E1)pr(E2)

∣∣ : E1 ∈ FX(B1), E2 ∈ FX(B2),

|B1| ≤ a1, |B2| ≤ a2,D(B1, B2) ≥ m,B1, B2 ∈ B(Rd)},

where B(Rd) denotes the Borel σ-field on Rd, FX(Bi) is the σ-field generated byX ∩Bi and

D(B1, B2) = inf{

max1≤i≤d

|ui − vi| : u = (u1, . . . , ud) ∈ B1, v = (v1, . . . , vd) ∈ B2

}.

To verify asymptotic normality we need the following assumptions as well as V1(the conditions V2 and V3 are not needed due to conditions N2 and N4 below):

N1: The mixing coefficient satisfies αX(m; (s + 2R)d,∞) = O(m−d−ε) for somes, ε > 0.

N2: There exists a η > 0 and L1 < ∞ such that g(k)(h1, . . . , hk−1) ≤ L1 fork = 2, . . . , 2(2 + dηe) and all h1, . . . , hk−1 ∈ Rd.

N3: lim infn→∞ τ 2n > 0.

N4: There exists L2 <∞ so that |fn(h)| ≤ L2 for all n ≥ 1 and h ∈ BRrmin

.

The conditions N1–N3 are standard in the point process literature, see e.g. thediscussions in Waagepetersen and Guan (2009) and Coeurjolly and Møller (2014).The condition N3 is difficult to verify and is usually left as an assumption, seeWaagepetersen and Guan (2009), Coeurjolly and Møller (2014) and Dvořák andProkešová (2016). However, at least in the stationary case, and in case of estimationof θk,n, the expression for Var(θk,n) in Section B of the supplementary material showsthat τ 2

n = |Wn|Var(θk,n) converges to a constant which supports the plausibility ofcondition N3. We discuss N4 in further detail below when applying the generalframework to θk,n and go,n. The following theorem is proved in Section C of thesupplementary material.

Theorem 4.1. Under conditions V1, N1–N4, τ−1n |Wn|1/2

{Sn−E(Sn)

} D−→ N(0, 1).

8

Page 10: 2017 - data.math.au.dk

4.4 Application to θk,n and go,n

In case of estimation of θk, θk,n = Sn with fn(h) = φk(‖h‖−rmin)w(‖h‖−rmin)/‖h‖d−1.The assumption N4 is then straightforwardly seen to hold in the case of the Fourier-Bessel basis where |φk(r)| ≤ |φk(0)| and w(r) = rd−1. For the cosine basis, N4 doesnot hold in general and further assumptions are needed, cf. the discussion in Sec-tion 3.2. For simplicity we here just assume rmin > 0. Thus we state the following

Corollary 1. Assume V1, N1–N4, and, in case of the cosine basis, that rmin > 0.Then

{Var(θk,n)}−1/2(θk,n − θk) D−→ N(0, 1).

For go,n(r; b) = Sn,

fn(h) =ϕn(h, r)w(‖h‖ − rmin)

‖h‖d−1

=w(‖h‖ − rmin)

‖h‖d−1

Kn∑

k=1

bk(ψn)φk(‖h‖ − rmin)φk(r − rmin),

where ϕn is defined in (4.2). In this case, fn is typically not uniformly bounded sincethe number of not necessarily decreasing terms in the sum defining ϕn in (4.2) growswith n. We therefore introduce one more condition:

N5: There exist an ω > 0 and Mω <∞ so that

K−ωn

Kn∑

k=1

bk(ψn)∣∣φk(r − rmin)φk(‖h‖ − rmin)

∣∣ ≤Mω

for all h ∈ BRrmin

.

Given N5, we can simply rescale: Sn := K−ωn Sn and τ 2n := K−2ω

n τ 2n. Then, assum-

ing lim infn→∞ τ 2n > 0, Theorem 4.1 gives the asymptotic normality of τ−1

n |Wn|1/2 ·{Sn − E(Sn)} which is equal to τ−1

n |Wn|1/2{Sn − E(Sn)}. Hence we obtain

Corollary 2. Assume V1, N1–N2, N5 and lim infn→∞K−2ωn τ 2

n > 0. In case of thecosine basis, assume further rmin > 0. Then for r ∈ (rmin, rmin +R),

τ−1n |Wn|1/2

[go,n(r; b)− E{go,n(r; b)}

] D−→ N(0, 1).

In case of the simple smoothing scheme bk(ψn) = 1(k ≤ Kn), we take ω = 1for the cosine basis. For the Fourier-Bessel basis we take ω = 4/3 when d = 1 andω = d/2 + 2/3 when d > 1 (see the derivations in Section F of the supplementarymaterial).

9

Page 11: 2017 - data.math.au.dk

5 Tuning the smoothing scheme

In practice we choose K, and other parameters in the smoothing scheme b(ψ), byminimizing an estimate of the mean integrated squared error. This is equivalent tominimizing

sadI(ψ) = mise(go, w)−∫ rmin+R

rmin

{g(r)− 1

}2w(r)dr

=K∑

k=1

[bk(ψ)2 E{(θk)2} − 2bk(ψ)θ2

k

].

(5.1)

In practice we must replace (5.1) by an estimate. Define θ2k as

6=∑

u,v,u′,v′∈XWv−u,v′−u′∈BRrmin

φk(‖v − u‖ − rmin)φk(‖v′ − u′‖ − rmin)· w(‖v − u‖ − rmin)w(‖v′ − u′‖ − rmin)

sad2ρ(u)ρ(v)ρ(u′)ρ(v′)‖v − u‖d−1‖v′ − u′‖d−1

· |W ∩Wv−u||W ∩Wv′−u′|.

Then, referring to the set-up in Section 4 and assuming V3,

limn→∞

E(θ2k,n)→

{∫ R

0

g(r + rmin)φk(r)w(r)dr

}2

= θ2k

(see Section D of the supplementary material) and hence θ2k,n is an asymptotically

unbiased estimator of θ2k. The estimator is obtained from (θk)

2 by retaining onlyterms where all four points u, v, u′, v′ involved are distinct. In simulation studies, θ2

k

had a smaller root mean squared error than (θk)2 for estimation of θ2

k.Thus

I(ψ) =K∑

k=1

{bk(ψ)2(θk)

2 − 2bk(ψ)θ2k

}(5.2)

is an asymptotically unbiased estimator of (5.1). Moreover, (5.2) is equivalent to thefollowing slight modification of Guan (2007a)’s criterion (2.3):

∫ rmin+R

rmin

{go(r; b)

}2w(r − rmin)dr

− 2

sad

6=∑

u,v∈XWv−u∈BRrmin

g−{u,v}o (‖v − u‖; b)w(‖v − u‖ − rmin)

ρ(u)ρ(v)|W ∩Wv−u|.

For the simple smoothing scheme bk(K) = 1(k ≤ K), (5.2) reduces to

I(K) =K∑

k=1

{(θk)

2 − 2θ2k

}=

K∑

k=1

(θk)2(1− 2b∗k), (5.3)

where b∗k = θ2k/(θk)

2 is an estimator of b∗k in (3.5).

10

Page 12: 2017 - data.math.au.dk

In practice, uncertainties of θk and θ2k lead to numerical instabilities in the min-

imization of (5.2) with respect to ψ. To obtain a numerically stable procedure wefirst determine K as

K = inf{2 ≤ k ≤ Kmax : (θk+1)2 − 2θ2k+1 > 0}

= inf{2 ≤ k ≤ Kmax : b∗k+1 < 1/2}.(5.4)

That is, K is the first local minimum of (5.3) larger than 1 and smaller than an upperlimit Kmax which we chose to be 49 in the applications. This choice of K is also usedfor the refined and the Wahba smoothing schemes. For the refined smoothing schemewe thus let bk = 1(k ≤ K)b∗k. For the Wahba smoothing scheme bk = 1(k ≤ K)/

(1 + c1kc2), where c1 and c2 minimize

∑Kk=1{(θk)2/(1 + c1k

c2)2 − 2θ2k/(1 + c1k

c2)}over c1 > 0 and c2 > 1.

6 Simulation study

Poisson Thomas VarGamma DPP

0.00 0.04 0.08 0.12 0.00 0.04 0.08 0.12 0.00 0.04 0.08 0.12 0.00 0.04 0.08 0.120.00

0.25

0.50

0.75

1.00

5

10

15

20

2.5

5.0

7.5

0.50

0.75

1.00

1.25

1.50

r

g(r)

Figure 6.1: Pair correlation functions for the point processes considered in the simulationstudy.

We compare the performance of the orthogonal series estimators and the kernelestimators for data simulated on W = [0, 1]2 or W = [0, 2]2 from four point pro-cesses with constant intensity ρ = 100. More specifically, we consider nsim = 1000realizations from a Poisson process, a Thomas process (parent intensity κ = 25, dis-persion standard deviation ω = 0.0198), a Variance Gamma cluster process (parentintensity κ = 25, shape parameter ν = −1/4, dispersion parameter ω = 0.01845,Jalilian et al., 2013), and a determinantal point process with pair correlation func-tion g(r) = 1 − exp{−2(r/α)2} and α = 0.056. The pair correlation functions ofthese point processes are shown in Figure 6.1.

For each realization, g(r) is estimated for r in (rmin, rmin +R), with rmin = 10−3

and R = 0.06, 0.085, 0.125, using the kernel estimators gk(r; b), gd(r; b) and gc(r; b)or the orthogonal series estimator go(r; b). The Epanechnikov kernel with bandwidthb = 0.15/

√ρ is used for gk(r; b) and gd(r; b) while the bandwidth of gc(r; b) is chosen

by minimizing Guan (2007a)’s estimate (2.3) of the mean integrated squared error.For the orthogonal series estimator, we consider both the cosine and the Fourier-Bessel bases with simple, refined or Wahba smoothing schemes. For the Fourier-Bessel basis we use the modified orthogonal series estimator described in Section 3.4.The parameters for the smoothing scheme are chosen according to Section 5.

11

Page 13: 2017 - data.math.au.dk

From the simulations we estimate the mean integrated squared error (2.2) withw(r) = 1 of each estimator gm, m = k, d, c, o, over the intervals [rmin, 0.025] (smallspatial lags) and [rmin, rmin + R] (all lags). We consider the kernel estimator gk asthe baseline estimator and compare any of the other estimators g with gk using thelog relative efficiency eI(g) = log{miseI(gk)/miseI(g)}, where miseI(g) denotes theestimated mean squared integrated error over the interval I for the estimator g. ThuseI(g) > 0 indicates that g outperforms gk on the interval I. Results for W=[0, 1]2

are summarized in Figure 6.2.

● ● ●

● ● ●

● ● ●

● ● ●

● ● ●

● ● ●

● ● ●

●●

VarGamma

(rmin, .025]

VarGamma

(rmin, R]

DPP

(rmin, .025]

DPP

(rmin, R]

Poisson

(rmin, .025]

Poisson

(rmin, R]

Thomas

(rmin, .025]

Thomas

(rmin, R]

0.060 0.085 0.125 0.060 0.085 0.125 0.060 0.085 0.125 0.060 0.085 0.125

0.060 0.085 0.125 0.060 0.085 0.125 0.060 0.085 0.125 0.060 0.085 0.1250.8

0.9

1.0

1.1

1.2

1.3

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.1

1.2

1.3

1.4

1.5

−2

−1

0

1

0

1

2

3

0.2

0.3

0.4

0.5

0

1

2

3

4

0.2

0.3

0.4

0.5

0.6

R

log

rela

tive

effic

ieny

with

resp

ect t

o g k Type

simple

refined

Wahba

Estimator● kernel d

kernel c

Bessel

cosine

Figure 6.2: Plots of log relative efficiencies for small lags (rmin, 0.025] and all lags (rmin, R],R = 0.06, 0.085, 0.125, and W = [0, 1]2. Black: kernel estimators. Blue and red: orthogonalseries estimators with Bessel respectively cosine basis. Lines serve to ease visual interpre-tation.

For all types of point processes, the orthogonal series estimators outperform ordoes as well as the kernel estimators both at small lags and over all lags. The detailedconclusions depend on whether the non-repulsive Poisson, Thomas and Var Gammaprocesses or the repulsive determinantal process are considered. Orthogonal-Besselwith refined or Wahba smoothing is superior for Poisson, Thomas and Var Gammabut only better than gc for the determinantal point process. The performance ofthe orthogonal-cosine estimator is between or better than the performance of thekernel estimators for Poisson, Thomas and Var Gamma and is as good as the bestkernel estimator for determinantal. Regarding the kernel estimators, gc is betterthan gd for Poisson, Thomas and Var Gamma and worse than gd for determinantal.The above conclusions are stable over the three R values considered. For W =[0, 2]2 (see Figure G.1 in the supplementary material) the conclusions are similarbut with more clear superiority of the orthogonal series estimators for Poisson andThomas. For Var Gamma the performance of gc is similar to the orthogonal seriesestimators. For determinantal and W = [0, 2]2, gc is better than orthogonal-Bessel-refined/Wahba but still inferior to orthogonal-Bessel-simple and orthogonal-cosine.Figures G.2 and G.3 in the supplementary material give a more detailed insightin the bias and variance properties for gk, gc, and the orthogonal series estimators

12

Page 14: 2017 - data.math.au.dk

with simple smoothing scheme. Table G.1 in the supplementary material shows thatthe selected K in general increases when the observation window is enlargened,as required for the asymptotic results. The general conclusion, taking into accountthe simulation results for all four types of point processes, is that the best overallperformance is obtained with orthogonal-Bessel-simple, orthogonal-cosine-refined ororthogonal-cosine-Wahba.

To supplement our theoretical results in Section 4 we consider the distributionof the simulated go(r; b) for r = 0.025 and r = 0.1 in case of the Thomas processand using the Fourier-Bessel basis with the simple smoothing scheme. In additionto W = [0, 1]2 and W = [0, 2]2, also W = [0, 3]2 is considered. The mean, standarderror, skewness and kurtosis of go(r) are given in Table 6.1 while histograms of theestimates are shown in Figure G.4. The standard error of go(r; b) scales as |W |1/2 inaccordance with our theoretical results. Also the bias decreases and the distributionsof the estimates become increasingly normal as |W | increases.

Table 6.1: Monte Carlo mean, standard error, skewness (S) and kurtosis (K) of go(r)using the Bessel basis with the simple smoothing scheme in case of the Thomas process onobservation windows W1 = [0, 1]2, W2 = [0, 2]2 and W3 = [0, 3]3.

r g(r) E{go(r)} [Var{go(r)}]1/2 S{go(r)} K{go(r)}W1 0.025 3.972 3.961 0.923 1.145 5.240W1 0.1 1.219 1.152 0.306 0.526 3.516W2 0.025 3.972 3.959 0.467 0.719 4.220W2 0.1 1.219 1.187 0.150 0.691 4.582W3 0.025 3.972 3.949 0.306 0.432 3.225W3 0.1 1.2187 1.2017 0.0951 0.2913 2.9573

7 Application

We consider point patterns of locations of Acalypha diversifolia (528 trees), Lon-chocarpus heptaphyllus (836 trees) and Capparis frondosa (3299 trees) species inthe 1995 census for the 1000 m× 500 m Barro Colorado Island plot (Hubbell andFoster, 1983; Condit, 1998). To estimate the intensity function of each species, weuse a log-linear regression model depending on soil condition (contents of copper,mineralized nitrogen, potassium and phosphorus and soil acidity) and topographical(elevation, slope gradient, multiresolution index of valley bottom flatness, ncomingmean solar radiation and the topographic wetness index) variables. The regressionparameters are estimated using the quasi-likelihood approach in Guan et al. (2015).The point patterns and fitted intensity functions are shown in Figure H.1 in thesupplementary material.

The pair correlation function of each species is then estimated using the biascorrected kernel estimator gc(r; b) with b determined by minimizing (2.3) and theorthogonal series estimator go(r; b) with both Fourier-Bessel and cosine basis, refinedsmoothing scheme and the optimal cut-offs K obtained from (5.4); see Figure 7.1.

13

Page 15: 2017 - data.math.au.dk

For Lonchocarpus the three estimates are quite similar while for Acalypha andCapparis the estimates deviate markedly for small lags and then become similar forlags greater than respectively 2 and 8 meters. For Capparis and the cosine basis,the number of selected coefficients coincides with the chosen upper limit 49 for thenumber of coefficients. The cosine estimate displays oscillations which appear to beartefacts of using high frequency components of the cosine basis. The function (5.3)decreases very slowly after K = 7 so we also tried the cosine estimate with K = 7which gives a more reasonable estimate.

Acalypha diversifolia

0 5 10 15

010

2030

40

r

g(r

)−1

kernel: gc(r ; b = 1.35)Bessel: K = 7 , refined

cosine: K = 11 , refined

Lonchocarpus heptaphyllus

0 10 20 30 40 50 60 70

0.0

0.5

1.0

1.5

2.0

r

g(r

)−1

kernel: gc(r ; b = 9.64)Bessel: K = 7 , refined

cosine: K = 6 , refined

Capparis frondosa

0 20 40 60 80

01

23

r

g(r

)−1

kernel: gc(r ; b = 5.13)Bessel: K = 7 , refined

cosine: K = 49 , refined

cosine: K = 7 , refined

Figure 7.1: Estimated pair correlation functions for tropical rain forest trees.

Acknowledgement

Rasmus Waagepetersen is supported by the Danish Council for Independent Re-search | Natural Sciences, grant “Mathematical and Statistical Analysis of SpatialData”, and by the “Centre for Stochastic Geometry and Advanced Bioimaging”,funded by the Villum Foundation.

References

Baddeley, A., Rubak, E., and Turner, R. (2015). Spatial Point Patterns: Methodology andApplications with R. Chapman & Hall/CRC Interdisciplinary Statistics. Chapman &Hall/CRC, Boca Raton.

Baddeley, A. J., Møller, J., and Waagepetersen, R. (2000). Non- and semi-parametric esti-mation of interaction in inhomogeneous point patterns. Statistica Neerlandica, 54:329–350.

Biscio, C. A. N. and Waagepetersen, R. (2016). A central limit theorem for α-mixingspatial point processes. In preparation.

Coeurjolly, J.-f. and Møller, J. (2014). Variational approach for spatial point processintensity estimation. Bernoulli, 20(3):1097–1125.

Coeurjolly, J.-F., Møller, J., and Waagepetersen, R. (2016). A tutorial on Palm distribu-tions for spatial point processes. International Statistical Review. to appear.

14

Page 16: 2017 - data.math.au.dk

Condit, R. (1998). Tropical forest census plots. Springer-Verlag and R. G. Landes Company,Berlin, Germany and Georgetown, Texas.

Dvořák, J. and Prokešová, M. (2016). Asymptotic properties of the minimum contrastestimators for projections of inhomogeneous space-time shot-noise cox processes. Appli-cations of Mathematics, 61(4):387–411.

Efromovich, S. (2010). Orthogonal series density estimation. Wiley Interdisciplinary Re-views: Computational Statistics, 2(4):467–476.

Guan, Y. (2007a). A least-squares cross-validation bandwidth selection approach in paircorrelation function estimations. Statistics and Probability Letters, 77(18):1722–1729.

Guan, Y. (2007b). A composite likelihood cross-validation approach in selecting bandwidthfor the estimation of the pair correlation function. Scandinavian Journal of Statistics,34(2):336–346.

Guan, Y., Jalilian, A., and Waagepetersen, R. (2015). Quasi-likelihood for spatial pointprocesses. Journal of the Royal Statistical Society. Series B: Statistical Methodology,77(3):677–697.

Hall, P. (1987). Cross-validation and the smoothing of orthogonal series density estimators.Journal of Multivariate Analysis, 21(2):189–206.

Hubbell, S. P. and Foster, R. B. (1983). Diversity of canopy trees in a Neotropical forestand implications for the conservation of tropical trees. In Sutton, S. L., Whitmore, T. C.,and Chadwick, A. C., editors, Tropical Rain Forest: Ecology and Management., pages25–41. Blackwell Scientific Publications, Oxford.

Illian, J., Penttinen, A., Stoyan, H., and Stoyan, D. (2008). Statistical Analysis and Mod-elling of Spatial Point Patterns, volume 76. Wiley, London.

Ivanoff, G. (1982). Central limit theorems for point processes. Stochastic Processes andtheir Applications, 12(2):171–186.

Jalilian, A., Guan, Y., and Waagepetersen, R. (2013). Decomposition of variance for spatialCox processes. Scandinavian Journal of Statistics, 40(1):119–137.

Landau, L. (2000). Monotonicity and bounds on Bessel functions. In Proceedings of theSymposium on Mathematical Physics and Quantum Field Theory, volume 4, pages 147–154. Southwest Texas State Univ. San Marcos, TX.

Lavancier, F., Møller, J., and Rubak, E. (2015). Determinantal point process modelsand statistical inference. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 77:853–877.

Loh, J. M. and Jang, W. (2010). Estimating a cosmological mass bias parameter withbootstrap bandwidth selection. Journal of the Royal Statistical Society: Series C (AppliedStatistics), 59(5):761–779.

McGown, K. J. and Parks, H. R. (2007). The generalization of Faulhaber’s formula to sumsof non-integral powers. Journal of Mathematical Analysis and Applications, 330(1):571– 575.

15

Page 17: 2017 - data.math.au.dk

Møller, J. and Waagepetersen, R. P. (2003). Statistical inference and simulation for spatialpoint processes. Chapman and Hall/CRC, Boca Raton.

Politis, D. N., Paparoditis, E., and Romano, J. P. (1998). Large sample inference forirregularly space dependent opservations based on subsampling. Sankhya: The IndianJournal of Statistics, 60(2):274–292.

Stoyan, D. and Stoyan, H. (1994). Fractals, Random Shapes and Point Fields: Methods ofGeometrical Statistics. Wiley.

Stoyan, D. and Stoyan, H. (1996). Estimating pair correlation functions of planar clusterprocesses. Biometrical Journal, 38(3):259–271.

Waagepetersen, R. and Guan, Y. (2009). Two-step estimation for inhomogeneous spatialpoint processes. Journal of the Royal Statistical Society: Series B (Statistical Methodol-ogy), 71(3):685–702.

Wahba, G. (1981). Data-based optimal smoothing of orthogonal series density estimates.Annals of Statistics, 9:146–l56.

Watson, G. N. (1995). A treatise on the theory of Bessel functions. Cambridge universitypress.

Yue, Y. R. and Loh, J. M. (2013). Bayesian nonparametric estimation of pair correlationfunction for inhomogeneous spatial point processes. Journal of Nonparametric Statistics,25(2):463–474.

Supplementary material

Supplementary material includes proofs of consistency and asymptotic normalityresults and details of the simulation study and data analysis.

Supplementary material for ‘Orthogonal series estimation of the pair correlationfunction of a spatial point process’

A Expanding observation window

This section states a few results on the asymptotic behavior of the edge correctionen(h) (defined in (4.1) in the main document) and related ratios. For each n ≥ 1and h, h1, h2 ∈ Rd,

|Wn ∩ (Wn)h| ={ ∏d

i=1 (2nai − |hi|) |hi| < 2nai, i = 1, . . . , d0 otherwise

and∣∣Wn ∩ (Wn)h1 ∩ (Wn)h2

∣∣

=

{Vn(h1, h2) max{|h1i|, |h2,i|, |h1i − h2i|} < 2nai, i = 1, . . . , d

0 otherwise

16

Page 18: 2017 - data.math.au.dk

where Vn(h1, h2) =∏d

i=1

(2nai −max{0, h1i, h2i} + min{0, h1i, h2i}

). Therefore, for

any fixed h, h1, h2 ∈ Rd, as n→∞,

en(h) =|Wn ∩ (Wn)h||Wn|

=d∏

i=1

(1− |hi|

2nai

)→ 1. (A.1)

If n ≥ R/min1≤i≤d ai or equivalently BRrmin⊂ Wn, then 1/2d ≤ en(h) ≤ 1 for any

h ∈ BRrmin

.Further,∣∣Wn ∩ (Wn)h1 ∩ (Wn)h2

∣∣|Wn|

=d∏

i=1

(1− max{0, h1i, h2i} −min{0, h1i, h2i}

2nai

)→ 1.

(A.2)Similarly, it can be shown that for any fixed h1, h2, h3 ∈ Rd,

∣∣Wn ∩ (Wn)h1 ∩ (Wn)h3 ∩ (Wn)h2+h3

∣∣/|Wn| → 1.

B Mean and variance of θk,n

For n large enough, 1(h ∈ BRrmin

) > 0 implies |Wn∩ (Wn)h| > 0. Then, by the secondorder Campbell formula (see Møller and Waagepetersen, 2003, Section C.2.1),

E(θk,n)

=1

sad|Wn|

W 2n

1(v − u ∈ BRrmin

)

· φk(‖v − u‖ − rmin)w(‖v − u‖ − rmin)

‖v − u‖d−1en(v − u)g(‖v − u‖)dudv

=

Rd1(h ∈ BR

rmin)g(‖h‖)φk(‖h‖ − rmin)w(‖h‖ − rmin)

sad‖h‖d−1

Rd

1{v ∈ Wn ∩ (Wn)h}|Wn ∩ (Wn)h|

dvdh

=

∫ rmin+R

rmin

g(r)φk(r − rmin)w(r − rmin)dr = θk.

Let fk(h) = φk(‖h‖ − rmin)w(‖h‖ − rmin)/‖h‖d−1. Then, by the second to fourthorder Campbell formulae (omitting the details),

Var(θk,n) =1

(sad)2|Wn|

·[

2

BRrmin

g(h)f 2k (‖h‖)G(2)

n (h)dh

+ 4

BRrmin

BRrmin

g(3)(h1, h2)fk(‖h1‖)fk(‖h2‖)G(3)n (h1, h2)dh1dh2

+

BRrmin

BRrmin

fk(‖h1‖)fk(‖h2‖)G(4)n (h1, h2)dh1dh2

],

(B.1)

17

Page 19: 2017 - data.math.au.dk

where

G(2)n (h) =

1

e2n(h)

[1

|Wn|

Wn

1[u+ h ∈ Wn]

ρ(u)ρ(u+ h)du

],

G(3)n (h1, h2) =

1

en(h1)en(h2)

[1

|Wn|

Wn

1[u ∈ (Wn)h1 ∩ (Wn)h2 ]

ρ(u)du

],

G(4)n (h1, h2) =

1

en(h1)en(h2)

R2

[g(4)(h1, u, h2 + u)− g(h1)g(h2)

]

·∣∣Wn ∩ (Wn)u ∩ (Wn)h1 ∩ (Wn)h2

∣∣|Wn|

du.

For n ≥ R/min1≤i≤d ai and any h, h1, h2 ∈ BRrmin

, by V1,

G(2)n (h) =

1

e2n(h)

[1

|Wn|

Wn

1[u+ h ∈ Wn]

ρ(u)ρ(u+ h)du

]≤ 1

ρ2minen(h)

≤ 2d

ρ2min

,

G(3)n (h1, h2) =

1

en(h1)en(h2)

[1

|Wn|

Wn

1[u ∈ (Wn)h1 ∩ (Wn)h2 ]

ρ(u)du

]

≤ 1

ρminen(h)≤ 2d

ρmin

and by V3,∣∣G(4)

n (h1, h2)∣∣ ≤ 1

en(h1)en(h2)

R2

∣∣∣g(4)(h1, u, h2 + u)− g(h1)g(h2)∣∣∣

·∣∣Wn ∩ (Wn)u ∩ (Wn)h1 ∩ (Wn)h2

∣∣|Wn|

du

≤ 1

en(h1)

R2

∣∣∣g(4)(h1, u, h2 + u)− g(h1)g(h2)∣∣∣du

≤ 2d∫

R2

∣∣∣g(4)(h1, u, h2 + u)− g(h1)g(h2)∣∣∣du ≤ 2dC4.

Thus, using V2, for all k ≥ 1 and n ≥ R/min1≤i≤d ai,

Var(θk,n) ≤ 1

(sad)2|Wn|

·[

2d+1

ρ2min

BRrmin

f 2k (‖h‖)g(‖h‖)dh+

2d+2

ρmin

C3

(∫

BRrmin

∣∣fk(‖h‖)∣∣dh)2

+ 2dC4

(∫

BRrmin

∣∣fk(‖h‖)∣∣dh)2].

But for all k ≥ 1,∫

BRrmin

f 2k (‖h‖)g(‖h‖)dh =

∫ rmin+R

rmin

φ2k(r − rmin)g(r)

w2(r − rmin)

rd−1dr

=

∫ R

0

φ2k(r)g(r + rmin)

w2(r)

(r + rmin)d−1dr

≤ C2

∫ R

0

φ2k(r)w(r)dr = C2,

18

Page 20: 2017 - data.math.au.dk

and by Hölder’s inequality,∫

BRrmin

∣∣fk(‖h‖)∣∣dh =

BRrmin

∣∣φk(‖h‖ − rmin)∣∣w(‖h‖ − rmin)

‖h‖d−1dh

=

∫ rmin+R

rmin

∣∣φk(r − rmin)∣∣w(r − rmin)dr

≤(∫ R

0

φ2k(r)w(r)dr

)1/2(∫ R

0

w(r)dr

)1/2

=

(∫ R

0

w(r)dr

)1/2

<∞.

Therefore

Var(θk,n) <1

(sad)2|Wn|

[2d+1

ρ2min

C2 +

(2d+2

ρmin

C3 + 2dC4

)(∫ R

0

w(r)dr

)]=

C1

|Wn|,

where

C1 =1

(sad)2

[2d+1

ρ2min

C2 +

(2d+2

ρmin

C3 + 2dC4

)(∫ R

0

w(r)dr

)]> 0.

C Proof of asymptotic normality

In this section we give a proof of Theorem 4.1. Let for t = (t1, . . . , td) ∈ Zd,

∆(t) = ×di=1 (s(ti − 1/2), s(ti + 1/2)]

be the hyper-square with side length s and centered at st. Then, {∆(t) : t ∈ Zd}is a partition of Rd; i.e., ∆(t1) ∩ ∆(t2) = ∅ for t1 6= t2 and ∪t∈Zd∆(t) = Rd, and∣∣∆n(t)⊕R

∣∣ = (s+ 2R)d, where

∆(t)⊕R = ×di=1 (s(ti − 1/2)−R, s(ti + 1/2) +R] .

Let Tn ={t ∈ Zd : ∆(t) ∩Wn 6= ∅

}and define

Yn(t) =∑

u∈X∩∆(t)

v∈X\{u}v−u∈BRrmin

fn(v − u)1(u ∈ Wn, v ∈ Wn)

ρ(u)ρ(v)en(v − u).

19

Page 21: 2017 - data.math.au.dk

Then, since X =⋃t∈Zd

(X ∩∆(t)

),

Sn =1

sad|Wn|

6=∑

u,v∈XWnv−u∈BRrmin

fn(v − u)

ρ(u)ρ(v)en(v − u)

=1

sad|Wn|∑

t∈Zd

u∈X∩∆(t)

v∈X\{u}v−u∈BRrmin

fn(v − u)1(u ∈ Wn, v ∈ Wn)

ρ(u)ρ(v)en(v − u)

=1

sad|Wn|∑

t∈Zd∆(t)∩Wn 6=∅

Yn(t) =1

sad|Wn|∑

t∈TnYn(t).

Due to V1, N4 and since en(h) > 1/2d for n large enough and h ∈ BRrmin

,

E(|Yn(t)|2+dηe) ≤ E( ∑

u∈X∩∆(t)

v∈X\{u}v−u∈BRrmin

L22d

ρ2min

)2+dηe.

The moments E(|Yn(t)|2+dηe) are thus bounded by sums of integrals involving thenormalized joint intensities g(k)(u1, . . . , uk−1) times (L22d/ρ2

min)2+dηe for k = 2, . . . ,2(2 + dηe). These integrals are bounded uniformly in t and n due to assumption N2.Thus,

supn≥1

supt∈Tn

E(|Yn(t)|2+η) ≤ supn≥1

supt∈Tn

E(|Yn(t)|2+dηe) <∞

and hence{|Yn(t)|2+η : t ∈ Tn, n ≥ 1

}is a uniformly integrable family (trian-

gular array) of random variables. Invoking finally N1 and N3 and letting σ2n =

Var{∑t∈Tn Yn(t)}, it follows directly from Theorem 3.1 in Biscio and Waagepetersen(2016) that

σ−1n

t∈Tn

[Yn(t)− E{Yn(t)}

] D−→ N(0, 1),

which is equivalent to Theorem 4.1.

20

Page 22: 2017 - data.math.au.dk

D Asymptotic mean of θ2k

Consider a real function f on Rd × Rd where f(h1, h2) 6= 0 implies |Wn ∩ (Wn)h1| ·|Wn ∩ (Wn)h2| > 0. Then by the fourth order Campbell formula,

E{ 6=∑

u,v,u′,v′∈XWn

f(v − u, v′ − u′)ρ(u)ρ(v)ρ(u′)ρ(v′)|Wn ∩ (Wn)v−u||Wn ∩ (Wn)v′−u′ |

}

=

W 4n

f(v − u, v′ − u′)|Wn ∩ (Wn)v−u||Wn ∩ (Wn)v′−u′ |

g(4)(v − u, u′ − u, v′ − u)dudvdu′dv′

=

(Rd)4f(h1, h2)g(4)(h1, u

′ − u, h2 + u′ − u)

· 1{u ∈ Wn ∩ (Wn)h1 , u

′ ∈ Wn ∩ (Wn)h2}

|Wn ∩ (Wn)h1||Wn ∩ (Wn)h2|dudh1du′dh2

=

Rd

Rdf(h1, h2)

∫Wn∩(Wn)h1

∫Wn∩(Wn)h2

g(4)(h1, u′ − u, h2 + u′ − u)dudu′

|Wn ∩ (Wn)h1 ||Wn ∩ (Wn)h2|

dh1dh2.

This expectation is the sum of

A =

Rd

Rdf(h1, h2)

{∫Wn∩(Wn)h1

∫Wn∩(Wn)h2

g(h1)g(h2)

|Wn ∩ (Wn)h1||Wn ∩ (Wn)h2 |dudu′

}dh1dh2

=

Rd

Rdf(h1, h2)g(h1)g(h2)dh1dh2

and

Bn =

Rd

Rdf(h1, h2)

·

∫Wn∩(Wn)h1

∫Wn∩(Wn)h2{g(4)(h1, u

′ − u, h2 + u′ − u)− g(h1)g(h2)}

dudu′

|Wn ∩ (Wn)h1||Wn ∩ (Wn)h2|

dh1dh2

=

Rd

Rdf(h1, h2)

·[ ∫

Rd

|Wn ∩ (Wn)h1 ∩ (Wn)h3 ∩ (Wn)h2+h3||Wn ∩ (Wn)h1||Wn ∩ (Wn)h2|

·{g(4)(h1, h3, h2 + h3)− g(h1)g(h2)

}dh3

]dh1dh2.

We now specialize to f(h1, h2) = fk(h1)fk(h2), where

fk(h) = φk(‖h‖ − rmin)w(‖h‖ − rmin)1(h ∈ BRrmin

)/(sad‖h‖d−1).

21

Page 23: 2017 - data.math.au.dk

Then, by V3,

|Bn| ≤∫

BRrmin

BRrmin

fk(h1)fk(h2)

·[ ∫

Rd

|Wn ∩ (Wn)h1 ∩ (Wn)h3 ∩ (Wn)h2+h3||Wn ∩ (Wn)h1||Wn ∩ (Wn)h2 |

·∣∣g(4)(h1, h3, h2 + h3)− g(h1)g(h2)

∣∣dh3

]dh1dh2

≤C4

BRrmin

BRrmin

fk(h1)fk(h2)

|Wn ∩ (Wn)h1|dh1dh2.

Thus Bn tends to zero as n→∞. Regarding A, we have

A =

Rd

Rdfk(h1)fk(h2)g(h1)g(h2)dh1dh2 =

{∫

Rdfk(h)g(h)dh

}2

=

{∫ rmin+R

rmin

g(r)φk(r − rmin)w(r − rmin)dr

}2

= θ2k.

E Asymptotic behavior of Bessel function roots

It is known (see Watson, 1995, p. 199) that as r →∞,

Jν(r) ∼(

2

πr

)1/2

cos(r − νπ

2− π

4

),

which implies that

αν,k =

(k +

ν

2− 1

4

)π +O(k−1), as k →∞, (E.1)

and αν,k →∞, as k →∞. We can argue that for large k,

Jν+1(αν,k) ≈(

2

παν,k

)1/2

cos

(αν,k −

π

4− (ν + 1)π

2

)

=

(2

παν,k

)1/2

cos((k − 1)π +O(k−1)

),

and consequently∣∣Jν+1(αν,k)

∣∣ ≈(

2

παν,k

)1/2

. (E.2)

F Order of sum of products of Bessel basisfunctions

In this section we consider the Bessel basis in the case rmin = 0. For ν ≥ 0 and r > 0(Landau, 2000), ∣∣Jν(r)

∣∣ ≤ c

|r|1/3 ,

22

Page 24: 2017 - data.math.au.dk

where c = 0.7857468704 . . ., and hence

∣∣φk(r)∣∣ ≤ cr−ν−1/3

√2

(R2αν,k)1/3∣∣Jν+1(αν,k)

∣∣ r > 0, k ≥ 1.

Using (E.1) and (E.2), for large k,

cr−ν−1/3√

2

(R2αν,k)1/3∣∣Jν+1(αν,k)

∣∣ ≈ c√πr−ν−1/3

R2/3α

1/6ν,k ≈ c

√πr−ν−1/3

R2/3

{(k +

ν

2− 1

4

}1/6

.

Since limr→0 Jν(r)r−ν = {Γ(ν+1)2ν}−1, we also obtain for large k and 0 < ‖h‖ < R,

|φ(‖h‖)| ≤ const(R

αν,k

)−ν √2

RJν+1(αν,k)≈ const

√παν+1/2ν,k

Rν+1= O(kν+1/2).

Thus, for fixed r and 0 < ‖h‖ < R,

|φk(r)φk(‖h‖)| = O(k1/6+max(1/6,ν+1/2)) = O(k1/6+max(1/6,d/2−1/2)).

By generalization of Faulhaber’s formula (McGown and Parks, 2007),

Kn∑

k=1

kp = O(Kp+1n ), p > −1.

Therefore,1

Kωn

Kn∑

k=1

∣∣φk(r)φk(‖h‖)∣∣ =

{O(K

4/3−ωn ) d = 1

O(Kd/2+2/3−ωn ) d > 1

for ω > 0 and 0 < r, ‖h‖ < R.

G Simulation study

The results in Figure G.1 are obtained as for the simulation study in the main docu-ment but with W = [0, 2]2. The estimated cut-offs K are summarized in Table G.1.Simulation mean and 95% envelopes for gk, gc and go with both Fourier-Bessel andcosine basis and simple smoothing schemes are shown in Figure G.2 and Figure G.3.Figure G.4 shows histograms of orthogonal series estimates. Comments on thesefigures and the table are given in the main document.

23

Page 25: 2017 - data.math.au.dk

● ● ●

● ● ●

● ● ●

● ● ●

● ● ●

● ● ●

● ● ●

●●

VarGamma

(rmin, .025]

VarGamma

(rmin, R]

DPP

(rmin, .025]

DPP

(rmin, R]

Poisson

(rmin, .025]

Poisson

(rmin, R]

Thomas

(rmin, .025]

Thomas

(rmin, R]

0.060 0.085 0.125 0.060 0.085 0.125 0.060 0.085 0.125 0.060 0.085 0.125

0.060 0.085 0.125 0.060 0.085 0.125 0.060 0.085 0.125 0.060 0.085 0.1250.9

1.2

1.5

1.8

−1.0

−0.5

0.0

0.5

1.2

1.6

2.0

−1

0

1

1

2

3

4

0.00

0.25

0.50

0.75

1.00

1

2

3

4

0.0

0.4

0.8

R

log

rela

tive

effic

ieny

with

resp

ect t

o g k Type

simple

refined

Wahba

Estimator● kernel d

kernel c

Bessel

cosine

Figure G.1: Plots of log relative efficiencies for small lags (rmin, 0.025] and all lags(rmin, R], R = 0.06, 0.085, 0.125, and W = [0, 2]2. Black: kernel estimators. Blue and red:orthogonal series estimators with Bessel respectively cosine basis. Lines serve to ease visualinterpretation.

Table G.1: Monte Carlo mean and quantiles of the estimated cut-off K obtained from(5.4) for the orthogonal series estimator with Bessel (go1) and cosine (go4) basis in the caseof Poisson (P), Thomas (T), Variance Gamma (V) and determinantal (D) point processeson observation windows W1 = [0, 1]2 and W2 = [0, 2]2 with rmin = 0.001.

R = 0.06 R = 0.085 R = 0.125

E(K) q0.05(K) q0.95(K) E(K) q0.05(K) q0.95(K) E(K) q0.05(K) q0.95(K)

W1 P go1 2.17 2.00 3.00 2.15 2.00 3.00 2.11 2.00 3.00W1 P go4 2.17 2.00 4.00 2.15 2.00 4.00 2.11 2.00 4.00W1 T go1 2.24 2.00 3.00 2.24 2.00 3.00 3.23 2.00 4.05W1 T go4 2.24 2.00 4.00 2.24 2.00 4.00 3.23 3.00 5.00W1 V go1 2.77 2.00 4.00 3.50 2.00 6.00 4.85 3.00 8.00W1 V go4 2.77 2.00 5.00 3.50 2.00 7.00 4.85 3.00 10.00W1 D go1 2.21 2.00 3.00 2.18 2.00 3.00 2.38 2.00 3.00W1 D go4 2.21 2.00 3.00 2.18 2.00 4.00 2.38 2.00 5.00W2 P go1 2.17 2.00 3.00 2.12 2.00 3.00 2.09 2.00 3.00W2 P go4 2.17 2.00 4.00 2.12 2.00 4.00 2.09 2.00 4.00W2 T go1 2.39 2.00 4.00 2.46 2.00 4.00 3.78 3.00 5.00W2 T go4 2.39 2.00 5.00 2.46 2.00 5.00 3.78 3.00 6.00W2 V go1 3.58 3.00 5.00 5.14 3.00 8.00 7.19 5.00 11.00W2 V go4 3.58 3.00 9.00 5.14 4.00 12.00 7.19 5.00 17.00W2 D go1 2.22 2.00 4.00 2.17 2.00 3.00 2.79 2.00 4.00W2 D go4 2.22 2.00 3.00 2.17 2.00 4.00 2.79 3.00 5.00

24

Page 26: 2017 - data.math.au.dk

kernel

r

g(r

)

0.00 0.04 0.08

05

1015

kernel: bias corrected

r

g(r

)

0.00 0.04 0.08

05

1015

Bessel: simple scheme

r

g(r

)

0.00 0.04 0.08

05

1015

cosine: simple scheme

r

g(r

)

0.00 0.04 0.08

05

1015

kernel

r

g(r

)

0.00 0.04 0.08

05

1015

2025

3035

kernel: bias corrected

r

g(r

)

0.00 0.04 0.08

05

1015

2025

3035

Bessel: simple scheme

r

g(r

)

0.00 0.04 0.08

05

1015

2025

3035

cosine: simple scheme

r

g(r

)

0.00 0.04 0.08

05

1015

2025

3035

kernel

r

g(r

)

0.00 0.04 0.08

010

2030

4050

60

kernel: bias corrected

r

g(r

)

0.00 0.04 0.08

010

2030

4050

60

Bessel: simple scheme

r

g(r

)

0.00 0.04 0.08

010

2030

4050

60

cosine: simple scheme

r

g(r

)

0.00 0.04 0.08

010

2030

4050

60

kernel

r

g(r

)

0.00 0.04 0.08

0.0

0.5

1.0

1.5

kernel: bias corrected

r

g(r

)

0.00 0.04 0.08

0.0

0.5

1.0

1.5

Bessel: simple scheme

r

g(r

)

0.00 0.04 0.08

0.0

0.5

1.0

1.5

cosine: simple scheme

r

g(r

)

0.00 0.04 0.08

0.0

0.5

1.0

1.5

Figure G.2: True pair correlation function (solid line), Monte Carlo mean (dashed lines)and 95% pointwise probability interval (grey area) of estimates based on nsim = 1000 sim-ulations from the Poisson (first row), Thomas (second row), Variance Gamma (third row)and determinantal (fourth row) point processes on W = [0, 1]2.

25

Page 27: 2017 - data.math.au.dk

kernel

r

g(r

)

0.00 0.04 0.08

05

1015

kernel: bias corrected

r

g(r

)

0.00 0.04 0.08

05

1015

Bessel: simple scheme

r

g(r

)

0.00 0.04 0.08

05

1015

cosine: simple scheme

r

g(r

)

0.00 0.04 0.08

05

1015

kernel

r

g(r

)

0.00 0.04 0.08

05

1015

2025

3035

kernel: bias corrected

r

g(r

)

0.00 0.04 0.08

05

1015

2025

3035

Bessel: simple scheme

r

g(r

)

0.00 0.04 0.08

05

1015

2025

3035

cosine: simple scheme

r

g(r

)

0.00 0.04 0.08

05

1015

2025

3035

kernel

r

g(r

)

0.00 0.04 0.08

010

2030

4050

60

kernel: bias corrected

r

g(r

)

0.00 0.04 0.08

010

2030

4050

60

Bessel: simple scheme

r

g(r

)

0.00 0.04 0.08

010

2030

4050

60

cosine: simple scheme

r

g(r

)

0.00 0.04 0.08

010

2030

4050

60

kernel

r

g(r

)

0.00 0.04 0.08

0.0

0.5

1.0

1.5

kernel: bias corrected

r

g(r

)

0.00 0.04 0.08

0.0

0.5

1.0

1.5

Bessel: simple scheme

r

g(r

)

0.00 0.04 0.08

0.0

0.5

1.0

1.5

cosine: simple scheme

r

g(r

)

0.00 0.04 0.08

0.0

0.5

1.0

1.5

Figure G.3: True pair correlation function (solid line), Monte Carlo mean (dashed lines)and 95% pointwise probability interval (grey area) of estimates based on nsim = 1000 sim-ulations from the Poisson (first row), Thomas (second row), Variance Gamma (third row)and determinantal (fourth row) point processes on W = [0, 2]2.

26

Page 28: 2017 - data.math.au.dk

Den

sity

2 3 4 5 6 7 8 9

0.0

0.2

0.4

g(0.025)D

ensi

ty0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

g(0.1)

Den

sity

3 4 5 6

0.0

0.2

0.4

0.6

0.8

g(0.025)

Den

sity

0.8 1.0 1.2 1.4 1.6 1.8 2.0

0.0

1.0

2.0

3.0

g(0.1)

Den

sity

3.0 3.5 4.0 4.5 5.0

0.0

0.4

0.8

1.2

g(0.025)

Den

sity

0.9 1.0 1.1 1.2 1.3 1.4 1.5

01

23

4

g(0.1)

Figure G.4: Histograms of go(r) at r = 0.025 and r = 0.1 using the Bessel basis with thesimple smoothing scheme in case of the Thomas process on W = [0, 1]2 (upper panels),W = [0, 2]2 (middle panels) and W = [0, 3]2 (lower panels).

27

Page 29: 2017 - data.math.au.dk

H Data example

Figure H.1 shows the data sets and fitted intensity functions considered in Section 7of the main document.

Acalypha diversifolia Lonchocarpus heptaphyllus Capparis frondosa

0.002 0.004 0.006 0.008 0.01 0.012 0.001 0.002 0.003 0.004 0.005 0.006 0.004 0.006 0.008 0.01 0.012 0.014

Figure H.1: Locations of Acalypha diversifolia, Lonchocarpus heptaphyllus and Capparisfrondosa trees in the Barro Colorado Island plot (upper panels) and their fitted parametricintensity functions (lower panels).

For the Capparis frondosa species and the orthogonal series estimator with cosinebasis, the function I(K) given in (5.3) of the main document is shown in Figure H.2.Although I(K) is decreasing over 1 ≤ K ≤ 49, the rate of decrease slows down afterK = 7.

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

K

I(K)

0 10 20 30 40 50

−14

5−

140

−13

5

Figure H.2: Estimate I(K) of the mean integrated squared error for Capparis frondosain case of the orthogonal series estimator with cosine basis.

I Behavior of the Fourier-Bessel and cosine basis

Figure I.1 shows the Fourier-Bessel and cosine basis functions φk(r) in the planarcase (d = 2) for R = 0.125, k = 1, . . . , 8 and r ∈ [0, 0.125]. Obviously, the cosine

28

Page 30: 2017 - data.math.au.dk

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

05

1020 k = 1

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−30

−10

10

k = 2

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−10

1030 k = 3

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−50

−20

020

k = 4

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−20

020

40 k = 5

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−60

−20

20

k = 6

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−20

2060

k = 7

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−60

−20

20 k = 8

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

2.0

3.0

4.0

k = 1

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−4

02

4

k = 2

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−4

02

4

k = 3

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−4

02

4

k = 4

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−4

02

4k = 5

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−4

02

4

k = 6

r

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−4

02

4

k = 7

0.00 0.02 0.04 0.06 0.08 0.10 0.12

−4

02

4

k = 8

Figure I.1: Fourier-Bessel and cosine basis functions φk(r) in the planar case (d = 2) forR = 0.125 and k = 1, . . . , 8.

basis functions are uniformly bounded and integrable. However, the Fourier-Besselbasis functions exhibit damped oscillation behavior with φk(R) = 0 and

φk(0) =ανν,k

Rν+12ν−1/2Γ(ν + 1)Jν+1(αν,k),

for all k ≥ 1, because limr→0 Jν(r)r−ν = 1/(Γ(ν+ 1)2ν) for ν ≥ 0. Thus, φk(0)→∞

as k →∞.For 0 ≤ ν ≤ 1/2 (or equivalently d = 2, 3), |Jν(r)| ≤ (2/πr)1/2 and hence

∫ αν,k

0

∣∣Jν(r)∣∣rν+1dr ≤

(2

π

)1/2 ∫ αν,k

0

rν+ 12 dr =

(2

π

)1/2(αν,k)

ν+ 32

ν + 32

.

29

Page 31: 2017 - data.math.au.dk

Therefore, as k →∞,∫ R

0

∣∣φk(r)∣∣w(r)dr =

√2

R∣∣Jν+1(αν,k)

∣∣(R

αν,k

)ν+2 ∫ αν,k

0

∣∣Jν(r)∣∣rν+1dr

≤√

2

R∣∣Jν+1(αν,k)

∣∣(R

αν,k

)ν+2(2

π

)1/2(αν,k)

ν+ 32

ν + 32

=2Rν+1

∣∣Jν+1(αν,k)∣∣(παν,k)1/2(ν + 3

2)

≈ 2Rν+1

( 2παν,k

)1/2(παν,k)1/2(ν + 32)

=

√2Rν+1

ν + 32

<∞,

which implies uniform integrability of φk(r).

30