Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR)...

63
Test of Complete Spatial Randomness on Networks A PROJECT SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Xinyue Chang IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED AND COMPUTATIONAL MATHEMATICS YANG LI May, 2016

Transcript of Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR)...

Page 1: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Test of Complete Spatial Randomness on Networks

A PROJECT

SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

OF THE UNIVERSITY OF MINNESOTA

BY

Xinyue Chang

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

MASTER OF APPLIED AND COMPUTATIONAL MATHEMATICS

YANG LI

May, 2016

Page 2: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

c© Xinyue Chang 2016

ALL RIGHTS RESERVED

Page 3: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Acknowledgements

Firstly, I would like to thank my advisor Professor Yang Li for his incredible support,

guidance, and encouragement on my project and graduate study. I would also like to

thank Professor Barry James and Professor Haiyang Wang for serving as my committee

members and their time reading my project report. Last but not least, I am very grateful

to Professor Kang James for her valuable suggestions and comments in the statistical

seminar class.

i

Page 4: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Abstract

Test of complete spatial randomness (CSR) is an essential part of spatial analysis and

regarded as a minimal prerequisite to any serious attempt to model an observed point

pattern. It has been investigated, discussed and verified in planar region by researchers

for more than 40 years. Recently more and more data of spatial point processes on

networks have been collected. This project aimed to apply CSR test method to any

spatial point pattern on the network. The study started with the derivation of the

cumulative distribution function (CDF) of inter-event distances between two locations

randomly distributed on a grid network. We then carried out a test procedure based

on Monte Carlo simulation. The procedure was proposed when considering both inter-

event distances and nearest-neighbor distances. It was found that this method worked

well when the process was constrained on a network. Finally, the car accident pattern

on Minnesota major roads network was tested by both inter-event distances method

and nearest-neighbor distances method.

ii

Page 5: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Contents

Acknowledgements i

Abstract ii

List of Tables vi

List of Figures vii

1 Introduction 1

1.1 Background and Organization . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Complete Spatial Randomness . . . . . . . . . . . . . . . . . . . 3

1.2.2 Spatial Point Processes on Networks . . . . . . . . . . . . . . . . 3

1.2.3 Inter-event Distances . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.4 CSR Test Based on Inter-event Distances . . . . . . . . . . . . . 4

1.2.5 Nearest-neighbor Distances . . . . . . . . . . . . . . . . . . . . . 5

1.2.6 CSR Test Based on Nearest-neighbor Distances . . . . . . . . . . 5

2 CDF of Inter-event Distances on a Grid Network under CSR 6

2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 CDF of Inter-event Distances if t < 1 . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Cumulative Distribution Function . . . . . . . . . . . . . . . . . 9

2.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 CDF of Inter-event Distances if t > 1 . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Cumulative Distribution Function . . . . . . . . . . . . . . . . . 13

iii

Page 6: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

2.3.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 CSR Test Based on Inter-event Distances 20

3.1 CSR Test Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.2 Cluster Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.3 Regular Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 CSR Test Based on Nearest-neighbor Distances 28

4.1 CSR Test Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2.1 Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.2 Cluster Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.3 Regular Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Car Crash Point Pattern on the Minnesota Major Roads 34

5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2.1 CSR Test Based on Inter-event Distances . . . . . . . . . . . . . 37

5.2.2 CSR Test Based on Nearest-neighbor Distances . . . . . . . . . . 37

5.3 Result and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Conclusion 39

References 40

Appendix A. Glossary and Acronyms 41

A.1 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

A.2 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

iv

Page 7: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Appendix B. Code 43

B.1 R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

B.1.1 Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

B.1.2 Cluster Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

B.1.3 Regular Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

B.1.4 Random Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

B.1.5 Car Crash Pattern on the MN Roads . . . . . . . . . . . . . . . . 50

B.2 Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

v

Page 8: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

List of Tables

A.1 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

vi

Page 9: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

List of Figures

2.1 An example of a regular grid network with m = n = 11. . . . . . . . . . 7

2.2 Four possible locations of two arbitrary points on the 5× 5 grid . . . . . 8

2.3 The 11× 11 grid network . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Simulation result and plot for CDF when t < 1 (blue is the theoretical

function) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 The 11× 11 grid network . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6 Simulation result and plot for CDF when t > 1 (blue is the theoretical

function) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Random point pattern on the grid network . . . . . . . . . . . . . . . . 23

3.2 Envelope plot for random process on the grid network . . . . . . . . . . 23

3.3 Random point pattern on a random network . . . . . . . . . . . . . . . 23

3.4 Envelope plot for random process on a random network . . . . . . . . . 23

3.5 Cluster point pattern on the grid network . . . . . . . . . . . . . . . . . 24

3.6 Envelope plot for cluster process on the grid network . . . . . . . . . . . 24

3.7 Cluster point pattern on a random network . . . . . . . . . . . . . . . . 25

3.8 Envelope plot for cluster process on a random network . . . . . . . . . . 25

3.9 Regular point pattern on the grid network . . . . . . . . . . . . . . . . . 26

3.10 Envelope plot for regular process on the grid network . . . . . . . . . . . 26

3.11 Regular point pattern on a random network . . . . . . . . . . . . . . . . 27

3.12 Envelope plot for regular process on a random network . . . . . . . . . . 27

4.1 Grid network and random point pattern . . . . . . . . . . . . . . . . . . 30

4.2 Envelope plot for random process on grid network . . . . . . . . . . . . 30

4.3 Random network and random point pattern . . . . . . . . . . . . . . . . 30

4.4 Envelope plot for random process on random network . . . . . . . . . . 30

vii

Page 10: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

4.5 Grid network and cluster point pattern . . . . . . . . . . . . . . . . . . . 31

4.6 Envelope plot for cluster process on grid network . . . . . . . . . . . . . 31

4.7 Random network and cluster point pattern . . . . . . . . . . . . . . . . 31

4.8 Envelope plot for cluster process on random network . . . . . . . . . . . 31

4.9 Grid network and regular point pattern . . . . . . . . . . . . . . . . . . 32

4.10 Envelope plot for regular process on grid network . . . . . . . . . . . . . 32

4.11 Random network and regular point pattern . . . . . . . . . . . . . . . . 32

4.12 Envelope plot for regular process on random network . . . . . . . . . . . 32

5.1 Location of fatal crashes in Minnesota in 2013 . . . . . . . . . . . . . . . 35

5.2 R Plot of Minnesota Major Roads Network. . . . . . . . . . . . . . . . . 35

5.3 R Plot of the car crash pattern on the Minnesota major roads. . . . . . 35

5.4 Display of the car crash pattern on the Minnesota major roads in ArcGIS 35

5.5 R plot of a CSR point pattern on the Minnesota major roads . . . . . . 36

5.6 Display of a CSR point pattern on the Minnesota major roads in ArcGIS 36

5.7 Envelope plot for CSR test for car crash pattern by inter-event method. 38

5.8 Envelope plot for CSR test of car crash pattern by nearest-neighbor method. 38

viii

Page 11: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Chapter 1

Introduction

1.1 Background and Organization

Practical investigation in ecology, epidemiology, and transportation often involves ob-

servation and study of spatial distribution of events. Researchers are interested in the

classification of a spatial point pattern and need to know if it is complete spatial ran-

domness (CSR) in the very beginning. Then the method of testing CSR for spatial

point process draws researchers more and more attention.

The techniques proposed for detecting non-randomness may be divided broadly into

two groups, described respectively as quadrant methods and distance methods [1]. The

power of randomness tests and, particularly tests based on nearest-neighbor distances,

inter-point distances and estimators of moment measures have been investigated by ar-

ticle [2]. Some papers have also tried to develop some other methods besides distances

and quadrants. In paper [3], the author introduced testing spatial randomness based on

angles between the vectors joining each sample point to its nearest neighbors. And a

method of qualifying spatial pattern where sample point move to a regular arrangement

which resembles a hexagonal lattice was discussed by [4]. To explore deeper the per-

formance of the CSR test, paper [5] presents results confirmed by ecological data and

illustrates that tests without edge-effect correction proposed by Diggle have a higher

power for small sample sizes.

The assumption of all these works is that spatial point events can be located any-

where on the planar region. However, spatial points can only be located on edges of

1

Page 12: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

2

a specific network in some practical scenarios. For example, car crash locations lie on

roads, which are able to form a roads network. Then the CSR test should become

different and complicated in the sense that inter-event distances are not Euclidean dis-

tance any more, and have to adjust to the geometry of network. Motivated by the

concern, CSR tests based on inter-event distances and nearest-neighbor distances [6]

are discussed under network scenario and verified to be applicable to the network point

pattern in this thesis. The result is confirmed by three point processes simulated on a

grid network and random network.

In terms of the test method based on the inter-event distance, it would be precise and

simple enough to implement if the theoretical cumulative distribution function (CDF) of

CSR were known. Not surprisingly, there are already some fancy results from the most

common cases of square or circular regions. For a square of unit side, the distribution

function of inter-event distances is

H(t) =

πt2 − 8t3/3 + t4/2 0 ≤ t ≤ 1

1/3− 2t2 − t4/2 + 4(t2 − 1)12 (2t2 + 1)/3

+2t2 arcsin(2t−2 − 1) 1 < t ≤√

2

For a circle of unit radius the corresponding expression is

H(t) = 1 + π−1{2(t2 − 1) arccos(t/2)− t(1 + t2/2)√

1− t2/4}

for all 0 ≤ t ≤ 2 [6].

If we consider the CSR point pattern on the network, the distances relying on the

geometry of network would make a difference from the case of planar region. Compared

to the CDF of distances in planar space, CDF of inter-event distances for CSR point

pattern on the network hence become more serious and important for our network-

orientated research. To be organized, this thesis will start by presenting how to derive

the CDF of inter-event distances for the CSR point pattern on a grid network.

• Chapter 2 describes how to derive the cumulative distribution function of inter-

event distances for the complete spatial point pattern on grid network.

• In Chapter 3, complete spatial randomness test method based on inter-event dis-

tances is discussed and summarized for implementation.

Page 13: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

3

• In Chapter 4, complete spatial randomness test method based on nearest-neighbor

distances is discussed and summarized for implementation.

• Chapter 5 applies the proposed feasible complete spatial randomness test to the

car crash pattern of 2013 on Minnesota major roads network.

• Chapter 6 presents a final discussion and conclusion of the work presented in the

project.

1.2 Definitions

1.2.1 Complete Spatial Randomness

We consider a network denoted as a graph G = {V,E}, where the set of vertices V =

{v1, v2, · · · , vl}, and the set of edges E = {e1, e2, · · · , em}. For the sake of simplicity, G

is assumed to be connected, which means there is a path between any pair of vertices.

Furthermore, |ei| is the length of edge ei. Let S denote the events from a spatial points

pattern on G, S = {s1, s2, · · · , sn}, constrained to be on the edges. Note: In graph

theory, if the vertices v0, v1, · · · , vk of the walk W = v0e1v1e2v2 · · · ekvk are distinct,

then W is called a path [7]. Speaking of the graph considered, we can say the edges are

weighted representing the distance between two end vertices.

The definition of complete spatial randomness (CSR) can be extended to the network

case as follows [6, 8]:

• For a density λ > 0 and a finite network G, the number of events of S, say |S|,must follow a Poisson distribution with mean λ|E| where |E| =

∑mi=1 |ei|.

• Given the number of events, i.e., |S| = n, events are distributed uniformly on the

network G. That is to say the n events of S form an independent random sample

from the continuous uniform distribution on E.

1.2.2 Spatial Point Processes on Networks

A spatial network point pattern, which can be classified as random, regular, or clustered,

is a set of locations distributed within an observed network. The different point process

model achieves a certain deviation from complete spatial randomness.

Page 14: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

4

• Random process is a realization of complete spatial randomness;

• Cluster process has smaller average inter-event distances than CSR and different

intensity within the network;

• Regular process has greater average inter-event distances than CSR, and points

are distributed regularly (inter-event distance is no less than a specified value δ)

within the network.

1.2.3 Inter-event Distances

Here are some notations which will be used in the following chapters. n is the number

of events in a spatial points pattern on the network G. tij is the shortest inter-event

distance between point i and j, i < j, along the edges in spatial point pattern S. T

= {tij |i, j = 1, · · · , n, and i < j} is the collection of all inter-event distances from S.

Clearly, number of elements in set T is |T | = n(n− 1)/2.

1.2.4 CSR Test Based on Inter-event Distances

Based on the definition, a test of complete spatial randomness addresses whether or not

the observed point pattern could possibly be a realization of a homogeneous Poisson

process. H(t) is the empirical distribution function (EDF) of all inter-event distances

in T from an observed spatial point pattern S lying on network G,

H(t) =

{1

2n(n− 1)

}−1#(tij ≤ t).

Hi(t), (i = 1, 2, · · · , s), is the EDF of all inter-event distances in the ith independent

simulated CSR point pattern on the same network G. The average function H(t), upper

envelope U1(t), and lower envelope L1(t) can be calculated as follows.

H(t) =1

s

s∑i=1

Hi(t) ,

U1(t) = max{Hi(t)} ,

L1(t) = min{Hi(t)} .

for all i = 1, · · · , s.

Page 15: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

5

1.2.5 Nearest-neighbor Distances

n = # of events in a spatial points pattern S on the network G.

tij = inter-event network distance between point i and j, i < j, in pattern S on the

network G.

Nearest-neighbor distance: ri = min{tij |1 ≤ j ≤ n, j 6= i}, i = 1, 2, · · · , n, R = {ri, i =

1, 2, · · · , n}. Clearly, number of elements in set R is |R| = n.

1.2.6 CSR Test Based on Nearest-neighbor Distances

K(t) is the empirical distribution function (EDF) of nearest-neighbor distances in set

R from an observed spatial point pattern S lying on G.

K(t) = n−1#(ri ≤ t)

Ki(t), (i = 1, 2, · · · , s), is the empirical distribution function (EDF) of nearest-neighbor

distances in the ith independent simulated CSR point pattern on the same network G.

Then the average function K(t), upper envelope U2(t), and lower envelope L2(t) can be

calculated as follows.

K(t) =1

s

s∑i=1

Ki(t)

U2(t) = max{Ki(t)} and L2(t) = min{Ki(t)} for all i = 1, 2, · · · , s

Page 16: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Chapter 2

CDF of Inter-event Distances on

a Grid Network under CSR

Testing of complete spatial randomness (CSR) on a network is the primary task in

this project. For a spatial point process on a planar network, we are interested in

the distribution of the locations of events if the underlying mechanism is completely

random. If we are able to have the theoretical cumulative distribution function (CDF)

of inter-event distances for a point pattern under CSR, say H(t), then the CDF of the

observed pattern should be close to H(t) if the pattern is completely random. If there

is a significant difference, the observed pattern does not have the property of CSR and

some further investigation should be carried out. In this chapter, I will discuss how

to derive the CDF of inter-event distances for the CSR point pattern in a regular grid

network.

2.1 Preliminary

As stated in chapter 1, theoretical distributions of inter-event distances are available for

some simple cases in planar spatial point processes. For regions with complex bound-

aries, it is in general impossible to derive the distribution function in a straightforward

way. The same situation also occurs for spatial point process on a network. The

inter-event distance depends on the geometric structure of the network. There may be

multiple paths connecting two locations on a network. If the network is convoluted,

6

Page 17: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

7

Figure 2.1: An example of a regular grid network with m = n = 11.

it is extremely challenging to work out an exact theoretical distribution of inter-event

distances. In this chapter, we will work on a grid network which has a regular geometric

structure.

A regular grid network consists of m horizontal lines and n vertical lines with the

same spacing in both directions. Figure 2.1 shows an example with m = n = 11.

without loss of generality, the spacing is assumed to be 1 in both horizontal and vertical

directions.

The inter-event distance between two locations is defined to be their shortest-path

distance allowed by the geometry of the space in which the spatial point process is

embedded. For a spatial point pattern on a two-dimensional plane, it is simply the

Euclidean distance√

(x1 − x2)2 + (y1 − y2)2 between two locations (x1, y1) and (x2, y2).

For spatial point processes on a network, it could be challenging to get the shortest-path

distance.

As shown in Figure 2.2 where two points are located on a 5 × 5 regular grid, there

are four different cases. Two points can be both on horizontal lines; or both on vertical

lines; or one on a vertical line and the other one on a horizontal line. We denote s1, s2

to be the locations of two arbitrary CSR points on an m × n regular grid network.

Page 18: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

8

Furthermore, we define a location function for an arbitrary point,

D(si) =

{h if si is on a horizontal line

v if si is on a vertical line

For a simpler notation, we use si = h representing horizontal point and si = v repre-

senting vertical points in the probability expressions. The notation is assumed to not

have the property of function.

Figure 2.2: Four possible locations of two arbitrary points on the 5× 5 grid

To simplify the notation, we also define ri = (mi, ni) to be (i) the nearest vertex to

the left of si if si = h; (ii) the nearest vertex below si if si = v. In addition, we define

x to be the disance between s1 and r1, and y to be the distance between s2 and r2.

Apparently, both x and y are between 0 and 1.

Based on the definition of CSR (discussed in 1.2.1 Complete Spatial Randomness),

two points s1 and s2 are distributed independently, therefore their related vertices r1

and r2 are also independent. Suppose we have an m×n grid, the distribution of related

variables and distance functions for four cases are easy to obtain. In this grid network,

the inter-event distance should be equivalent to the taxicab distance. According to the

Wikipedia, the taxicab distance, d1, between two vectors p, q in an n-dimensional real

vector space with fixed Cartesian coordinate system, is the sum of the lengths of the

projections of the line segment between the points onto the coordinate axes. Therefore,

in the plane, d1(p,q) = |p1 − q1|+ |p2 − q2| where p = (p1, p2) and q = (q1, q2).

(1) D(s1) = h,D(s2) = h. m1,m2 ∼ DU(1,m − 1);n1, n2 ∼ DU(1, n);x, y ∼UNIF(0, 1).

(a) If m1 6= m2, d(s1, s2) = |n1 − n2|+ |m1 −m2|+ x− y

(b) If m1 = m2, n1 6= n2, d(s1, s2) = |n1 − n2|+min(x+ y, 2− x− y)

Page 19: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

9

(c) If m1 = m2, n1 = n2, d(s1, s2) = |x− y|

(2) D(s2) = v,D(s2) = v. m1,m2 ∼ DU(1,m);n1, n2 ∼ DU(1, n−1);x, y ∼ UNIF(0, 1).

(a) If n1 6= n2, d(s1, s2) = |m1 −m2|+ |n1 − n2|+ x− y

(b) If n1 = n2,m1 6= m2, d(s1, s2) = |m1 −m2|+min(x+ y, 2− x− y)

(c) If n1 = n2,m1 = m2, d(s1, s2) = |x− y|

(3) D(s1) = h,D(s2) = v. m1 ∼ DU(1,m − 1);m2 ∼ DU(1,m);n1 ∼ DU(1, n);n2 ∼DU(1, n− 1);x, y ∼ UNIF(0, 1).

(a) If m2 > m1, n2 ≥ n1, d(s1, s2) = m2 −m1 − x+ n2 + y − n1

(b) If m2 > m1, n2 < n1, d(s1, s2) = m2 −m1 − x+ n1 − n2 − y

(c) If m2 ≤ m1, n2 ≥ n1, d(s1, s2) = m1 + x−m2 + n2 + y − n1

(d) If m2 ≤ m1, n2 < n1, then d(s1, s2) = m1 + x−m2 + n1 − n2 − y

(4) D(s1) = v,D(s2) = h. m1 ∼ DU(1,m);m2 ∼ DU(1,m − 1);n1 ∼ DU(1, n −1);n2 ∼ DU(1, n);x, y ∼ UNIF(0, 1).

(a) If n2 > n1,m2 ≥ m1, d(s1, s2) = m2 −m1 + y + n2 − n1 − x

(b) If n2 > n1,m2 < m1, d(s1, s2) = m1 −m2 − y + n2 − n1 − x

(c) If n2 ≤ n1,m2 ≥ m1, d(s1, s2) = n1 + x− n2 +m2 + y −m1

(d) If n2 ≤ n1,m2 < m1, d(s1, s2) = m1 −m2 − y + n1 + x− n2

Here DU(a, b) denotes the discrete uniform distribution on {a, a + 1, . . . , b − 1, b};UNIF(a, b) denotes the continuous uniform distribution on interval (a, b); and d(s1, s2)

is the shortest-path distance between locations s1 and s2.

2.2 CDF of Inter-event Distances if t < 1

2.2.1 Cumulative Distribution Function

We start by calculating the CDF of inter-event distances less than one. We would like

to obtain the CDF, P (d ≤ t) where t < 1, by conditioning on the four cases stated

above.

Page 20: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

10

(1) s1 = h, s2 = h. m1,m2 ∼ DU(1,m− 1);n1, n2 ∼ DU(1, n);x, y ∼ UNIF(0, 1).

(a) P (d ≤ t|s1, s2 = h,m1 6= m2) = P (|n1 − n2|+ |m1 −m2|+ x− y < t|s1, s2 =

h,m1 6= m2). Since t < 1 and x − y ∈ (−1, 1), n1 = n2 and |m1 −m2| = 1

must be satisfied. We have

P (d ≤ t|s1, s2 = h,m1 6= m2)

= P (n1 = n2|s1, s2 = h)P (|m1 −m2| = 1|s1, s2 = h,m1 6= m2)P (x− y ≤ t− 1)

=

[1

n

] [2(m− 2)

(m− 1)2 − (m− 1)

] [1

2(1 + t− 1)2

]=

t2

n(m− 1).

To obtain P (d ≤ t|s1, s2 = h) by the Total Probability Theorem, P (m1 6=m2|s1, s2 = h) = (m− 2)/(m− 1) is needed in later calculation.

(b) P (d ≤ t|s1, s2 = h,m1 = m2, n1 6= n2) = P (|n1−n2|+min(x+y, 2−x−y) ≤t|s1, s2 = h,m1 = m2, n1 6= n2). Since min(x + y, 2 − x − y) ∈ (0, 1), the

equation makes sense if n1 = n2. However, assumption of n1 6= n2 results in

a contradiction. Thus, P (d ≤ t|s1, s2 = h,m1 = m2, n1 6= n2) = 0.

(c) When n1 = n2,m1 = m2,

P (d ≤ t|s1, s2 = h, n1 = n2,m1 = m2)

= P (|x− y| ≤ t|s1, s2 = h, n1 = n2,m1 = m2)

= P (x− y ≤ t)− P (x− y ≤ −t)

= −1

2t2 + t+

1

2− 1

2(1− t)2

= −t2 + 2t.

Also, P (m1 = m2, n1 = n2|s1, s2 = h) = 1/(m−1)n will be used in obtaining

P (d ≤ t|s1, s2 = h)

By the Total Probability Theorem,

P (d < t|s1, s2 = h) =t2

n(m− 1)

m− 2

m− 1+ 0 + (−t2 + 2t)

1

m− 1

1

n

=−t2 + (2m− 2)t

n(m− 1)2

Page 21: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

11

(2) s2 = v, s2 = v. m1,m2 ∼ DU(1,m);n1, n2 ∼ DU(1, n− 1);x, y ∼ UNIF(0, 1).

Obviously, we can get P (d ≤ t|s1, s2 = v) by just exchanging m and n in P (d ≤t|s1, s2 = h)

P (d ≤ t|s1, s2 = v) =−t2 + (2n− 2)t

m(n− 1)2

(3) s1 = h, s2 = v. m1 ∼ DU(1,m − 1);m2 ∼ DU(1,m);n1 ∼ DU(1, n);n2 ∼DU(1, n− 1);x, y ∼ UNIF(0, 1).

(a) P (d ≤ t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1) = P (m2−m1 +n2−n1−x+ y <

t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1). Because t < 1, m2 − m1 ≥ 1, and

−x+ y ∈ (−1, 1). Therefore, m2 −m1 = 1, n2 = n1, and 1− x+ y < t must

be satisfied. We have

P (d < t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1)

= P (m2 −m1 = 1|s1 = h, s2 = v,m2 > m1)P (n2 = n1|s1 = h, s2 = v, n2 ≥ n1)

P (x− y > 1− t)

=

[m− 1

m(m− 1)/2

] [n− 1

n(n− 1)/2

] [1− (−1

2(1− t)2 + 1− t+

1

2)

]=

2t2

mn

(b) P (d < t|s1 = h, s2 = v,m2 > m1, n2 < n1) =2t2

mn. P (m2 > m1, n2 < n1|s1 =

h, s2 = v) = 1/4.

(c) P (d < t|s1 = h, s2 = v,m2 ≤ m1, n2 ≥ n1) =2t2

mn. P (m2 ≤ m1, n2 ≥ n1|s1 =

h, s2 = v) = 1/4.

(d) P (d < t|s1 = h, s2 = v,m2 ≤ m1, n2 < n1) =2t2

mn. P (m2 ≤ m1, n2 < n1|s1 =

h, s2 = v) = 1/4.

The conclusion of (b), (c), and (d) results from the equality among the four cases.

It is easy to see that each of the four cases contributes to 1/4 of the whole condition

when s1 is horizontal and s2 is vertical. Thus, by the Total Probability Theorem,

P (d < t|s1 = h, s2 = v) =1

4

2t2

mn+

1

4

2t2

mn+

1

4

2t2

mn+

1

4

2t2

mn=

2t2

mn

Page 22: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

12

(4) s1 is vertical, s2 is horizontal. Obviously, we can get P (d < t|s1 = v, s2 = h) by

exchanging m and n in P (d < t|s1 = h, s2 = v).

P (d < t|s1 = v, s2 = h) =2t2

mn

For a point generated by CSR, the probability that it is located on a horizontal edge is

P (h) =length of horizontal edges

length of horizontal edges + length of vertical edges=

(m− 1)n

(m− 1)n+ (n− 1)m

Similarly, the probability that the point lies on a vertical edge is P (v) =(n− 1)m

(m− 1)n+ (n− 1)m.

Combining all these four cases for two arbitrary points, we are able to get the cumulative

distribution function of inter-event distances in an m× n grid network as follows.

P (d < t, t < 1)

=P (h)2P (d < t|s1, s2 = h) + P (v)2P (d < t|s1, s2 = v)

+ P (h)P (v) [P (d < t|s1 = h, s2 = v) + P (d < t|s1 = v, s2 = h)]

=

[−t2 + (2m− 2)t

n(m− 1)2

] [(m− 1)n

(m− 1)n+ (n− 1m)

]2+

[−t2 + (2n− 2)t

m(n− 1)2

] [(n− 1)m

(m− 1)n+ (n− 1)m

]2+ 2

[2t2

mn

] [(m− 1)n

(m− 1)n+ (n− 1)m

] [(n− 1)m

(m− 1)n+ (n− 1)m

]=

(4mn− 5m− 5n+ 4)t2 + (4mn− 2m− 2n)t

(2mn−m− n)2

2.2.2 Simulation

As an illustration, I simulated 100 CSR point patterns consisting of 500 points each

on a 11 × 11 grid network. The procedure of simulation for complete spatial random

point pattern will be discussed in detail in the 3.3 Simulations. Once obtaining EDF

of inter-event distances for each simulation, I was able to calculate the mean function,

upper and lower envelopes which are defined in the 1.2.4 CSR Test Based on Inter-

event Distances. These three functions are plotted along with the theoretical result just

derived. From Figure 2.4, we can tell that the simulated and theoretical mean functions

are almost identical.

Page 23: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

13

Figure 2.3: The 11× 11 grid network

0.0 0.2 0.4 0.6 0.8 1.0

0.000

0.005

0.010

0.015

tt

CDF

tt

Figure 2.4: Simulation result and plot forCDF when t < 1 (blue is the theoreticalfunction)

2.3 CDF of Inter-event Distances if t > 1

2.3.1 Cumulative Distribution Function

The analysis for t > 1 is very similar to that of t < 1 in the previous section but much

more complicated. I skip similar content with the t < 1 here.

(1) Both horizontal. m1,m2 ∼ DU(1,m− 1), n1, n2 ∼ DU(1, n), x, y ∼ UNIF(0, 1).

(a) When m1 6= m2,

P (d < t|s1, s2 = h,m1 6= m2)

=P (|n1 − n2|+ |m1 −m2|+ x− y < t|s1, s2 = h,m1 6= m2)

=P (|n1 − n2|+ |m1 −m2| = dte|s1, s2 = h,m1 6= m2)P (x− y < t− dte)

+ P (|n1 − n2|+ |m1 −m2| = btc|s1, s2 = h,m1 6= m2)P (x− y < t− btc)

+ P (|n1 − n2|+ |m1 −m2| ≤ btc − 1|s1, s2 = h,m1 6= m2).

Page 24: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

14

Let us define a function called C1h(x) as following,

C1h(x) = P (|n1 − n2|+ |m1 −m2| = x|s1, s2 = h,m1 6= m2)

=

t(x)∑i=s(x)

P (|n1 − n2| = x− i)P (|m1 −m2| = i)

=

t(x)∑i=s(x)

2(m− 1− i)(m− 1)(m− 2)

· 2(n− x+ i)

n2− I(t(x) = x)

2(m− 1− x)

(m− 1)(m− 2)· 1

n

=4∑t(x)

i=s(x)(m− 1− i)(n− x+ i)

(m− 1)(m− 2)n2− I(t(x) = x)

2(m− 1− x)

n(m− 1)(m− 2).

where s(x) = max{1, x− (n− 1)}, t(x) = min{x,m− 2}, and I(t(x) = x) is

an indicator function. Also, we have

P (x− y < t− btc) = −1

2(t− btc)2 + t− btc+

1

2

P (x− y < t− dte) =1

2(1 + t− dte)2.

Thus, we are able to write the conditional probability in terms of C1h(x),

P (d < t|s1, s2 = h,m1 6= m2) (2.1)

=1

2C1h(dte)(1 + t− dte)2 + C1

h(btc)(−1

2(t− btc)2 + t− btc+

1

2) +

btc−1∑j=1

C1h(j).

Note: If t is an integer, P (d < t|s1, s2 = h,m1 6= m2) = P (|n1 − n2|+ |m1 −m2| = t|s1, s2 = h,m1 6= m2)P (x − y < 0) + P (|n1 − n2| + |m1 − m2| ≤t− 1) = 1

2C1h(t) +

∑t−1j=1C

1h(j). Generalize this result to the form of 2.1, it is

equal to 2.1 after removed the second term. That is 12C

1h(dte)(1 + t−dte)2 +∑btc−1

j=1 C1h(j). Also, P (m1 6= m2|s1, s2 = h) = (m − 2)/(m − 1) is required

to derive the probability of distance less than t conditioning on both points

are horizontal.

(b) When m1 = m2 and n1 6= n2,

P (d < t|s1, s2 = h,m1 = m2, n1 6= n2)

=P (|n1 − n2| = btc)P (min(x+ y, 2− x− y) < t− btc) + P (|n1 − n2| ≤ btc − 1)

Page 25: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

15

Let us define a function called C2h(x) as following.

C2h(x) = P (|n1 − n2| = x|s1, s2 = h,m1 = m2, n1 6= n2)

=

0 if x > n− 12(n− x)

n(n− 1)if 1 ≤ x ≤ n− 1

Also, we have

P (min(x+ y, 2− x− y) < t− btc)

=P (x+ y < t− btc) + P (x+ y > 2− t+ btc)

=1

2(t− btc)2 + 1−

[−1

2(2− t+ btc)2 + 2(2− t+ btc)− 1

]=(t− btc)2

Thus, we are able to write the conditional probability in terms of C2h(x).

P (d < t|s1, s2 = h,m1 = m2, n1 6= n2) (2.2)

=C2h(btc)(t− btc)2 +

btc−1∑j=1

C2h(j)

Note: If t is an integer, P (d < t|s1, s2 = h,m1 = m2, n1 6= n2) = P (|n1 −n2| ≤ t − 1) =

∑t−1j=1C

2h(j), which is also 2.2 after removed the first term.

Also, P (m1 = m2, n1 6= n2|s1, s2 = h) = (n− 1)/n(m− 1) should be known

for calculating P (d < t|s1, s2 = h).

(c) Since |x − y| ∈ [0, 1] and t is assumed to be greater than 1 in this section.

So P (d < t|s1, s2 = h,m1 = m2, n1 = n2) = P (|x − y| < t|s1, s2 = h,m1 =

m2, n1 = n2) = 1. In addition, P (m1 = m2, n1 = n2|s1, s2 = h) = 1/n(m −1).

Therefore, by the Total Probability Theorem,

P (d < t|s1, s2 = h) (2.3)

=m− 2

m− 1

1

2C1h(dte)(1 + t− dte)2 + C1

h(btc)(−1

2(t− btc)2 + t− btc+

1

2) +

btc−1∑j=1

C1h(j)

+

n− 1

n(m− 1)

C2h(btc)(t− btc)2 +

btc−1∑j=1

C2h(j)

+1

n(m− 1)

Page 26: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

16

(2) Both vertical. Clearly, P (d < t|s1, s2 = v) can be obtained by exchanging m and

n in P (d < t|s1, s2 = h). So let us define function C1v (x) and C2

v (x) as following.

C1v (x) =

4∑t(x)

i=s(x)(n− 1− i)(m− x+ i)

(n− 1)(n− 2)m2− I(t(x) = x)

2(n− 1− x)

m(n− 1)(n− 2)

C2v (x) =

0 if x > m− 12(m− x)

m(m− 1)if 1 ≤ x ≤ m− 1

where s(x) = max{1, x− (m− 1)}, t(x) = min{x, n− 2}. Then we can have

P (d < t|s1, s2 = v) (2.4)

=n− 2

n− 1

1

2C1v (dte)(1 + t− dte)2 + C1

v (btc)(−1

2(t− btc)2 + t− btc+

1

2) +

btc−1∑j=1

C1v (j)

+

m− 1

m(n− 1)

C2v (btc)(t− btc)2 +

btc−1∑j=1

C2v (j)

+1

m(n− 1)

(3) s1 is horizontal, s2 is vertical. m1 ∼ DU(1,m−1), m2 ∼ DU(1,m), n1 ∼ DU(1, n),

n2 ∼ DU(1, n− 1), x, y ∼ UNIF(0, 1).

(a) When m2 > m1 and n2 ≥ n1,

P (d < t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1)

=P (m2 −m1 + n2 − n1 − (x− y) < t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1)

=P (m2 −m1 + n2 − n1 = dte)P (x− y > dte − t) + P (m2 −m1 + n2 − n1 = btc)

P (x− y > btc − t) + P (m2 −m1 + n2 − n1 ≤ btc − 1).

Let us define a function called C1hv(x) as following.

C1hv(x) = P (m2 −m1 + n2 − n1 = x|s1 = h, s2 = v,m2 > m1, n2 ≥ n1)

=

t(x)∑i=s(x)

P (m2 −m1 = i)P (n2 − n1 = x− i)

=

t(x)∑i=s(x)

2(m− i)m(m− 1)

2(n− 1− x+ i)

n(n− 1)

=4∑t(x)

i=s(x)(m− i)(n− 1− x+ i)

mn(m− 1)(n− 1),

Page 27: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

17

where s(x) = max{1, x− (n− 2)}, t(x) = min{x,m− 1}. Also, we have

P (x− y > dte − t) = 1−[−1

2(dte − t)2 + (dte − t) +

1

2

]=

1

2(dte − t)2 − (dte − t) +

1

2

P (x− y > btc) = 1− 1

2(1 + btc − t)2

Thus, we are able to write the conditional probability in terms of C1hv(t),

P (d < t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1) (2.5)

=C1hv(dte)

[1

2(dte − t)2 − (dte − t) +

1

2

]+ C1

hv(btc)[1− 1

2(1 + btc − t)2

]

+

btc−1∑j=1

C1hv(j).

Note: If t is an integer, P (d < t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1) =

P (m2 −m1 + n2 − n1 = t)P (x− y ≥ 0) + P ((m2 −m1 + n2 − n1 ≤ t− 1) =

C1hv(t)

12 +

∑t−1j=1C

1hv(t), which is also equal to 2.5 took off the second term.

Also, we have P (m2 > m1, n2 ≥ n1)|s1 = h, s2 = v) = 1/4.

(b) When m2 > m1 and n2 < n1, the probability is same as (a). Also, P (m2 >

m1, n2 > n1|s1 = h, s2 = v) = 1/4.

(c) When m2 ≤ m1 and n2 ≥ n1, the probability is same as (a). Also, we know

P (m2 ≤ m1, n2 ≥ n1|s1 = h, s2 = v) = 1/4.

(d) When m2 ≤ m1 and n2 < n1, the probability is same as (a). Also, we know

P (m2 ≤ m1, n2 < n1|s1 = h, s2 = v) = 1/4.

Again, the conclusion of (b), (c), and (d) comes from the equality among the

four cases. It is easy to see that each of the four cases contributes 1/4 to the

whole condition and they are equivalent to each other. Therefore, by the Total

Page 28: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

18

Probability Theorem,

P (d < t|s1 = h, s2 = v) (2.6)

=1

44P (d < t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1)

=P (d < t|s1 = h, s2 = v,m2 > m1, n2 ≥ n1)

=C1hv(dte)

[1

2(dte − t)2 − (dte − t) +

1

2

]+ C1

hv(btc)[1− 1

2(1 + btc − t)2

]

+

btc−1∑j=1

C1hv(j).

(4) s1 is vertical, s2 is horizontal. Clearly, P (d < t|s1 = v, s2 = h) can be obtained

by just exchanging m and n in P (d < t|s1 = h, s2 = v). Actually, P (d < t|s1 =

v, s2 = h) must be equal to P (d < t|s1 = h, s2 = v) because a 90◦ rotation of grid

will not change the distribution of distances.

Finally, we are able to get P (d < t) for t > 1 by combining the four cases above. Again,

we will use P (h) =(m− 1)n

(m− 1)n+ (n− 1)mand P (v) =

(n− 1)m

(m− 1)n+ (n− 1)m.

P (d < t, t > 1) (2.7)

=P (d < t|s1, s2 = h)

[(m− 1)n

(m− 1)n+ (n− 1m)

]2+ P (d < t|s1, s2 = v)

[(n− 1)m

(m− 1)n+ (n− 1)m

]2+ 2

[(m− 1)n

(m− 1)n+ (n− 1m)

] [(n− 1)m

(m− 1)n+ (n− 1)m

]P (d < t|s1 = h, s2 = v)

where P (d < t|s1, s2 = h), P (d < t|s1, s2 = v) and P (d < t|s1 = h, s2 = v) refer to 2.3,

2.4, and 2.6.

2.3.2 Simulation

Exactly same as what I did for t < 1, I did 100 CSR simulations on the 11 × 11

grid network in order to demonstrate the result obtained is reasonable enough. I plot

three functions (mean function, upper and lower envelopes which are defined in the

1.2.4 CSR Test Based on Inter-event Distances) from simulations together with my

theoretical function in one figure. From Figure 2.5, we can tell that the mean function

Page 29: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

19

and theoretical function (blue) are already coincident and my result is between upper

and lower envelope exactly.

Figure 2.5: The 11× 11 grid network

5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

tt

CDF

tt

Figure 2.6: Simulation result and plot for

CDF when t > 1 (blue is the theoretical

function)

Page 30: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Chapter 3

CSR Test Based on Inter-event

Distances

As shown in the previous chapter, the theoretical cumulative distribution function

(CDF) is complicated and difficult to obtain even for the simplest grid network. In

practice, however, we have various kinds of networks which are far more complicated

than the regular grid network. It will not be feasible to use the theoretical CDF of

the complete spatial random (CSR) point pattern as a criterion when testing for CSR.

Instead a well-approximated true function derived from Monte Carlo simulations can

be utilized to replace the analytic solutions. In this chapter, I will present how to im-

plement the CSR test based on inter-event distances using Monte Carlo simulations in

real applications.

3.1 CSR Test Implementation

Since the empirical distribution function converges to the cumulative distribution func-

tion when the sample size is large, H(t) can be used as an approximation to the theoreti-

cal CDF of inter-event distances for CSR point pattern on the given network. Under the

hypothesis of CSR, H(t), which represents the observed distribution function, should

be close to H(t) which is regarded as the “true” function. In other words, a plot of H(t)

against H(t) should be roughly linear. To assess the significance or departures from

linearity, upper bound and lower bound of Hi(t) are also evaluated and plotted against

20

Page 31: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

21

H(t). Hence, if H(t) is always bounded by U1(t) and L1(t) across all simulations, a

hypothesis of CSR certainly cannot be rejected. On the other hand, if H(t) is more

extreme than U1(t) or L1(t), it is very likely that the underlying data-generating mech-

anism is not CSR. In summary, we can implement the CSR test based on inter-event

distances as follows.

(1) Calculate all unique inter-event distances tij , 1 ≤ i < j ≤ n, in the observed

spatial point pattern on the network G and get H(t);

(2) Simulate s complete spatial random point patterns on the network G indepen-

dently. To make the p-value for that EDF of CSR pattern is upper or lower

envelope at most 0.1, the s is generally greater than 100;

(3) For each ith simulated CSR pattern, calculate all inter-event distances tij and get

Hi(t);

(4) Obtain H(t), U1(t), and L1(t);

(5) Plot H(t), U1(t), and L1(t) against H(t).

In spatial point process on the network, distance between a pair of points relies on

the geometry of observed network, not the usual Euclidean distance used for planar

region. Therefore, the main difficulty would be obtaining the network distance which is

an essential part for our CSR test. Only if all distances are calculated, the comparison

between the observed data and the data simulated from CSR can be analyzed in terms

of the CDF to test different models. The igraph package of R computing environment

turns out to be a very useful tool to deal with the problem of finding the shortest distance

on a network. It is very convenient and efficient for visualizing stream network, point

pattern, and calculating inter-event distances on network. It is also able to visualize

simulated network point process.

3.2 Simulations

Theoretically, since the CSR test method is discussed for a general setup, the proposed

test method should work for both network and planar regions. I did simulations for

Page 32: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

22

three typical point processes to for demonstration before I applied this method to a real

application.

3.2.1 Random Process

Based on the definition of CSR, complete spatial random process is a binomial process

if the number of events n is fixed. To generate complete spatial random n points on a

network, a binomial process was simulated such that n points are added onto the edges

one by one. Let m be the number of edges in network. The procedure of simulation for

random process is summarized as follows.

(a) Choose one edge ei with probability of|ei|∑mi=1 |ei|

;

(b) The point is distributed uniformly on ei, i.e., d ∼ UNIF(0, |ei|) where d is the

distance to one vertex of ei;

(c) Repeat (a) and (b) n times to get n random points on the network.

I generated complete spatial random point patterns on two different types of net-

works. The first one is a regular grid network as the one shown in the previous chapter

where all edges have the same length. The second one is a random network in which

edges have different lengths and vertices are distributed irregularly. A random network

is constructed by vertices distributed randomly and edges with connection probability

of 0.5. To have a connected and planar network, some restrictions were added onto the

generation procedure of the random network. For example, distance between every two

points is no less than 0.5, and only considering a point’s connection to its nearest 3

points. Figures 3.1 and 3.3 show these two types of networks.

Page 33: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

23

Figure 3.1: Random point pattern on the

grid network

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 3.2: Envelope plot for random pro-

cess on the grid network

Figure 3.3: Random point pattern on a ran-

dom network

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 3.4: Envelope plot for random pro-

cess on a random network

For the spatial point pattern on the grid network, the envelope plot in Figure 3.2

shows that H(t) is roughly equal to H(t) and lies between U1(t) and L1(t) throughout

its range, which suggests an acceptance of CSR. For the random network, the envelope

plot is Figure 3.4 also shows an acceptance of CSR which is similar to that of the grid

network. We conclude that these data are compatible with completely random spatial

distribution. This is not surprising since the data are indeed generated using CSR.

Page 34: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

24

3.2.2 Cluster Process

For cluster processes, we generate the data using a two-step procedure. First, parent

points are independently generated from the uniform distribution on the network (bi-

nomial process) [9]. We then generate some child points from each of the parents. The

number of children follows a Poisson distribution, and the positions of children relative

to their parents are normally distributed. After child points are generated, we remove

all parent points. Thus the final data set only consists of the children. The procedure

for generating cluster process is summarized as follows.

(a) Generate 20 parents following steps in complete spatial randomness simulation;

(b) For each parent, the number of its children follows Poisson distribution with mean

10. The position of each child relative to its parent is normally distributed, i.e.,

N(0, 0.52);

(c) Remove the parents and only keep the children.

Figure 3.5: Cluster point pattern on the grid

network

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average1

H1

Figure 3.6: Envelope plot for cluster process

on the grid network

Page 35: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

25

Figure 3.7: Cluster point pattern on a ran-

dom network

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average1

H1

Figure 3.8: Envelope plot for cluster process

on a random network

Figures 3.5 and 3.7 show realizations of cluster processes on a regular grid network

and a random network. Figure 3.6 shows that H(t) is greater than H(t) within the

range and lies above U1(t) for very small values of H(t). This is typical for cluster

processes because there is an excess of short inter-event distances. The plot suggests a

rejection of CSR in favor of clustering. Similarly, Figure 3.8 suggests the rejection of

CSR because of the obvious departure from complete spatial randomness. In summary,

if the probability of getting small distances is higher than that of CSR point pattern,

the observed point patten should be more likely to be classified as a cluster process.

3.2.3 Regular Process

In order to generate events with regularity, I started with a homogeneous Poisson process

with n events. The density λ = n/∑m

i=1 |ei|. Any two events separated by a distance of

less than a specified value of δ are thinned. The probability of retention for a point x

Page 36: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

26

on network is

P (x is retained) =

(1− πδ2∑m

i=1 |ei|

)n= exp

{n ln

(1− πδ2∑m

i=1 |ei|

)}≈ exp

{−n πδ2∑m

i=1 |ei|

}= e−λπδ

2

Here is the procedure for simulation of regular process.

(a) Generate a CSR point pattern with density λ =n∑m

i=1 |ei|.

(b) For each point x, x is retained with probability of e−λπδ2

as long as the distance

between any two events is less than a specified value δ.

(c) The retained points are regular point pattern we wanted.

Figure 3.9: Regular point pattern on thegrid network

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 3.10: Envelope plot for regular pro-cess on the grid network

As shown in the Figures 3.10 and 3.12, H(t) is less than H(t) and even lower than

L1(t) for small inter-event distances. The reason is that a regular process does not have

inter-event distances smaller than a certain lower threshold. Thus the envelope plots

claim the rejection of complete spatial randomness in favor of regularity.

Page 37: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

27

Figure 3.11: Regular point pattern on a ran-dom network

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 3.12: Envelope plot for regular pro-cess on a random network

3.2.4 Conclusion

In conclusion, we considered Monte Carlo tests for CSR for both regular grid network

and random network. The results are similar for two types of networks. The conclusions

based on envelope plots are consistent with the original assumptions. In other words,

the proposed CSR test method based on inter-event distances works well for network

case and is able to test any spatial point pattern on network correctly.

Page 38: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Chapter 4

CSR Test Based on

Nearest-neighbor Distances

Besides inter-event distance, nearest-neighbor distance is another sensible quantity which

can be used to test for complete spatial randomness (CSR). Similar to method based on

inter-event distance, the essential part of CSR test using nearest-neighbor distances is

to calculate nearest-neighbor distance of each event. In this chapter, I will discuss test

method based on the nearest-neighbor distances in the same way as previous chapter.

So some repeated definitions and analysis will be omitted in this chapter.

4.1 CSR Test Implementation

Similar to the demonstration in last chapter, we should know that plot of K(t) against

K(t) should be roughly linear under the hypothesis of CSR. Besides the rough linear

plot, if K(t) is always between U2(t) and L2(t) from significant number of simulations,

then a hypothesis of CSR certainly cannot be rejected. In summary, we can implement

CSR test based on nearest-neighbor distance as follows.

(1) Calculate all nearest-neighbor distances ri in the observed spatial point pattern

on the network G ⇒ K(t)

(2) Simulate s(s ≥ 100) complete spatial random point pattern on the network G

independently. Same reason as that of inter-event distances method, s should be

28

Page 39: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

29

at least 100.

(3) For each ith simulated CSR pattern, calculate all the nearest-neighbor distances

ri ⇒ Ki(t).

(4) Obtain K(t), U2(t), and L2(t). Plot K(t), U2(t), and L2(t) against K(t).

Again, the main difficulty would be obtaining the nearest-neighbor network distance

which is an essential part for CSR test. I also used the igraph package of R to deal

with the problem here. Firstly, we can calculate the inter-event distances from a point

x to all its neighbors. Then the minimum of these inter-event distances is the nearest-

neighbor distance of event x.

4.2 Simulations

Theoretically, since the CSR test method is discussed in a general case, the proposed test

method should work for both network and planar region. To guarantee its correctness

and feasibility, I did simulations for three typical point processes to verify the CSR

test method based on nearest-neighbor distance before I applied this method to a real

application.

In this section, I follow the same models and steps described in section 3.2. The

main difference between inter-event method and nearest-neighbor method is the amount

of distance data obtained. In contrast with n(n − 1)/2 inter-event distances used for

calculating EDF, we can only have n nearest-neighbor distances. To ensure the accuracy

of the approximated cumulative distribution function (CDF), an observed spatial point

pattern should be required to have large numbers of events. After several practice, at

least 200 points should be simulated for the CSR point pattern such that the plot of

EDF is smooth enough.

Page 40: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

30

4.2.1 Random Process

Figure 4.1: Grid network and random point

pattern

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 4.2: Envelope plot for random pro-

cess on grid network

Figure 4.3: Random network and random

point pattern

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 4.4: Envelope plot for random pro-

cess on random network

Figure 4.2 shows that K(t) is roughly equal to K(t) and lies between U2(t) and L2(t)

through out its range, which suggests an acceptance of CSR. For spatial point pattern

on the network generated randomly, envelope plot 4.4 shows an acceptance of CSR

too. So we conclude that these data are compatible with completely random spatial

distribution.

Page 41: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

31

4.2.2 Cluster Process

Figure 4.5: Grid network and cluster point

pattern

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average1

H1

Figure 4.6: Envelope plot for cluster process

on grid network

Figure 4.7: Random network and cluster

point pattern

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average1

H1

Figure 4.8: Envelope plot for cluster process

on random network

Figure 4.6 shows that K(t) is greater than K(t) within the range and lies above U2(t)

for very small values of K(t). So the plot claimed a rejection of CSR. Similarly, envelope

plot 4.8 tends to reject CSR because of the obvious departure from complete spatial

randomness. Since the probability of getting small nearest-neighbor distances is higher

than that of CSR point pattern, the simulated point patten should be more likely

classified as a cluster process.

Page 42: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

32

4.2.3 Regular Process

Figure 4.9: Grid network and regular point

pattern

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 4.10: Envelope plot for regular pro-

cess on grid network

Figure 4.11: Random network and regular

point pattern

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 4.12: Envelope plot for regular pro-

cess on random network

As shown in the Figure 4.10, K(t) is less than K(t) and even lower than L2(t) for small

nearest-neighbor distances. Thus the envelope plot claims a rejection of complete spatial

randomness. For the network generated randomly, the test result 3.12 also leads to a

rejection of CSR and conclusion that the point pattern intends to have larger nearest-

neighbor distances than that of CSR point pattern. So the simulated point pattern

should be more likely classified as a regular point pattern.

Page 43: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

33

4.2.4 Conclusion

In conclusion, each of the three testings for their corresponding typical point process

results in the correct conclusion which is consistent with the original assumption. In

other words, the proposed CSR test method based on nearest-neighbor distances works

well for network case and is able to test any spatial point pattern on the network

correctly.

Page 44: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Chapter 5

Car Crash Point Pattern on the

Minnesota Major Roads

In this chapter, I tested the car crash point pattern on the Minnesota Major Roads by

the method based on inter-event distance and nearest-neighbor distance respectively.

Both results suggested that car crashes on the Minnesota Roads tend to follow a cluster

point process.

5.1 Dataset

My datasets are

• Locations of car accidents in Minnesota in 2013 from National Highway

Traffic Safety Administration (NHTSA) and

• Major road network of Minnesota (U.S. highway, interstate highway, and

Minnesota highway) from Minnesota Geospatial Commons.

The Figure 5.1 represents how the car crash dataset looks originally on the website. And

the Figure 5.2 is the presentation of Minnesota major roads network in R environment.

34

Page 45: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

35

Figure 5.1: Location of fatal crashes in Min-

nesota in 2013

Figure 5.2: R Plot of Minnesota Major

Roads Network.

Since some of the car accidents did not occur on the highway of Minnesota. In order

to consider car crashes on major roads only, I detected and eliminated crash points not

on the major road network. So we end up with 234 observed points on this specific

network for testing, and these are shown in Figure 5.3 and Figure 5.4.

Figure 5.3: R Plot of the car crash pattern

on the Minnesota major roads.

Figure 5.4: Display of the car crash pattern

on the Minnesota major roads in ArcGIS

Page 46: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

36

5.2 Implementation

Considering implementation of the proposed CSR test methods from last two chapters,

computing all the inter-event distances must be the issue which should be handled

above all. Different from the network that can be generated by igraph package in

R, Minnesota major roads data is stored in a shape file and made up of hundreds of

road segments. So the inter-event distance along the Minnesota highway can not be

calculated by igraph package in R in this case. Fortunately, ArcGIS is very good at

processing shape files and calculating distance in terms of map data. Thus, I utilized

ArcGIS to compute inter-event distances of car crash pattern and all my simulated CSR

patterns on the Minnesota major roads.

Besides computing inter-event distances, simulating CSR point pattern on the Min-

nesota major roads is also an essential part of testing since simulations are the key to

obtaining H(t). For the simulations, I just followed the same steps described in last two

chapters and got 100 independent sets consisting of 200 CSR points’ locations. Here is

one set out of 100 independent CSR point pattern shown in the Figure 5.5 and Figure

5.6.

Figure 5.5: R plot of a CSR point pattern

on the Minnesota major roads

Figure 5.6: Display of a CSR point pattern

on the Minnesota major roads in ArcGIS

Page 47: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

37

5.2.1 CSR Test Based on Inter-event Distances

(1) Simulate 100 CSR point patterns consisting of 200 random points each on MN

Major Roads Network in R.

(2) In ArcGIS, utilized “Feature to Line” to split road segment at intersections. Then

generated spatial network dataset and created “OD Cost Matrix Layer” for cal-

culating inter-event distances.

(3) Solved “OD Cost Matrix Layer” of ArcGIS to compute inter-event distances for

all the 100 simulated point patterns and the car crash pattern.

(3) Calculate H(t), Hi(t), i = 1, 2, · · · , 100 in R. Plot H(t), U1(t), and L1(t) against

H(t).

5.2.2 CSR Test Based on Nearest-neighbor Distances

(1) Simulate 100 CSR point patterns consisting of 200 random points each on MN

Major Roads Network in R.

(2) In ArcGIS, utilized “Feature to Line” to split road segment at intersections. Then

generated spatial network dataset and created “OD Cost Matrix Layer” for cal-

culating inter-event distances.

(3) Solved “OD Cost Matrix Layer” of ArcGIS to compute inter-event distances for

all the 100 simulated point patterns and the car crash pattern.

(3) Obtained K(t), Ki(t), i = 1, 2, · · · , 100 by taking the minimum of all the inter-

event distances of each point in R. Plot K(t), U2(t), and L2(t) against K(t).

5.3 Result and Analysis

As shown in the Figure 5.7 and 5.8, plot of crash location pattern exceed upper plot

formed by complete spatial random process throughout its range. What’s more, proba-

bility of having small inter-event distances in crash point pattern is much greater than

that of random point pattern. So we come to the conclusion that car crash point pattern

Page 48: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

38

on Minnesota major roads cannot be a complete spatial random process and tends to

be a cluster process instead.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

average

H1

Figure 5.7: Envelope plot for CSR test for

car crash pattern by inter-event method.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

averageH1

Figure 5.8: Envelope plot for CSR test of car

crash pattern by nearest-neighbor method.

It is not surprising that a conclusion of cluster process is reached since there is

obviously a small cluster around the Twin Cities. But the mechanism underlying the

car crash point pattern, like what the exact factor causing cluster is, is still unknown.

Intuitively, a network density ρ(G) can be defined as the total length of edges per unit

area, and the density of a point pattern ρ(S) is the number of points per unit area. So if

we can get ρ(G) = αρ(S) all over the spatial point pattern S on the network G, then the

cluster point pattern should be formed by just following the density of network without

any external factors. However, we have not got the area data of Minnesota which needs

to be used in the analysis mentioned at this point. Therefore, this would probably lead

to research exploring deeper into the car crash pattern of Minnesota in the future.

Page 49: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Chapter 6

Conclusion

In the project “ CSR Test of Spatial Point Pattern on Network”, I succeed in obtaining

cumulative distribution function (CDF) of inter-event distance for complete spatial ran-

dom (CSR) point pattern on the m×n grid network by conditioning on locations of two

arbitrary points. Also, I proposed two CSR test methods based on inter-event distance

and nearest-neighbor distance respectively in terms of Monte Carlo simulations. Both

of the methods were verified by three typical simulations on grid and networks gener-

ated randomly. Finally, car crash point pattern on Minnesota major roads network was

tested by the two proposed methods to be a cluster process.

39

Page 50: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

References

[1] Peter J. Diggle, Julian Besag, and J.Timothy Gleaves. Statistical analysis of spatial

point patterns by means of distance methods. Biometrics, 32:659 – 667, 1976.

[2] B. D. Ripley. Test of randomness for spatial point patterns. J.R.Statist.Soc.B,

41(3):368 – 374, 1979.

[3] Renato Assuncao. Testing spatial randomness by means of angles. Biometrics,

50:531 – 537, June 1994.

[4] Joe N.Perry. Spatial analysis by distance indices. Journal of Animal Ecology, 64:303

– 314, 1995.

[5] Jacques Gignoux, Camille Duby, and Sebastien Barot. Comparing the performances

of diggle’s tests of spatial randomness for small samples with and without edge-effect

correction: Application to ecological data. Biometrics, 55:156 – 164, March 1999.

[6] Peter J. Diggle. Statistical Analysis of Spatial and Spatio-Temporal Point Patterns.

CRC Press, Third Edition, 2014.

[7] John Clark and Derek Allan Holton. A First Look at Graph Theory. World Scientific

Publishing Co. Pte. Ltd., 1991.

[8] Atsuyuki Okabe and Kokichi Sugihara. Spatial Analysis along Networks. John Wiley

& Sons, 2012.

[9] Oliver Schabenberger and Carol A. Gotway. Statistical Methods for Spatial Data

Analysis. Chapman & Hall/CRC, 2005.

40

Page 51: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Appendix A

Glossary and Acronyms

Care has been taken in this thesis to minimize the use of jargon and acronyms, but

this cannot always be achieved. This appendix contains a table of acronyms and their

meaning.

A.1 Glossary

• Total Probability Theorem – Given n mutually exclusive events A1, · · · , Anwhose probabilities sum to unity, then P (B) = P (B|A1)P (A1)+· · ·+P (B|An)P (An),

where B is an arbitrary event, and P (B|Ai) is the conditional probability of B

assuming Ai.

A.2 Acronyms

Table A.1: Acronyms

Acronym Meaning

CSR Complete Spatial Randomness

CDF Cumulative Distribution Function

EDF Empirical Distribution Function

Continued on next page

41

Page 52: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

42

Table A.1 – continued from previous page

Acronym Meaning

UNIF Continuous Uniform Distribution

DU Discrete Uniform Distribution

si = h D(si) = h, i.e., si is in the horizontal direction

si = v D(si) = v, i.e., si is in the vertical direction

Page 53: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

Appendix B

Code

This appendix contains selected crucial code.

B.1 R Code

B.1.1 Random Process

#function for adding one random point to update the network

AddOnePointOnNetwork <- function(g) {

# add one point on the network

# the point is added uniformly on the length of the edges

#Conditioning on the number of points, poisson process is binomial process

edges <- get.edges(g, E(g))

# g is a igraph

length <- E(g)$weight

all.length <- sum(length)

n.edges <- ecount(g)

new.vertex <- vcount(g) + 1 #number of vertices after addiing this point

new.loc <- sample(1:n.edges, size = 1, prob = length / all.length)

# new.loc is the id of the edge where the new point should be added to

new.length <- length[new.loc] #length of this edge

dist.from.start <- new.length * runif(1)

43

Page 54: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

44

#choose location where the point add on this edge

dist.to.end <- new.length - dist.from.start #distance to another vertex

start <- edges[new.loc, 1] #one vertex of this edge

end <- edges[new.loc, 2] #another vertex of this edge

coord <- g$layout #layout of igraph is the coordinates of vertices

x1 <- coord[start, 1]

y1 <- coord[start, 2]

x2 <- coord[end, 1]

y2 <- coord[end, 2]

x0 <- dist.from.start / new.length * (x2 - x1) + x1

y0 <- dist.from.start / new.length * (y2 - y1) + y1

#(x0, y0) is the coordinate of new point

g <- add.vertices(g, 1) #add this new point to igraph as a vertex

#add two new edges (undirected), weight is the distance

g <- g + edges(start, new.vertex, end, new.vertex,

weight = c(dist.from.start, dist.to.end))

g$layout <- rbind(coord, c(x0, y0)) #update layout of igraph

#remove the edge which the point is added on

g[start, end] <- FALSE

return (list(g = g, num.v = vcount(g)))

}

B.1.2 Cluster Process

#function for generating offspring from a parent in poisson cluster process

GenerateOneChild <- function(g, parent) {

#extract information of this parent: edge id, distance to two vertices, etc

edge.index <- as.numeric(parent[1])

dist.from.start <- as.numeric(parent[2])

dist.to.end <- as.numeric(parent[3])

start <- as.numeric(parent[4])

end <- as.numeric(parent[5])

distance <- rnorm(1, mean = 0, sd = 0.5)

Page 55: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

45

#children’s location around parent is normal distribution

#if the offspring doesn’t locate outside the edge holding parent

if ((dist.from.start + distance >= 0) & (dist.to.end - distance >= 0)) {

child <- c(edge.index, dist.from.start + distance, dist.to.end - distance,

start, end)}

#if the generated length exceed dist.from.start

if ((distance < 0) & (distance < -dist.from.start)) {

one.dist <- -(dist.from.start + distance)

edges <- incident(g, start)

#find all the edge passing through start vertex except for the parent’s edge

cand.edges <- edges[!(edges %in% edge.index)]

#if there are more than one candidate edges, choose one randomly

if (length(cand.edges) > 1) {

new.edge <- sample(cand.edges, 1)

}

else {

new.edge <- cand.edges

}

newlength = as.numeric(E(g)[new.edge]$weight)

if (newlength < one.dist && length(cand.edges) != 0) {

one.dist = newlength

}

temp.edges <- get.edges(g, new.edge) #get the candidate edge

child <- c(new.edge, one.dist, newlength - one.dist, start,

temp.edges[!(temp.edges %in% start)])}

#if the generated length exceeds dist.to.end

#same as last loop

if (distance > dist.to.end) {

one.dist <- distance - dist.to.end

edges <- incident(g, end)

cand.edges <- edges[!(edges %in% edge.index)]

if (length(cand.edges) > 1) {

Page 56: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

46

new.edge <- sample(cand.edges, 1)

}

else {

new.edge <- cand.edges

}

newlength = as.numeric(E(g)[new.edge]$weight)

if (newlength < one.dist && length(cand.edges) != 0) {

one.dist = newlength

}

temp.edges <- get.edges(g, new.edge)

child <- c(new.edge, one.dist, newlength - one.dist, end,

temp.edges[!(temp.edges %in% end)])}

#print(child)

return(child)

#print(length(child))

}

#function for poisson cluster process

GeneratePoissonClusterOnNetwork <- function(g, N, S) {

parents <- GenerateNPointsOnNetwork(g, N)

children <- NULL

#num.children <- rpois(1, S) + 1

for (i in 1 : nrow(parents)) {

num.children <- rpois(1, S) + 1 #number of children is poisson distribution

#print(num.children)

ichildren <- GenerateNChildren(g, parents[i, ], num.children)

children <- rbind(children, ichildren)

}

children <- as.data.frame(children)

#print(children)

children.graph <- AddChildrenOnNetwork(g, children)

return(children.graph)

}

Page 57: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

47

B.1.3 Regular Process

#function for generating regular pattern

GenerateRegularOnNetwork <- function(g, num.veritces, N, delta) {

#num.vertices <- g$num.v

meas <- sum(E(g)$weight) #length of all the edges

#print(meas)

lamda <- N / meas

P <- exp(-lamda * pi * delta ^ 2) #probability of each point can be retained

#print(P)

#children <- NULL

for (i in 1 : N) {

graph <- AddOnePointOnNetwork(g)

indicator <- sample(0 : 1, size = 1, prob = c(1 - P, P))

#check each point whether it should be retained or not

if (indicator == 1) {

num.points <- graph$num.v

#print(num.points)

newg <- graph$g

if (num.points > num.vertices + 1) {

new.distance <- shortest.paths(newg, v = num.points,

to = c((num.vertices + 1) : (num.points - 1)), weights = E(newg)$weight)

#print(min(new.distance))

if (min(new.distance) > 0.5) {

g <- newg

}

}

else {

g <- graph$g

}

}

}

return(list(g = g, num.v = vcount(g)))

Page 58: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

48

}

B.1.4 Random Network

#function for generating a random network

GenerateRandomNetwork <- function(m, size, p) {

x <- runif(1, 0, size)

y <- runif(1, 0, size)

coord <- cbind(x, y) #the first point, choose randomly

#for each point generated, the distance to every other point is approporiate

for (i in 2 : m) {

shortest <- 0

while (shortest <= 0.5) {

x <- runif(1, 0, size)

y <- runif(1, 0, size)

xy <- cbind(rep(x, i - 1), rep(y, i - 1))

mat <- xy - coord

shortest <- min(mat[, 1]^2 + mat[, 2]^2)

#print(shortest)

}

coord <- rbind(coord, c(x, y))

}

#print(coord)

dist.mat <- as.matrix(dist(coord))

#max.distance <- max(dist.mat)

adj.mat <- matrix(0, nrow = m, ncol = m)

near <- 3

#to make the network nicer, only connect the near points

for (i in 1 : (m - near)) {

#find the nearest 3 points

dist <- dist.mat[i, i : m]#no repeat

rank <- rank(dist)

connect <- sample(0:1, size = near, replace = TRUE, prob = c(1 - p, p))

Page 59: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

49

for (j in 1 : near) {

num <- 1 : (m - i + 1)

n <- num[rank == (j + 1)]

#get the place corresponding the jth nearest point

if (adj.mat[i, (i - 1 + n)] == 0 && dist.mat[i, (i - 1 + n)] <= 8) {

adj.mat[i, i - 1 + n] <- connect[j]

adj.mat[i - 1 + n, i] <- connect[j]

}

}

#if a point is isolated, connect to its nearest point

if (max(adj.mat[i, ]) == 0) {

nn <- (1 : (m - i + 1))[rank == 2]

adj.mat[i, i - 1 + nn] <- 1

adj.mat[i - 1 + nn, i] <- 1

}

}

for (i in (m - near + 1) : m) {

d <- dist.mat[i, ]

r <- rank(d)

#print(adj.mat[i, r == 2])

#only connect the nearest point

if (adj.mat[i, r == 2] == 0) {

adj.mat[i, r == 2] <- 1

adj.mat[r == 2, i] <- 1

#print(adj.mat[i, ])

}

}

random.network <- GenerateNetwork(coord, adj.mat)

num.v <- random.network$num.v

g <- random.network$g

all.distance <- c()

new.distance <- c()

Page 60: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

50

for (i in 1 : (num.v - 1)) {

new.distance <- shortest.paths(random.network$g, v = i,

to = c((i + 1) : num.v), weights = E(g)$weight)

all.distance <- append(all.distance, new.distance)

}

return(list(g = g, num.v = num.v, max = max(all.distance)))

}

B.1.5 Car Crash Pattern on the MN Roads

#this script is designed to generate random points on road network

library(sp)

library(rgdal)

library(igraph)

#read the shape file for major road of Minnesota

alldata <- readOGR(dsn = "/Users/xinyuechang/Documents/summer

research/major_road", layer = "mda_major_roads_cartographic")

class(alldata)

slotNames(alldata)

data <- alldata[alldata$ROAD_CLASS != 0,]

#only consider interstate highway and US highway

#plot(data, border = "grey")

coord <- coordinates(data)

length <- data$LENGTH

all.length <- sum(length)

n <- length(coord)

GenerateOnePoint <- function() {

road <- sample(1:n, size = 1, prob = length/all.length)

#print(road)

loc <- length[road]*runif(1)

points <- coord[[road]]

Page 61: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

51

line <- NULL

for (i in 1 : length(points)) {

line <- rbind(line, points[[i]])

}

line <- as.matrix(line)

d <- 0

for (i in 2 : nrow(line)) {

new.d <- dist(rbind(line[i-1, ], line[i, ]), method = "euclidean",

diag = FALSE, p = 2)

d <- d + new.d

if (d > loc) {

#print(i)

res <- d - loc

E1 <- line[i-1,1]

N1 <- line[i-1,2]

E2 <- line[i,1]

N2 <- line[i,2]

E <- E2 + (res/new.d)*(E1 - E2)

N <- N2 + (res/new.d)*(N1 - N2)

break

}

}

return(c(E,N))

}

RANDOM <- NULL

#n <- length(coord)

for (k in 1 : 300) {

random <- GenerateOnePoint()

RANDOM <- rbind(RANDOM, random)

}

RANDOM <- as.matrix(RANDOM)

row.names(RANDOM) <- c()

Page 62: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

52

#points(RANDOM[,1],RANDOM[,2],pch = 8)

# prepare UTM coordinates matrix

utmcoor<-SpatialPoints(RANDOM, proj4string=CRS("+proj=utm +zone=15"))

#utmdata$X and utmdata$Y are corresponding to UTM Easting and Northing.

#zone= UTM zone

# converting

longlatcoor<-spTransform(utmcoor,CRS("+proj=longlat"))

randomcoords <- as.matrix(cbind(longlatcoor$coords.x1, longlatcoor$coords.x2))

write.csv(randomcoords, file = "/Users/xinyuechang/Documents/summer

research/newpointsdata/random100.csv",row.names = FALSE)

B.2 Python Code

#python script for generating inter-event distance of 100 random points sets

#this was entered into python window of ArcGIS

import arcpy

import glob

import os

arcpy.env.workspace = "C:/Users/mathgrad/My Documents/ArcGIS/Projects/

MajorRoad2/MajorRoad2.gdb"

INPUT = "C:/Users/mathgrad/summer research/newpointsdata/random[1-9].csv"

for file in glob.glob(INPUT):

arcpy.MakeXYEventLayer_management(file, "V1", "V2", "random_layer","","")

arcpy.na.AddLocations("OD Cost Matrix", "Origins", "random_layer",

"CurbApproach # 0", "300 points",append = "CLEAR")

arcpy.na.AddLocations("OD Cost Matrix", "Destinations", "random_layer",

"CurbApproach # 0", "300 points",append = "CLEAR")

arcpy.na.Solve("OD Cost Matrix","HALT")

filename = os.path.relpath(file,"C:/Users/mathgrad/summer research/newpointsdata")

name = filename.split(’.’)[0]

name = name + "d.csv"

direct = "C:/Users/mathgrad/summer research/300RandomDistance/"

Page 63: Test of Complete Spatial Randomness on Networks€¦ · Test of complete spatial randomness (CSR) is an essential part of spatial analysis and regarded as a minimal prerequisite to

53

arcpy.CopyRows_management("OD Cost Matrix\Lines", direct + name)

arcpy.Delete_management("random_layer","Layer")