Post on 01-Jun-2020
Algebraic Topology of RandomFields and Complexes
Omer Bobrowski
Algebraic Topology of RandomFields and Complexes
Research Thesis
As Partial Fulfillment of the Requirements for
the Degree Doctor of Philosophy
Omer Bobrowski
Submitted to the Senate of the Technion—Israel Institute of Technology
Tammuz 5772 Haifa July 2012
i
THE RESEARCH THESIS WAS DONE UNDER THE SUPERVISION OF
PROFESSOR ROBERT J. ADLER IN THE DEPARTMENT OF ELECTRICAL
ENGINEERING.
Acknowledgement
First and foremost, I would like to express my deepest gratitude to my advisor, Professor
Robert Adler, for his amazing dedication in guiding and inspiring me through my PhD,
while showing full confidence in me and letting me build my academic independence; for
always encouraging, opening every possible door for me, and giving me much more than I
could have ever expected; for making each and every meeting truly enjoyable; but above
all this - for showing me that it is possible to be a highly professional and successful
scientist, while keeping both feet on the ground and maintaining modesty, a great sense
of humor, and a true capability to appreciate others. My gratitude extends way beyond
this single paragraph. I owe you so much. Thank you.
I would like to thank Professor Shmuel Weinberger from the University of Chicago for
hosting me at the early stages of my PhD, and for the long-lasting and fruitful collabora-
tion. I would also like to thank Matthew Strom Borman from the University of Chicago
for the joint work on the first part of this thesis.
I would like to thank Professor Ron Meir from the Technion, my advisor during my
Masters, for his support both during my masters studies and after, and for many fasci-
nating and friendly chats.
I met wonderful people during my many years in the Technion. Special thanks to Aran
Bergman, Daniel Sigalov, Hadas Benisty, Ronen Talmon and Zigi (Isask’har) Walter for
being such good friends, for being there when I needed you, and for making the Technion
a warm and fun place to come to.
I would like to thank my dearest family: My parents Dov and Lili, for believing in me
and providing me with everything I needed to reach this point, but mostly for accepting
my choices in life in the most understanding and loving way; my sisters and brothers
ii
(in the wide sense) Barak, Hila, Keren, Nitzan, Udi and Yael, for your support and
friendship; and last but not least, my nieces and nephews Eshkar, Inbar, Livnat, Or, Raz,
Sara, Shahar, Tair, Talia, Tamar, Yonatan and Zur. You are the batteries that keep my
energies up. Thank you all. I love you.
Finally, I would like to dedicate this thesis to my grandmother Ester Landoi and my
late grandfather Aharon Landoi. To grandpa, who was a true inspiration for the pursuit
of knowledge and for an amazingly balanced perspective on life; and to grandma, who
has been there from the very early stages, always supported me, showed interest in what
I do, and pushed me forward. I really miss you.
The generous financial help of the Technion and the Adams Fellowship Program of the
Israel Academy of Sciences and Humanities is gratefully acknowledged.
Contents
1 Introduction 5
1.1 Background - Algebraic Topology . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Homotopy Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.3 Morse Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Persistent Homology of Gaussian Random Fields . . . . . . . . . . . . . . . 9
1.3 The Topology of Random Geometric Complexes . . . . . . . . . . . . . . . 12
1.4 Noise Crackles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Persistent Homology of Gaussian Random Fields 17
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Gaussian Random Fields . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.2 The Geometry of Gaussian Random Fields . . . . . . . . . . . . . . 19
2.1.3 Persistent Homology . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.4 Euler Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Redefining the Euler Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 The Euler Integral and Morse Theory . . . . . . . . . . . . . . . . . 29
2.2.2 The Euler Integral and Persistent Homology . . . . . . . . . . . . . 31
2.3 The Euler Integral of Gaussian Random Fields . . . . . . . . . . . . . . . . 32
2.3.1 Real Valued Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.2 Vector Valued Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4 Persistent Homology of Gaussian Random Fields . . . . . . . . . . . . . . . 38
2.5 Weighted Sum of Critical Values . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Towards Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iii
iv CONTENTS
2.7 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3 The Topology of Random Geometric Complexes 47
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.1 Geometric Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.2 Motivation and Previous Work . . . . . . . . . . . . . . . . . . . . 50
3.2 The Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Definition and Motivation . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Critical Points of the Distance Function . . . . . . . . . . . . . . . 53
3.3 Limit Theorems for the Distance Function . . . . . . . . . . . . . . . . . . 56
3.3.1 The Subcritical Range (nrdn → 0) . . . . . . . . . . . . . . . . . . . 57
3.3.2 The Critical and Supercritical Ranges (nrdn → λ ∈ (0,∞]) . . . . . . 59
3.4 The Topology of Random Cech Complexes . . . . . . . . . . . . . . . . . . 62
3.4.1 Critical Points and Betti Numbers . . . . . . . . . . . . . . . . . . 62
3.4.2 The Limiting Behavior of the Cech Complex . . . . . . . . . . . . . 63
3.5 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.1 The Supercritical Phase . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5.2 The Distance Function on Closed Manifolds . . . . . . . . . . . . . 68
3.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6.1 Some Notation and Elementary Considerations . . . . . . . . . . . 70
3.6.2 Means for the Subcritical Range (nrdn → 0) . . . . . . . . . . . . . . 71
3.6.3 Variances and Limit Distributions for the Subcritical Range . . . . 74
3.6.4 The Critical and Supercritical Ranges (nrdn → λ ∈ (0,∞]) . . . . . . 91
3.6.5 Asymptotic Means . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.6.6 Asymptotic Variance - Poisson Case . . . . . . . . . . . . . . . . . . 94
3.6.7 CLT - Poisson Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.6.8 CLT - Random Sample Case . . . . . . . . . . . . . . . . . . . . . . 98
3.6.9 Euler Characteristic Results . . . . . . . . . . . . . . . . . . . . . . 102
3.A Palm Theory for Poisson Processes . . . . . . . . . . . . . . . . . . . . . . 103
3.B Stein’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.C De-Poissonization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
CONTENTS v
4 Noise Crackles 107
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.2 The Core of Distributions with Unbounded Support . . . . . . . . . . . . . 108
4.3 How Power-Law Noise Crackles . . . . . . . . . . . . . . . . . . . . . . . . 110
4.4 How Exponential Noise Crackles . . . . . . . . . . . . . . . . . . . . . . . . 112
4.5 Gaussian Noise Does Not Crackle . . . . . . . . . . . . . . . . . . . . . . . 113
4.6 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.7.1 The Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.7.2 Crackle - Notation and General Lemmas . . . . . . . . . . . . . . . 117
4.7.3 Crackle - The Power Law Distribution . . . . . . . . . . . . . . . . 120
4.7.4 Crackle - The Exponential Distribution . . . . . . . . . . . . . . . . 123
4.7.5 Crackle - The Gaussian Distribution . . . . . . . . . . . . . . . . . 126
Bibliography 129
vi CONTENTS
List of Figures
1.1 The first homology group of the torus . . . . . . . . . . . . . . . . . . . . . 7
1.2 Morse theory for the height function on the Torus . . . . . . . . . . . . . . 10
2.1 Capturing the homology of an annulus . . . . . . . . . . . . . . . . . . . . 23
2.2 The barcode of a Rips complex . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Barcodes for the excursion sets of a function . . . . . . . . . . . . . . . . . 25
3.1 Simplicial complexes in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 The Cech and Rips complexes . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Critical points of a distance function in R2. . . . . . . . . . . . . . . . . . . 54
3.4 Generating a critical point of index 2 in R2 . . . . . . . . . . . . . . . . . . 56
3.5 The γk(λ) function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1 Crackle layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
vii
viii LIST OF FIGURES
Abstract
Algebraic topology studies the topology of spaces using algebraic machinery. One of its
main aims lies in the fact that assigning algebraic structures (e.g. homology groups) to
topological spaces can be used to classify them into classes of “similar” (e.g. homotopy
equivalent) spaces, to study their properties and to study the behavior of mappings be-
tween them. The field of ‘Applied Algebraic Topology’ focuses on applying algebraic
topology methods to study features of surfaces and functions arising in engineering sce-
narios, as well as for data analysis and manifold learning. This field has generated con-
siderable interest over the past few years. However, despite the fact that many of the
problems in this area involve random data collection, and thus randomness, its proba-
bilistic foundations are still at a very preliminary stage. The main goal of this thesis is
to explore such problems, and to help supply at least some of them with rigorous proba-
bilistic statements. We focus on two different probabilistic setups, that generate intricate
topological spaces which we are interested in studying using the methods of algebraic
topology.
Random fields are stochastic processes defined over parameter spaces of dimension
greater than one. For example, consider a noisy image as a random field on [0, 1]× [0, 1] ⊂R
2. As the domain of the process is of dimension greater than one, the graph of the process
is typically a (random) manifold, rather than a simple one-dimensional line. Thus, many
intriguing probabilistic questions on the geometrical and topological structure of the image
arise.
A simplicial complex is a collection of vertices, edges, triangles, tetrahedra, and sim-
plexes of higher dimension, following a few basic rules, so one can think of it as a gen-
eralization of a graph. A geometric complex is a simplicial complex, where in order to
decide whether to include a k-dimensional simplex or not, we need to verify whether its
1
2 ABSTRACT
k+1 vertices satisfy a certain geometrical property. Choosing the vertices of a geometric
complex at random yields a random topological space with many interesting features.
In the first part of the thesis we study the persistent homology of Gaussian random
fields, and compute its expected Euler characteristic. The results we present also have
surprising and interesting consequences related to the critical points of Gaussian fields.
In the second part we focus on the limiting behavior of the Betti numbers of random
geometric complexes, as the number of vertices goes to infinity. We study different ways
to construct a geometric complex, each resulting in a completely different structure.
Notation
(·)⊤ transpose operator
| · | absolute value / size of a set1 {·} indicator function
P (·) probability
E {·} expectation
Var (·) variance
‖·‖ Euclidean norm
Sd−1 a unit (d− 1)-sphere in Rd
sd−1 the volume of Sd−1
ωd−1 the volume of a unit ball in Rd
Hk the k-th homology group
βk the k-th Betti number
X the Euler characteristic
PH persistent-homology
Lk the k-th Lipschitz-Killing curvature
Mk the k-th Gaussian Minkowski functional
C a Cech complex
Br(x) a ball with radius r centered at x
N (·, ·) the Gaussian distribution
φ(x) the standard Gaussian density function
Φ(x) the standard Gaussian cumulative distribution function
dP(·) the distance function from a set of points P
3
4 NOTATION
Chapter 1
Introduction
The field of algebraic topology focuses on studying the topology of spaces using algebraic
machinery. Assigning topological spaces with algebraic structures (e.g. homology, coho-
mology and homotopy groups) can be used to classify them into classes of “similar” (e.g.
homotopy equivalent) spaces, study their properties and study the behavior of mappings
between spaces.
Over the past few years there has been a very interesting and exciting effort to es-
tablish a new field called ‘Applied Algebraic Topology’. This field focuses on applying
the methods of algebraic topology to study features of surfaces and functions arising in
engineering scenarios, as well as for data analysis and manifold learning. Although at this
point sophisticated applications are still few and mostly at a theoretical stage, there is
a growing feeling that the gap between theory and practice is closing. However, despite
the fact that many of the problems in this area involve random data collection, and thus
randomness, its probabilistic foundations are still at a very preliminary stage.
The main goal of this research is twofold. On the one hand, we are interested in using
probability theory to study concepts and methods from applied algebraic topology in cases
where the data being analyzed are random. This study should significantly contribute
to the development of powerful applied algebraic topology tools. On the other hand, we
believe that our understanding of even well studied stochastic processes, such as Gaussian
random fields, can be significantly enhanced by considering a topological point of view.
In this introduction we are going to give a very brief and sketchy introduction to some
basic notions of algebraic topology. A concise yet very clear introduction can be found
5
6 CHAPTER 1. INTRODUCTION
in [13, 26], while [28, 47] are good examples of a thorough coverage of homology theory.
For the details behind Morse theory, see [35]. Once we cover the key concepts in algebraic
topology which are relevant for the current work, we shall discuss the three main topics
dealt with in this thesis, and the main results in each of them.
1.1 Background - Algebraic Topology
The field of algebraic topology is extremely wide, and involves many interesting concepts
and deep theorems. In the following sections we wish to focus on two main topics which are
relevant to the current research - Homology Theory and Morse Theory. We shall describe
each of them in a rather intuitive way, avoiding rigorous definitions and theorems, but at a
level which we believe should suffice for the purposes of understanding the motivation and
ideas in the current work. We also briefly describe the notion of Homotopy Equivalence,
since we will use this term repeatedly throughout this work.
1.1.1 Homology
Let X be a topological space. The homology of X is a set of abelian groups {Hk(X)}∞k=0,
called ‘homology groups’. The zeroth homology H0(X) is generated by elements that
represent connected components ofX . For example, ifX has three connected components,
then H0(X) ∼= Z ⊕ Z ⊕ Z (where ∼= denotes group isomorphism), and each of the three
generators of this group corresponds to a different connected component of X . For k ≥ 1,
the k-th homology group Hk(X) is generated by elements representing k-dimensional
“holes” in X . Without giving precise definitions, a k-dimensional hole should be thought
of as the result of taking the (empty) boundary of a (k + 1)-dimensional body. For
example, if X = S1 - the unit circle in R2 then H1(X) ∼= Z, if X = S2 - the unit sphere
in R3 then H2(X) ∼= Z, and in general if X = Sn is an n-dimensional sphere, then
Hk(X) ∼=
Z k = 0, n
{0} otherwise.
1.1. BACKGROUND - ALGEBRAIC TOPOLOGY 7
A slightly more interesting example is given by the 2-dimensional torus T 2 = S1×S1 (see
Figure 1.1). In this case
Hk(T2) ∼=
Z k = 0, 2,
Z⊕ Z k = 1,
{0} otherwise.
(1.1.1)
It is clear that T 2 has a single connected component as well as a single 2-dimensional
hole. However, we can find infinitely many 1-dimensional holes on the surface of the
empty torus. The reason we claim that there are only two 1-dimensional holes in this
case, is that we consider only ‘equivalence classes’ of holes, so that if we can continuously
deform one hole into the other, they are considered the same object in homology. Thus,
as can be seen in Figure 1.1, we have only two equivalence classes of holes in T 2, and
therefore H1(T2) ∼= Z⊕ Z.
(a) (b)
Figure 1.1: The first homology group of the torus - H1(T2). (a) All the blue loops correspond
to the same equivalency class in H1(T2), since we can continuously deform one loop into the
other. (b) The red loops also correspond to a single generator of H1(T2). However, they do not
belong the same equivalence class as the blue ones, since there is no way to deform a blue loop
into a red one (without leaving the torus).
Note that it is true in general that if X is of dimension N , then Hk∼= {0} for k > N .
The rank of Hk(X), denoted by βk, is called the k-th Betti number. Thus, for k ≥ 1, βk is
the number of k-dimensional holes in X , while β0 is the number of connected components.
8 CHAPTER 1. INTRODUCTION
1.1.2 Homotopy Equivalence
Another term we will use often is ‘homotopy equivalence’. Let X, Y be topological spaces,
and let f0, f1 : X → Y be continuous functions. A homotopy between f0 and f1 is a
continuous function H : X × [0, 1] → Y such that H(·, 0) ≡ f0 and H(·, 1) ≡ f1. If
there exists a homotopy between f0 and f1 we say that these functions are homotopic,
and denote this by f0 ≃ f1. Two spaces X, Y are called ‘homotopy equivalent’ (denoted
X ≃ Y ) if there exist continuous functions f : X → Y and g : Y → X such that
g ◦ f ≃ 1X and f ◦ g ≃ 1Y where 1X ,1Y are the identity mappings on X, Y respectively.
Informally, X, Y are homotopy equivalent if there is a continuous deformation from one
space to the other, which is not necessarily invertible. For example, a ball is homotopy
equivalent to a single point, and an annulus is homotopy equivalent to a circle, even
though there exists no one-to-one mapping between the spaces. Note that if X, Y are
homeomorphic, then they are also homotopy equivalent, but the converse is not true (as
the previous examples show). For our purposes, the key property of homotopy equivalent
spaces is that their homology groups are isomorphic, i.e. Hk(X) ∼= Hk(Y ) for all k ≥ 0.
Two homotopy equivalent spaces are also said to have the same ‘homotopy type’. A space
that has the homotopy type of a point is called ‘contractible’.
1.1.3 Morse Theory
The study of homology is strongly connected to the study of critical points of real valued
functions. The link between them is called Morse theory, and we shall describe it briefly.
Let M be a smooth manifold embedded in Rn, and let f : M → R be a C2 function.
A point p ∈ M is called a critical point of f if ∇f(p) = 0, and the number f(p) is called
a critical value of f . A critical point p is called non-degenerate if the Hessian Hf(p) is
non-singular. In that case, the Morse index of f at p, denoted by µ(p) is the number of
negative eigenvalues of Hf(p). A C2 function f is a Morse function if all its critical points
are non-degenerate, and its critical levels are distinct.
The main idea of Morse theory is as follows. Suppose that M is a closed manifold
(i.e. a compact manifold without a boundary), and let f : M → R be a Morse function.
1.2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS 9
Denote
Mρ , f−1((−∞, ρ]) = {x ∈ M : f(x) ≤ ρ} ⊂ M
(sublevel sets of f). If there are no critical levels in (a, b], then Ma and Mb are homotopy
equivalent, and in particular have the same homology. Next, suppose that p is a critical
point of f with Morse index k, and let v = f(p) be the critical value at p. Then the
homology of Mρ changes at v in the following way. For a small enough ǫ we have that the
homology of Mv+ǫ is obtained from the homology of Mv−ǫ by either adding a generator
to Hk (increasing βk by one) or removing a generator of Hk−1 (decreasing βk−1 by one).
In other words, as we pass a critical level, either a new k-dimensional hole is formed, or
an existing (k − 1)-dimensional hole is terminated (filled up). Consequently, the change
in the Euler characteristic (described in Section 2.1.4) is always ±1.
Figure 1.2 presents a classic visual example of how Morse theory works. Take the
torus T 2 = S1 × S1, as depicted there. Let h : T 2 → R be the function measuring the
height of each point p ∈ T 2, and consider the filtration of sublevel sets {Mρ}ρ. For ρ < v1
we have Mρ = ∅, and therefore Hk(Mρ) ∼= {0} , k ≥ 0. At the level ρ = v1 we have a
minimum point, i.e. a critical point of index 0. Indeed, as we cross this level we reach Mρ1
(v1 < ρ1 < v2) in which a new connected component appears, and thus H1(Mρ1)∼= Z.
At the level v2 we have a saddle point, or a critical point with index 1. As we cross
this level, we reach Mρ2 (v2 < ρ2 < v3) where a 1-dimensional hole shows up, and so
H1(Mρ2)∼= Z. Similarly, v3 adds another generator to H1, so that H1(Mρ3)
∼= Z ⊕ Z.
Finally, at level v4 we have a maximum point, or a critical point of index 2. Once we cross
this level the surface of the torus is completed, introducing a 2-dimensional hole, and thus
H2(Mρ4)∼= Z. For every ρ > v4 we have Mρ = Mρ4 = T 2, so there are no more changes
to the sublevel sets, and indeed at the end of this process we retrieve the homology of T 2
(see (1.1.1)).
1.2 Persistent Homology of Gaussian Random Fields
In this section we describe the first main topic in this thesis, which is treated in detail
in Chapter 2. Random fields are stochastic processes defined over a parameter space
X of dimension greater than one. For example, X could be a 3-dimensional brain or a
10 CHAPTER 1. INTRODUCTION
Figure 1.2: Morse theory for the height function h : T 2 → R on the torus. The red crosses mark
the critical points of h, and v1 < v2 < v3 < v4 are the critical levels with Morse index 0, 1, 1, 2,
respectively. We present four sublevel sets, each demonstrating a single change in the homology
as we cross a critical level.
2-dimensional cortical surface, examples which have been of a significant practical impor-
tance. As the domain of the process is of dimension greater than one, the graph of the
process is typically a (random) manifold, rather than a simple one-dimensional line. Thus,
many intriguing probabilistic questions on the geometrical and topological structure of
the image arise.
As for to random processes on the real line, the distribution of random fields is
determined by the multidimensional distribution of any finite collection of elements
f(x1), . . . , f(xn) ; xi ∈ X . A Gaussian random field is a random field where any finite
collection of elements f(x1), . . . , f(xn) has a multidimensional Gaussian distribution. Let
f : X → Rd be a Gaussian random field. We define its mean value function m : X → R
d
by
m(x) = E {f(x)} , x ∈ X,
and the covariance function C : X×X → Rd×d by
C(x, y) = E{(f(x)−m(x))(f(y)−m(y))⊤
}, x, y ∈ X,
where ⊤ denotes the transpose operator, and we write our vectors as columns. As for
Gaussian processes, the distribution of a Gaussian random field is completely determined
by these two functions. For more details on Gaussian random fields, see Section 2.1.1.
The primary mathematical techniques used so far to analyze Gaussian (or Gaussian
related) random fields have come from the area of differential geometry. In the current
1.2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS 11
research we are interested in studying topological features of the sublevel sets of Gaussian
random fields and in particular, the persistent homology they generate. Briefly, the persis-
tent homology of a real valued function f tracks changes in the homology of sublevel sets
f−1((−∞, u]). As the sublevel sets grow (by increasing u), new homology elements (i.e.
“holes”) are born and others die. Persistent homology keeps a record of this birth/death
process. More details on persistent homology can be found in Section 2.1.3.
The theory of persistent homology is relatively new, and until recently nothing was
known about the persistent homology of Gaussian random fields. This thesis contains the
first result in this area, based on the Gaussian Kinematic Formula (GKF) of Adler and
Taylor (see [3] and Section 2.1.1), which is the state of the art in the theory of Gaussian
fields. An important special case of the GKF gives a formula for computing the mean
value of the Euler characteristic of sublevel sets of Gaussian fields.
The Euler characteristic is an integer-valued topological invariant, which gives a partial
description of the ‘shape’ of a topological space (see Section 2.1.4 for more details). For
a compact d-dimensional space X , we can compute the Euler characteristic (denoted by
χ(X)) from its Betti numbers βk, using the formula
χ(X) =
d∑
k=0
(−1)kβk.
In the work presented in Chapter 2, we extend the notion of Euler characteristic to
persistent homology and then compute its expected value for a wide class of Gaussian
and Gaussian related fields. Here is a preview of these results.
Let M be a ‘nice’ space, and f = (f1, . . . , fk) : M → Rk be a Gaussian random field
such that its elements f1, . . . , fk are i.i.d. real valued Gaussian random fields, with zero
mean, unit variance, and a ‘nice’ covariance function C. Let G : Rk → R be a ‘nice’
function, and set g = G◦f . Then g is a real valued random field called a Gaussian related
field. Set gmax = supx∈M g(x), and consider the filtration of sublevel sets
{g−1((−∞, u])
}gmax
u=−∞ .
Note that once we pass gmax the sublevel sets remain unchanged, so we can terminate our
filtration there. Let PH∗(g, gmax) be the persistent homology of this filtration. Our main
12 CHAPTER 1. INTRODUCTION
result (see Theorem 2.4.1) states that
E {χ(PH∗(g, gmax))} = χ(M) (E {gmax} − E {g}) +d∑
j=1
(2π)−j/2Lj(M)
∫
R
Mγj (Du)du,
(1.2.1)
where E {g} , E {g(x)} (for any x ∈ M), and Du , G−1((−∞, u]). The Lj-s and Mγj -s
are geometrical measures known as Lipschitz-Killing curvatures and Gaussian Minkowski
functionals, respectively. This formula provides means to evaluate the expected Euler
characteristic of the persistent homology, using geometrical features of the space M and
the (deterministic) sublevel sets Du. In Chapter 2 we discuss this formula in more detail,
and give precise definitions for its ingredients and their ‘niceness’ requirements.
One surprising corollary of the computations in Chapter 2 is related to the expected
signed sum of critical values of a Gaussian random field, a functional of considerable
interest in the study of Coulomb gases. If f : M → R is a Gaussian random field, and M
is a closed manifold, we prove that
E
∑
p∈CP(f)
(−1)µ(p)f(p)
= −L1(M)√
2π, (1.2.2)
where CP(f) is the set of critical points of the field f , and µ(p) is the Morse index of f at
p. The functional L1(M) represents a one dimensional measure of M . Thus, the expected
signed sum of critical values of a Gaussian field does not scale according to the volume of
M as one might expect, but rather according to a one dimensional measure of the space.
This result is very surprising and nonintuitive, and we shall discuss it further in Section
2.5.
1.3 The Topology of Random Geometric Complexes
In this section we describe the second part of this thesis, treated in detail Chapter 3. Let
be V ⊂ Rd be a set of vertices. A geometric graph G(V, ǫ) is an undirected graph on the
set of vertices V , where we connect a pair of vertices v1, v2 if ‖v1 − v2‖ ≤ ǫ. The field of
random geometric graphs has been thoroughly studied, and many of the known results to
date can be found in [39].
1.3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES 13
A simplicial complex is a collection of vertices, edges, triangles, tetrahedra, and sim-
plexes of higher dimension, following a few basic rules (see Section 3.1.1), so one can think
of it as a generalization of a graph. A geometric complex is a simplicial complex, where
in order to decide whether to include a k-dimensional simplex or not, we need to verify
whether its k+ 1 vertices satisfy a certain geometrical property. There are a few ways to
choose this property, which typically yield different complexes. For example, in the Cech
complex C(V, ǫ) we need to check whether the intersection of k + 1 balls with radius ǫ
centered at the vertices is nonempty.
The main motivation for our study of random complexes is the following manifold
learning problem. Let M ⊂ Rd be an unknown closed manifold which we wish to recover,
and suppose that we are given a set of random points X1, . . . , Xn sampled from some
distribution over M . It turns out that under mild conditions, the Betti numbers of the
hidden manifold can be recovered by computing the Betti numbers of the union of d-
dimensional balls Ur ,⋃n
i=1Br(Xi) centered at the samples with a fixed radius r. This
method, however, is highly sensitive to the choice of the radius r. A few methods have
been suggested to overcome this sensitivity. In [37, 38] sufficient conditions on n and r
are given so that the probability to recover the correct Betti numbers is sufficiently high.
A different approach is to compute the persistent homology of the filtration {Ur}∞r=0, and
locate the homology elements that last through a long range of radii, which most likely
correspond to real features of the original manifold.
Recent research on such random Cech complexes focuses on the following related setup.
Xn = {X1, . . . , Xn} is a set of random points in Rd, sampled from a known distribution f .
We would like to study the Betti numbers of Urn in the limit when n → ∞ and rn → 0.
This problem can turn into a simpler combinatorial problem by studying the Cech complex
C(Xn, rn). By the celebrated Nerve Theorem (see [11]), Urn and C(Xn, rn) are homotopy
equivalent, and in particular have the same Betti numbers. Recent work (see [30, 31])
studied the Cech complex in the setup just described. In this scenario, the behavior of
the Cech complex (or the union of balls) splits into three main regimes. If nrdn → 0 (the
subcritical or ‘dust’ phase), the complex is very sparse, with many small disconnected
components and hardly any holes. In the critical phase nrdn → λ ∈ (0,∞), the complex
becomes connected with many holes of any dimension k < d. Finally, if nrdn → ∞ the
14 CHAPTER 1. INTRODUCTION
complex is highly connected, with very few holes, if any. Detailed study of the Betti
numbers is possible mostly in the dust phase, and is significantly more complicated in the
other regimes. Thus, in this thesis we have adopted an alternative approach, based on
distance functions, which yields results in all regimes.
Let dn : Rd → R+ be the distance function from Xn defined as
dn(x) = min1≤k≤n
‖x−Xk‖ .
The key observation is that d−1n ((−∞, r]) = Ur ≃ C(Xn, r). By Morse Theory, changes
in the Betti numbers of d−1n ((−∞, r]) occur at the critical levels of dn. Thus, studying
the critical points of dn should reveal information about the topology of C(Xn, r). Note,
however, that dn is non-differentiable (and so certainly not a Morse function). Neverthe-
less, following [24], we can define a special notion of a critical point and Morse index for
dn, and apply Morse theory to it. We then define the values Nk,n to be the number of
critical points p of dn with index k, such that dn(p) ≤ rn. In other words, we count the
number of critical points that “construct” the topology of C(Xn, rn).
As in the behavior of the Cech complex described above, the limit behavior of Nk,n
splits into three different regimes, depending on the limit of nrdn. In Chapter 3, we present
a significant body of limit theorems for Nk,n in all three regimes. Not surprisingly, there
is a high correspondence between our results and the Betti number results in [31], which
have a Morse theoretic explanation. However, while the results for the Betti numbers are
mainly restricted to the subcritical phase, the study of the distance function expands over
the other regimes as well. Thus, the indirect approach of studying critical points (rather
than Betti numbers) is found to be advantageous. For example, using our results, we can
easily derive limit theorems for the Euler characteristic of the Cech complex in all three
regimes.
1.4 Noise Crackles
This is the last part of this thesis, which is treated in detail in Chapter 4. We wish to
study the behavior of random Cech complexes with a fixed radius, i.e. C(Xn, 1), rather
than C(Xn, rn) as studied in Chapter 3. The setup is the same as described in the previous
1.4. NOISE CRACKLES 15
section. Obviously, if the sample distribution has a compact support S, then for large
enough n we have that⋃n
k=1B1(Xk) ≈ Tube(S, 1). Thus, there is not much to study
in this case. However, when the support of the distribution is unbounded, interesting
phenomena occur.
In Chapter 4 we study distributions supported on Rd. In this case, there exists a ‘core’,
i.e. a region where the random samples are very dense, so that placing unit balls around
the individual points completely covers the region. Consequently, the Cech complex inside
the core is contractible. The size of the core obviously grows as n → ∞. Outside the core
there may be additional isolated points, but not enough for the associated balls to cover
the entire area. Thus, in this region, the topology of the Cech complex is nontrivial, and
many holes of different dimensions might show up. We call this phenomenon ‘crackling’.
The exact crackling behavior depends on the choice of distribution. We study three
representative examples - the power law, exponential, and Gaussian distributions. These
three distributions are spherically symmetric, and therefore their cores are balls centered
at the origin. The size of the ball is different for each distribution. Denoting by Rcn the
radius of the core, we show in Section 4.2 that
Rcn ∼
(n/ logn)1/α f(x) ∝ 11+‖x‖α ,
logn f(x) ∝ e−‖x‖,
√2 logn f(x) ∝ e−‖x‖2/2.
When studying crackling behavior, however, the Gaussian distribution turns out to be
fundamentally different than the other two distributions. In the power-law case as well
as in the exponential case, quite a lot is going on outside the core. In Sections 4.3 and
4.4, we show that the exterior of the core can be divided into separate annuli at radii
Rd−1,n ≪ Rd−2,n ≪ · · · ≪ R0,n (defined differently for each distribution). At [R0,n,∞)
there are mostly disconnected points, and no holes. At [R1,n, R0,n) connectivity is a bit
higher, and a finite number of 1-dimensional holes shows up. At [R2,n, R1,n) we have a
finite number of 2-dimensional holes, while the number of 1-dimensional holes grows to
infinity as n → ∞. In general, at [Rk,n, Rk−1,n), as n → ∞ we have a finite number
of k-dimensional holes, infinitely many l-dimensional holes for l < k, and no holes of
dimension l > k. In other words, the crackle starts with a pure dust at Rn,0 and as we
16 CHAPTER 1. INTRODUCTION
get closer to the core, higher dimensional holes gradually appear.
The Gaussian distribution behaves very differently. It does not crackle. In Section 4.5
we show that, for the Gaussian distribution, there are hardly any points located outside
the core. Thus, as n → ∞, the union of balls around the sample points becomes a giant
contractible ball of radius ∼ √2 logn.
The results presented in Chapter 3 on the homology of the Cech complex C(Xn, rn) do
not cover distributions with unbounded supports in the super-critical regime (nrdn → ∞).
In Chapter 4 we also discuss how the crackling results may shed some light on the behavior
of C(Xn, rn) in this case. In addition, we discuss how studying the crackling phenomenon
can be useful for noisy manifold learning applications.
Chapter 2
Persistent Homology of Gaussian
Random Fields
2.1 Background
The primary mathematical techniques used so far to analyze Gaussian (or Gaussian re-
lated) random fields have come from the area of differential geometry. Recent advances in
the study of excursion sets of Gaussian fields have produced applications in brain imaging
and astronomy (see [3,14,46,49,50]). In this chapter we extend the toolkit used to study
these objects to include methods of algebraic topology. Specifically, we are interested in
studying the persistent homology of sublevel sets of Gaussian fields. There is no doubt
that studying the algebraic topology of random fields will significantly strengthen existing
applications and introduce others.
In this section we are going to review the probabilistic and topological background
needed in order to present the results. The results presented in this chapter were published
in [2, 10].
2.1.1 Gaussian Random Fields
In this section we give a brief introduction to Gaussian random fields. As we already de-
scribed in the introduction, a random field f : X → Rd is a stochastic process defined over
a topological space X of dimension greater than one (in most cases this will be a manifold
or a stratified space). As for random processes on the real line, the distribution of random
17
18 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
fields is determined by the multidimensional distribution of any finite collection of vector
valued random variables f(x1), . . . , f(xn) ; xi ∈ X . A Gaussian random field is a ran-
dom field where any finite collection of elements f(x1), . . . , f(xn) has a multidimensional
Gaussian distribution.
We define the mean value function m : X → Rd of a random field f by
m(x) = E {f(x)} ,
and the covariance function C : X×X → Rd×d by
C(x, y) = E{(f(x)−m(x))(f(y)−m(y))⊤
},
where ⊤ denotes the transpose operator, and our vectors are columns. As for Gaussian
processes, the distribution of a Gaussian random field is completely determined by these
two functions. One way to construct a real valued Gaussian random field is as follows.
Let {φn}n be a set of functions, φn : X → Rd, such that
∑n φ
2n(x) < ∞ for all x ∈ X .
Let {ξn}n be a set of i.i.d. random variables such that ξn ∼ N (0, 1). Then
f(x) ,∑
n
ξnφn(x)
is a Gaussian random field. In this case m(x) ≡ 0, and
C(x, y) =∑
n
φn(x)φn(y).
The theory of Gaussian random field is extremely wide and very deep. However, for
the purposes of this work, we mainly need recent results from the study of the geometry
of Gaussian fields, which we shall describe in the next Section.
Throughout this Chapter we use the following notation. Denote by
ϕ(x) ,1√2π
e−x2/2,
the standard normal density and
Φ(x) ,
∫ x
−∞ϕ(x)dx,
the normal cumulative distribution. Also, denote by γk the Gaussian measure on Rk, i.e.
for A ⊂ Rk
γk(A) , P (X ∈ A)
2.1. BACKGROUND 19
where X has a standard multi-normal distribution in Rk (i.e. i.i.d. standard normal
components). Finally, for a nice set D ⊂ Rk the Gaussian Minkowski functionals Mγ
j (D)
are defined via the Taylor expansion, for small enough ρ ≥ 0,
γk(Tube(D, ρ)) =
∞∑
j=0
ρjMγ
j (D)
j!, (2.1.1)
where Tube(D, ρ) ={x ∈ R
k : dist(D, x) ≤ ρ}, and dist(D, x) , infy∈D ‖x− y‖. The
functionals Mγj play a key role in the results of this chapter.
2.1.2 The Geometry of Gaussian Random Fields
There has been extensive effort over the past few years to study the sample paths of
smooth random fields, f , from a general Riemannian manifold M to Rd. In particular, M
could be a 3-dimensional brain or a 2-dimensional cortical surface, examples which have
been of significant practical importance. The basic (random) geometrical objects studied
were the excursion sets of the random fields, defined by
AD ≡ AD(f ;M) , {x ∈ M : f(x) ∈ D} = M ∩ f−1(D) (2.1.2)
for nice subsets D of Rd, and the tools for quantifying these sets were those of differential
geometry. The theory of this subject has developed rapidly over the past few years
(see [3, 43, 44]). One of its most powerful results is an explicit expression for the mean
value of all Lipschitz-Killing curvatures of excursion sets for centered (i.e. E {f(x)} =
0), constant variance, C2, Gaussian random fields. The result presented in [3] links
random field theory and integral and differential geometry, and leads to approximations of
other important objects in probability and statistics, such as the exceedance probabilities
P (supM f(x) > u) (cf. [1, 45]).
The main theorem in [3] is called the Gaussian Kinematic Formula (GKF), and the
purpose of this section is to properly state this result, which is at the heart of this chapter.
The Lipschitz-KillingCurvatures
Let M be a Riemannian manifold. Lipschitz-Killing curvatures are geometric objects
that depend on the Riemannian metric on M , such that Lk(M) is a measure of the k-
dimensional ‘size’ of M . This means that if we scale the metric by a constant λ, then
20 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
Lk(M) scales by λk. For a large class of spaces, which include smooth manifolds and
convex compact regions, if M ⊂ Rn is given the Euclidean metric, then the following tube
formula holds for sufficiently small ρ ≥ 0
µ(Tube(M, ρ)) =n∑
j=0
ωjLn−j(M)ρj , (2.1.3)
where Tube(M, ρ) , {x ∈ Rn : dist(M,x) ≤ ρ} is the tube of radius ρ about M , and µ is
the Lebesgue measure. For example, if M ⊂ R2 is convex and compact then
L0(M) = 1, L1(M) = (perimeter of M)/2, L2(M) = area(X).
For a general d-dimensional Riemannian manifold (M, g), the Lipschitz-Killing curva-
tures can be expressed in an integral form with respect to the Riemannian volume induced
by the metric g. For example, if M is a manifold without boundary, then these are given
by
Lj(M) =1
(2π)(d−j)/2((d− j)/2)!
∫
M
TrM(−R)(d−j)/2 Volg,
when d−j ≥ 0 is even, and 0 otherwise. Here R is the curvature tensor and TrM the trace
operator on the algebra of double forms on M . For more details see [3]. Note that it is
always true that Ld(M) ≡ Volg(M) is the Riemannian volume of M , and L0(M) ≡ χ(M)
is its Euler characteristic.
The Gaussian Kinematic Formula
Suppose that M ⊂ RN is an d-dimensional, C2, Whitney stratified manifold satisfying
some mild side conditions (cf. [3] for details) and D a similarly nice stratified submanifold
of Rk. Let f = (f1, . . . , fk) : M → Rk be a vector valued random process, satisfying the
following conditions:
• f1, . . . , fk are i.i.d. real Gaussian fields, with a common covariance function C(s, t),
• for every 1 ≤ i ≤ k and x ∈ M , fi(x) ∼ N (0, 1),
• fi has C2 sample paths almost surely for every 1 ≤ i ≤ k,
• the joint distributions of fi and its first and second order derivatives are non-
degenerate,
2.1. BACKGROUND 21
• ∃K,α > 0 such that ∀s, t ∈ M :
maxi,j
|Cij(t, t) + Cij(s, s)− 2Cij(s, t)| ≤ K |ln |t− s||−(1+α) ,
where Cij is the covariance function of ∂2
∂ti∂tjfm(t).
Essentially, this list of condition ensures that the samples of f are Morse functions (almost
surely). For example, the covariance function C(s, t) = exp(−‖s− t‖2) s, t ∈ Rd, satisfies
this list of conditions.
Using f , define a Riemannian metric on M by setting
gx(X, Y ) , E{(Xfi)(x) (Y fi)(x)}, (2.1.4)
for any i and for X, Y ∈ TxM , the tangent space to M at x ∈ M . In other words, Xfi is
the derivative of fi in the direction represented by the tangent vector X . Next, use this
metric to define the Lipschitz-Killing curvatures, Lj , j = 0, . . . , d, on M . With the above
definitions and conditions, we are now ready to state the Gaussian Kinematic Formula
(GKF).
Theorem 2.1.1 (The Gaussian Kinematic Formula, [3]). Let M ⊂ Rd and D ⊂ R
k be
nice stratified spaces. Let f = (f1, . . . , fk) : M → Rk be a C2 k-dimensional Gaussian
field, satisfying the conditions above. Then,
E{Li(f
−1(D))}=
dimM−i∑
j=0
[i+ j
j
](2π)−j/2Li+j(M)Mγ
j (D),
where:
• Li(·) is the i-th Lipschitz-Killing curvature of M computed with respect to the Rie-
mannian metric defined in (2.1.4),
• Mγi (·) is the i-th Gaussian Minkowski functional, defined in (2.1.1).
• The combinatorial coefficients here are the standard flag coefficients of integral ge-
ometry, given by [n
k
]=
(n
k
)ωn
ωkωn−k,
where ωn is the volume of the n-dimensional unit ball.
22 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
More details about the Lipschitz-Killing curvatures, the induced Riemannian metric,
the Minkowski functionals and the niceness of the spaces can be found in [3, 43]. Aside
from the generality of the GKF, the fact that the spacesM andD appear in two completely
separate terms in the formula makes it even more elegant. Each of the terms Lj(M) and
Mγj (D) can be computed separately, and independently of the other.
An interesting special case of the GKF is when i = 0. In this case L0 is just the Euler
characteristic χ. Therefore, we have that
E{χ(f−1(D))
}=
dimM∑
j=0
(2π)−j/2Lj(M)Mγj (D). (2.1.5)
2.1.3 Persistent Homology
In this section we present the main ideas behind the theory of persistent homology. Con-
sider the following situation. Let X be an unknown subspace of Rd with a finite Lebesgue
measure, and let X1, . . . , Xn be n independent random samples uniformly distributed on
X . We would like to study the homology of X from the given set of random points. In
many cases we can find an ǫ for which the union of balls
U =
n⋃
i=1
Bǫ(Xi)
is homotopy equivalent to X (and hence has the same homology, see Figure 2.1(a)).
However, we do not know a-priori what is the correct choice of ǫ. For example, if ǫ is
chosen to be too small (Figure 2.1(b)) then U is homotopy equivalent to the union of n
distinct points (and hence contains no information on X). On the other hand, choosing ǫ
to be too big (Figure 2.1(c)), then U is just a big contractible blob (which again tells us
nothing about X). Persistent homology tries to overcome this sensitivity to the choice of
ǫ.
The main idea behind persistent homology is to consider the whole range of possible
values of ǫ rather than one particular value. Starting with ǫ = 0, we have n distinct points.
As we increase ǫ, homology elements (i.e. connected components and k-dimensional holes)
are created and destroyed, until we reach a point were U is contractible (a giant blob). The
theory of persistent homology describes very accurately how to follow homology elements
throughout this birth/death process. The result is a set of pairs (bi, di) standing for the
2.1. BACKGROUND 23
(a) (b) (c)
Figure 2.1: Trying to capture the homology of an annulus (where β0 = 1, β1 = 1) from a
union of balls around a random set of samples. (a) A good choice of radius recovers the correct
homology. (b) The radius chosen is too small, hence the union of balls has the same homology
as n distinct points (β0 = 15, β1 = 0). (c) The radius chosen is too big, and the union is
contractible (β0 = 1, β1 = 0).
birth and death times (values of ǫ) of each homology element. The key assumption is
that homology elements that “live longer” (or, persist) are more likely to represent “real”
homology elements of X , whereas the others are just “noise”.
The description above is just a special case where persistent homology is useful, pre-
sented as a motivation. However, persistent homology can be defined for any filtration of
spaces. Given a filtration X = {Xu}u such that Xs ⊂ Xt if s < t, the persistent homology
of X , denoted by PH∗(X ), consists of families of homology elements that ‘persist’ through
time. More explicitly, an element of PHk(X ) is a family of homology elements α = {αu}u,where αu ∈ Hk(Xu) (the k-th homology group of Xu). Let ı
ks,t : Hk(Xs) → Hk(Xt) be the
homomorphism between homology groups induced by the inclusion Xs → Xt. The birth
time b of an element α ∈ PHk(X ) can be thought of as the first time α appears, which is
defined by the condition that αb 6∈ Im(ıks,b) for all s < b. The death time d of an element α
is the moment that αt merges with an element that existed before b. Formally, we require
that αt 6∈ Im(ıks,t) for all s < b and t < d, but αd ∈ Im(ıks,d) for all s < b.
A useful way to describe persistent homology is via the notion of barcodes. A barcode
for the persistent homology of a filtration {Xu}u is a collection of graphs, one for each
collection of homology groups of common order. A bar in the k-th graph, starting at b
and ending at d (b ≤ d) indicates the existence of a generator of Hk(Xu) whose birth
24 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
and death times are b, d respectively. For example, Figure 2.2 presents the persistent
homology of a filtration of simplicial complexes known as Rips complexes (see Section
3.1.1 for details). In this example random samples are taken from an annulus in R2, and
the Rips complexes are used in order to recover the homology of the annulus.
Figure 2.2: The barcode of a Rips complex, taken from [25]. The points were sampled from an
annulus in R2. We see that there is a single H0 bar that persists forever. This bar represents the
single connected component of the annulus. In H1 we see a couple of dominant bars indicating
that the sample space contains holes. The longest bar actually represents the real hole of the
annulus. In H2 there is nothing significant, and indeed β2 = 0 in this case.
A particularly interesting case is the persistent homology of functions. Suppose that
M is a nice space, that f : M → R is smooth, and consider the sublevel sets Mρ ,
f−1((−∞, ρ]). Note that if u ≤ v then Mu ⊂ Mv, thus {Mρ}∞ρ=−∞ is a valid filtration.
Going from u to v, components of Mu may merge, and new components may be born
and possibly later merge with one another or with the components of Mu. Similarly,
the topology of these components may change, as holes and other structures form and
disappear. Following the topology of sets in this filtration as a function of ρ, by following
their homology, is another example of persistent homology. From our discussion on Morse
theory in Section 1.1.3, it is clear the birth/death times of homology elements will be the
2.1. BACKGROUND 25
critical levels of the function f (since these are the only levels where topology changes).
Figure 2.3 presents the barcodes of a function f : [0, 1]2 → R. In this figure, however,
the persistent homology is computed for superlevel (or ‘excursion’) sets defined by Aρ ,
f−1([ρ,∞)), rather than sublevel sets. Thus, to have a filtration of sets, we start from a
very high level, and gradually decrease it. In this case the excursion sets are subsets of
[0, 1]2, and thus only have non-trivial H0 and H1 (i.e. connected components and holes).
For more details on persistent homology see [15, 23, 25].
Figure 2.3: Barcodes for the excursion (superlevel) sets of a function on [0, 1]2. The top seven
boxes show the surfaces generated by a 2-dimensional random field above excursion sets Au for
different levels u. To determine the level for each figure, follow the dotted line down to the
scale at the bottom of the barcode. As the dotted lines pass through the boxes labeled H0 and
H1, the number of intersections with bars in the H0 (H1) box gives the number of connected
components (resp. holes) in Au. Thus, at u ∼ 1.9, Au has 4 connected components but no holes,
while at u ∼ −1.2, Au has only 1 connected component, but 9 holes. The horizontal lengths
of the bars indicate how long the different topological structures (generators of the homology
groups) persist. Computation of the barcodes was carried out in Matlab by Eliran Subag from
the Technion, using Plex (Persistent Homology Computations) from Stanford [16].
26 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
2.1.4 Euler Integration
Proving the main result of this Chapter (Theorem 2.4.1), we use a relatively new notion
of integration which treats the Euler characteristic operator as a measure. This integral is
gaining increasing interest lately and seems to have great potential to become a powerful
data analysis and signal processing tool (see [7, 8, 27], and the recent survey paper [21]).
In this section we review the basic ideas behind the Euler calculus.
The Euler characteristic is an integer value assigned to topological spaces, which pro-
vides a partial description of their shape. It is a topological invariant, meaning that if X
and Y are homeomorphic spaces, then they have the same Euler characteristic. There are
a few equivalent ways to define the Euler characteristic (e.g. using simplicial, cell or basic
complexes). For compact d-dimensional spaces X , the Euler characteristic (denoted by
χ(X)) can be defined by
χ(X) =d∑
k=0
(−1)kβk, (2.1.6)
where βk = rankHk(X) is the k-th Betti number of X . For example, χ(point) =
1, χ(S1) = 0, χ(S2) = 2, χ(T 2) = 0. One of the key properties of the Euler char-
acteristic is that it is additive, in the sense that for nice compact sets A,B we have
that
χ(A ∪B) = χ(A) + χ(B)− χ(A ∩B).
Therefore, it is tempting consider χ as a measure and integrate with respect to it. The
main problem in doing so is that χ is only finitely additive (it is also not positive, but
that can be overcome using the theory of signed measures).
At first (see [48]), integration with respect to the Euler characteristic was defined for
a small set of functions called constructible functions defined by
CF (X) =
{h(x) =
n∑
k=1
ak 1Ak(x)
∣∣∣∣∣ ak ∈ Z, Ak are disjoint tame subsets of X
}
where ‘tame’ means having a finite Euler characteristic. For this set of functions we can
define the Euler integral analogously to the Lebesgue integral. Let
h(x) =
n∑
k=1
ak1Ak(x),
2.1. BACKGROUND 27
and define ∫
X
hdχ =
n∑
k=1
akχ(Ak). (2.1.7)
This integral has many nice properties, similar to those of the Lebesgue integral, such as
linearity and a version of the Fubini theorem (see [48]). However, as mentioned above,
the Euler characteristic is not countably additive, and therefore we cannot continue from
here by approximating other functions using functions in CF (X).
In [8] two extensions were suggested for the Euler integral of real valued functions using
the notion of a definable function over an O-minimal structure (see [8] and references
therein for more background). Let ⌊x⌋ and ⌈x⌉ be the floor and ceiling values of x
respectively. In the O-minimal language, if h : X → R is a definable function on a
definable space X , then an important property is that both ⌊h⌋ and ⌈h⌉ are constructiblefunctions and hence have well defined Euler integrals. This leads to the following Riemann-
sum like definition:
Definition 2.1.2 ([8]). Let h : X → R be a definable function on a definable space X .
The lower Euler integral is defined by
∫
X
h⌊dχ⌋ = limn→∞
n−1
∫
X
⌊nh⌋ dχ, (2.1.8)
and the upper Euler integral is defined by
∫
X
h⌈dχ⌉ = limn→∞
n−1
∫
X
⌈nh⌉ dχ. (2.1.9)
These two extensions coincide with the original Euler integral in (2.1.7) for con-
structible functions. For other functions they might be completely different. Easier
formulae to work with are given in the next proposition.
Proposition 2.1.3 ([8]). If h : X → R is a definable function, then
∫
X
h⌊dχ⌋ =∫ ∞
u=0
[χ(h ≥ u)− χ(h < −u)] du,
and ∫
X
h⌈dχ⌉ =∫ ∞
u=0
[χ(h > u)− χ(h ≤ −u)] du,
where χ(h ≥ u) , χ (h−1([u,∞))) and χ(h > u) , χ (h−1((u,∞))), etc.
28 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
Unfortunately, these extensions of the Euler integral have many flaws, of which the
most prominent one is the lack of additivity. For example, a simple computation shows
that for X = [0, 1]
∫
X
x⌊dχ⌋ +∫
X
(1− x)⌊dχ⌋ = 2 6= 1 =
∫
X
1⌊dχ⌋.
Nevertheless, these integrals still have interesting properties. One of these is stated in the
following theorem.
Theorem 2.1.4 ( [8]). If M is a closed d-dimensional manifold and h : M → R is a
Morse function, then
∫
M
h⌊dχ⌋ =∑
p∈CP(h)
(−1)d−µ(p)h(p),
∫
M
h⌈dχ⌉ =∑
p∈CP(h)
(−1)µ(p)h(p).
where CP(h) is the set of critical points of h and µ(p) is the Morse index of p (i.e. the
number of negative eigenvalues of the Hessian matrix Hh(p)).
2.2 Redefining the Euler Integral
The definition of the Euler integral as it appears in [8] and stated in Section 2.1.4 uses
the language of O-minimal structures and definable functions. Later in this chapter we
would like to evaluate the expected value of the Euler integral over a Gaussian random
field. Unfortunately, it is not clear if Gaussian random fields can be made to fit inside an
O-minimal setting. Therefore, we now introduce a simplified definition of a tame function,
which we shall use throughout this chapter. We then re-define the Euler integral for this
type of function.
Definition 2.2.1. A continuous function h : X → R on a compact topological space
X with a finite Euler characteristic is tame if the homotopy types (and hence the Euler
characteristics) of h−1((−∞, u]) and h−1([u,∞)) change only finitely many times as u
varies over R and the Euler characteristic of each set is always finite.
2.2. REDEFINING THE EULER INTEGRAL 29
With this wide definition of tame functions, the formulae appearing in Proposition
2.1.3 are well defined. Thus, for tame functions we use these formulae as our definition of
the Euler integral, replacing Definition 2.1.2 with
Definition 2.2.2. Let h : X → R be a tame function. Then the lower and upper Euler
integrals are defined by
∫
X
h⌊dχ⌋ =∫ ∞
u=0
[χ(h ≥ u)− χ(h < −u)] du,
and ∫
X
h⌈dχ⌉ =∫ ∞
u=0
[χ(h > u)− χ(h ≤ −u)] du.
From here on, we focus only on the upper Euler integral . Note, however, that all the
results we present have a straightforward lower integral analogue.
It turns out that the Euler integral is strongly related to both Morse theory and
persistent homology. In the following sections we introduce and discuss these connections.
2.2.1 The Euler Integral and Morse Theory
In this section we discuss the connection between the Euler integral of a tame function,
and its critical points. In [8] the Euler integral was given a stratified Morse theory
interpretation. A corollary of this approach was stated in Proposition 2.1.4, claiming that
if h : M → R is a Morse function, and M is a closed manifold, then
∫
M
h⌈dχ⌉ =∑
p∈CP(h)
(−1)µ(p)h(p), (2.2.1)
where CP(h) is the set of critical points of h and µ(p) is the index of p as a critical point.
Using our definition of tame functions, we have the following more general proposition.
Proposition 2.2.3. Let h : X → R be a tame function and let CV(h) be the set of values
where the homotopy type of h−1(−∞, u] changes (the critical values of h). Then
∫
X
h⌈dχ⌉ =∑
v∈CV(h)
∆χ(h, v) v,
where ∆χ(h, v) = χ(h ≤ v + ǫ) − χ(h ≤ v − ǫ), for sufficiently small ǫ, is the change in
the Euler characteristic of h−1((−∞, u]) as u passes through the critical value v.
30 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
Proof. Label the critical values CV(h) = {v1, . . . , vn} in increasing order such that v1 <
· · · < vi < 0 ≤ vi+1 < · · · < vn. If vk < u < vk+1, then via a telescoping sum
χ(h ≤ u) = ∆χ(h, v1) + · · ·+∆χ(h, vk),
χ(h > u) = χ(X)− χ(h ≤ u) = ∆χ(h, vk+1) + · · ·+∆χ(h, vn).
Therefore for u ∈ [0,∞) and u 6= ±vj ,
χ(h > u) =
n∑
j=i+1
∆χ(h, vj)1[0,vj ](u), and
χ(h ≤ −u) =
i∑
j=1
∆χ(h, vj)1[0,−vj ](u).
Thus, ∫
M
h⌈dχ⌉ =∫ ∞
u=0
(χ(h > u)− χ(h ≤ −u)) du =
n∑
j=1
vj ∆χ(h, vj),
as desired.
This recovers the Morse theoretic viewpoint in (2.2.1), since if h : M → R is a Morse
function then Morse theory says that the Euler characteristic changes by the addition of
(−1)k as h−1((−∞, u]) passes through a critical point of index k (See Section 1.1.3). The
following corollary slightly generalizes Proposition 2.2.3.
Corollary 2.2.4. Let h : X → R be tame, satisfying the conditions of Proposition 2.2.3.
If a 6∈ CV(h), then
∑
v∈CV(h)v<a
∆χ(h, v)v =
∫
X
(h ∧ a)⌈dχ⌉ + aχ(h ≤ a)− aχ(X).
Note, that by taking a > supX h(x) we recover Proposition 2.2.3.
Proof. If v1 < · · · < vn are the critical values of h and a is such that vk < a < vk+1, then
ha , (h ∧ a) has critical values at v1, · · · , vk, a. By Proposition 2.2.3,
∫
X
ha⌈dχ⌉ =k∑
j=1
vj ∆χ(ha, vj) + a∆χ(ha, a)
=
k∑
j=1
vj ∆χ(h, vj) + a(χ(M)− χ(h ≤ a)).
This gives us the desired result.
2.2. REDEFINING THE EULER INTEGRAL 31
2.2.2 The Euler Integral and Persistent Homology
The Euler integral of a tame function h is strongly related to the persistent homology of
h (described in Section 2.1.3). In light of Proposition 2.2.3, this is not surprising, since
the Euler integral is a measure of how the Euler characteristic of h−1((−∞, u]) changes,
while the persistent homology tracks how the homology of h−1((−∞, u]) changes. To
make the relationship precise we introduce the following natural extension of the Euler
characteristic to barcodes.
Recall that a barcode is a graphical representation of persistent homology, as a collec-
tion of bars divided into groups for different homology degrees. A bar in the j-th group,
starting at b and terminating at d, represents a generator for the homology group Hj that
is born at level b and dies at level d. We therefore write PH∗ to denote the persistent
homology of a filtration thought of as a collection of bars B. For each bar in PH∗ we write
b(B), d(B) for its birth and death levels, ℓ(B) = d(B)− b(B) for its length, and µ(B) for
the degree of homology it belongs to.
Definition 2.2.5. Suppose PH∗ contains only a finite number of bars, and no bars of
infinite length. We define the Euler characteristic of PH∗ to be
χ(PH∗) =∑
B∈PH∗
(−1)µ(B)ℓ(B).
Let h : X → R be a tame function and let PH∗(h, a) be the barcode of the persistent
homology of the filtration {h−1((−∞, u])}a−∞. The relation between the Euler integral
and the Euler characteristic of the persistent homology PH∗(h, a) is given by the following
proposition.
Proposition 2.2.6. Let h : X → R be a tame function, and set hmax , supX h(x). Then
χ(PH∗(h, hmax)) = hmax χ(X)−∫
X
h⌈dχ⌉,
and, in general,
χ(PH∗(h, a)) = aχ(X)−∫
X
(h ∧ a)⌈dχ⌉.
Proof. Let B ∈ PH∗(h, a), and denote 1B(u) , 1[b(B),d(B)](u), then using Definition 2.2.5
32 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
we have
χ(PH∗(h, a)) =∑
B∈PH∗(h,a)
(−1)µ(B)
∫ a
−∞1B(u)du
=
∫ a
−∞
∑
B∈PH∗(h,a)
(−1)µ(B)1B(u)du.
Note, that the number of bars of index k that intersect a level u is exactly the k-th Betti
number of h−1((−∞, u]), denoted by βk(u). Thus,
∑
B∈PH∗(h,a)
(−1)µ(B)1B(u) =∑
k
(−1)kβk(u) = χ(h ≤ u),
and therefore we have,
χ(PH∗(h, a)) =
∫ a
−∞χ {h ≤ u} du.
If a ≥ hmax, then using Definition 2.2.2 and the fact that χ(f ≤ u) = χ(M) for
u ≥ hmax, we have
∫ a
−∞χ {h ≤ u} du = aχ(M) +
∫ ∞
0
(χ {h ≤ u} − χ(M)) du+
∫ 0
−∞χ {h ≤ u} du
= aχ(M)−∫
M
h⌈dχ⌉
= aχ(M)−∫
M
(h ∧ a)⌈dχ⌉.
On the other hand, if a < hmax, then the maximum of ha , (h∧a) is a. Thus, applyingthe first part of the Lemma for ha yields
χ(PH∗(ha, a)) = aχ(M)−∫
M
ha⌈dχ⌉ = aχ(M)−∫
M
(h ∧ a)⌈dχ⌉.
Finally, the barcodes of h and ha are identical in (−∞, a) and therefore χ(PH∗(ha, a)) =
χ(PH∗(h, a)). This completes the proof.
2.3 The Euler Integral of Gaussian Random Fields
Let M be a stratified space and let g : M → R be a Gaussian or Gaussian related random
field. We are interested in computing the expected value of the Euler integral of the field g
over M , which is a precursor to the main result of the chapter (Theorem 2.4.1). As noted
2.3. THE EULER INTEGRAL OF GAUSSIAN RANDOM FIELDS 33
earlier, while we focus on the upper Euler integral, everything we do has a lower Euler
integral analogue. The following result is a consequence of the GKF (Theorem 2.1.1) and
Proposition 2.2.2.
Theorem 2.3.1. Let M be a compact d-dimensional stratified space, satisfying the con-
ditions of the GKF, and let f : M → Rk be a k-dimensional Gaussian random field,
both satisfying the GKF conditions. For a piecewise C2 tame function G : Rk → R, let
g = G ◦ f . Setting Du = G−1((−∞, u]), we have
E
{∫
M
g⌈dχ⌉}
= χ(M)E {g} −d∑
j=1
(2π)−j/2Lj(M)
∫
R
Mγj (Du)du (2.3.1)
where E {g} = E {g(x)} (g(x) has a constant mean).
The difficulty in evaluating the expression above lies in computing the Minkowski
functionals Mγj (Du). In Sections 2.3.1 and 2.3.2 we present a few cases where they have
been computed, which allows us to simplify (2.3.1).
Proof. Using our definition for the Euler integral (Definition 2.2.2) we have that
∫
M
g⌈dχ⌉ =∫ ∞
0
(χ(g > u)− χ(g ≤ −u)) du
=
∫ ∞
0
(χ(M)− χ(g ≤ u)) du−∫ 0
−∞χ(g ≤ u)du.
Therefore,
E
{∫
M
g⌈dχ⌉}
=
∫ ∞
0
(χ(M)− E {χ(g ≤ u)}) du
−∫ 0
−∞E {χ(g ≤ u)} du.
(2.3.2)
Replacing D with Du in the GKF (Theorem 2.1.1) yields
E {χ(g ≤ u)} = E{χ(f−1(Du))
}=
d∑
j=0
(2π)−j/2Lj(M)Mγj (Du).
Substituting this formula into (2.3.2) yields,
E
{∫
M
g⌈dχ⌉}
=d∑
j=0
(2π)−j/2ρjLj(M),
34 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
where
ρj =
−∫RMγ
j (Du)du j > 0,
∫∞0
(1−Mγ0(Du)) du−
∫ 0
−∞ (Mγ0(Du)du) j = 0.
The expression for ρ0 can be further simplified. Let X ∈ Rk be a standard multi-normal
vector and Y = G(X), then
Mγ0 (Du) = γk(Du) = P (X ∈ Du) = P (Y ≤ u) .
Therefore,
ρ0 =
∫ ∞
0
(1− P (Y ≤ u)) du−∫ 0
−∞P (Y ≤ u) du
=
∫ ∞
0
P (Y > u) du−∫ 0
−∞P (Y ≤ u) du
= E {Y } .
Since for all x ∈ M , f(x) ∼ N (0, 1), we can replace Y with G(f(x)) = g(x). Finally,
recalling that L0 ≡ χ completes the proof.
2.3.1 Real Valued Fields
For real valued fields (i.e. k = 1) we can improve Theorem 2.3.1 by computing the terms
Mγj (Du) that appear in (2.3.1). First, we need to recall some facts about the family of
Hermite polynomials. For n ≥ 0, the n-th Hermite polynomial is defined as
Hn(x) = (−1)nϕ(x)−1 dn
dxnϕ(x),
where ϕ(x) = (2π)−1/2e−x2/2 is the density of the standard Gaussian distribution. This
family of polynomials is orthogonal under the inner product on functions f, g : R → R
〈f, g〉 =∫
R
f(x)g(x)ϕ(x)dx.
A consistent and useful convention is
H−1(x) = ϕ(x)−1
∫ ∞
x
ϕ(u)du.
2.3. THE EULER INTEGRAL OF GAUSSIAN RANDOM FIELDS 35
Theorem 2.3.2. Let M be a compact d-dimensional stratified space, and let f : M → R
be a real valued Gaussian random field, both satisfying the GKF conditions. Let G : R → R
be piecewise C2 and g = G ◦ f . Then
E
{∫
M
g⌈dχ⌉}
= χ(M)E {g}+d∑
j=1
(−1)jLj(M)〈Hj−1, (sign(G
′))jG′〉(2π)j/2
.
In the case that the functionG is strictly monotone, this result can be further simplified
using the fact that sign(G′) is constant and then integrating by parts.
Corollary 2.3.3. Let f be as in Theorem 2.3.2, and G be a strictly increasing function.
Then
E
{∫
M
g⌈dχ⌉}
=
d∑
j=0
(−1)jLj(M)〈Hj, G〉(2π)j/2
.
If G is strictly decreasing then,
E
{∫
M
g⌈dχ⌉}
=
d∑
j=0
Lj(M)〈Hj , G〉(2π)j/2
.
To prove Theorem 2.3.2 we will need the following calculus lemma, which is a special
case of Federer’s coarea formula.
Lemma 2.3.4. Let h : R → R be an integrable function and let G : R → R be a piecewise
differentiable continuous function that is nondifferentiable on a countable set. Then
∫
R
h(x) |G′(x)| dx =
∫
R
∑
x∈G−1(t)
h(x)
dt.
Proof of Theorem 2.3.2. By Theorem 2.3.1, it suffices to show that
∫
R
Mγj (Du)du = (−1)j
⟨Hj−1, (sign(G
′))jG′⟩ , (2.3.3)
for j ≥ 1, where Du = G−1((−∞, u]). Since G is continuous, we can write the inverse
image of (−∞, u] as a disjoint union of closed intervals
Du =⋃
i
[ai, bi],
where we allow one ai to be −∞ and one bi to be ∞. Note that for all the finite values
we have G(ai) = G(bi) = u, G′(ai) < 0 and G′(bi) > 0.
36 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
For small enough ρ we have
Tube (Du, ρ) =⋃
i
[ai − ρ, bi + ρ].
Therefore
γk (Tube (Du, ρ)) =∑
i
(Φ(bi + ρ)− Φ(ai − ρ)) , (2.3.4)
where Φ(x) =∫ x
−∞ ϕ(u)du, and γk is Gauss measure on Rk (see Section 2.1.1). The Taylor
expansion of Φ(x+ ρ) in ρ is
Φ(x+ ρ) = Φ(x) +
∞∑
j=1
ρj
j!(−1)j−1Hj−1(x)ϕ(x),
so in particular Mγj ((−∞, x]) = (−1)j−1Hj−1(x)ϕ(x). Therefore we conclude that for
j ≥ 1
Mγj (Du) =
∑
i
((−1)j−1Hj−1(bi)ϕ(bi) +Hj−1(ai)ϕ(ai)
). (2.3.5)
Note that if bi = ∞ (or ai = −∞) its contribution to the volume of the tube in (2.3.4) is
independent of ρ (1 or 0 respectively). Thus, it will affect only Mγ0 and we can assume
all the ai and bi in (2.3.5) are finite and hence⋃
i{ai, bi} = G−1(u).
If j is odd, then from (2.3.5) we have that
Mγj (Du) =
∑
i
(Hj−1(bi)ϕ(bi) +Hj−1(ai)ϕ(ai)) =∑
x∈G−1(u)
Hj−1(x)ϕ(x).
Using Lemma 2.3.4 we have
∫
R
Mγj (Du) du =
∫
R
∑
x∈G−1(u)
Hj−1(x)ϕ(x)
du
=
∫
R
Hj−1(x)ϕ(x)|G′(x)|dx
= 〈Hj−1, |G′|〉
= (−1)j−1⟨Hj−1, (sign(G
′))jG′⟩ .
If j is even, then from (2.3.5)
Mγj (Du) =
∑
i
(−Hj−1(bi)ϕ(bi) +Hj−1(ai)ϕ(ai))
= −∑
x∈G−1(u)
sign(G′(x))Hj−1(x)ϕ(x).
2.3. THE EULER INTEGRAL OF GAUSSIAN RANDOM FIELDS 37
Using Lemma 2.3.4 we have
∫
R
Mγj (Du) du = −
∫ ∞
−∞
∑
x∈G−1(u)
Hj−1(x)ϕ(x) sign(G′(x))du
= −∫
R
Hj−1(x)ϕ(x)G′(x)dx
= −〈Hj−1, G′〉
= (−1)j−1⟨Hj−1, (sign(G
′))jG′⟩ .
This completes the proof.
2.3.2 Vector Valued Fields
When f is a vector valued Gaussian field, it can be difficult to evaluate the Minkowski
functionals Mγj . In this subsection we treat two special cases, in the first of which it is
possible to do the calculus and find nice explicit formula for the mean Euler integral.
The χ2 case
Let M be a compact d-dimensional manifold. A χ2 field with k degrees of freedom is of
the form g = G ◦ f , where f = (f1, . . . , fk) : M → Rk is a Gaussian random field with
i.i.d., mean zero and unit variance components, and G(x1, . . . , xk) =∑k
i=1 x2i .
Theorem 2.3.5. The mean Euler integral for a χ2 field with k degrees of freedom, with
k ≥ d, is given by
E
{∫
M
g⌈dχ⌉}
= kL0(M)− 2√π
Γ(k+12)
Γ(k2)L1(M) +
1
πL2(M).
Proof. First note that in this case, Mγj (Du) = Mγ
j (G−1(−∞, u]) = 0 when u < 0 since
G is nonnegative. In [3, Section 15.10.2] it is shown that for k ≥ d and j ≥ 1
Mγj (Du) =
dj−1pk(x)
dxj−1
∣∣∣∣x=
√u
where pk(x) =xk−1e−x2/2
Γ(k/2)2(k−2)/2.
Therefore,
∫
R
Mγj (Du)du =
∫ ∞
0
dj−1pk(x)
dxj−1
∣∣∣∣x=
√u
du = 2
∫ ∞
0
dj−1pk(t)
dtj−1t dt.
38 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
Computing for j = 1, j = 2, d ≥ j ≥ 3, we have that
∫ ∞
0
Mγ1(Du)du = 2
∫ ∞
0
pk(t)t dt = 2√2Γ(k+1
2)
Γ(k2),
∫ ∞
0
Mγ2(Du)du = 2
∫ ∞
0
p′k(t)t dt = 2,
and integration by parts yields
∫ ∞
0
Mγj (Du)du = 2
(dj−2pk(t)
dtj−2t
∣∣∣∣∞
0
− dj−3pk(t)
dtj−3
∣∣∣∣∞
0
)= 0.
Finally, noting that E {g} = k completes the proof.
The F case
Let M be a compact d-dimensional manifold and let f : M → Rn+m be a vector valued
Gaussian field with i.i.d., mean zero and unit variance components,
G(x) =n
m
∑mi=1 x
2i∑n
i=1 x2m+i
,
and g = G ◦ f . In this case, it is proved in [3, Theorem 15.10.3] that for j ≥ 1
Mγj
(G−1([u,∞))
)=(1 +
mu
n
)−m+n−22
⌊ j−12 ⌋∑
l=0
j−2l−1∑
i=0
Cm,n,j,l,i
(mu
n
)m−j2
+i+l
for a set of constants Cm,n,j,l,i.
Using basic calculus we can show that for n > j + 2 and for all m, the integral∫∞0
Mγj (G
−1[u,∞)) du converges. This can be used to compute the expected lower Euler
integral∫Mg⌊dχ⌋ rather than the expected upper integral that we have computed so far.
Thus, we can conclude that for n > d+ 2 the expected lower Euler integral is finite. For
each n,m it is possible to compute the exact value, but no general formula is known. In
order to compute the upper Euler integral, we need to compute Mγj (G
−1((−∞, u])). We
note that this is feasible, but technically too complicated to be pursued here.
2.4 Persistent Homology of Gaussian Random Fields
In Section 2.2.2 we described the connection between the Euler integral of a function
and its persistent homology. This allows us to interpret our computation of the expected
2.4. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS 39
Euler integral for Gaussian random fields as a computation on the expected value of a
quantitative measure of the persistent homology of a Gaussian random field. We consider
this interpretation to be the main result of this Chapter. This result is the first of its kind
giving a precise form for the expected value of a quantitative property of the persistent
homology of random functions.
Theorem 2.4.1. Let f : M → Rk be a Gaussian random field satisfying the GKF condi-
tions, G : Rk → R continuous and piecewise C2, and g = G ◦ f . Then
E {χ(PH∗(g, gmax))} = χ(M) (E {gmax} − E {g}) +d∑
j=1
(2π)−j/2Lj(M)
∫
R
Mγj (Du)du.
If f : M → R is a real valued field then
E {χ(PH∗(f, fmax))} = E {fmax}χ(M) +L1(M)√
2π.
Proof. By Proposition 2.2.6,
E {χ(PH∗(g, gmax))} = E {gmax}χ(M)− E
{∫
M
g⌈dχ⌉}.
and using Theorem 2.3.1 completes the proof .
It can also be useful to consider the partial barcode PH∗(g, a), terminated at some
fixed level a (rather than gmax). For real valued fields we have an explicit formula for this
case as well.
Theorem 2.4.2. Let f : M → R be a Gaussian random field satisfying the GKF condi-
tions. Then for any a ∈ R,
E {χ(PH∗(f, a))} = χ(M) (ϕ(a) + aΦ(a)) + ϕ(a)d∑
j=1
(−1)j(2π)−j/2Lj(M)Hj−2(a).
To prove Theorem 2.4.2 we need the following lemma.
Lemma 2.4.3. Let f : M → R be a Gaussian random field, satisfying the GKF condi-
tions. Then,
E
{∫
M
(f ∧ a)⌈dχ⌉}
= χ(M) (a− aΦ(a)− ϕ(a))
− ϕ(a)
d∑
j=1
(−1)j(2π)−j/2Lj(M)Hj−2(a).
40 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
Proof. We will apply Theorem 2.3.2 to the function Ga(x) , (x∧a). In this case G′a(x) =1(−∞,a](x). Therefore,
⟨Hj−1, (sign(G
′a))
jG′a
⟩=
∫ a
−∞Hj−1(u)ϕ(u)du = −Hj−2(a)ϕ(a).
In addition,
E {Ga ◦ f} =
∫ a
−∞xϕ(x)dx+
∫ ∞
a
aϕ(x)dx = a− aΦ(a)− ϕ(a).
Thus, by Theorem 2.3.2, we are done.
Proof of Theorem 2.4.2. Using Proposition 2.2.6 we have
E {χ(PH∗(f, a))} = aχ(M)− E
{∫
M
(f ∧ a)⌈dχ⌉}
and applying Lemma 2.4.3 completes the proof.
2.5 Weighted Sum of Critical Values
In this section we use the link between the Euler integral and Morse theory discussed in
Section 2.2.1, to present novel statements about critical points of Gaussian random fields.
Taking G(x) = H1(x) = x in Theorem 2.3.2 and using Proposition 2.2.3 yields the
following compact formula.
Theorem 2.5.1. Let f : M → R be a Gaussian random field satisfying the conditions of
the GKF. Then
E
∑
v∈CV(f)
∆χ(f, v)v
= −L1(M)√
2π, (2.5.1)
where CV(f) is the set of critical values of f and ∆χ(f, v) is the change in the Euler
characteristic of f−1((−∞, u]) as u passes through v from below (see Section 2.2.1). In
the case that M is a closed manifold,
E
∑
p∈CP(f)
(−1)µ(p)f(p)
= −L1(M)√
2π. (2.5.2)
where CP(f) is the set of critical points of f , and µ(p) is the Morse index of the critical
point p.
2.5. WEIGHTED SUM OF CRITICAL VALUES 41
In the case that M is a closed even dimensional manifold, L1(M) = 0 so (2.5.2) states
that
E
∑
p∈CP(f)
(−1)µ(p)f(p)
= 0.
Note that this fact has the following alternative proof. If f is a Morse function, then so is
f , −f . In addition, p is a critical point of f with index µ(p) if and only if p is a critical
point of f with index µ(p) = d − µ(p). Finally, f is a Gaussian random field with zero
mean, therefore f and f have the same probability law. Thus,
E
∑
p∈CP(f)
(−1)µ(p)f(p)
= E
∑
p∈CP(f)
(−1)µ(p)f(p)
= −E
∑
p∈CP(f)
(−1)µ(p)f(p)
.
The first equality holds because f and f have the same probability law. The second
equality holds because µ = d− µ and d is even.
The thing to note about Theorem 2.5.1 is that the expected value of a weighted
sum of the critical values scales like L1(M), a 1-dimensional measure of M and not
the volume Ld(M), as one might have expected. Consider the following example: Let
f : Rd → R be a Gaussian random field with covariance function C : Rd × Rd → R given
by C(x, y) = e−‖x−y‖2
2 . This covariance function induces the Euclidian metric on Rd and
Theorem 2.5.1 implies that
E
{∫
[0,L]df⌈dχ⌉
}= −L1([0, L]
d)√2π
= − d√2π
L.
In comparison to Theorem 2.5.1, letting G(x) = xd and using Theorem 2.3.2 we get
that E{∫
Mf d⌈dχ⌉
}depends on the volume Ld(M) (as well the other measures). So while
in general the behavior of the critical points and the critical values depends on the volume,
when one takes the weighted sum of the critical values a lot of cancelation occurs and the
result only depends on a 1-dimensional measure. This phenomenon is very surprising and
is intrinsically interesting. In fact, there is an alternative non-topological way to prove
this result using the Kac–Rice formula. However, it is not clear if there is a topological
phenomenon behind these cancelations, and so we leave this for future research.
42 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
The result in Theorem 2.5.1 can be generalized to the case where we consider only
critical values below some level a.
Theorem 2.5.2. Let f : M → R be a Gaussian random field satisfying the conditions of
the GKF. Then
E
∑
v∈CV(f)v<a
∆χ(f, v)v
=− ϕ(a)L0(M)
− ϕ(a)
d∑
j=1
(−1)j(2π)−j/2Lj(M) (Hj−2(a) + aHj−1(a)).
(2.5.3)
In the case that M is a closed manifold, then the left hand side above can be replaced with
E
∑
p∈CP(f): f(p)<a
(−1)µ(p)f(p)
.
Observe that taking a → ∞ recovers the result in Theorem 2.5.1.
Proof. According to Corollary 2.2.4,
E
∑
v∈CV(f)v<a
∆χ(f, v)v
= E
{∫
M
(f ∧ a)⌈dχ⌉}+ aE {χ(f ≤ a)} − aχ(M).
The first term on the right hand side is given by Lemma 2.4.3 and the second term is
given by the GKF (Theorem 2.1.1).
2.6 Towards Applications
An interesting application of the Euler integral is suggested in [7]. Suppose that an
unknown number of targets are located in a space X , and each target α is represented
by its support Uα ⊂ X . Suppose also that the space X is covered with sensors, reporting
only the number of targets each one sees (i.e. no identification). Let h : X → Z be the
sensor field, i.e.
h(x) = # {targets activating the sensor located at x} .
The following theorem states how to combine the readings from all the sensors and get
the exact number of targets.
2.6. TOWARDS APPLICATIONS 43
Theorem 2.6.1 ([7]). If all the target supports Uα satisfy χ(Uα) = N for some N 6= 0,
then
# {targets} =1
N
∫
X
hdχ,
where dχ is the original Euler integration for constructible functions.
Note that we do not need to assume anything about the targets other than they all have
the same Euler characteristic. For example, we need not assume that they are all convex
or even have the same number of connected components. On the other hand, the theorem
assumes an ideal sensor field, in the sense that the entire (most likely continuous) space
X is covered with extremely accurate sensors (the range of each sensor is a single point
in X). In [8] more realizable models using the lower/upper Euler integral are discussed.
Using the results from Section 2.3 we can extend the setup above to the case where
the readings from the sensors are contaminated by a Gaussian (or Gaussian related) noise
f(x). We will use the following proposition.
Proposition 2.6.2. Let h, f : X → R be tame functions and suppose that h(X) is
discrete, then ∫
X
(h + f)⌈dχ⌉ =∫
X
h⌈dχ⌉ +∫
X
f⌈dχ⌉.
Proof. Let h(x) =∑n
i=1 ai1Ai(x), where the Ai are disjoint. Then by the additivity of
the Euler characteristic we have that
∫
X
(h + f)⌈dχ⌉ =n∑
i=1
∫
Ai
(h+ f)⌈dχ⌉. (2.6.1)
Next, ∫
Ai
(h + f)⌈dχ⌉ =∫
Ai
(ai + f)⌈dχ⌉ = aiχ(Ai) +
∫
Ai
f⌈dχ⌉,
where the last equality follows from Proposition 2.2.3, since every critical value is changed
by ai. Applying this to (2.6.1) completes the proof.
Returning to the target enumeration problem, suppose that we have a deterministic
signal x =∫Xh⌈dχ⌉, observed via a noisy measurement Y =
∫X(h + f)⌈dχ⌉. By the
above proposition we have that
Y =
∫
X
(h + f)⌈dχ⌉ =∫
X
h⌈dχ⌉ +∫
X
f⌈dχ⌉ = x+N,
44 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
so we have the classical parameter estimation with additive noise model. If f(x) is a
Gaussian or Gaussian related random field satisfying the conditions in Theorem 2.3.2, then
we can use the estimator x = Y −E {N}. This is a very naive estimator indeed, however
it still reduces the mean squared error compared to just taking the measurement Y .
Further investigating the properties of the Euler integral might lead to useful estimation
techniques for this model.
2.7 Summary and Future Work
In this chapter we presented novel quantitative claims about the persistent homology of
Gaussian random fields. To do this, we first gave a very general Morse theoretic interpre-
tation for the Euler integral (Proposition 2.2.3). We then used this interpretation to relate
the persistent homology of a function to its Euler integral (Proposition 2.2.6). Finally,
we applied the Gaussian Kinematic Formula (Theorem 2.1.1), to evaluate the expected
value of the Euler integral of Gaussian and Gaussian related fields, and consequently, the
expected value of the Euler characteristic of the persistent homology of these fields.
Persistent homology is a very powerful theoretical and analytical tool that can be
used to study spaces and functions. It is already being used in a variety of data analysis
applications (cf. [12,15,17,19]). However, in order to make it into a statistical compelling
tool, there is a need to introduce rigorous probabilistic models describing the behavior
of persistence diagrams. There has been some work in this direction (cf. [13, 34]), but
including the work presented in this chapter, this is all just the tip of the iceberg.
Possible ways to continue the work presented in this chapter are numerous. In this
chapter we defined the Euler characteristic of a barcode, and computed its expected value.
However, it is still remains to further understand what this Euler characteristic can tell
us about excursion sets of random fields. In addition, the results about the signed sum
of critical value deserves further investigation. It would be very interesting to understand
the exact causes of the phenomenon presented in Section 2.5, and to determine if it is
unique to Gaussian fields, or extends beyond those.
From a more general perspective, we believe that there is lot more to reveal about the
persistent homology of random fields. It is important to find other parameters charac-
2.7. SUMMARY AND FUTURE WORK 45
terizing the persistent homology of a function, and to investigate them statistically. As
the nature of persistent homology is abstract, it would be difficult to find such quanti-
tative parameters for which one can also carry out probabilistic computations. Possible
candidates are the average bar length, maximal bar length, number of bars, distribution
of birth/death times of a bar and so forth. In addition, as homology elements and critical
points are strongly connected via Morse theory, it is highly probable that results related
to the homology of excursion sets could lead to novel statements about critical points of
random fields, about which very little is known at this point.
Finally, the following points are a few more ideas for future work, which were raised
while we were working on the results presented in this chapter.
• Similarly to Euler integration, one can make sense of integration with respect to
any of the Lipschitz-Killing curvatures (or any linear combination of them). In this
setting an appropriate generalization of Theorem 2.5.1 should hold. It could be
interesting to study those integrals as well, and see if they also lead to new and
interesting statements about Gaussian random fields.
• Theorems 2.5.1 and 2.4.1 show that if f : M → R is a real valued Gaussian ran-
dom field satisfying the GKF, we have that E{∫
f⌈dχ⌉}
grows linearly with a
1-dimensional measure of the space M . Could this be developed further into a
way to statistically test if a set of given measurements originated from a Gaussian
random field?
• One motivation for computing the expected Euler characteristic of super-level sets of
a random field f : M → R is that it allows one to estimate the excursion probabilities
P (supx∈M f(x) ≥ u) (see [3, Chapter 14]). In Theorem 2.5.2 we computed
E
∑
v∈CV(f)v<a
∆χ(f, v)v
.
When a is a large negative number, this value can approximate the number of local
minima that are below the value a. Could these results be used to gain meaningful
information about E {fmin}?
46 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS
Chapter 3
The Topology of Random Geometric
Complexes
3.1 Background
In this chapter we study the limiting behavior of critical points of the distance function
(defined in Section 3.2). While the critical points are, by themselves, intrinsically inter-
esting, knowledge of their behavior also has immediate implications (via Morse theory) to
the study of the topology of Cech complexes built over random point sets. In this Section
we give a brief introduction to geometric complexes, discuss their use as an applied topol-
ogy tool, and review previous work. The results presented in this chapter were published
in [2, 9].
3.1.1 Geometric Complexes
A k-dimensional simplex (or just a ‘k-simplex’) in Rd is the convex hull of k + 1 points
x0, . . . , xk ∈ Rd, denoted by σ = [x0, . . . , xk]. A simplicial complex is a collection of
simplexes satisfying the following conditions.
Definition 3.1.1. A set of simplexes ∆ is a simplicial complex if
1. For any σ ∈ ∆, if σ′ ⊂ σ then σ′ ∈ ∆, and
2. For any σ1, σ2 ∈ ∆, σ1 ∩ σ2 ∈ ∆.
47
48 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Figure 3.1 depicts two collection of simplexes in R2, one of which is a simplicial com-
plex, and another one which is not.
(a) (b)
Figure 3.1: Simplicial complexes in R2. (a) This collection of vertices, edges and triangles is a
valid simplicial complex (see Definition 3.1.1). (b) Here we also have a collection of simplexes,
however, the intersection of the two triangles is not included in this collection, and therefore this
is not a simplicial complex. However, representing each simplex by its vertices, this collection
does represent an abstract simplicial complex (see Definition 3.1.2).
We will use the notion of an ‘abstract simplicial complex’, in which the simplexes are
considered just as finite subsets of a global set S, and lose their geometrical meaning.
Definition 3.1.2. Let S be a set. A collection ∆ of finite subsets of S is called an abstract
simplicial complex if, for any σ ∈ ∆, if σ′ ⊂ σ then σ′ ∈ ∆.
From the definition it is clear that any simplicial complex is an abstract simplicial
complex as well (if we think of every k-simplex as a set of k + 1 vertices). The collection
of simplexes in Figure 3.1(b) demonstrates an abstract simplicial complex which is not a
simplicial complex. Two types of abstract simplicial complexes which are commonly used
in applied algebraic topology, are the Cech and Rips complexes.
Definition 3.1.3 (The Cech Complex). Let P = {x1, x2, . . .} be a collection of points in
a metric space X . Construct an abstract simplicial complex C(P, ε) in the following way:
1. The 0-simplexes are the points in P.
2. An n-simplex [xi0 , . . . , xin ] is in C(P, ǫ) if⋂n
k=0Bǫ(xik) 6= ∅,
where Bǫ(x) is the ball of radius ǫ around x. The complex C(P, ǫ) is called the Cech
complex attached to P and ǫ.
3.1. BACKGROUND 49
Definition 3.1.4 (The Vietoris-Rips Complex). Let P = {x1, x2, . . .} a collection of
points in a metric space X . Construct an abstract simplicial complex R(P, ǫ) in the
following way:
1. The 0-simplexes are the points in P.
2. An n-simplex [xi0 , . . . , xin ] is in R(P, ǫ) if Bǫ(xik) ∩ Bǫ(xim) 6= ∅ for every 0 ≤ k <
m ≤ n.
The complex R(P, ǫ) is called the Rips complex attached to P and ǫ.
(a) (b)
Figure 3.2: The Cech and Rips complexes. (a) A Cech complex constructed from a set of points
and a given radius. The complex consists of 6 vertices, 7 edges, and a single triangle. The grey
area represents the balls used to construct the complex, and does not belong to the complex
itself. (b) A Rips complex constructed from the same set of points and the same radius ǫ. Note
that a new triangle was added on the right, since we have pairwise intersection between the
balls. This triangle does not belong to the Cech complex, however, since the intersection of the
three balls is empty.
Figure 3.2 shows a Cech and a Rips complex, constructed from the same set of points
and the same given radius. It is important to note that while the space X might have
a finite dimension d (for example X = Rd), the Cech and Rips complexes might contain
k-simplexes with k > d. Thus, neither of them necessarily embeds into X .
From the definitions above it is obvious that C(P, ǫ) ⊂ R(P, ǫ). In addition, it is
proved in [22] that R(P, ǫ′) ⊂ C(P, ǫ) for ǫ/ǫ′ ≥√2d/(d+ 1). In other words, a Cech
complex can be “approximated” by Rips complexes. This fact is used in computational
applications, since working with Rips complexes is much more efficient than Cech com-
50 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
plexes. There are occasions when Rips and Cech complexes coincide, as is the case when
X is Euclidean but the metric is the L∞ rather than the more standard L2 norm.
Homology theory can be applied to abstract simplicial complexes as well. This variant
of homology is known as ‘Simplicial Homology’. For simplicial complexes embedded in
Euclidean spaces, we can still think of simplicial homology as describing connected com-
ponents and holes of the complex. The main importance of the Cech complex, and its
relevance to homology theory, is given by the nerve theorem we state next. This theorem
goes back to [11], but can also be found in many other resources (e.g. [32, Theorem 4.4.4]).
Theorem 3.1.5 (The Nerve Theorem). Suppose that the intersections⋂
x∈P ′ Bε(x) are
either empty or contractible for any subset P ′ of P. Then the Cech complex C(P, ǫ) is
homotopy equivalent to⋃
x∈P Bǫ(x). In particular, if X is a finite dimensional normed
linear space, or a compact Riemannian manifold with convexity radius greater than ǫ, and
if {Bǫ(x)}x∈P is a cover of the space X, then C(P, ǫ) is homotopy equivalent to X.
More simply, this theorem states that in order to study the homology of the topological
space⋃
x∈P Bǫ(x), we can study the (simplicial) homology of the combinatorial space
C(P, ǫ). This fact can be useful in proving theoretical results, but its main contribution
is to computational applications. We noted above that if X is of a finite dimension
d, then C(P, ǫ) might have higher dimensional simplexes. However, the nerve theorem
asserts that this higher dimensional space will not have homology of dimension greater
than d. This is not true, however, for Rips complexes.
3.1.2 Motivation and Previous Work
There is considerable current interest in the study, from a topological, homological, point
of view of random structures such as graphs and simplicial complexes. Some recent
references are [4, 6, 20, 33, 41] with two reviews, from different aspects, in [2] and [25].
Many of these papers find their raison d’etre in essentially statistical problems, in which
data generates these structures.
The main motivation for the work in this chapter is the same manifold learning problem
described in Section 2.1.3. Let M be an unknown manifold, and suppose that we are a
given a set if i.i.d. random samples X = {X1, . . . , Xn} from this manifold. In order
3.1. BACKGROUND 51
to recover the homology of M , we look at the homology of U =⋃n
k=1Bǫ(Xk). The
subject of manifold learning goes, obviously, well beyond such an example, and examples
of algorithms for ‘estimating’ an underlying manifold from a finite sample abound in the
statistics and computer science literatures. Very few of them, however, take an algebraic
point of view.
One contribution in the spirit of this chapter is [37], where the problem of estimating
the homology of smooth manifolds from finite samples was studied. For every δ > 0, the
main theorem in [37] provides sufficient conditions on n and ǫ such that the homology
of U is equal to the homology of M with a probability of at least (1 − δ). Of course,
one of the most important issues in dealing with data is noise. In the setting of manifold
learning this translates to the sample points possibly not coming from the submanifold
that theoretically models the phenomenon because of experimental, measurement, or other
error. The work in [38] deal with this issue, as does [18] from a different and enlightening
point of view.
In this chapter we wish to study the homology of U (or, equivalently, C(X , ǫ)), when
the number of points |X | = n goes to infinity and ǫ , rn → 0. It turns out that even
in the case where M = Rd, when the underlying manifold is trivial, there is quite a lot
to study about the homology of C(X , ǫ), and this is the main focus of this chapter. In
Section 3.5 we discuss how one might extend our results to the case of sampling from
closed smooth manifolds.
Recent work (see [30,31]) studied the Betti numbers of the Cech complex in the setup
just described. In this scenario, the behavior of the Cech complex (or the union of balls)
splits into three main regimes. If nrdn → 0 (the subcritical or ‘dust’ phase), the complex
is very sparse, with many small disconnected components and hardly any holes. In the
critical phase nrdn → λ ∈ (0,∞), the complex becomes connected with many holes of any
dimension k < d. Finally, if nrdn → ∞ the complex is highly connected, with very few
holes, if any. Detailed study of the Betti numbers is possible mostly in the dust phase,
and is significantly more complicated in the other regimes. Thus, we tried to take an
indirect approach, by studying critical points of the distance function (described in the
next section), and applying Morse theory.
Not surprisingly, there is close correspondence between the results in this chapter, and
52 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
the Betti number results in [30,31]. However, while the results for the Betti numbers are
mainly to the subcritical phase, the study of the distance function extends to the other
regimes as well. Thus, the indirect approach of studying critical points (rather than Betti
numbers) has some advantages. For example, using our results, we can easily derive limit
theorems for the Euler characteristic of the Cech complex in all three regimes.
In Section 3.2 we define the distance function, and discuss its own version of Morse
theory. Section 3.3 presents all the limit theorems we have for the critical points of the
distance function. In Section 3.4 we return to discuss the topology of Cech complexes
in light of the new results, and compare our results with those in [30, 31]. Proofs are
relegated to Section 3.6, and Section 3.5 contains a summary and some directions for
future research.
3.2 The Distance Function
The distance function is the main object of study in this chapter. In this section we define
the distance function and its critical point theory.
3.2.1 Definition and Motivation
For a finite set P of points in Rd, of size |P|, let dP : Rd → R
+ be the distance function
for P, so that
dP(x) , minp∈P
‖x− p‖, x ∈ Rd. (3.2.1)
We are interested in studying the asymptotic behavior (in |P|) of critical points of
this distance function, for random sets P. While the critical points are, by themselves,
intrinsically interesting, knowledge of their behavior also has immediate implications to
the study of the topology of Cech complexes built over random point sets.
Let
Ur ,⋃
x∈PBr(x).
The key observation is that d−1n ((−∞, r]) = Ur and so, by the Nerve theorem (3.1.5)
d−1n ((−∞, r]) is homotopy equivalent to C(P, r). By Morse Theory, changes in the ho-
mology of d−1n ((−∞, r]) occur at the critical levels of dn. Thus, studying the critical points
3.2. THE DISTANCE FUNCTION 53
of dn should reveal information about the topology of C(P, r). Note, however, that dn is
non-differentiable (and certainly not a Morse function). Nevertheless, following [24], we
can define a special notion of a critical point and Morse index for dn, and apply Morse
theory to it. This will be the focus of the following section.
3.2.2 Critical Points of the Distance Function
Critical points of smooth functions have been studied since the earliest days of calculus,
but took on significant additional importance following the development of Morse theory
(e.g. [35, 36]) which tied them closely to the homologies of manifolds (see Section 1.1.3).
Recall that if M is a nice (closed, smooth) d-dimensional manifold, and f : M → R
a nice (Morse) function, then a point c is called a critical point if ∇f(c) = 0. A non-
degenerate critical point is one for which the Hessian matrix Hf(c) is non-singular. The
Morse index k ∈ {0, 1, . . . , d} of a non-degenerate critical point c is then the number
of negative eigenvalues of Hf(c). Note that critical points of index 0 are local minima,
while critical points of index d are local maxima. The indexes between 0 and d represent
different types of ‘saddle points’. The critical points, along with their indexes, provide
one of the main links between differential and algebraic topology.
Classical Morse theory does not directly apply to the distance function in (3.2.1)
mainly because it is not everywhere differentiable. However, one can still define a notion
of non-degenerate critical points for the distance function, as well as their Morse index,
which we now do. Our arguments follow [24], which we specialize to the case of the
distance function.
Given a set P of points in Rd, and defining the distance function dP (3.2.1), we start
with the local (and global) minima of dP ; viz. the points of P, where dP = 0, and call
these critical points with index 0. For higher indexes, we have the following definition.
Definition 3.2.1. A point c ∈ Rd is a critical point of dP with index 1 ≤ k ≤ d if there
exists a subset Y of k + 1 points in P such that:
1. ∀y ∈ Y : dP(c) = ‖c− y‖, and for all p ∈ P\Y we have ‖c− p‖ > dP(p).
2. The points in Y are in general position (i.e. the k + 1 points of Y do not lie in a
(k − 1)-dimensional affine space).
54 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
3. c ∈ conv◦(Y), where conv◦(Y) is the interior of the convex hull of Y (an open
k-simplex in this case).
Figure 3.3: Critical points of a distance function in R2. The grayscale image represents the
values of the distance function dP for P = {p1, p2, p3}. Clearly, the minima of dP are the points
in P themselves. Looking at c2, we observe that there is one direction in which the function
decreases (the green arrows), and one in which the distance function increases (the red arrows).
Hence, this points is considered a saddle point, or a critical point of index 1. Note that c2 is
located on the edge between p2, p3, which is their convex hull, in accordance with Definition
3.2.1. The same applies to c1, c3. Finally, c4 is a maximum point, or a critical point of index 2.
This point lies inside the triangle whose vertices are p1, p2, p3, which is again, the convex hull of
the points.
Note that the first condition implies that dP ≡ dY in a small neighborhood of c. The
second condition implies that the points in Y lie on a unique (k− 1)- dimensional sphere.
Figure 3.3 depicts a distance function from a set of three points in R2, and its critical
points of indexes 0, 1, 2.
3.2. THE DISTANCE FUNCTION 55
We shall use the following notation:
S(Y) = The unique (k − 1)-dimensional sphere containing Y , (3.2.2)
C(Y) = The center of S(Y) in Rd, (3.2.3)
R(Y) = The radius of S(Y), (3.2.4)
B(Y) = The open ball in Rd with radius R(Y) centered at C(Y), (3.2.5)
Note that S(Y) is a (k − 1)-dimensional sphere, whereas B(Y) is a d-dimensional ball.
Obviously, S(Y) ⊂ B(Y), but, unless k = d, S is not the boundary of B. Since the critical
point c in Definition 3.2.1 is equidistant from all the points in Y , we have that c = C(Y).
Thus, we say that c is the unique index k critical point generated by the k + 1 points of
the subset Y . The last statement can be rephrased as follows:
Lemma 3.2.2. A subset Y ⊂ P of k + 1 points in general position generates an index k
critical point if, and only if, the following two conditions hold:
CP1 C(Y) ∈ conv◦(Y),
CP2 P ∩ B(Y) = ∅.
Furthermore, the critical point is C(Y) and the critical value is R(Y).
Figure 3.4 depicts the generation of an index 2 critical point in R2 by subsets of 3
points. We shall also be interested in critical points c that are within distance ǫ from P,
i.e. dP(c) ≤ ǫ. This adds a third condition, which we will refer to later,
CP3 R(Y) ≤ ǫ.
The following indicator functions, related to CP1–CP3, will appear often.
Definition 3.2.3. Using the notation above,
h(Y) , 1 {C(Y) ∈ conv◦(Y)} (CP1) (3.2.6)
hǫ(Y) , h(Y)1[0,ǫ](R(Y)) (CP1+CP3) (3.2.7)
gǫ(Y ,P) , hǫ(Y)1 {P ∩ B(Y) = ∅} (CP1+CP2+CP3) (3.2.8)
56 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Figure 3.4: Generating a critical point of index 2 in R2 (i.e. a maximum point). The small blue
disks are the points of P. We examine three subsets of P: Y1 = {y1, y2, y3}, Y2 = {y4, y5, y6},and Y3 = {y7, y8, y9}. S(Yi) are the dashed circles, whose centers are C(Yi) = ci. The shaded
balls are B(Yi), and the interior of the triangles are conv◦(Yi). (1) We see that both C(Y1) ∈conv◦(Y1) (CP1) and P ∩ B(Y1) = ∅ (CP2). Hence c1 is a critical point of index 2. (2)
C(Y2) 6∈ conv◦(Y2), which means that (CP1) does not hold, and therefore c2 is not a critical
point (as can be observed from the flow arrows). (3) C(Y3) ∈ conv◦(Y3), so (CP1) holds.
However, we have P ∩ B(Y3) = {p}, so (CP2) does not hold, and therefore c3 is also not a
critical point. Note that in a small neighborhood of c3 we have dP ≡ d{p}, completely ignoring
the existence of Y3.
3.3 Limit Theorems for the Distance Function
In this section we present the main results of this chapter. In order to avoid interrupting
the chain of events in this section, we postpone the proofs to Section 3.6.
We wish to study the distance function dP when the set P is random and |P| → ∞.
We shall focus on two different (yet very similar) setups.
Random Sample
Let Xn = {X1, . . . , Xn} be a set of i.i.d. random points in Rd, with a common probability
density f , which we assume to be bounded. Denote by CPk(dXn) the sets of critical points
of dXn with index k. Let {rn}∞n=1 be a sequence of positive numbers with limn→∞ rn = 0,
and define
Nk,n , # {c ∈ CPk(dXn) : dXn(c) ≤ rn} ,
3.3. LIMIT THEOREMS FOR THE DISTANCE FUNCTION 57
the number of critical points of dXn with index k, and with a critical value bounded by
rn. In other words, Nk,n counts critical points with index k which are within distance rn
from Xn.
Poisson Process
Let Pn be a spatial Poisson process on Rd with intensity function λn = nf , where f is a
bounded probability density function on Rd (so that E {|Pn|} = n). Denote by CPk(dPn)
the sets of critical points of dPn with index k. Let {rn}∞n=1 be a sequence of positive
numbers with limn→∞ rn = 0, and define
Nk,n , # {c ∈ CPk(dPn) : dPn(c) ≤ rn} .
Our main goal in this section is to study the limits of Nk,n and Nk,n as n → ∞. Since
N0,n = E{N0,n} = n (the minima are the points of Xn or Pn) we shall only be interested in
1 ≤ k ≤ d. The results split into three main regimes, depending on the rate of convergence
of rn to zero, specifically, on the limit of the term nrdn. We shall state all the results in
terms of Nk,n. Unless otherwise stated, exactly the same results apply for Nk,n.
A word on notation: In the formulae presented below, for g : (Rd)k+1 → R and
y = (y1, . . . , yk) ∈ (Rd)k we write g(0,y) for g(0, y1, . . . , yk).
3.3.1 The Subcritical Range (nrdn → 0)
This range is also known as the ‘dust phase’, for reasons that will become clearer later,
when we discuss Cech complexes. We start with the limiting mean.
Theorem 3.3.1 (Limit mean). If nrdn → 0, then for 1 ≤ k ≤ d,
limn→∞
(nk+1rdkn )−1E {Nk,n} = µk,
where
µk =1
(k + 1)!
∫
Rd
fk+1(x)dx
∫
(Rd)kh1(0,y)dy < ∞,
and h1 is defined by (3.2.7).
58 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
In general, as is common for results of this nature, we cannot explicitly compute µk.
However, when k = 1, y contains only a single point, and so h ≡ 1 and R(0,y) = ‖y‖/2.Therefore, h1(0,y) = 1 {‖y‖ ≤ 2}, yielding
µ1 = 2d−1ωd
∫
Rd
f 2(x) dx,
where ωd is the volume of the unit ball in Rd. Some numerics for other cases are given
below.
The observation that, for a specific choice of rn, there is at most one α > 0 such that
limn→∞ nα+1rdαn ∈ (0,∞) leads to the important fact that there is a ‘critical’ index,
kc ,
0 limn→∞ nα+1rdαn = 0, ∀α > 0,
⌊α⌋ limn→∞ nα+1rdαn ∈ (0,∞),
∞ limn→∞ nα+1rdαn = ∞, ∀α > 0,
such that
limn→∞
E {Nk,n} =
∞ k < kc
0 k > kc
(3.3.1)
with any value in (0,∞] possible at k = kc. That is, there is phase transition occurring
within the subcritical regime itself.
Similar regimes, with identical limits, appear for asymptotic variances.
Theorem 3.3.2 (Limit variance). If nrdn → 0, then for 1 ≤ k ≤ d,
limn→∞
(nk+1rdkn )−1Var (Nk,n) = µk.
Not surprisingly, the three regimes also yield different limit distributions.
Theorem 3.3.3 (Limit distribution). Let nrdn → 0, and 1 ≤ k ≤ d,
1. If limn→∞ nk+1rdkn = 0, then
Nk,nL2
−→ 0.
2. If limn→∞ nk+1rdkn = α ∈ (0,∞), then
Nk,nL−→ Poisson (αµk) .
3.3. LIMIT THEOREMS FOR THE DISTANCE FUNCTION 59
3. If limn→∞ nk+1rdkn = ∞, then
Nk,n − E {Nk,n}(nk+1rdkn )1/2
L−→ N (0, µk).
As above, for a specific choice of rn, there is going to be at most a single kc for which
the Poisson limit applies. Otherwise, Nk,n converges either to zero or infinity. Thus, in
the subcritical regime, the picture is that n = N0,n ≫ N1,n ≫ · · · ≫ Nkc,n, while, for
k > kc the value of Nk,n will be zero, with high probability, which increases with k.
3.3.2 The Critical and Supercritical Ranges (nrdn → λ ∈ (0,∞])
We now look at the critical (nrdn → λ ∈ (0,∞)) and supercritical (nrdn → ∞) regimes.
While there are differences between the two regimes, the general outline of the results
is the same. In both, the correct scaling for Nk,n is n (as opposed to nk+1rdkn in the
subcritical range). Consequently, the limit results are similar for all the indexes.
The supercritical regime is significantly more difficult to analyze than either the crit-
ical or subcritical, and we shall require an additional assumption for this case, which
necessitates a definition.
Definition 3.3.4. Let f : Rd → R be a probability density function. We say that f is
lower bounded if it has compact support and fmin , inf {f(x) : x ∈ supp(f)} > 0.
Henceforth, when dealing with the supercritical phase, we always assume that f is a
lower bounded probability density, and that supp(f) is convex. As we shall see in Chapter
4, the compact support assumption is crucial here. However, it is not clear at this point
if convexity is a necessary condition, or a consequence of our proofs.
Theorem 3.3.5 (Limit mean). If nrdn → λ ∈ (0,∞], then, for 1 ≤ k ≤ d,
limn→∞
n−1E {Nk,n} = γk(λ),
where
γk(λ) =λk
(k + 1)!
∫
(Rd)k+1
fk+1(x)h1(0,y)e−λωdR
d(0,y)f(x) dxdy,
γk(∞) = limλ→∞
γk(λ) =1
(k + 1)!
∫
(Rd)kh(0,y)e−ωdR
d(0,y) dy,
60 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
ωd is the volume of the unit ball in Rd, and R, h, and h1 are defined in (3.2.4), (3.2.6),
and (3.2.7), respectively.
Again, these terms can be evaluated for k = 1, in which case
γ1(λ) =λ
2
∫
Rd
∫
‖y‖≤2
f 2(x)e−λωd2−d‖y‖df(x) dydx,
γ1(∞) =1
2
∫
Rd
e−ωd2−d‖y‖d dy = 2d−1.
For a uniform distribution on a compact set D ⊂ Rd it is easy to show that γ1(λ) is
given by
γ1(λ) = 2d−1(1− e−λωd/Vol(D)), (3.3.2)
from which it is easy to check that γ1(λ) → γ1(∞) as λ → ∞. For higher indexes, we
have no analytic way to compute γk(λ). However, it can be evaluated numerically, and
an example is given in Figure 3.5 for the uniform distribution on [0, 1]3. Note that, in
that example, γ0(∞)− γ1(∞) + γ2(∞)− γ3(∞) ≈ 0. This is not a coincidence, and the
explanation for this phenomenon will be given in Section 3.4, where we discuss the mean
Euler characteristic of Cech complexes.
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
λ
γ k(λ)
k=0k=1k=2k=3
Figure 3.5: The γk(λ) function. In this example d = 3, and f(x) is the uniform density on [0, 1]3.
For k = 0 we know that n−1N0,n = 1, and for k = 1 we have an explicit formula in (3.3.2). For
k = 2, 3 we had to use a numerical approximation, hence the noisiness of the graphs.
Recall that, in the subcritical phase, the limit mean and the limit variance were exactly
the same. For other phases, this is no longer true.
3.3. LIMIT THEOREMS FOR THE DISTANCE FUNCTION 61
Theorem 3.3.6 (Limit variance). If nrdn → λ ∈ (0,∞] and 1 ≤ k ≤ d,
limn→∞
n−1Var (Nk,n) = σ2k(λ), lim
n→∞n−1Var(Nk,n) = σ2
k(λ),
where 0 < σ2k(λ) < σ2
k(λ) < ∞.
The expressions defining σ2k(λ) and σ2
k(λ) are rather complicated, and can be found
at (3.6.31) and (3.6.24), respectively. Note that this theorem, and the following central
limit theorem (CLT), are the only places where the limit values differ between the random
sample and Poisson cases.
Theorem 3.3.7 (CLT). If nrdn → λ ∈ (0,∞], then for 1 ≤ k ≤ d,
Nk,n − ENk,n√n
L−→ N (0, σ2k(λ)),
Nk,n − ENk,n√n
L−→ N (0, σ2k(λ)).
Note that as an immediate corollary of these CLTs and Theorem 3.3.6 we have the
‘law of large numbers’ that, under the conditions of the CLTs,
n−1Nk,nL2
−→ γk(λ).
To conclude this section, we note an interesting result which is unique to the super-
critical regime, for which we define
NGk,n , |CPk(dXn)| ,
the ‘global’ number of critical points of the distance function dXn in Rd (i.e. without
requiring (CP3)). We note first that Nk,n and NGk,n have identical asymptotic behaviors,
at least at the level of their first two moments and CLT:
Theorem 3.3.8. If nrdn → ∞, and f is lower bounded with convex support, then, for
1 ≤ k ≤ d,
limn→∞
n−1E{NG
k,n
}= γk(∞), lim
n→∞n−1Var
(NG
k,n
)= σ2
k(∞),
andNG
k,n − E{NG
k,n
}√n
L−→ N (0, σ2k(∞)),
where γk and σk are the same as in Theorems 3.3.5 and 3.3.6.
62 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
As usual, the results are the same for the Poisson case. An obvious corollary of
Theorem 3.3.8 is that n−1E{NG
k,n −Nk,n
}→ 0. However, much more is true:
Proposition 3.3.9. Under the conditions of Theorem 3.3.8, and if nrdn ≥ D⋆ logn, for
sufficiently large (f -dependent) D⋆, then, for 1 ≤ k ≤ d,
limn→∞
E{∣∣NG
k,n −Nk,n
∣∣} = 0.
Thus, in the supercritical phase, the slow decrease of the radii rn implies that the
global and the local number of critical points are ultimately equal with high probability,
despite the fact that both grow to infinity with increasing n. This is an interesting
and unexpected result, and will turn out to be important when we discuss the Euler
characteristic of the Cech complex in the next section. However, Proposition 3.3.9 relies
heavily on the assumed convexity of supp(f). For example, take f to be the uniform
density on the annulus A = {x ∈ R2 : 1 ≤ |x| ≤ 2}. Then, for n large enough, we would
expect to have a maximum point (index 2) close to the origin. This critical point will be
accounted for in NG2,n, but will be ignored by N2,n, since its distance to Xn is greater than
1. Thus, we would expect that E{|NG2,n −N2,n|} → 1, which contradicts Proposition 3.3.9
3.4 The Topology of Random Cech Complexes
As mentioned already a number of times, the results of the previous section regarding
critical points of the distance function have implications for the homology and Betti
numbers of certain random Cech complexes, and so are related to recent results of [29]
and [31]. Our plan in this section is to describe this connection.
3.4.1 Critical Points and Betti Numbers
The link between the distance function and the Cech complex is given by the following
equivalence, which is due to the Nerve theorem (Theorem 3.1.5),
d−1P ((−∞, ǫ]) =
⋃
p∈PBǫ(p) ≃ C(P, ǫ). (3.4.1)
Morse theory, and in particular the version developed in [24] that applies to the distance
function, tells us that, in view of the equivalences in (3.4.1), there is a connection between
3.4. THE TOPOLOGY OF RANDOM CECH COMPLEXES 63
the critical points of dP over the set d−1P ([0, ǫ]), and the Betti numbers of C(P, ǫ). In
particular, for every critical point of dXn at height ǫ and of index k, for all small enough
δ, either
βk
(C(Xn, ǫ+ δ)
)= βk
(C(Xn, ǫ− δ)
)+ 1,
or
βk−1
(C(Xn, ǫ+ δ)
)= βk−1
(C(Xn, ǫ− δ)
)− 1.
Despite this connection, Betti numbers, dealing, as they do, with ‘holes’, are typically
determined by global phenomena, and this makes them hard to study directly in the
random setting. On the other hand, the structure of critical points is a local phenomenon,
which is why, in the random case, we can say more about critical points than what is known
for Betti numbers to date.
3.4.2 The Limiting Behavior of the Cech Complex
For the remainder of this section we shall treat only the random sample Xn, although
similar statements could be made regarding the Poisson case. Retaining the notation
of the previous section, and defining βk,n , βk(C(Xn, rn)), our aim will be to examine
relationships between the random variables Nk,n and the βk,n and βk−1,n. In addition, we
shall compare our results for Nk,n to those of [29] and [31] for βk,n, using Morse theory to
explain the connections.
In direct analogy to the results of Section 3.3, [29,31] show that the limiting behavior of
C(Xn, rn) splits into three main regimes, depending on the limit of nrdn. In the subcritical
(nrdn → 0) or dust phase, in which the Cech complex consists mostly of small disconnected
particles and very few holes, Theorem 3.2 in [29] states that for 1 ≤ k ≤ d− 1,
limn→∞
(nk+2rd(k+1)n )−1
E {βk,n} = Dk,
for some constant Dk defined in an integral form and related to the µk of our Theorem
3.3.1. In [31] the subcritical phase is explored in more detail, and limit theorems analogous
to those of Theorem 3.3.3 are proved. Combining their results with those in Section 3.3.1,
observe that the Nk,n and the βk−1,n exhibit similar limiting behavior, and are O(nk+1rdkn ).
Furthermore, we can summarize the relationship between the different Nk,n and βk,n as
64 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
follows:
N1,n ≫ N2,n ≫ N3,n ≫ · · · ≫ Nkc,n
≈ ≈ ≈β1,n ≫ β2,n ≫ · · · ≫ βkc−1,n,
where ≈ means ‘same order of magnitude’ and kc is as in (3.3.1). For k > kc all terms
are zero with high probability, which, as before, grows with k.
Recall that Morse theory tells us that each critical point of index k contributes either
+1 to βk,n or −1 to βk−1,n. Splitting Nk,n and βk,n accordingly as
Nk,n = N+k,n +N−
k,n,
βk,n = N+k,n −N−
k+1,n,
the diagram implies that N+k,n = O(nk+2r
d(k+1)n ) and N−
k,n = O(nk+1rdkn ). Hence we con-
clude that N−k,n ≫ N+
k,n. In other words, most of the critical points of index k destroy
homology generators rather than create new ones. In the case where k = 0, noting that
β0,n = N0,n −N−1,n yields the following corollary,
Corollary 3.4.1. If nrdn → 0, then
limn→∞
n−1E {β0,n} = 1.
Recall that β0,n represents the number of connect components of the complex
C(Xn, rn), and is of an essentially different nature to that of the other Betti numbers.
The study in [31] does not apply to β0 at all, while in [39] limit theorems for β0,n are
proved for the critical phase. The Morse theoretic point of view we use here, thus gives
additional results not accessible from the direct approach to Betti numbers.
For the other regimes, making statements about the Cech complex becomes extremely
difficult, and thus the theory is still incomplete.
In the critical phase (nrdn → λ ∈ (0,∞)), the Cech complex starts to connect and the
topology becomes more complex. In addition, once λ passes a certain threshold, a giant
component emerges (cf. Chapter 9 of [39]), from which comes the alternate description of
this phase as the ‘percolation phase’. Theorem 4.1 in [29] states that for 1 ≤ k ≤ d− 1,
limn→∞
n−1E {βk,n} ∈ (0,∞),
3.4. THE TOPOLOGY OF RANDOM CECH COMPLEXES 65
although the exact limit is not computed. This agrees with the results in Section 3.3.2 of
this paper. The main difference between the two sets of results is that for critical points
we are able to give a closed form expression for the limit mean of Nk,n (Theorem 3.3.5),
as well as stronger limit results (Theorems 3.3.7–3.3.9). This will be useful below, when
we discuss Euler characteristics.
In the supercritical regime (nrdn → ∞) even less is known about the Cech complex.
In general, the Cech complex becomes highly connected, the topology becomes simpler
and the Betti numbers decrease. Theorem 6.1 of [29] gives the precise results that if f is
a uniform density with a compact and convex support, and limn→∞(log n/n)−1/drn > 0 ,
then
limn→∞
P (β0,n = 1, β1,n = · · · = βd−1,n = 0) = 1, (3.4.2)
which is described in [31] by saying that C(Xn, rn) is “asymptotically almost surely con-
tractible”. We have no analogous result about critical points, nor could we, since Nk,n
is O(n) and thus Nk,n → ∞ (Section 3.3.2). However, Corollary 3.4.2 below gives infor-
mation about the Euler characteristic of the Cech complex which is different from, but
related to, (3.4.2). (Note that (3.4.2) requires that the underlying probability density is
lower bounded with convex support, the same assumption we adopted Section 3.3.2.)
To conclude this section, we present a novel statement about the Cech complex
C(Xn, rn) which can be made based on the results in Section 3.3. The Euler charac-
teristic of a simplicial complex S has a number of equivalent definitions, and a number of
important applications. One of the definitions, via Betti numbers, is
χ(S) =∞∑
k=0
(−1)kβk(S). (3.4.3)
However, χ(S) also has a definition via indexes of critical points of appropriately defined
functions supported on S, and this leads to
Corollary 3.4.2. Let χn be the Euler characteristic of C(Xn, rn). Then, under the as-
66 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
sumptions of Theorems 3.3.1 and 3.3.5, we have
limn→∞
n−1E {χn} =
1 nrdn → 0,
1 +∑d
k=1 (−1)kγk(λ) nrdn → λ ∈ (0,∞),
0 nrdn → ∞.
(3.4.4)
Moreover, when nrdn → ∞ and nrdn ≥ D⋆ logn (see Proposition 3.3.9), then
E {χn} → 1.
Note that (3.4.4) cannot be proven using only the existing results on Betti numbers,
since the values of the limiting mean in the critical and supercritical regimes are not
available. This demonstrates one of the advantages of studying the homology of the Cech
complex via the distance function.
In closing we note some of the implications of Corollary 3.4.2. In the subcritical phase,
we have that χn ∼ n, which agrees with the intuition developed so far that, in this range,
the Cech complex consists of mostly small disconnected particles and very few holes. In
the critical range we have a non-trivial limit resulting from the fact that the Cech complex
has many holes of all possible dimensions. In the supercritical range, χn ∼ 1 which is
exactly what we get when β0,n = 1, β1,n = · · · = βd−1,n = 0 (cf. (3.4.3), (3.4.2)). Finally,
since n−1E {χn} → 0 in this regime, it is clear now why the numerics of Figure 3.5 showed
that∑3
k=0(−1)kγk(∞) ≈ 0.
3.5 Summary and Future Work
In this chapter we presented a body of limit theorems for the distance function from a
random set of points in Euclidean space Rd. We observed different limiting behavior in
three different phases, which are controlled by the term nrdn. Using a special version
of Morse theory for the distance function (which was developed in [24]), we linked our
results for the critical points with the results in [30,31] where limit theorems for the Betti
numbers βk,n of the Cech complex C(Xn, rn) were presented.
There is a lot more to study on the theory of random distance functions and their
relationships with random Cech complexes. In this chapter we have already established
3.5. SUMMARY AND FUTURE WORK 67
a number of novel topological results about random Cech complexes, such as Corollaries
3.4.1 and 3.4.2. We would like to pursue more results of this kind. Of particular interest
are the critical (nrdn → λ) and super-critical (nrdn → ∞) regimes, where the behavior of
the Betti numbers of the Cech complex has not yet been figured out. We believe that
the results we have for the distance function in these regimes could be highly useful for
understanding the behavior of the complex in these regimes.
The following two sections discuss in more detail two topics, which remain for future
research, and which seem particularly interesting.
3.5.1 The Supercritical Phase
In this chapter we presented the results for the supercritical phase under the assumption
that f is lower bounded with a convex support. We rely on this assumption in the proofs
presented in section 3.6. We would really like to extend our results beyond this set of
distributions.
For distributions with compact supports, the result in Corollary 3.4.2 suggests that in
the supercritical phase, the Cech complex captures the topology of the support. However,
under our assumption, the support of f is always contractible, in which case β0 = 1 and
βk = 0 for k ≥ 1. Extending our results to non-convex supports, we believe that the
result in Corollary 3.4.2 could be extended to the claim that for the right choice of rn we
have E {χn} → χ(supp(f)). This also might lead to finding a regime in which we can
prove convergence for the Betti numbers, e.g. that E {βk,n} → βk(supp(f)). Proving such
results would be a significant contribution to the manifold learning problem described as
a motivation for this chapter. It means that we could find conditions on the radius rn,
such that in the limit, the topology of the Cech complex recovers the topology of the
original space.
For distributions with unbounded support, the results we will present in Chapter 4
indicate a very different limiting behavior. In Chapter 4 we will show that the power-
law and exponential distributions ‘crackle’. Briefly, this means that as we add more and
more points a contractible core is formed, but outside this core there are many small
disconnected particles and homology elements of any order. Therefore, the techniques to
68 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
prove the results in this chapter may need adjustments to handle such distributions.
3.5.2 The Distance Function on Closed Manifolds
The main assumption in this chapter is that the samples are generated from a nice proba-
bility density f in Rd. This setup is interesting on its own, and the Cech complex behavior
exhibits a rich variety of phenomena yet to be fully studied. However, we are also very
interested in extending the results we have so far to the manifold case. Here, the samples
are drawn from a m-dimensional (smooth, closed) manifold M ⊂ Rd, where m < d.
The techniques used in this chapter need significant adaptation, and this is work in
progress. As one would expect, the general behavior is similar, and in particular we
observe the same phase transition phenomena. The main difference is that the term
controlling the transition is now nrmn rather than nrdn.
Similarly to the discussion in Section 3.5.1, Corollary 3.4.2 suggests that after ex-
tending our results to manifolds, we would find a sub-case of the supercritical phase
(nrmn → ∞) where the topology of the Cech complex recovers the topology of M .
In the critical phase (nrmn → λ ∈ (0,∞)) we can easily extend the limit theorems in
this chapter to closed manifolds, using the results in [40]. Let M ⊂ Rd be a smooth closed
manifold, and let f : M → R be a probability density function on M , i.e. f ≥ 0, and∫
M
f(x)dx = 1,
where dx is the volume form on M . Let Xn = {X1, . . . , Xn} be a set of i.i.d. random
samples with density f . In [40], limit theorems for functionals defined on such random
sets are introduced. For simplicity, assume that rn = λn−1/m, although the results can be
easily extended for any choice of rn in the critical range.
Following the notation in [40], define
ξ(x,X ) , 1 {∃x1, . . . , xk ∈ X : hλ(x, x1, . . . , xk) = 1} ,
ξn(x,X ) , ξ(n1/mx, n1/mX ) = 1 {∃x1, . . . , xk ∈ X : hrn(x, x1, . . . , xk) = 1}
Then clearly,
Nk,n =1
k + 1
∑
X∈Xn
ξn(X,Xn).
Thus, adapting Theorem 3.1 in [40] to this special case, we have
3.6. PROOFS 69
Theorem 3.5.1. Let Hα be a homogeneous Poisson process on Rm with rate α. Then,
n−1Nk,n → 1
k + 1
∫
M
E{ξ(0,Hf(x))
}f(x)dx,
both in L1 and almost surely.
Next, simple computations show that
E{ξ(0,Hf(x))
}=
λkfk(x)
k!
∫
(Rm)kh1(0,y)e
−λωdf(x)Rd(0,y)dy.
Therefore, we have that
n−1Nk,n → γk(λ),
where
γk(λ) =λk
(k + 1)!
∫
M
∫
(Rm)kfk+1(x)h1(0,y)e
−λωdRd(0,y)f(x)dydx.
Note that the expression of γk(λ) here is very similar to the one given in Theorem 3.3.5.
The only difference is the domain of integration. In light of this result, it seems very likely
that all the limit theorems presented in this chapter have an equivalent manifold version.
This topic remains as future work.
Sampling from Fractals
A similar idea to the manifold setup is sampling from fractals. Here, we are interested in
generating samples from some distribution over domains with a fractal (e.g. Hausdorff)
dimension m, which is not necessarily an integer. It would be extremely interesting to
see if our results can be extended to this case as well, and how will they depend on the
fractal dimension. Numerical simulations on samples taken from the graph of a Brownian
motion (whose Hausdorff dimension is m = 1.5) suggest that the phase transition in this
case indeed occurs at nrmn → λ. While this is very encouraging, we leave fractals as future
work as well.
3.6 Proofs
This section is devoted to prove the results in Sections 3.3 and 3.4, and is organized
according to situations: sub-critical (dust), critical (percolation), and super-critical (con-
nected). In the proofs below we use theorems from Palm theory, Stein’s method and the
70 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
de-Possoinization method. The appendixes to this chapter, provide a brief background to
each of these topics and state the required theorems.
3.6.1 Some Notation and Elementary Considerations
In this section we list some common notation and note some simple facts that will be
used in many of them.
• Henceforth, k will be fixed, and whenever we use Y ,Y ′ or Yi we implicitly assume that
|Y| = |Y ′| = |Yi| = k + 1, unless stated otherwise.
• Usually, finite subsets of Rd will be denoted calligraphically (X ,Y). However inside
integrals we use boldfacing and lower case (x,y).
• For x ∈ Rd, x ∈ (Rd)k+1 and y ∈ (Rd)k, we use the shorthand
f(x) , f(x1)f(x2) · · · f(xk+1),
f(x+ rny) , f(x+ rny1)f(x+ rny2) · · ·f(x+ rnyk),
h(0,y) , h(0, y1, . . . , yk).
• The symbol ‘c⋆’, denotes a constant value, which might depend on d (ambient dimen-
sion), f (the probability density of the samples), and k (the Morse index), but on
neither n nor rn. The actual value of c⋆ may change between and even within lines.
• While not exactly a notational issue, we shall often use the fact that, for every k,
n−k(nk
)→ 1/k! as n → ∞, and thus there is a c⋆ such that
(nk
)≤ c⋆nk.
Finally, the following lemma will be used extensively throughout the proofs below.
Lemma 3.6.1. Let X = (X1, . . . , Xk) be a set of k i.i.d. points in Rd sampled from a
bounded density f . Then there exists a constant c⋆ such that
P (X is contained in a ball with radius r) ≤ c⋆rd(k−1).
3.6. PROOFS 71
Proof. If X is bounded by a ball with radius r, then X2, . . . , Xk are all within distance
2r from X1, thus
P (X is bounded by a ball of radius r) ≤∫
Rd
(∫
B2r(x)
f(y)dy
)k−1
f(x)dx
≤∫
Rd
(fmax Vol(B2r(x))k−1 f(x)dx
= fk−1maxω
k−1d (2r)d(k−1)
= c⋆rd(k−1),
where fmax , supx∈Rd f(x), and ωd is the volume of the unit ball in Rd.
3.6.2 Means for the Subcritical Range (nrdn → 0)
We start by proving Theorem 3.3.1 (the limit expectation), which requires the follow-
ing important lemma. Note that the lemma has two implications. Firstly, it gives a
precise order of magnitude, with constant, for the probability that k + 1 points in the
rn-neighborhood of a point in χn generate an index-k critical point. Secondly, it implies
that if an additional, high density set of Poisson points is added to these k + 1 points,
any of these will lie in the ball containing the k+ 1 original points. Recall the definitions
of the indicator functions h, hǫ, gǫ, given by (3.2.6), (3.2.7),(3.2.8) respectively.
Lemma 3.6.2. Let Y ⊂ Xn, be a subset (chosen in advance) of k + 1 random variables
from Xn, and assume that Y is independent of the Poisson process Pn. Then,
limn→∞
r−dkn E {hrn(Y)} = lim
n→∞r−dkn E {grn(Y ,Xn)}
= limn→∞
r−dkn E {grn(Y ,Y ∪ Pn)} = (k + 1)!µk,
where µk is defined in Theorem 3.3.1.
Proof. Note that from the definition of hǫ(·) (see (3.2.7)), it follows that
hǫ(x, x+ ǫy) , hǫ(x, x+ ǫy1, . . . , x+ ǫyk) = h1(0,y).
72 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Thus, using the change of variables x → (x, x+ rny),
E {hrn(Y)} =
∫
(Rd)k+1
f(x)hrn(x)dx
= rdkn
∫
Rd
∫
(Rd)kf(x)f(x+ rny)hrn(x, x+ rny)dydx
= rdkn
∫
Rd
f(x)
∫
(Rd)kf(x+ rny)h1(0,y)dydx. (3.6.1)
Now, for h1(0,y) to be nonzero, all the elements y1, . . . , yk ∈ Rd must lie inside B2(0) -
the ball of radius 2 around the origin. Therefore,
|f(x+ rny)h1(0,y)| ≤ fkmax1B2(0)(y1) · · ·1B2(0)(yk),
and applying the dominated convergence theorem (DCT) to (3.6.1) yields
limn→∞
∫
(Rd)kf(x+ rny)h1(0,y)dxdy = fk(x)
∫
(Rd)kh1(0,y)dy, (3.6.2)
from which follows
limn→∞
r−dkn E {hrn(Y)} =
∫
Rd
fk+1(x)dx
∫
(Rd)kh1(0,y)dy = (k + 1)!µk, (3.6.3)
completing the proof for hrn(Y). For grn(Y ,Xn) we have
E {grn(Y ,Xn)} = E {E {grn(Y ,Xn) | Y}} = E{hrn(Y)(1− p(Y))n−k−1
},
where p(Y) ,∫B(Y)
f(z)dz (B(Y) is defined in (3.2.5)). Thus,
E {grn(Y ,Xn)} =
∫
(Rd)k+1
f(x)hrn(x)(1− p(x))n−k−1dx (3.6.4)
= rdkn
∫
Rd
f(x)
∫
(Rd)kf(x+ rny)h1(0,y)(1− p(x, x+ rny))
n−k−1dydx.
The integrand here is smaller or equal to the one in (3.6.1), therefore we can safely apply
the DCT to it. To find the limit, first note that
np(x, x+ rny) = n
∫
B(x,x+rny)
f(z)dz
= nVol(B(x, x+ rny))
∫B(x,x+rny)
f(z)dz
Vol(B(x, x+ rny))
= nωd(rnR(0,y))d
∫B(x,x+rny)
f(z)dz
Vol(B(x, x+ rny)).
3.6. PROOFS 73
Applying the Lebesgue differentiation theorem yields
limn→∞
∫B(x,x+rny)
f(z)dz
Vol(B(x, x+ rny))= f(x).
Therefore, since nrdn → 0, we have
limn→∞
np(x, x+ rny) = 0. (3.6.5)
Thus, it is easy to show that
limn→∞
(1− p(x, x+ rny))n−k−1 = 1,
and using (3.6.3) and (3.6.4) yields
limn→∞
r−dkn E {grn(Y ,Xn)} = lim
n→∞r−dkn E {hrn(Y)} = (k + 1)!µk.
Finally, the definition of Pn as a Poisson process with intensity nf(x) implies
E {grn(Y ,Y ∪ Pn) | Y} = hrn(Y)P (B(Y) ∩ Pn = ∅ | Y) = hrn(Y)e−np(Y).
Thus,
E {grn(Y ,Y ∪ Pn)} = E {E {grn(Y ,Y ∪ Pn) | Y}}
=
∫
(Rd)k+1
f(x)hrn(x)e−np(x)dx
= rdkn
∫
Rd
f(x)
∫
(Rd)kf(x+ rny)h1(0,y)e
−np(x,x+rny)dydx,
Applying the DCT as before, and using (3.6.3) and (3.6.5), yields
limn→∞
r−dkn E {grn(Y ,Pn)} = lim
n→∞r−dkn E {hrn(Y)} = (k + 1)!µk,
and we are done.
Using the previous lemma, it is now easy to prove Theorem 3.3.1.
Proof of Theorem 3.3.1. First, note that
Nk,n =∑
Y⊂Xn
grn(Y ,Xn),
74 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
where the sum is over all the subsets of size k + 1. Therefore,
E {Nk} =∑
Y⊂Xn
E {grn(Y ,Xn)} =
(n
k + 1
)E {grn(Xk+1,Xn)} .
Using the fact n−(k+1)(
nk+1
)→ 1
(k+1)!together with Lemma 3.6.2, yields
limn→∞
(nk+1rdkn )−1E {Nk} = µk,
as required. As for the Poisson case, note first that Nk,n =∑
Y⊂Pngrn(Y ,Pn). Applying
Theorem 3.A.1 therefore yields that
E
{Nk,n
}=
nk+1
(k + 1)!E {grn(Y ′,Y ′ ∪ Pn)} ,
where Y ′ is a copy of Y independent of Pn. Lemma 3.6.2 then implies
limn→∞
(nk+1rdkn )−1E
{Nk,n
}= µk,
as required.
3.6.3 Variances and Limit Distributions for the Subcritical
Range
The proofs of Theorems 3.3.2 and 3.3.3 split into three different cases, depending on the
limit of nk+1rdkn .
Case 1: nk+1rdkn → 0
We start with the limit variance for this case.
Proof of Theorem 3.3.2.
E{N2
k,n
}= E
{∑
Y1⊂Xn
∑
Y2⊂Xn
grn(Y1,Xn)grn(Y2,Xn)
}
=k+1∑
j=0
E
{∑
Y1⊂Xn
∑
Y2⊂Xn
grn(Y1,Xn)grn(Y2,Xn)1 {|Y1 ∩ Y2| = j}}
,
k+1∑
j=0
E {Ij} .
3.6. PROOFS 75
Note that
Ik+1 =∑
Y1⊂Xn
grn(Y1,Xn) = Nk,n.
Thus, from Theorem 3.3.1,
limn→∞
(nk+1rdkn )−1E {Ik+1} = µk. (3.6.6)
Next, for 0 < j < k + 1, if |Y1 ∩ Y2| = j and grn(Y1,Xn)grn(Y2,Xn) = 1, then necessarily
the 2k + 2 − j points in Y1 ∪ Y2 are bounded by a ball of radius 2rn, and using Lemma
3.6.1 we have
E {Ij} =
(n
k + 1
)(n− k − 1
k + 1− j
)(k + 1
j
)E {grn(Y1,Xn)grn(Y2,Xn)}|Y1∩Y2|=j
≤ c⋆n2k+2−jrd(2k+1−j)n .
Thus,
(nk+1rdkn )−1E {Ij} ≤ c⋆(nrdn)
k+1−j → 0. (3.6.7)
For j = 0, the sets Y1 and Y2 are independent, and since grn(Yi,Xn) ≤ hrn(Yi), we
have
E {grn(Y1,Xn)grn(Y2,Xn)} ≤ E {hrn(Y1)hrn(Y2)} = (E {hrn(Y1)})2 .
Therefore,
E {I0} =
(n
k + 1
)(n− k − 1
k + 1
)E {grn(Y1,Xn)grn(Y2,Xn)}|Y1∩Y2|=0
≤ c⋆n2(k+1) (E {hrn(Y)})2 .
Using Lemma 3.6.2 together with the fact that nk+1rdkn → 0 yields
(nk+1rdkn )−1E {I0} ≤ c⋆nk+1rdkn
(r−dkn E {hrn(Y)}
)2 → 0. (3.6.8)
Combining (3.6.6), (3.6.7), and (3.6.8) yields
limn→∞
(nk+1rdkn )−1E{N2
k,n
}= µk.
In addition, Theorem 3.3.1 implies
(nk+1rdkn )−1(E {Nk,n})2 = nk+1rdkn((nk+1rdkn )−1
E {Nk,n})2 → 0.
76 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Therefore, since Var (Nk,n) = EN2k,n − (ENk,n)
2, we conclude that
limn→∞
(nk+1rdkn )−1Var (Nk,n) = µk,
which gives Theorem 3.3.2 for the random sample case. The proof for the Poisson case
(i.e. for Nk,n) is similar in spirit, but technically more complicated. The main steps of
the argument follow. We start by writing
E
{N2
k,n
}= E
{∑
Y1⊂Pn
∑
Y2⊂Pn
grn(Y1,Pn)grn(Y2,Pn)
}
=
k+1∑
j=0
E
{∑
Y1⊂Pn
∑
Y2⊂Pn
grn(Y1,Pn)grn(Y2,Pn)1 {|Y1 ∩ Y2| = j}}
,
k+1∑
j=0
E{Ij}.
Again, for j = k + 1 we have
limn→∞
(nk+1rdkn )−1E{Ik+1} = lim
n→∞(nk+1rdkn )−1
E{Nk,n} = µk. (3.6.9)
For 0 ≤ j < k + 1, using Corollary 3.A.2 we have
E{Ij} = c⋆n2k+2−jE {grn(Y ′
1,Y ′12 ∪ Pn)grn(Y ′
2,Y ′12 ∪ Pn)}|Y ′
1∩Y ′2|=j .
where Y ′1,Y ′
2 are sets of k + 1 i.i.d. points in Rd with density f(x), independent of Pn,
such that |Y ′1 ∩ Y ′
2| = j, and Y ′12 = Y ′
1 ∪ Y ′2. Similar arguments to those we used in the
previous case then yield that
limn→∞
(nk+1rdkn )−1E{Ij} = 0.
Furthermore, it is also easy to see that
limn→∞
(nk+1rdkn )−1(E{Nk,n})2 = 0.
Thus, we conclude that
limn→∞
(nk+1rdkn )−1Var(Nk,n) = µk,
which completes the proof of the theorem.
Next, we wish to prove the first part of Theorem 3.3.3, i.e. that Nk,nL2→ 0.
3.6. PROOFS 77
Proof of Theorem 3.3.3 - Part 1. Clearly, it suffice to show that
limn→∞
E{N2
k,n
}= lim
n→∞E{N2
k,n} = 0. (3.6.10)
However, in the previous proof, we saw that
limn→∞
(nk+1rdkn )−1E{N2
k,n
}= lim
n→∞(nk+1rdkn )−1
E{N2k,n} = µk.
Since nk+1rdkn → 0, (3.6.10) follows immediately, and we are done.
Case 2: nk+1rdkn → α ∈ (0,∞)
Proof of Theorem 3.3.2. The proof in this case is similar to the previous one, the only
difference being in how to bound the terms E {I0} and E{I0}. For that, a proof in the
spirit of Lemma 3.6.2 can be used to show that
limn→∞
r−2dkn E {grn(Y1,Xn)grn(Y2,Xn)}|Y1∩Y2|=0 = ((k + 1)!µk)
2,
limn→∞
r−2dkn E {grn(Y1,Pn)grn(Y2,Pn)}|Y1∩Y2|=0 = ((k + 1)!µk)
2.
Therefore,
limn→∞
(nk+1rdkn )−1E {I0}
= limn→∞
(nk+1rdkn )−1
(n
k + 1
)(n− k − 1
k + 1
)E {grn(Y1,Xn)grn(Y2,Xn)}|Y1∩Y2|=0
= αµ2k,
Similarly, using Corollary 3.A.2, we have
limn→∞
(nk+1rdkn )−1E{I0}
= limn→∞
(nk+1rdkn )−1 n2k+2
((k + 1)!)2E {grn(Y1,Pn)grn(Y2,Pn)}|Y1∩Y2|=0
= αµ2k.
Finally, we also have
limn→∞
(nk+1rdkn )−1 (E {Nk,n})2 = limn→∞
(nk+1rdkn )−1(E{Nk,n}
)2= αµ2
k.
This completes the proof.
78 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Next, we prove the Poisson limit of Theorem 3.3.3, for which we need
Lemma 3.6.3. Denote the total variation norm by dTV. Then
1. Let Sk,n ,∑
Y⊂Xnhrn(Y), and let Z ∼ Poisson (E {Sk,n}). Then
dTV (Sk,n, Z) ≤ c⋆nrdn.
2. Let Sk,n ,∑
Y⊂Pnhrn(Y), and Z ∼ Poisson
(E{Sk,n}
). Then
dTV
(Sk,n, Z
)≤ c⋆nrdn.
Proof. The proof is very similar to the proof of Theorem 3.4 in [39], and uses the Poisson
approximation given in Theorem 3.B.2.
Part 1: Let In = {i ⊂ {1, 2, . . . , n} : |i| = k + 1}. Then, for i = {i0, . . . , ik}, and Xi =
{Xi0 , . . . , Xik}, we can write
Sk,n =∑
i∈Inhrn(Xi).
Set Ni = {j ∈ In : |i ∩ j| > 0}, and let ∼ be a relation on In such that i ∼ j if and only
if j ∈ Ni. For i 6= j, Xi and Xj are independent unless j ∈ Ni. Thus, the graph (In,∼) is
the dependency graph for ξi , hrn(Xi).
Now, if hrn(Xi) 6= 0 then the k + 1 points in Xi are bounded by a ball of radius rn,
and using Lemma 3.6.1 we have
pi , E {ξi} ≤ c⋆rdkn .
Therefore,
∑
i∈In
∑
j∈Ni
pipj ≤(
n
k + 1
)((n
k + 1
)−(n− k − 1
k + 1
))c⋆r2dkn
≤ c⋆n2k+1r2dkn
= c⋆nk+1rdkn (nrdn)k
≤ c⋆nk+1rdkn (nrdn),
where the last inequality uses the facts that nrdn → 0 and k ≥ 1.
3.6. PROOFS 79
Next, if i ∼ j with |i ∩ j| = l > 0, and hrn(Xi)hrn(Xj) 6= 0, then necessarily the
2k + 2− l points in Xi ∪ Xj are bounded by a ball of radius 2rn, and therefore,
pi,j , E {ξiξj} ≤ c⋆rd(2k+1−l)n .
Thus,
∑
i∈In
∑
j∈Ni\{i}pi,j ≤
k∑
l=1
(n
k + 1
)(n− k − 1
k + 1− l
)(k + 1
l
)c⋆rd(2k+1−l)
n
≤ c⋆k∑
l=1
n2k+2−lrd(2k+1−l)n
≤ c⋆nk+1rdkn (nrdn).
Finally, using Lemma 3.6.2 it is easy to prove that
limn→∞
(nk+1rdkn )−1E {Sk,n} = µk,
which implies that
1
E {Sk,n}≤ c⋆(nk+1rdkn )−1.
Therefore, from Theorem 3.B.2 we conclude that
dTV (Sk,n, Z) ≤ c⋆(nrdn).
Part 2: The proof here relies on the preceding one, albeit with additional technicalities.
We start by conditioning on |Pn|, the number of points in Pn.
∣∣∣P(Sk,n ∈ A
)− P
(Z ∈ A
) ∣∣∣
=∣∣∣
∞∑
m=0
(P
(Sk,n ∈ A | |Pn| = m
)− P
(Z ∈ A
))P (|Pn| = m)
∣∣∣
≤∞∑
m=0
∣∣∣P(Sk,n ∈ A | |Pn| = m
)− P
(Z ∈ A
) ∣∣∣P (|Pn| = m) .
(3.6.11)
Given |Pn| = m, using the notation in the proof Lemma 3.6.3, we can write
Sk,n =∑
i∈Imhrn(Xi).
80 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Setting ξi = hrn(Xi), pi = E {hrn(ξi)} and pi,j = E {ξiξj}, then, as in the proof of Lemma
3.6.3, it is easy to show that
∑
i∈Im
∑
j∈Ni
pipj ≤ c⋆m2k+1r2dkn ,
∑
i∈Im
∑
j∈Ni\ipi,j ≤ c⋆
k∑
l=1
m2k+2−lrd(2k+1−l)n ,
1
E{Sk,n}≤ c⋆(nk+1rdkn )−1.
Therefore, from Theorem 3.B.2, we can conclude that
∣∣∣P(Sk,n ∈ A | |Pn| = m
)− P
(Z ∈ A
)∣∣∣ ≤ c⋆n−(k+1)k∑
l=1
m2k+2−lrd(k+1−l)n .
Substituting back into (3.6.11), we have
dTV
(Sk,n, Z
)≤ c⋆n−k+1
k∑
l=1
rd(k+1−l)n E
{|Pn|2k+2−l
}.
Since |Pn| ∼ Poisson (n), it is easy to find a constant c⋆ such that
E
{|Pn|2k+2−l
}≤ c⋆n2k+2−l,
for every 1 ≤ l ≤ k. So, finally, we have that
dTV
(Sk,n, Z
)≤ c⋆
k∑
l=1
nk+1−lrd(k+1−l)n ≤ c⋆nrdn,
since nrdn → 0 and so is bounded.
Note that the previous result did not use on the assumption that nk+1rdkn → α ∈(0,∞). However, to prove an analogous result for Nk,n rather than Sk,n we shall need it.
We shall also need the following two lemmas.
Lemma 3.6.4. Let X, Y be integer random variables defined over the same probability
space, such that ∆ , X − Y ≥ 0. Then dTV (X, Y ) ≤ E {∆} .
3.6. PROOFS 81
Proof. For every A ∈ B(R) (the Borel sets of R),
|P (X ∈ A)− P (Y ∈ A)| = |P (X ∈ A,X 6= Y )− P (Y ∈ A,X 6= Y )|
= |P (X 6= Y ) (P (X ∈ A | X 6= Y )− P (Y ∈ A | X 6= Y ))|
≤ P (X 6= Y )
= P (∆ ≥ 1)
≤ E {∆} .
Thus,
dTV (X, Y ) = supA∈B(R)
|P (X ∈ A)− P (Y ∈ A)| ≤ E {∆}
and we are done
Lemma 3.6.5. Let X ∼ Poisson (λx) , Y ∼ Poisson (λy). Then dTV (X, Y ) ≤ |λx − λy|.
Proof. Assume that λx ≥ λy. Let ∆ ∼ Poisson (λx − λy) be independent of Y , and
define X , Y + ∆. Then X ∼ Poisson (λx), and so dTV (X, Y ) = dTV(X, Y ). Since
∆ = X − Y ≥ 0, it follows from Lemma 3.6.4 that
dTV(X, Y ) ≤ E {∆} = λx − λy,
and we are done.
Proof of Theorem 3.3.3 - Part 2. For a start, we need to prove that dTV (Nk,n, Sk,n) ≤c⋆nrdn. To this end, define ∆ , Sk,n − Nk,n and note that ∆ counts the number of
subsets Y ⊂ Xn for which hrn(Y) = 1 but grn(Y ,Xn) = 0. This implies that there exists
X ∈ Xn\Y for which X ∈ B(Y). Thus ∆ is bounded from above by k + 2 times the
number of (k+2)-subsets contained in a ball of radius rn. From Lemma 3.6.1 and Lemma
3.6.4 we have
dTV (Nk,n, Sk,n) ≤ E {∆} ≤ c⋆(
n
k + 2
)rd(k+1)n ≤ c⋆(nk+1rdkn )(nrdn) ≤ c⋆(nrdn),
where we used the fact that nk+1rdkn is bounded.
Next, if ZN ∼ Poisson (E {Nk,n}) and ZS ∼ Poisson (E {Sk,n}) , then from Part 1 of
Lemma 3.6.3 and the triangle inequality,
dTV (Nk,n, ZN) ≤ dTV (Nk,n, Sk,n) + dTV (Sk,n, ZS) + dTV (ZS, ZN)
≤ c⋆(nrdn) + dTV (ZS, ZN) .
82 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Finally, Lemma 3.6.5 implies that
dTV (ZS, ZN) ≤ |E {Sk,n} − E {Nk,n}| = |E {∆}| ≤ c⋆(nrdn).
Thus, we conclude that,
dTV (Nk,n, ZN) ≤ c⋆(nrdn) → 0.
From Theorem 3.3.1, since nk+1rdkn → α, we have that E {Nk,n} → αµk. Using the fact
that ZN ∼ Poisson (E {Nk,n}), it is easy to see that dTV (Nk,n,Poisson (αµk)) → 0 which
implies convergence in distribution.
The proof for the Poisson case (i.e. Nk,n) is exactly the same, other than using Part
2 of Lemma 3.6.3) rather than Part 1..
Case 3: nk+1rdkn → ∞
This is the most complicated case. We start by proving Theorems 3.3.2 and 3.3.3 (variance
and CLT) for the Poisson case. Then, using “De-Poissonization” (Appendix 3.C) we treat
the random sample case.
CLT for the Poisson Case
Proof of Theorem 3.3.2 - Part 3 (Nk,n only). We start with the second moment of Nk,n,
E{N2k,n} = E
{∑
Y1⊂Pn
∑
Y2⊂Pn
grn(Y1,Pn)grn(Y2,Pn)
}
=
k+1∑
j=0
E
{∑
Y1⊂Pn
∑
Y2⊂Pn
grn(Y1,Pn)grn(Y2,Pn)1 {|Y1 ∩ Y2| = j}}
,
k+1∑
j=0
E{Ij}.
As in the proof of Theorem 3.3.2 for the previous cases, we have that
limn→∞
(nk+1rdkn )−1E{Ij} = 0, 1 ≤ j ≤ k,
limn→∞
(nk+1rdkn )−1E{Ik+1} = µk.
3.6. PROOFS 83
However, in this case, I0 requires a different treatment. Recall that our interest is in the
variance - Var(Nk,n). So we have,
Var(N2k,n) = E{N2
k,n} −(E{Nk,n}
)2
= E{Ik+1}+k∑
j=1
E{Ij}+(E{I0} −
(E{Nk,n}
)2).
Thus, to complete the proof, we need to show that
limn→∞
(nk+1rdkn )−1
(E{I0} −
(E{Nk,n}
)2)= 0.
Applying Corollary 3.A.2 we have
E{I0} =
(nk+1
(k + 1)!
)2
E {grn(Y ′1,Y ′
12 ∪ Pn)grn(Y ′2,Y ′
12 ∪ Pn)}Y ′1∩Y ′
2=∅ ,
where Y ′1 and Y ′
2 are sets of i.i.d. points with density f , independent of Pn, and Y ′12 =
Y ′1 ∪ Y ′
2. Similarly, applying Theorem 3.A.1, we have
E{Nk,n} =nk+1
(k + 1)!E {grn(Y ′
1,Y ′1 ∪ Pn)} .
Therefore, we can write
(E{Nk,n}
)2=
(nk+1
(k + 1)!
)2
E {grn(Y ′1,Y ′
1 ∪ Pn)grn(Y ′2,Y ′
2 ∪ P ′n)} ,
where P ′n is an independent copy of Pn. Set
∆ , grn(Y ′1,Y ′
12 ∪ Pn)grn(Y ′2,Y ′
12 ∪ Pn)− grn(Y ′1,Y ′
1 ∪ Pn)grn(Y ′2,Y ′
2 ∪ P ′n).
Showing that nk+1r−dkn E {∆} → 0 will complete the proof. Set
∆1 = ∆ · 1 {B(Y ′1) ∩ B(Y ′
2) 6= ∅} , ∆2 = ∆ · 1 {B(Y ′1) ∩ B(Y ′
2) = ∅} .
If ∆1 6= 0 then all the elements in Y ′1 and Y ′
2 are bounded by a ball of radius 2rn.
Therefore, using Lemma 3.6.1
E {∆1} ≤ c⋆rd(2k+1)n .
Next, note that
∆2 = hrn(Y ′1)hrn(Y ′
2)1 {B(Y ′1) ∩B(Y ′
2) = ∅}
×(1 {Pn ∩ B(Y ′
1) = ∅}1 {Pn ∩B(Y ′2) = ∅}
− 1 {Pn ∩B(Y ′1) = ∅}1 {P ′
n ∩ B(Y ′2) = ∅}
).
84 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
If ∆2 6= 0, then B(Y ′1) and B(Y ′
2) are disjoint. Therefore, given Y ′1 and Y ′
2, the set
Pn∩B(Y ′2) is independent of the set Pn∩B(Y ′
1) (by the spatial independence of the Poisson
process), and has the same distribution as P ′n ∩B(Y ′
2). Thus, E {∆2 | Y ′1,Y ′
2} = 0, which
implies that E {∆2} = 0.
To conclude, E {∆} ≤ c⋆rd(2k+1)n . Therefore,
limn→∞
nk+1r−dkn E {∆} ≤ lim
n→∞c⋆(nrdn)
k+1 = 0.
This completes the proof for the limit variance.
Next, we wish to prove the CLT in Theorem 3.3.3.
Proof of Theorem 3.3.3 - Part 3 (Nk,n only). The proof is based on the normal approxi-
mation for sums of dependent variables given by Stein’s method (Appendix 3.B). We start
by counting only critical points located in a compact A ⊂ Rd for which
∫Af(x)dx > 0.
For a fixed n, let {Qi,n}i∈N be a partition of Rd into cubes of side rn, and let IA ⊂ N be
the (finite) set of indexes i for which Qi,n ∩A 6= ∅. For i ∈ IA, set
g(i)rn (Y ,Pn) , grn(Y ,Pn)1A∩Qi,n(C(Y)), (3.6.12)
where C(Y) is the critical point in Rd generated by Y (cf. (3.2.3)). That is, g
(i)rn = 1 if
and only if Y generates a critical point located in A ∩Qi,n. Then
N(i)k,n ,
∑
Y⊂Pn
g(i)rn (Y ,Pn),
is the number of critical points inside A ∩Qi,n, and
NAk,n , # {critical points of dPn inside A} =
∑
i∈IAN
(i)k,n.
First, as in the proof of Theorem 3.3.2, one can show that
µk(A) , limn→∞
(nk+1rdkn )−1Var(NA
k,n
)∈ (0,∞) (3.6.13)
Now, for i, j ∈ IA, define the relation i ∼ j if the distance between Qi,n and Qj,n is less
than 2rn. Then (IA,∼) is the dependency graph (cf. (3.B.1)) for the set{N
(i)k,n
}i∈IA
. This
follows from the fact that a critical point located inside Qi,n is generated by points of Pn
3.6. PROOFS 85
that are within distance rn from Qi,n (along with the spatial independence of Pn). The
degree of this graph is bounded by 5d. Consider the normalized random variables
ξi ,N
(i)k,n − E
{N
(i)k,n
}
(Var
(NA
k,n
))1/2 .
According to Theorem 3.B.3, in order to prove a CLT for NAk,n, all we have to do now is
to find bounds for E {|ξi|p} , p = 3, 4 .
Let Brn(Qi,n) ⊂ Rd be the set of points within distance rn of Qi,n, and let Zi ,
|Pn ∩ Brn(Qi,n)| be the number points of the Poisson process Pn lying inside Brn(Qi,n).
Then Zi ∼ Poisson (λi) where λi =∫Brn (Qi,n)
nf(x)dx ≤ nfmax(3rn)d. Thus, Zi is stochas-
tically dominated by a Poisson random variable with parameter c⋆nrdn. Now,
N(i)k,n ≤
(Zi
k + 1
)≤ c⋆Zk+1
i .
Therefore, for any p ≥ 1,
E
{∣∣∣N (i)k,n
∣∣∣p}
≤ c⋆E{Z
p(k+1)i
}≤ c⋆(nrdn)
p(k+1) ≤ c⋆(nrdn)k+1,
since nrdn is bounded (note that each of the c⋆’s stands for a different value). Thus, it is
easy to show that also
E
{∣∣∣N (i)k,n − E
{N
(i)k,n
}∣∣∣p}
≤ c⋆(nrdn)k+1.
Since A is compact, there exists a constant v such that |IA| ≤ vr−dn . Therefore, for
p = 3, 4,
∑
i∈IAE {|ξi|p} ≤ vr−d
n c⋆(nrdn)k+1
(Var
(NA
k,n
))p/2 = vc⋆(nk+1rdkn )1−p/2
(nk+1rdkn )
Var(NA
k,n
)
p/2
→ 0,
where we used the fact that nk+1rdkn → ∞ and the limit in Theorem 3.3.2. From Theorem
3.B.3, we conclude that
NAk,n − E
{NA
k,n
}
(Var
(NA
k,n
))1/2L−→ N (0, 1). (3.6.14)
Now that we have a CLT for NAk,n, we need to extend it to one for Nk,n. The method we
shall use is exactly the same as the one used in [39], but, for completeness, we nevertheless
include it.
86 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Set AM = [−M,M ]d, AM = Rd\AM , and suppose that M is large enough such that
∫AM
f(z)dz > 0. Set
ζn(A) =NA
k,n − E
{NA
k,n
}
(nk+1rdkn )1/2ζn =
Nk,n − E
{Nk,n
}
(nk+1rdkn )1/2
To complete the proof we need to show that∣∣P (ζn ≤ t)− Φ(t/
√µk)∣∣ → 0, where Φ(·)
is the standard normal distribution function. Clearly, ζn = ζn(AM) + ζn(AM), and from
(3.6.14) we have that
ζn(AM)L−→ N (0, µk(AM)). (3.6.15)
For every t ∈ R and M, δ > 0 we have
|P (ζn ≤ t)− Φ(t/√µk)| ≤ |P (ζn ≤ t)− P (ζn(AM) ≤ t− δ)|
+∣∣∣P (ζn(AM) ≤ t− δ)− Φ((t− δ)/
õk(AM))
∣∣∣
+∣∣∣Φ((t− δ)/
õk(AM)
)− Φ (t/
√µk)∣∣∣ . (3.6.16)
Now,
P (ζn ≤ t) = P (ζn(AM ) ≤ t− δ, ζn ≤ t) + P (|ζn(AM)− t| < δ, ζn ≤ t)
+ P (ζn(AM) ≥ t+ δ, ζn ≤ t) .
Note that the first term equals
P (ζn(AM) ≤ t− δ)− P (ζn(AM) ≤ t− δ, ζn > t) .
Thus,
|P (ζn ≤ t)− P (ζn(AM) ≤ t− δ)| ≤ P (ζn(AM) ≤ t− δ, ζn > t)
+ P (|ζn(AM)− t| < δ, ζn ≤ t) + P (ζn(AM) ≥ t+ δ, ζn ≤ t)
≤ P(∣∣ζn(AM)
∣∣ > δ)+ P (|ζn(AM)− t| < δ) .
From Chebyshev’s inequality we have that P(∣∣ζn(AM)
∣∣ > δ)≤ δ−2Var
(ζn(A
M )). From
(3.6.15), we have that
limn→∞
P (|ζn(AM)− t| < δ) = Φ((t + δ)/√µk(AM))− Φ((t− δ)/
õk(AM))
≤ 2δ√2πµk(AM)
.
3.6. PROOFS 87
Therefore,
lim supn→∞
|P (ζn ≤ t)− P (ζn(AM ) ≤ t− δ)| ≤ µk(AM)
δ2+
2δ√2πµk(AM)
.
For ǫ > 0, choose δ = ǫ√πµk/4. Since limM→∞ µk(AM) = µk, and limM→∞ µk(A
M) =
0, there exists M large enough such that µk(AM) ≥ µk/2, µk(AM) ≤ ǫδ2/2, and also∣∣∣Φ
((t− δ)/
õk(AM)
)− Φ
(t/õk
)∣∣∣ < 2ǫ. For this choice of δ,M , using last displayed
inequality, we have
lim supn→∞
|P (ζn ≤ t)− P (ζn(AM) ≤ t− δ)| ≤ ǫ.
Finally, returning to (3.6.16), there exists N > 0 such that for every n > N
|P (ζn ≤ t)− Φ(t/√µk)| < 4ǫ.
This completes the proof.
CLT for the Random Sample Case
We shall now return from the Poisson case to the random sample one. Our argument will
be based on the De-Poissonization of Theorem 3.C.1.
Proof of Theorems 3.3.2 and 3.3.3 (Nk,n). Let Dm,n denote the increment:
Dm,n =∑
Y⊂Xm+1
grn(Y ,Xm+1)−∑
Y⊂Xm
grn(Y ,Xm).
In other words, Dm,n is the change in the number of critical points, as we add a new point
to our fixed-size set. Let γ be an arbitrary number in (1/2, 1). We wish to apply Theorem
3.C.1, with Hn(Pn) = (nrdn)−k/2Nk,n and α = 0. Thus, we need to prove the following:
limn→∞
supn−nγ≤m≤n+nγ
∣∣(nrdn)−k/2E {Dm,n}
∣∣ = 0, (3.6.17)
limn→∞
supn−nγ≤m<m′≤n+nγ
∣∣(nrdn)−kE {Dm,nDm′,n}
∣∣ = 0, (3.6.18)
limn→∞
supn−nγ≤m≤n+nγ
n−1/2(nrdn)−kE{D2
m,n
}= 0. (3.6.19)
Considering only the cases where grn(Y ,Xm) 6= grn(Y ,Xm+1), we can write
Dm,n = D+m,n −D−
m,n,
88 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
where
D+m,n = # {Y ⊂ Xm+1 : grn(Y ,Xm+1) = 1 and Xm+1 ∈ Y} ,
D−m,n = # {Y ⊂ Xm : grn(Y ,Xm) = 1 and Xm+1 ∈ B(Y)} .
In other words, D+m,n counts the critical points added when we move from Xm to Xm+1,
and D−m,n counts those who disappear. We now prove (3.6.17)–(3.6.19), starting with
(3.6.17). Note that
|E {Dm,n}| ≤ E{D+
m,n
}+ E
{D−
m,n
}.
We shall show that the supremum over each of the terms goes to zero. From the definition
of D+m,n we have that
E{D+
m,n
}=
(m
k
)E {grn(Y ,Xm+1)} ≤
(n+ nγ
k
)E {hrn(Y)} .
where we define(xk
),(⌊x⌋
k
)if x is non-integer. Thus, using Lemma 3.6.2,
limn→∞
supn−nγ≤m≤n+nγ
(nrdn)−k/2
E{D+
m,n
}
≤ limn→∞
(nrdn)k/2
(n−k
(n + nγ
k
))(r−dkn E {hrn(Y)}
)
= 0.
From the definition of D−m,n we have
E{D−
m,n
}=
(m
k + 1
)E{grn(Y ,Xm)1B(Y)(Xm+1)
}
≤(n + nγ
k + 1
)E{hrn(Y)1B(Y)(Xm+1)
}
≤ c⋆nk+1E{hrn(Y)1B(Y)(Xm+1)
},
for some constant c⋆. Now,
E{1B(Y)(Xm+1) | Y
}=
∫
B(Y)
f(x)dx ≤ fmaxωdrdn,
which implies that E{hrn(Y)1B(Y)(Xm+1)
}≤ c⋆rdnE {hrn(Y)}, and so
limn→∞
(sup
n−nγ≤m≤n+nγ(nrdn)
−k/2E{D−
m,n
})
≤ limn→∞
c⋆(nrdn)k/2+1
(r−dkn E {hrn(Y)}
)
= 0.
3.6. PROOFS 89
This proves (3.6.17). To prove (3.6.18) we need to show that
limn→∞
(sup
n−nγ≤m<m′≤n+nγ
∣∣(nrdn)−kE {Dm,nDm′,n}
∣∣)
= 0.
Recall that Dm,n = D+m,n−D−
m,n. Thus we can write Dm,nDm′,n as a sum of four different
product terms. We start by looking at the term D+m,nD
+m′,n. Recalling the definition of
D+m,n, we can write this as
D+m,n =
∑
Y⊂Xm+1Xm+1∈Y
grn(Y ,Xm+1) ≤∑
Y⊂Xm+1Xm+1∈Y
hrn(Y).
Thus,
E{D+
m,nD+m′,n
}≤
∑
Y⊂Xm+1Xm+1∈Y
∑
Y ′⊂Xm′+1
Xm′+1∈Y ′
E {hrn(Y)hrn(Y ′)} . (3.6.20)
Now, if |Y ∩ Y ′| = j > 0, and hrn(Y)hrn(Y ′) = 1, then Y ∪Y ′ must be bounded by a ball
of radius 2rn. This set contains 2k + 2− j points, so that, by Lemma 3.6.1, we have
E {hrn(Y)hrn(Y ′)} ≤ c⋆rd(2k+1−j)n .
If Y ∩ Y ′ = ∅, then the two sets are disjoint and independent. Each consists of k + 1
points and must be bounded by a ball of radius rn. Therefore,
E {hrn(Y)hrn(Y ′)} = (E {hrn(Y)})2 ≤ c⋆r2dkn .
Applying these bounds to (3.6.20) yields
E{D+
m,nD+m′,n
}≤ c⋆
(m
k
)((m′ − k − 1
k
)r2dkn +
k∑
j=1
(m′ − k − 1
k − j
)(k + 1
j
)rd(2k+1−j)n
)
≤ c⋆(n + nγ
k
)((n + nγ
k
)r2dkn +
k∑
j=1
(n+ nγ
k − j
)rd(2k+1−j)n
)
≤ c⋆
(n2kr2dkn +
k∑
j=1
n2k−jrd(2k+1−j)n
),
where we emphasize that each of the appearances of c⋆ represents a different value. Mul-
tiplying by (nrdn)−k and taking the limit, we obtain
limn→∞
supn−nγ≤m<m′≤n+nγ
(nrdn)−kE{D+
m,nD+m′,n
}= 0.
90 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
To handle D−m,n recall its definition and write
D−m,n =
∑
Y⊂Xm
grn(Y ,Xm)1B(Y)(Xm+1)
≤∑
Y⊂Xm
hrn(Y)1B(Y)(Xm+1).
Thus,
E{D−
m,nD−m′,n
}(3.6.21)
≤∑
Y⊂Xm
∑
Y ′⊂Xm′
E{hrn(Y)hrn(Y ′)1B(Y)(Xm+1)1B(Y ′)(Xm′+1)
}.
If Xm+1 ∈ Y ′, and |Y ∩ Y ′| = j ≥ 0, then Y ∪ Y ′ ∪ {Xm+1, Xm′+1} consists of 2k + 3− j
points, and for the expression inside the expectation to be nonzero all the points must be
contained in a ball of radius 2rn. Thus, by Lemma 3.6.1,
E{hrn(Y)hrn(Y ′)1B(Y)(Xm+1)1B(Y ′)(Xm′+1)
}≤ c⋆rd(2k+2−j)
n .
If Xm+1 6∈ Y ′, and |Y ∩ Y ′| = j > 0, then the set Y ∪ Y ′ ∪ {Xm+1, Xm′+1} consists of
2k + 4− j points, and therefore,
E{hrn(Y)hrn(Y ′)1B(Y)(Xm+1)1B(Y ′)(Xm′+1)
}≤ c⋆rd(2k+3−j)
n ≤ c⋆rd(2k+2−j)n .
If j = 0, however, then the sets Y∪{Xm+1} and Y ′∪{Xm′+1} are disjoint and independent,
each containing k + 2 points. In addition, we need each of this sets to be contained in a
ball of radius rn. Therefore,
E{hrn(Y)hrn(Y ′)1B(Y)(Xm+1)1B(Y ′)(Xm′+1)
}≤ c⋆rd(2k+2)
n .
3.6. PROOFS 91
Substituting the above into (3.6.21) we have
E{D−
m,nD−m′,n
}≤ c⋆
k∑
j=0
(m
k + 1
)(m′ − k − 2
k − j
)(k + 1
j
)rd(2k+2−j)n
+ c⋆k+1∑
j=0
(m
k + 1
)(m′ − k − 2
k + 1− j
)(k + 1
j
)rd(2k+2−j)n
≤ c⋆k∑
j=0
(n+ nγ
k + 1
)(n + nγ
k − j
)rd(2k+2−j)n
+ c⋆k+1∑
j=0
(n+ nγ
k + 1
)(n+ nγ
k + 1− j
)rd(2k+2−j)n
≤ c⋆k∑
j=0
n2k+1−jrd(2k+2−j)n + c⋆
k+1∑
j=0
n2k+2−jrd(2k+2−j)n .
From the above we can conclude that
limn→∞
(sup
n−nγ≤m<m′≤n+nγ
(nrdn)−kE{D−
m,nD−m′,n
})= 0.
We shall stop with the computations here. The convergence of the cross-products (i.e.
D+m,nD
−m′,n and D−
m,nD+m′,n) can be shown using similar techniques, and these will prove
(3.6.18). The proof of (3.6.19) is also very similar.
Finally, the last condition in Theorem 3.C.1 requires that
Hn(Xm) = (nrdn)−k/2
∑
Y⊂Xm
grn(Y ,Xm) ≤ β(n+m)β .
for some β > 0. Using that facts that∑
Y⊂Xmgrn(Y ,Xm) ≤
(mk+1
), nrdn → 0, and
nk+1rdkn → ∞, we have
Hn(Xm) ≤ c⋆(nrdn)−k/2mk+1 ≤ c⋆mk+1n(nrdn)
k/2(nk+1rdkn )−1 ≤ c⋆(n+m)k+2.
Thus, taking β = max(c⋆, k + 2) completes the De-Poissonization proof. Consequently,
we have that both Theorem 3.3.2 and Theorem 3.3.3 hold for the random sample case as
well.
3.6.4 The Critical and Supercritical Ranges (nrdn → λ ∈ (0,∞])
We start with the expectation computations. The following standard lemma is going to
play a key role in the supercritical regime.
92 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Lemma 3.6.6. Let D ⊂ Rd be a compact convex set with positive Lebesgue measure, and
let Br(x) ⊂ Rd be the ball of radius r around x. Then there exists a constant c⋆ such that
for every r < diam(D) and x ∈ D,
Vol(Br(x) ∩D) ≥ c⋆rd.
The following Lemma is analogous to Lemma 3.6.2.
Lemma 3.6.7. Let Y ⊂ Xn, be a set of k+1 random variables from Xn, and assume that
Y is independent of the Poisson process Pn. Then,
limn→∞
nkE {grn(Y ,Xn)} = lim
n→∞nk
E {grn(Y ,Y ∪ Pn)} = (k + 1)!γk(λ).
Proof. We shall show the full proof for the Poisson case (grn(Y ,Y ∪ Pn)). The proof for
the random sample case is similar. Setting sn = n−1/d and mimicking the proof of Lemma
3.6.2 we obtain
E {grn(Y ,Y ∪ Pn)} =
∫
(Rd)k+1
f(x)hrn(x)e−np(x)dx
= sdkn
∫
Rd
∫
(Rd)kf(x)f(x+ sny)hrn(x, x+ sny)e
−np(x,x+sny)dydx
= n−k
∫
Rd
f(x)
∫
(Rd)kf(x+ sny)hτn(0,y)e
−np(x,x+sny)dydx, (3.6.22)
where τn = rn/sn = n1/drn. We wish to apply the dominated convergence theorem for
the last integral. Thus, we need to bound the integrand with an integrable expression.
In the critical range this is done much as in the subcritical range. Since nrdn → λ < ∞,
we have that τn is bounded by some value M . Now, for hτn(0,y) to be nonzero, all the
elements y1, . . . , yk ∈ Rd must lie inside B2τn(0) ⊂ B2M (0). Therefore,
∣∣f(x+ sny)hτn(0,y)e−np(x,x+sny)
∣∣ ≤ fkmax1B2M (0)(y1) · · ·1B2M (0)(yk),
and this expression is integrable.
The last argument cannot be applied in the supercritical range since τn is no longer
bounded. This is where we use our additional, lower bounded, assumptions on the f .
Since we now have fmin > 0 we also have
p(x) =
∫
B(x)
f(z)dz ≥ fminVol(B(x) ∩ supp(f)).
3.6. PROOFS 93
If hrn(x) 6= 0, then necessarily C(x) ∈ conv◦(x) and R(x) ≤ rn (cf. (3.2.7)). In addition,
if f(x) 6= 0, then x ⊂ supp(f). Since we assume that supp(f) is convex, we have that
C(x) ∈ supp(f) as well. Thus, B(x) is a ball centered at C(x) ∈ supp(f), with radius
R(x) small enough, and Lemma 3.6.6 yields
Vol(B(x) ∩ supp(f)) ≥ c⋆Rd(x).
This can be used to bound the integrand in (3.6.22), so that
∣∣f(x+ sny)hτn(0,y)e−np(x,x+sny)
∣∣ ≤ fkmaxe
−nfminc⋆Rd(x,x+sny)
= fkmaxe
−fminc⋆Rd(0,y).
(3.6.23)
Next, note that for i = 1, . . . , k, R(0,y) ≥ ‖yi‖ /2. Thus,
Rd(0,y) ≥ 1
2dk
k∑
j=1
‖yj‖d,
which implies that the expression in (3.6.23) is indeed integrable, and so the DCT can be
safely applied in both regimes.
Next, we compute the limit of the integral in (3.6.22). Note first that
np(x, x+ sny) = n
∫
B(x,x+sny)
f(z)dz
= nVol(B(x, x+ sny))
∫B(x,x+sny)
f(z)dz
Vol(B(x, x+ sny))
= nωd(snR(0,y))d
∫B(x,x+sny)
f(z)dz
Vol(B(x, x+ sny)).
= ωdRd(0,y)
∫B(x,x+sny)
f(z)dz
Vol(B(x, x+ sny)),
and using the Lebesgue differentiation theorem yields
limn→∞
np(x, x+ sny) = ωdRd(0,y)f(x).
Taking the limit of all the other terms in (3.6.22) we have
limn→∞
nkE {grn(Y ,Y ∪ Pn)} =
∫
(Rd)k+1
fk+1(x)hτ∞(0,y)e−ωdRd(0,y)f(x)dydx,
where τ∞ = limn→∞ τn. In the supercritical regime, τ∞ = ∞, and consequently hτ∞ =
h∞ ≡ h. Thus,
limn→∞
nkE {grn(Y ,Y ∪ Pn)} =
∫
(Rd)k+1
fk+1(x)h(0,y)e−ωdRd(0,y)f(x)dydx,
94 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
and using the change of variables yi → f−1/d(x)yi (where f(x) > 0) we have
limn→∞
nkE {grn(Y ,Y ∪ Pn)} =
∫
(Rd)kh(0, y)e−ωdR
d(0,y) dy = (k + 1)!γk(∞).
In the critical range, τn → λ1/d. Therefore,
limn→∞
nkE {grn(Y ,Y ∪ Pn)} =
∫
(Rd)k+1
fk+1(x)hλ1/d(0,y)e−ωdRd(0,y)f(x)dydx
= λk
∫
(Rd)k+1
fk+1(x)hλ1/d(0, λ1/dz)e−λωdRd(0,z)f(x)dzdx = (k + 1)!γk(λ).
This completes the proof.
3.6.5 Asymptotic Means
Using Lemma 3.6.7 we can prove Theorem 3.3.5.
Proof of Theorem 3.3.5. For the random sample case we have
E {Nk,n} =
(n
k + 1
)E {grn(Y),Xn)} ,
and, using Lemma 3.6.7,
limn→∞
n−1E {Nk,n} = lim
n→∞
(n−(k+1)
(n
k + 1
))(nk
E {grn(Y ,Xn)})= γk(λ).
For the Poisson case, using Theorem 3.A.1,
E{Nk,n} =nk+1
(k + 1)!E {grn(Y ′,Y ′ ∪ Pn)} ,
and, using Lemma 3.6.7,
limn→∞
n−1E{Nk,n} = γk(λ),
which completes the proof.
3.6.6 Asymptotic Variance - Poisson Case
For the variance and CLT results, as in the subcritical phase, we shall first treat the
Poisson case. Then, using De-Poissonization we shall turn to the random sample case.
3.6. PROOFS 95
Proof of Theorem 3.3.6, (Nk,n only). As in the proof of Theorem 3.3.2,
Var(N2k,n) = E{Nk,n}+
k∑
j=1
E{Ij}+(E{I0} −
(E{Nk,n}
)2),
where
Ij =∑
Y1⊂Pn
∑
Y2⊂Pn
grn(Y1,Pn)grn(Y2,Pn)1 {|Y1 ∩ Y2| = j}.
From Corollary 3.A.2,
E{Ij} =n2k+2−j
j!((k + 1− j)!)2E {grn(Y ′
1,Y ′12 ∪ Pn)grn(Y ′
2,Y ′12 ∪ Pn)}|Y ′
1∩Y ′2|=j .
where Y ′1,Y ′
2 are sets of k + 1 i.i.d. points in Rd with density f(x), independent of Pn,
such that |Y ′1 ∩ Y ′
2| = j, and Y ′12 = Y ′
1 ∪Y ′2. For 0 < j < k + 1, as in the proof of Lemma
3.6.7, one can show that
limn→∞
nd(2k+1−j)E {grn(Y ′
1,Y ′12 ∪ Pn)grn(Y ′
2,Y ′12 ∪ Pn)}|Y ′
1∩Y ′2|=j
=
∫
Rd(2k+2−j)
f 2k+2−j(x)hτ∞(0,y1 ∪ z)hτ∞(0,y2 ∪ z)
× e−Vol(B(0,y1∪z)∪B(0,y2∪z))f(x)dxdy1dy2dz,
where x ∈ Rd, yi ∈ R
d(k+1−j), z ∈ Rd(j−1), and τ∞ = limn→∞ n1/drn. Therefore,
limn→∞
n−1E{Ij} = γ
(j)k (λ),
where
γ(j)k (λ) ,
λ2k+1−j
j!((k + 1− j)!)2
∫
Rd(2k+2−j)
f 2k+2−j(x)h1(0,y1 ∪ z)h1(0,y2 ∪ z)
× e−λVol(B(0,y1∪z)∪B(0,y2∪z))f(x)dxdy1dy2dz.
for λ ∈ (0,∞), and
γ(j)k (∞) ,
1
j!((k + 1− j)!)2
∫
Rd(2k+2−j)
f 2k+2−j(x)h(0,y1 ∪ z)h(0,y2 ∪ z)
× e−Vol(B(0,y1∪z)∪B(0,y2∪z))f(x)dxdy1dy2dz.
It is easy to show that 0 < γjk(λ) < ∞ for λ ∈ (0,∞]. For j = 0, we define
∆ , grn(Y ′1,Y ′
12 ∪ Pn)grn(Y ′2,Y ′
12 ∪ Pn)− grn(Y ′1,Y ′
1 ∪ Pn)grn(Y ′2,Y ′
2 ∪ P ′n)
96 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
so that
E {I0} −(E{Nk,n}
)2=
n2k+2
((k + 1)!)2E {∆} .
Now set
∆1 = ∆ · 1 {B(Y ′1) ∩ B(Y ′
2) 6= ∅} , ∆2 = ∆ · 1 {B(Y ′1) ∩B(Y ′
2) = ∅} .
Then, as in the proof of Theorem 3.3.2, we can show that E {∆2} = 0, and
limn→∞
n2k+1E {∆1} =
∫
Rd(2k+2)
f 2k+2(x)hτ∞(0,y1)hτ∞(0,y2)1 {B(0,y1) ∩ B(z, z + y2) 6= ∅}
×(e−Vol(B(0,y1)∪B(z,z+y2))f(x) − e−ωd(R
d(0,y1)+Rd(0,y2))f(x))dxdzdy1dy2,
where x, z ∈ Rd, and yi ∈ (Rd)k. Thus,
limn→∞
n−1
(E {I0} −
(E{Nk,n}
)2)= γ
(0)k (λ),
where
γ(0)k (λ) ,
λ2k+1
((k + 1)!)2
×∫
Rd(2k+2)
f 2k+2(x)h1(0,y1)h1(0,y2)1 {B(0,y1) ∩ B(z, z + y2) 6= ∅}
×(e−λVol(B(0,y1)∪B(z,z+y2))f(x) − e−λωd(R
d(0,y1)+Rd(0,y2))f(x))dxdzdy1dy2,
for λ < ∞, and
γ(0)k (∞) ,
1
((k + 1)!)2
×∫
Rd(2k+2)
f 2k+2(x)h(0,y1)h(0,y2)1 {B(0,y1) ∩B(z, z + y2) 6= ∅}
×(e−Vol(B(0,y1)∪B(z,z+y2))f(x) − e−ωd(R
d(0,y1)+Rd(0,y2))f(x))dxdzdy1dy2.
To conclude, we have proven that
limn→∞
n−1Var(Nk,n) = γk(λ) +
k∑
j=0
γ(j)k (λ) , σ2
k(λ) ∈ (0,∞), (3.6.24)
as required.
3.6. PROOFS 97
3.6.7 CLT - Poisson Case
Next, we prove the CLT result in Theorem 3.3.7, again using Stein’s method, as in the
proof of Theorem 3.3.3.
Proof of Theorem 3.3.7 (Nk,n only). We start again by counting only critical points lo-
cated in a compact set A ⊂ Rd, with
∫Af(x)dx > 0. We define Qi,n, N
(i)k,n, N
Ak,n, g
(i)rn , (IA,∼)
and ξi the same way as in the proof of Theorem 3.3.3. Then, as in the proof of Theorem
3.3.6, one can show that
limn→∞
n−1Var(NA
k,n
)∈ (0,∞). (3.6.25)
According to Theorem 3.B.3, in order to prove a CLT for NAk,n, we need to find bounds
for E {|ξi|p} , p = 3, 4 . We start with p = 3.
E
{(N
(i)k,n − E
{N
(i)k,n
})3}=
3∑
j=0
(3
j
)(−1)j
(E
{N
(i)k,n
})3−j
E
{(N
(i)k,n
)j}.
The computation of the bound here is similar in spirit to the ones we used in the proof of
Theorem 3.3.2, but technically more complicated, and we shall not give details. Rather,
we shall suffice with a brief description of the main ideas: Every element in the sum can
be expressed as the expectation of a triple sum of the form
E
{∑
Y1⊂P(1)n
∑
Y2⊂P(2)n
∑
Y3⊂P(3)n
g(i)rn (Y1,P(1)n )g(i)rn (Y2,P(2)
n )g(i)rn (Y3,P(3)n )
}, (3.6.26)
where each of the Poisson processes can either be equal to one of the others or an inde-
pendent copy, depending on j. As for E {∆2} in the proof of Theorem 3.3.6, we can use
Palm theory, collect all the terms in which at least one of the balls B(Yi) is disjoint from
the others, and show that they cancel each other. For each of the remaining terms, we
can show that if |Y1 ∪ Y2 ∪ Y3| = 3k + 3 − j, with 0 ≤ j ≤ 3k + 3, then the relevant
part of the sum in (3.6.26) is bounded by c⋆n3k+3−jsd(3k+2−j)n rdn = c⋆nrdn. This bound is
achieved using integral evaluations similar to the ones used in the proof of Theorem 3.3.6,
along with the fact that all the points are located within distance of rn from the cube
Qi,n. Thus, we have
E
{(N
(i)k,n − E
{N
(i)k,n
})3}≤ c⋆nrdn.
98 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Recall, that |IA| ≤ c⋆r−dn . Therefore,
∑
i∈IAE{|ξi|3
}≤ c⋆r−d
n nrdn(Var
(NA
k,n
))3/2 =c⋆n
n3/2(n−1Var
(NA
k,n
))3/2 → 0.
The proof for p = 4 is similar, and from Theorem 3.B.3 we have that
NAk,n − E
{NA
k,n
}
(Var
(NA
k,n
))1/2L−→ N (0, 1).
To conclude the proof, we need to show that the CLT for NAk,n implies a CLT for Nk,n.
This is done exactly as for Part 3 of Theorem 3.3.3.
3.6.8 CLT - Random Sample Case
To complete the proof of Theorems 3.3.6 and 3.3.7, we need to show that the same limit
results apply to the random sample case as well. While we again rely on De-Poissonization,
it is worth noting that, as opposed to the subcritical range, here the limiting variances
are different in the Poisson and random sample cases. We start by defining
ηk(λ) ,λk+1
(k + 1)!
∫
(Rd)k+2
fk+2(x)h(0,y)1B(0,y)(z)e−λωdR
d(0,y)f(x)dxdydz
ηk(∞) ,1
(k + 1)!
∫
(Rd)k+2
fk+2(x)h(0,y)1B(0,y)(z)e−ωdR
d(0,y)f(x)dxdydz
where λ < ∞, x ∈ Rd and y ∈ (Rd)k, z ∈ R
d.
Proof of Theorems 3.3.6 and 3.3.7 (Nk,n). Let Dm,n denote the increment:
Dm,n =∑
Y⊂Xm+1
grn(Y ,Xm+1)−∑
Y⊂Xm
grn(Y ,Xm).
Let γ be an arbitrary number in (1/2, 1). We wish to apply Theorem 3.C.1, withHn(Pn) =
Nk,n and α = αk(λ) , (k + 1)γk(λ)− ηk(λ). Thus, we need to prove:
limn→∞
supn−nγ≤m≤n+nγ
|E {Dm,n} − αk(λ)| = 0 (3.6.27)
limn→∞
supn−nγ≤m<m′≤n+nγ
∣∣E {Dm,nDm′,n} − α2k(λ)
∣∣ = 0 (3.6.28)
limn→∞
supn−nγ≤m≤n+nγ
n−1/2E{D2
m,n
}= 0 (3.6.29)
3.6. PROOFS 99
As in the proof of Theorem 3.3.3, write Dm,n = D+m,n −D−
m,n, where
D+m,n = # {Y ⊂ Xm+1 : grn(Y ,Xm+1) = 1 and Xm+1 ∈ Y} ,
D−m,n = # {Y ⊂ Xm : grn(Y ,Xm) = 1 and Xm+1 ∈ B(Y)} .
From the definition of D+m,n we have that
E{D+
m,n
}=
(m
k
)E {grn(Y ,Xm+1)} .
Thus,
(n− nγ
k
)E {grn(Y ,Xn+nγ)} ≤ E
{D+
m,n
}≤(n+ nγ
k
)E {grn(Y ,Xn−nγ)} .
As in the proof of Lemma 3.6.7, since γ ∈ (1/2, 1) it is easy to show that
limn→∞
nkE {grn(Y ,Xn±nγ)} = (k + 1)!γk(λ)
and since limn→∞ n−k(n±nγ
k
)= 1/k!, we have
limn→∞
(sup
n−nγ≤m≤n+nγ
∣∣E{D+
m,n
}− (k + 1)γk(λ)
∣∣)
= 0.
Next, from the definition of D−m,n we have,
E{D−
m,n
}=
(m
k + 1
)E{grn(Y ,Xm)1B(Y)(Xm+1)
}.
Note that if X is a random variable in Rd with density f , independent of Xn, then we can
replace Xm+1 with X in the last equality. Thus, we have
E{D−
m,n
}≥(n− nγ
k + 1
)E{grn(Y ,Xn+nγ)1B(Y)(X)
},
E{D−
m,n
}≤(n + nγ
k + 1
)E{grn(Y ,Xn−nγ)1B(Y)(X)
}.
In addition, it is easy to show that
limn→∞
nk+1E{grn(Y ,Xn±nγ)1B(Y)(X)
}= (k + 1)!ηk(λ).
Thus,
limn→∞
(sup
n−nγ≤m≤n+nγ
∣∣E{D−
m,n
}− ηk(λ)
∣∣)
= 0.
100 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Finally, since |Dm,n − αk(λ)| ≤∣∣D+
m,n − (k + 1)γk(λ)∣∣+∣∣D−
m,n − ηk(λ)∣∣, we conclude that
(3.6.27) holds.
To prove (3.6.28) we need to show that,
limn→∞
(sup
n−nγ≤m<m′≤n+nγ
∣∣E {Dm,nDm′,n} − α2k(λ)
∣∣)
= 0.
Recall that Dm,n = D+m,n −D−
m,n, and so we can write the product Dm,nDm′,n as a sum of
four different products. We start with D+m,nD
+m′,n:
E{D+
m,nD+m′,n
}=
∑
Y⊂Xm+1Xm+1∈Y
∑
Y ′⊂Xm′+1
Xm′+1∈Y ′
E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} . (3.6.30)
Now, if |Y ∩ Y ′| = j > 0, then it is easy to show that
limn→∞
n2k−jE {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} = 0.
Therefore, the relevant part of the sum in (3.6.30) satisfies(m
k
)(m′ − k − 1
k − j
)(k + 1
j
)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)}
≥(n− nγ
k
)(n− nγ − k − 1
k − j
)(k + 1
j
)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} → 0
and(m
k
)(m′ − k − 1
k − j
)(k + 1
j
)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)}
≤(n+ nγ
k
)(n+ nγ − k − 1
k − j
)(k + 1
j
)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} → 0
If Y ∩ Y ′ = ∅, then
E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} ≥ E {grn(Y ,Xn+nγ)grn(Y ′,Xn+nγ)} ,
E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} ≤ E {grn(Y ,Xn−nγ)grn(Y ′,Xn−nγ)} ,
and it is easy to show that
limn→∞
n2kE {grn(Y ,Xn±nγ)grn(Y ′,Xn±nγ)} = ((k + 1)!γk(λ))
2.
Therefore, the relevant part of the sum in (3.6.30) satisfies(m
k
)(m′ − k − 1
k
)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)}
≥(n− nγ
k
)(n− nγ − k − 1
k
)E {grn(Y ,Xn+nγ)grn(Y ′,Xn+nγ)} → ((k + 1)γk(λ))
2,
3.6. PROOFS 101
and(m
k
)(m′ − k − 1
k
)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)}
≤(n + nγ
k
)(n + nγ − k − 1
k
)E {grn(Y ,Xn−nγ)grn(Y ′,Xn−nγ)} → ((k + 1)γk(λ))
2.
Thus,
limn→∞
(sup
n−nγ≤m<m′≤n+nγ
∣∣E{D+
m,nD+m′,n
}− ((k + 1)γk(λ))
2∣∣)
= 0.
Similarly, we can show that
limn→∞
(sup
n−nγ≤m<m′≤n+nγ
∣∣E{D−
m,nD−m′,n
}− (ηn(λ))
2∣∣)
= 0,
limn→∞
(sup
n−nγ≤m<m′≤n+nγ
∣∣E{D−
m,nD+m′,n
}− (k + 1)γk(λ)ηn(λ)
∣∣)
= 0,
limn→∞
(sup
n−nγ≤m<m′≤n+nγ
∣∣E{D+
m,nD−m′,n
}− (k + 1)γk(λ)ηn(λ)
∣∣)
= 0.
Combining all these limits together shows that (3.6.28) holds. Finally, similar computa-
tions yield (3.6.29).
For the last condition in Theorem 3.C.1, note that
Hn(Xm) ≤(
m
k + 1
)≤ c⋆mk+1 ≤ c⋆(n+m)k+1.
Thus, taking β = max(c⋆, k+1) completes the De-Poissonization proof, and from Theorem
3.C.1 we conclude that α2k(λ) ≤ σ2
k(λ), and that
limn→∞
Var (Nk,n) = σ2k(λ)− α2
k(λ) , σ2k(λ). (3.6.31)
and thatNk,n − E {Nk,n}√
n
L−→ N (0, σ2k(λ))
which completes the proof of Theorem 3.3.7, as promised.
The only remaining results in Section 3.3 that still require proofs relate to the global
number of critical points - NGk,n.
Proof of Theorem 3.3.8. This theorem is proved exactly the same way as Theorems 3.3.5,
3.3.6, and 3.3.7 are proved in the super-critical phase. The only difference is that, through-
out, h(x) replaces hτn(x). This, however does not affect any of the results, since in the
limit hτn(x) → h(x).
102 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Proof of Proposition 3.3.9. We prove the proposition for the Poisson case. The random
sample case is similar.
E
{NG
k,n − Nk,n
}
=nk+1
(k + 1)!n−k
∫
(Rd)k+1
f(x)f(x+ sny)(h(0,y)− hτn(0,y))e−np(x,x+sny)dydx
=1
(k + 1)!
∫
(Rd)k+1
f(x)f(x+ sny)(h(0,y)− hτn(0,y))ne−np(x,x+sny)dydx. (3.6.32)
As in the proof of Theorem 3.3.5 (cf. (3.6.23)), we can show that the integrand is bounded
by
f(x)fkmax(h(0,y)− hτn(0,y))ne
−fminc⋆Rd(0,y). (3.6.33)
Now note that if the integrand is nonzero then h 6= hτn , and so R(0,y) > τn. Therefore,
Rd(0,y) > 1/2(Rd(0,y) + nrdn), and (3.6.33) can be replaced by
f(x)fkmax(h(0,y)− hτn(0,y))e
−fminc⋆Rd(0,y)/2ne−fminc
⋆nrdn/2. (3.6.34)
Assuming that nrdn ≥ D⋆ log n, with D⋆ = (fminc⋆/2)−1 then ne−fminc
⋆nrdn/2 ≤ 1 and we
obtain an integrable bound for the integrand. Thus, we can apply the DCT to (3.6.32).
Finally, note that the bound we found in (3.6.34) converges to zero (since hτn → h), so
we are done.
3.6.9 Euler Characteristic Results
In this section we prove Corollary 3.4.2.
Proof of Corollary 3.4.2. Recall that χn , χ(C(Xn, rn)), and χn , χ(C(Pn, rn)). Morse
theory provides an alternative way to compute the Euler characteristic via the number of
critical points. Specifically, in our case we have
χn =d∑
k=0
(−1)kNk,n, χn =d∑
k=0
(−1)kNk,n.
First note that N0 = n in the random sample case, and E{Nk,n} = n in the Poisson
case. Therefore,
E {χn} = n +
d∑
k=1
(−1)kE {Nk,n} , E {χn} = n+
d∑
k=1
(−1)kE{Nk,n
}.
3.A. PALM THEORY FOR POISSON PROCESSES 103
The first two cases of the theorem are now obvious consequences of Theorems 3.3.1 and
3.3.5. For the third case, using Theorem 3.3.8, we have
limn→∞
n−1χn = limn→∞
n−1
d∑
k=0
(−1)kNGk,n.
However, since NGk,n counts all the critical points in R
d, Morse theory implies
d∑
k=0
(−1)kNGk,n = χ(Rd) = 1,
and we can conclude that limn→∞ n−1χn = 0.
If, in addition, rdn satisfies the conditions of Proposition 3.3.9 (i.e. nrdn ≥ D⋆ log n),
then
0 = limn→∞
d∑
k=0
(−1)kE{NG
k,n −Nk,n
}= 1− lim
n→∞χn,
which implies that χn → 1.
3.A Palm Theory for Poisson Processes
This appendix contains a collection of definitions and theorems which are used in the
proofs of this paper. Most of the results are cited from [39], although they may not
necessarily have originated there. However, for notational reasons we refer the reader
to [39], while other resources include [5, 42]. The following theorem is very useful when
computing expectations related to Poisson processes.
Theorem 3.A.1 (Palm theory for Poisson processes, [39, Theorem 1.6] ). Let f be a
probability density on Rd, and let Pn be a Poisson process on R
d with intensity λn = nf .
Let h(Y ,X ) be a measurable function defined for all finite subsets Y ⊂ X ⊂ Rd with
|Y| = k. Then
E
{ ∑
Y⊂Pn
h(Y ,Pn)}=
nk
k!E {h(Y ′,Y ′ ∪ Pn)}
where Y ′ is a set of k iid points in Rd with density f , independent of Pn.
We shall also need the following corollary, which treats second moments:
104 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
Corollary 3.A.2. With the notation above, assuming |Y1| = |Y2| = k,
E
{ ∑
Y1,Y2⊂Pn
|Y1∩Y2|=j
h(Y1,Pn)h(Y2,Pn)}=
n2k−j
j!((k − j)!)2E {h(Y ′
1,Y ′12 ∪ Pn)h(Y ′
2,Y ′12 ∪ Pn)}
where Y ′12 = Y ′
1 ∪ Y ′2 is a set of 2k − j iid points in R
d with density f(x), independent of
Pn, and |Y ′1 ∩ Y ′
2| = j.
Proof. Given |Pn| = m, the sum on the LHS is finite. Therefore,
E
{ ∑
Y1,Y2⊂Pn
|Y1∩Y2|=j
h(Y1,Pn)h(Y2,Pn)∣∣∣|Pn| = m
}(3.A.1)
=
(m
2k − j
)(2k − j
k
)(k
j
)E {h(Y1,Pn)h(Y2,Pn) | |Pn| = m}|Y1∩Y2|=j
Choosing now all possible subsets Y of size 2k − j, and splitting each of them into two
arbitrary subsets Y1,Y2 of size k with |Y1 ∩ Y2| = j, yields
E
{ ∑
Y⊂Pn|Y|=2k−j
h(Y1,Pn)h(Y2,Pn)∣∣∣|Pn| = m
}(3.A.2)
=
(m
2k − j
)E {h(Y1,Pn)h(Y2,Pn) | |Pn| = m}|Y1∩Y2|=j .
Combining (3.A.1), (3.A.2), and Theorem 3.A.1 for subsets Y of size 2k − j yields,
E
{ ∑
Y1,Y2⊂Pn
|Y1∩Y2|=j
h(Y1,Pn)h(Y2,Pn)}
=
(2k − j
k
)(k
j
)E
{ ∑
Y⊂Pn|Y|=2k−j
h(Y1,Pn)h(Y2,Pn)}
=n2k−j
j!((k − j)!)2E {h(Y ′
1,Y ′12 ∪ Pn)h(Y ′
2,Y ′12 ∪ Pn)} ,
where Y ′12 = Y ′
1 ∪ Y ′2 is a set of 2k − j iid points in R
d with density f(x), independent of
Pn, and |Y ′1 ∩ Y ′
2| = j.
3.B Stein’s Method
In this chapter we heavily used Stein’s method to derive limit theorems for the sums of
dependent Bernoulli variables. We need both the Poisson and normal approximations,
which are presented below.
3.C. DE-POISSONIZATION 105
Definition 3.B.1. Let (I, E) be a graph. For i, j ∈ I we denote i ∼ j if (i, j) ∈ E. Let
{ξi}i∈I be a set of random variables. We say that (I,∼) is a dependency graph for {ξi}if for every I1 ∩ I2 = ∅, with no edges between I1 and I2, the set of variables {ξi}i∈I1 is
independent of {ξi}i∈I2. We also define the neighborhood of i as Ni , {i}∪{j ∈ I : j ∼ i}.
Theorem 3.B.2 (Stein’s Method for Bernoulli Variables, [39, Theorem 2.1]). Let {ξi}i∈Ibe a set of Bernoulli random variables, with dependency graph (I,∼). Let
pi , E {ξi} , pi,j , E {ξiξj} , λ ,∑
i∈Ipi, W ,
∑
i∈Iξi, Z ∼ Poisson (λ) .
Then,
dTV (W,Z) ≤ min(3, λ−1)(∑
i∈I
∑
j∈Ni\{i}pij +
∑
i∈I
∑
j∈Ni
pipj
).
Theorem 3.B.3 (CLT for sums of weakly dependent variables, [39, Theorem 2.4]). Let
(ξi)i∈I be a finite collection of random variables, with E {ξi} = 0. Let (I,∼) be the
dependency graph of (ξi)i∈I , and assume that its maximal degree is D−1. SetW ,∑
i∈I ξi,
and suppose that E {W 2} = 1. Then for all w ∈ R,
|FW (w)− Φ(w)| ≤ 2(2π)−1/4
√D2∑
i∈IE{|ξi|3
}+ 6
√D3∑
i∈IE{|ξi|4
},
where FW is the distribution function of W and Φ that of a standard Gaussian.
3.C De-Poissonization
Recall that the results in this chapter apply to both fixed size sets Xn and for Poisson
processes Pn. In some cases it is easier to prove the results for Pn first, and then conclude
that similar results apply to Xn. The second step is known as ‘De-Poissonization’, and
our use of it will depend primarily on the following theorem.
Theorem 3.C.1 (De-Poissonization, [39, Theorem 2.12]). For every n ∈ N, let Hn(X ) be
a functional defined for all finite sets of points X ⊂ Rd. Let Pn a Poisson process defined
the same way as in Section 3.3, such that
n−1Var (Hn(Pn)) → σ2 andHn(Pn)− E {Hn(Pn)}√
n
D−→ N (0, σ2),
106 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES
as n → ∞. Define
Rm,n , Hn(Xm+1)−Hn(Xm),
where Xm is defined as in Section 3.3. In other words, Rm,n measures the change in the
value of the functional Hn as a single point is added to the random set. Supposed that
there exist α ∈ R and γ > 1/2 such that,
limn→∞
(sup
n−nγ≤m≤n+nγ|E {Rm,n} − α|
)= 0 (3.C.1)
limn→∞
(sup
n−nγ≤m<m′≤n+nγ
∣∣E {Rm,nRm′,n} − α2∣∣)
= 0 (3.C.2)
limn→∞
(sup
n−nγ≤m≤n+nγn−1/2
E{R2
m,n
})= 0, (3.C.3)
and also that there exists β > 0 such that |Hn(Xm)| ≤ β(n+m)β (a.s.).
Then α2 ≤ σ2, and as n → ∞,
n−1Var (Hn(Xn)) → σ2 − α2 andHn(Xn)− E {Hn(Xn)}√
n
D−→ N (0, σ2 − α2).
Chapter 4
Noise Crackles
4.1 Introduction
In this chapter we continue to study random Cech complexes constructed from either a
random sample Xn or a Poisson process Pn (see Section 3.3 for definitions). The main
difference from Chapter 3 is that we look at C(Xn, 1) rather than C(Xn, rn), i.e. we take
fixed-sized balls rather than shrinking ones. Obviously, if the sample distribution has com-
pact support S, then for large enough n we have that⋃n
k=1B1(Xk) ≈ Tube(S, 1). Thus,
there is not much to study in this case. However, when the support of the distribution is
unbounded, interesting phenomena occur.
We shall study distributions supported on Rd, and find that there exists a ‘core’ -
a region in which the density of points is very high and so, placing unit balls around
them completely covers this region. Consequently, the Cech complex inside the core is
contractible. The size of the core obviously grows as n → ∞. Outside the core there may
be additional isolated points, but not enough to cover the entire area. Thus, in this region,
the topology of the Cech complex is nontrivial, and many holes of different dimensions
may appear. We call this phenomenon ‘crackling’.
The exact crackling behavior depends on the choice of distribution. In this chapter
we study three representative examples. The power-law, exponential, and the standard
107
108 CHAPTER 4. NOISE CRACKLES
Gaussian distributions, whose density functions are given, respectively, by
fp(x) ,cp
1 + ‖x‖α , (4.1.1)
fe(x) , cee−‖x‖, (4.1.2)
fg(x) , cge−‖x‖2/2, (4.1.3)
where α > d, ‖·‖ is the standard L2 norm in Rd, and cp, ce, cg are normalization constants.
The motivation for our study is threefold. Firstly, studying how different distributions
crackle is an interesting pure probability problem. Secondly, recall the manifold learning
problem discussed repeatedly in this thesis - we have a set of random samples from a
compact manifold M ⊂ Rd and we would like to recover the homology of M . One
may consider a similar problem, but where noise is added to the samples. For example,
in [38], the samples are of the form Yk = Xk + Nk, where Xk ∈ M and Nk ∈ Rd is a
Gaussian noise lying on the normal of M at Xk. In such cases, the noise outliers might
introduce homology elements which do not belong to the original manifoldM . Indeed, the
algorithm suggested in [38] includes a significant step of throwing away what seem to be
outliers. Studying how pure noise crackles would be the first step in understanding how
to handle noisy manifold learning schemes, generally and rigorously. Finally, the results
in this chapter shed some light on the behavior of the Cech complex C(Xn, rn) (studied
in Chapter 3) in the super-critical range (nrdn → ∞). We will discuss this in Section 4.6.
We note that the work described in this chapter is still in progress. While we have
uncovered the main interesting crackling phenomena, there is still more to study on the
crackling of pure noise, as well as providing stronger limit statements. Also note that
while we present all the results in terms of the random sample Xn, exactly the same
results apply to the Poisson process Pn as well.
4.2 The Core of Distributions with Unbounded Sup-
port
We start by examining the core of the power-law, exponential and Gaussian distributions
presented in the previous section. These distributions are spherically symmetric and the
4.2. THE CORE OF DISTRIBUTIONS WITH UNBOUNDED SUPPORT 109
samples are concentrated near the origin. By ‘core’ we refer to a centered ball BRn ,
BRn(0) ⊂ Rd containing a very large number of points from the sample Xn (or Pn), such
that
BRn ⊂⋃
X∈Xn∩BRn
B1(X).
i.e. the unit balls around the samples cover BRn completely. Since BRn is covered, it
contains no holes, and therefore the homology of⋃
X∈Xn∩BRnB1(X), or equivalently, of
C(Xn ∩ BRn , 1), is trivial. Obviously, as n → ∞, the radius Rn grows as well.
Let {Rn}∞n=1 be an increasing sequence of positive numbers. Define by Cn the event
that BRn is covered, i.e.
Cn ,
BRn ⊂
⋃
X∈Xn∩BRn
B1(X)
.
We wish to find the largest possible value of Rn such that P (Cn) → 1. The following
theorem presents lower bounds for this value.
Theorem 4.2.1. Let ǫ > 0, and define
Rcn ,
(δpn
(logn−e−ǫ log logn)− 1)1/α
f = fp
log n− log log logn− δe − ǫ f = fe√
2 (logn− log log logn− δg − ǫ) f = fg
where
δp = cpα2−dd−(1+d/2),
δe = (1 + d/2) log d+ d log 2− log ce,
δg = (1 + d/2) log d+ (d− 1) log 2− log cg.
If Rn ≤ Rcn, then
P (Cn) → 1.
We see that the core size has a completely different order of magnitude in the three
distributions we chose. The heavy-tailed power-law distribution has the largest core, while
the core of the Gaussian distribution is the smallest one. In the following sections we will
study the behavior of the Cech complex outside the core.
110 CHAPTER 4. NOISE CRACKLES
4.3 How Power-Law Noise Crackles
In this section we explore the crackling phenomenon in the power-law distribution, i.e.
f = fp (defined in (4.1.1)). Let BRn ⊂ Rd be the centered ball with radius Rn, and let
Cn , C(Xn ∩ (BRn)c, 1),
be the Cech complex constructed from sample points outside BRn . We wish to study
βk,n , βk(Cn),
the k-th Betti number of Cn.
Note that the minimum number of points required to form a k-dimensional hole (k ≥ 1)
is k + 2. For k ≥ 1 and Y ⊂ Rd, denote
Tk(Y) , 1{|Y| = k + 2, βk(C(Y , 1)) = 1},
i.e. Tk takes the value 1 if C(Y , 1) is a minimal k-dimensional hole, and 0 otherwise. This
indicator function will be used to define the limits of the Betti numbers.
Theorem 4.3.1. If limn→∞ nR−αn = 0, then
limn→∞
(nRd−α
n
)−1E {β0,n} = µp,0,
limn→∞
(nk+2Rd−α(k+2)
n
)−1E {βk,n} = µp,k, 1 ≤ k ≤ d− 1
where
µp,0 ,sd−1cpα− d
, (4.3.1)
µp,k ,sd−1c
k+2p
(α(k + 2)− d)(k + 2)!
∫
(Rd)k+1
Tk(0,y)dy, 1 ≤ k ≤ d− 1, (4.3.2)
and where sd−1 is the surface area of the (d− 1)-dimensional sphere in Rd.
Next, we define the following values, which serve as critical radii for the crackle,
Rǫ0,n , n(
1α−d
+ǫ),
R0,n , R00,n,
Rǫk,n , n(
1α−d/(k+2)
+ǫ), (k ≥ 1)
Rk,n , R0k,n.
4.3. HOW POWER-LAW NOISE CRACKLES 111
The following is a straightforward corollary of Theorem 4.3.1, and summarizes the behav-
ior of E {βk,n}.
Corollary 4.3.2. For k ≥ 0 and ǫ > 0,
limn→∞
E {βk,n} =
0 Rn = Rǫk,n,
µp,k Rn = Rk,n,
∞ Rn = R−ǫk,n,
Theorem 4.3.1 and Corollary 4.3.2 reveal that the crackling behavior is organized into
separate ‘layers’, see Figure 4.1. Dividing Rd into a sequence of annuli at radii
Rǫ0,n ≫ R0,n ≫ Rǫ
1,n ≫ R1,n ≫ · · · ≫ Rǫd−1,n ≫ Rd−1,n ≫ Rc
n,
we observe a different behavior of the Betti numbers in each annulus. We shall briefly
review the behavior in each annulus, in a decreasing order of radii values. The following
description is mainly qualitative, and refers to expected values only.
• [Rǫ0,n,∞) - there are hardly any points (βk ∼ 0, 0 ≤ k ≤ d− 1).
• [R0,n, Rǫ0,n) - points start to appear, and β0 ∼ µp,0. The points are very few and
scattered, so no holes are generated (βk ∼ 0, 1 ≤ k ≤ d− 1).
• [Rǫ1,n, R0,n) - the number of components grows to infinity, but no holes are formed
yet (β0 ∼ ∞, and βk = 0, 1 ≤ k ≤ d− 1).
• [R1,n, Rǫ1,n) - a finite number of 1-dimensional holes show up, among the infinite
number of components (β0 ∼ ∞, β1 ∼ µp,1, and βk = 0, 1 ≤ k ≤ d− 1).
• [Rǫ2,n, R1,n) - we have β0 ∼ ∞, β1 ∼ ∞, and βk ∼ 0 for k ≥ 1.
This process goes on, until the (d− 1)-dimensional holes appear -
• [Rd−1, Rǫd−1) - we have βd−1 ∼ µp,d−1 and βk ∼ ∞ for 0 ≤ k ≤ d− 2.
• [Rcn, Rd−1) - just before we reach the core, the complex exhibits the most intricate
structure, with βk ∼ ∞ for 0 ≤ k ≤ d− 1.
112 CHAPTER 4. NOISE CRACKLES
Note that there is a very fast phase transition as we move from the contractible core
to the first crackle layer. At this point we do not know exactly where and how this phase
transition takes place. A reasonable conjecture would be that the transition occurs at
Rn = n1/α (since at this radius the term nR−αn that appears in Theorem 4.3.1 changes its
limit, affecting the limiting Betti numbers). However, this remains for future work.
Figure 4.1: The layered behavior of crackle. Inside the core (BRcn) the complex consists of a
single component and no holes. The exterior of the core is divided into separate annuli. Going
from right to left, we see how the Betti numbers grow. In each annulus we present the Betti
number that was most recently changed.
4.4 How Exponential Noise Crackles
In this section we focus on the exponential density function, i.e. f = fe (defined in (4.1.2)).
The results in this section are very similar to the ones of the power law distribution, and
we shall describe them briefly. Differences will show in (a) the values of Rk,n, and (b) the
exact limits.
Theorem 4.4.1. If limn→∞ ne−Rn = 0, then
limn→∞
(nRd−1
n e−Rn)−1
E {β0,n} = µe,0,
limn→∞
(nk+2Rd−1
n e−(k+2)Rn)−1
E {βk,n} = µe,k, k ≥ 1
where
µe,0 , sd−1ce, (4.4.1)
µe,k ,sd−1c
k+2e
(k + 2)!
∫ ∞
0
∫
(Rd)k+1
Tk(0,y)e−((k+2)ρ+
∑k+1i=1 y1i )
k+1∏
i=1
1{y1i > −ρ}dydρ, (4.4.2)
and where y1i is the first coordinate of yi ∈ Rd.
4.5. GAUSSIAN NOISE DOES NOT CRACKLE 113
Next, define
Rǫ0,n , logn + (d− 1 + ǫ) log logn,
R0,n , R00,n,
Rǫk,n , logn +
(d− 1
k + 2+ ǫ
)log log n, (k ≥ 1)
Rk,n , R0k,n.
From Theorem 4.4.1 we can conclude the following.
Corollary 4.4.2. For k ≥ 0 and ǫ > 0,
limn→∞
E {βk,n} =
0 Rn = Rǫk,n,
µe,k Rn = Rk,n,
∞ Rn = R−ǫk,n,
As in the power-law case, Theorem 4.4.1 implies the same ‘layered’ behavior, the only
difference being in the values of Rk,n. From examining the values of Rcn, and Rk,n it is
reasonable to guess that the phase transition in the exponential case occurs at Rn = log n.
4.5 Gaussian Noise Does Not Crackle
The standard Gaussian distribution (defined in (4.1.3)) exhibits a completely different
behavior than the power-law and the exponential distributions. Define
Rǫ0,n ,
√2 logn + (d− 2 + ǫ) log logn,
then
Theorem 4.5.1. If f = fg, ǫ > 0, and Rn = Rǫ0,n, then for 0 ≤ k ≤ d− 1
limn→∞
E {βk,n} = 0.
Note that in the Gaussian case limn→∞(Rǫ
0,n −Rcn
)= 0. This implies that as n → ∞
we have the core which is contractible, and outside the core there is hardly anything. In
other words, the ball placed around every new point we add to the sample immediately
connects to the core, and thus, the Gaussian noise does not crackle.
114 CHAPTER 4. NOISE CRACKLES
4.6 Summary and Future Work
In the preceding sections we presented the crackling phenomenon which occur in some
distributions with unbounded supports. We examined three prototype distributions - the
power-law, exponential and Gaussian. We characterized the ‘core’ of the distributions and
found bounds on its size. Once we move outside the core, the Cech complex crackles -
i.e. it splits up into many particles with non-trivial homology. We described the crackling
phenomenon in the power-law and exponential distributions, and found that different
Betti number show up in different layers. Comparing the results in Theorems 4.3.1 and
4.4.1, we see that the exponential distribution crackles much closer to the origin than the
heavy-tailed power-law distribution. For the Gaussian distribution, on the other hand,
we showed that crackling does not occur. In the Gaussian case the Cech complex consists
mainly of its core, and thus remains contractible, even after adding n → ∞ points.
Beyond these results, there remains much to investigate. Firstly, we would like to
extend our results beyond expectations and provide stronger limit theorems, as in Chapter
3. In addition, we wish to carefully study the bounds we established for the different
crackling radii (i.e. Rcn, Rk,n, R
ǫk,n), and see if they can be refined. We also wish to
characterize the phase-transition phenomenon as we move from the contractible core, into
the chaotic first layer of the crackle. Finally, in this chapter we studied the power-law,
exponential and Gaussian distributions, but it would be interesting to see if the results
we have could be generalized to broader classes of distributions.
In Section 4.1 we discussed the motivation for the study in this chapter. At this point,
we may start gaining some intuition about the noisy manifold learning problem discussed
there. For example, if the distribution of the noise is Gaussian, our results imply that
noise outliers should not significantly interfere with homology recovery, since Gaussian
noise does not introduce any artificial homology elements (components, holes). On the
other hand, if the distribution of the noise is power-law or exponential, then noise outliers
will typically generate extraneous homology elements that will damage the estimation
of the original manifold. Thus, in these cases homology recovery algorithms should re-
move outliers before attempting to analyze the data. In this chapter we studied the
regions where crackling occur for these distributions. Further investigating the phenom-
4.7. PROOFS 115
ena presented here, can be later used to develop outlier removal methods that reduce the
probability of artificial homology elements. This, however, remains as future work.
Another motivation mentioned in Section 4.1 is the study of the Cech complex
C(Xn, rn) in the super-critical phase (nrdn → ∞), which was presented in Chapter 3.
Recall that in the super-critical phase the results were restricted to distributions with a
compact and convex support. Under this assumption, the results of Chapter 3 (combined
with [30]) indicate that, in the limit, the Cech complex C(Xn, rn) becomes contractible
(i.e. β0 = 1 and βk = 0 for k ≥ 1). The results in Sections 4.3 and 4.4 imply that if
the support of the distribution is non-compact, the behavior of C(Xn, rn) is completely
different. We saw that the power-law and exponential distributions crackle and have an
infinite number of components, even for a fixed radius of 1. Taking an even smaller radius
(rn → 0) can only enhance crackling, and thus there is no reason to believe that the result-
ing complex would be contractible. On the other hand, since the Gaussian distribution
does not crackle, it is possible that it behaves like a compactly supported distribution.
At this point we cannot make any concrete statements, but the results in this chapter
definitely gives us a lead to where we should look.
4.7 Proofs
4.7.1 The Core
In this section we prove the main result of Section 4.2
Proof of Theorem 4.2.1. The proof is general for all three distributions. Take a grid on
Rd of size g = 1
2√d. Let Qn be the collection of cubes in this grid that are contained in
BRn . Let Cn be the following event
Cn , {∀Q ∈ Qn : Q ∩ Xn 6= ∅} ,
i.e. Cn is the event that every cube in Qn contains at least one point from Xn. Recall the
definition of Cn,
Cn ,
BRn ⊂
⋃
X∈Xn∩BRn
B1(X)
.
116 CHAPTER 4. NOISE CRACKLES
Then it is easy to show that Cn ⊂ Cn. The complementary event Ccn is the event that at
least one cube is empty. Thus,
P(Ccn) ≤
∑
Q∈Qn
P (Q ∩ Xn = ∅) =∑
Q∈Qn
(1− p(Q))n ≤∑
Q∈Qn
e−np(Q)
where
p(Q) =
∫
Q
f(z)dz ≥ gdf(Rn).
In addition, the number of cubes that are contained in BRn is less than(2Rn
g
)d. Therefore,
P(Ccn) ≤ (2g−1)dRd
ne−ngdf(Rn). (4.7.1)
Now, choose any ǫ > 0 and set
Rn = Rcn ,
(δpn
(logn−e−ǫ log logn)− 1)1/α
f = fp
log n− log log log n− δe − ǫ f = fe√
2 (log n− log log log n− δg − ǫ) f = fg
where
δp = cpα2−dd−(1+d/2),
δe = log d− log ce − log gd,
δg = log(d/2)− log cg − log gd.
It is easy to verify that in all cases we have
Rdne
−ngdf(Rn) → 0.
Thus, from (4.7.1) we conclude that P(Cn) → 1. Since P (Cn) ≥ P(Cn) we now have that
for Rn = Rcn, in each of the distributions,
P (Cn) → 1,
which completes the proof. The proof for the Poisson case (Pn) follows exactly the same
steps.
4.7. PROOFS 117
4.7.2 Crackle - Notation and General Lemmas
As noted in Section 4.1, while the results in this chapter are stated for the random sample
case (Xn), they apply to the Poisson case (Pn) as well. We will present the proofs for the
Poisson case only. The proofs for the random sample case follow exactly the same steps,
using the same bounds and yielding the same results. Thus, to avoid duplicated notation
and proofs, we omit them.
For Rn > 0, denote
Pn,Rn , Pn ∩ (BRn)c,
i.e. Pn,Rn consists of the points of Pn located outside the ball BRn . Next, recall the
definition of Tk,
Tk(Y) , 1{|Y| = k + 2, βk(C(Y , 1)) = 1},
for Y ⊂ Rd, and denote
S0,n , |Pn,Rn | ,
S0,n , #{X ∈ Pn,Rn : X is a connected component of C(Pn, 1)
}
Sk,n ,∑
Y⊂Pn,Rn
Tk(Y),
Sk,n ,∑
Y⊂Pn,Rn
Tk(Y)1{C(Y , 1) is a connected component of C(Pn, 1)},
Lk,n ,∑
Y⊂Pn,Rn
1{|Y| = k + 3, C(Y , 1) is connected},
where k ≥ 1. Observe that
S0,n ≤β0,n ≤ S0,n (4.7.2)
Sk,n ≤βk,n ≤ Sk,n + Lk,n, k ≥ 1 (4.7.3)
We will evaluate the limits of E {Sk,n}, E{Sk,n} and E {Lk,n} and deduce from these the
limit of E {βk,n}.
In the following proofs we will use the notation introduced in Section 3.6.1. In addition,
118 CHAPTER 4. NOISE CRACKLES
we set
e1 , (1, 0, . . . , 0) ∈ Rd,
f(r) , f(re1), r ∈ R,
U(x) ,
k⋃
i=1
B2(xi), x ∈ (Rd)k,
p(x) ,
∫
U(x)
f(z)dz, x ∈ (Rd)k.
The following Lemmas are purely technical, but will considerably simplify our computa-
tions later.
Lemma 4.7.1. Let f : Rd → R be a spherically symmetric probability density. Then,
E {S0,n} = sd−1n
∫ ∞
Rn
rd−1f(r)dr,
E{S0,n} = sd−1n
∫ ∞
Rn
rd−1f(r)e−np(re1)dr,
where sd−1 is the volume of the d− 1 dimensional unit sphere.
Proof. Using Palm theory (Theorem 3.A.1) we have
E {S0,n} = n
∫
Rd
f(x)1 {‖x‖ > Rn} dx.
Next, we move to polar coordinates, using the change of variables x → rθ where r ∈ R+
and θ ∈ Sd−1. This yields
E {S0,n} = n
∫ ∞
Rn
∫
Sd−1
f(rθ)rd−1J(θ)dθdr,
where J(θ) =∣∣∂x∂θ
∣∣. Since f is spherically symmetric, f(rθ) = f(r), and therefore,
E {S0,n} = sd−1n
∫ ∞
Rn
rd−1f(r)dr.
The proof for S0,n is similar, using the fact that the probability that a point x ∈ Rd is
disconnected from the rest of the complex C(Pn, 1) is e−np(x).
4.7. PROOFS 119
Lemma 4.7.2. Let f : Rd → R be a spherically symmetric probability density. Then for
k ≥ 1,
E {Sk,n} =sd−1n
k+2
(k + 2)!
∫ ∞
Rn
rd−1f(r)Gk(r)dr,
E{Sk,n} =sd−1n
k+2
(k + 2)!
∫ ∞
Rn
rd−1f(r)Gk(r)dr,
where sd−1 is the volume of the d− 1 dimensional sphere, and where
Gk(r) ,
∫
(Rd)k+1
f(‖re1 + y‖)Tk(0,y)k+1∏
i=1
1 {‖re1 + yi‖ > Rn} dy,
Gk(r) ,
∫
(Rd)k+1
f(‖re1 + y‖)Tk(0,y)
k+1∏
i=1
1 {‖re1 + yi‖ > Rn} e−np(re1,re1+y)dy.
Proof. The proof is in the same spirit of the proof of Lemma 4.7.1, but technically more
complicated. Using Palm theory (Theorem 3.A.1), we have that
E {Sk,n} =nk+2
(k + 2)!
∫
(Rd)k+2
f(x)Tk(x)k+2∏
i=1
1 {‖xi‖ > Rn} dx.
Let Ik denote the integral above. Then, using the change of variables
x1 → x, xi → x+ yi−1, (i > 1),
yields
Ik =
∫
‖x‖≥Rn
∫
(Rd)k+1
f(x)f(x+ y)Tk(x, x+ y)
k+1∏
i=1
1 {‖x+ yi‖ > Rn} dydx
=
∫
‖x‖≥Rn
∫
(Rd)k+1
f(x)f(x+ y)Tk(0,y)
k+1∏
i=1
1 {‖x+ yi‖ > Rn} dydx.
Next, we move to polar coordinates, using the change of variables x → rθ where r ∈ R+
and θ ∈ Sd−1. This yields
Ik =
∫ ∞
Rn
∫
Sd−1
∫
(Rd)k+1
f(rθ)f(rθ + y)Tk(0,y)k+1∏
i=1
1 {‖rθ + yi‖ > Rn} rd−1J(θ)dydθdr
=
∫ ∞
Rn
rd−1f(r)
∫
Sd−1
J(θ)
∫
(Rd)k+1
f(‖rθ + y‖)Tk(0,y)k+1∏
i=1
1 {‖rθ + yi‖ > Rn} dydθdr,
where J(θ) =∣∣∂x∂θ
∣∣, and f(x) = f(‖x‖) by the spherical symmetry assumption. Denote
Gk(r, θ) ,
∫
(Rd)k+1
f(‖rθ + y‖)Tk(0,y)k+1∏
i=1
1 {‖rθ + yi‖ > Rn} dy.
120 CHAPTER 4. NOISE CRACKLES
Since Tk is rotation invariant, it is easy to show that for every θ ∈ Sd−1
Gk(r, θ) = Gk(r, e1) , Gk(r).
Thus,
Ik = sd−1
∫ ∞
Rn
rd−1f(r)Gk(r)dr, (4.7.4)
where sd−1 is the surface area of the d-dimensional unit ball. This completes the proof
for Sk,n. The proof for Sk,n is similar.
4.7.3 Crackle - The Power Law Distribution
In this section we wish to prove the results in Section 4.3. First, we need a few lemmas.
Lemma 4.7.3. If f = fp, and Rn → ∞, then
limn→∞
(nRd−α
n
)−1E {S0,n} = µp,0,
where µp,0 is defined in (4.3.1).
If, in addition, nR−αn → 0, then
limn→∞
(nRd−α
n
)−1E{S0,n} = µp,0.
Proof. From Lemma 4.7.1 we have that
E {S0,n} = sd−1n
∫ ∞
Rn
rd−1f(r)dr.
Using the change of variables r → Rnρ yields
E {S0,n} = sd−1n
∫ ∞
1
cp(Rnρ)d−1
1 + (Rnρ)αRndρ
= sd−1cpnRd−αn
∫ ∞
1
ρd−1
R−αn + ρα
dρ.
Applying the DCT to the last integral yields,
limn→∞
(nRd−α
n
)−1E {S0,n} = sd−1cp
∫ ∞
1
ρd−1−αdρ =sd−1cpα− d
= µp,0.
This proves the first part of the lemma.
4.7. PROOFS 121
Next, from Lemma 4.7.1 we have that
E{S0,n} = sd−1n
∫ ∞
Rn
rd−1f(r)e−np(re1)dr.
The exponential term will not affect the DCT conditions. Thus, we only need to evaluate
its limit.
p(re1) =
∫
B2(re1)
f(z)dz =
∫
B2(0)
cp1 + ‖re1 + z‖dz,
and after the change of variables r → Rnρ we have,
p(Rnρe1) = cpR−αn
∫
B2(0)
1
R−αn + ‖ρe1 +R−1
n z‖αdz.
If nR−αn → 0, then using the DCT we have
limn→∞
np(Rnρe1) = 0.
Thus,
limn→∞
e−np(Rnρe1) = 1,
and therefore we have
limn→∞
(nRd−α
n
)−1E{S0,n} = lim
n→∞
(nRd−α
n
)−1E {S0,n} = µp,0.
This completes the proof for the second part of the lemma.
Lemma 4.7.4. If f = fp, and Rn → ∞ then
limn→∞
(nk+2Rd−α(k+2)
n
)−1E {Sk,n} = µp,k,
where µp,k is defined in (4.3.2). If, in addition,nR−αn → 0, then
limn→∞
(nk+2Rd−α(k+2)
n
)−1E{Sk,n} = µp,k.
Proof. The proof is in the spirit of the proof of Lemma 4.7.3, but technically more com-
plicated. From Lemma 4.7.2 we have that
E {Sk,n} =nk+2
(k + 2)!Ik,
where
Ik = sd−1
∫ ∞
Rn
rd−1f(r)Gk(r)dr.
122 CHAPTER 4. NOISE CRACKLES
Using the change of variables r → Rnρ, yields
Ik = sd−1Rn
∫ ∞
1
(Rnρ)d−1f(Rnρ)Gk(Rnρ)dρ
= sd−1ck+2p (Rn)
d−α(k+2)
∫ ∞
1
∫
(Rd)k+1
ρd−1
R−αn + ρα
k+1∏
i=1
1
R−αn + ‖ρe1 +R−1
n yi‖α
× Tk(0,y)
k+1∏
i=1
1{∥∥ρe1 +R−1n yi
∥∥ > 1}dy.
Thus,
(nk+2Rd−α(k+2)n )−1
E {Sk,n} =sd−1c
k+2p
(k + 2)!
∫ ∞
1
∫
(Rd)k+1
ρd−1
R−αn + ρα
k+1∏
i=1
1
R−αn + ‖ρe1 +R−1
n yi‖α
× Tk(0,y)k+1∏
i=1
1{∥∥ρe1 +R−1n yi
∥∥ > 1}dy.
It is easy to show that the integrand is bounded properly, so the DCT applies, yielding
limn→∞
(nk+2Rd−α(k+2)n )−1
E {Sk,n} =sd−1c
k+2p
(k + 2)!
∫ ∞
1
ρd−1−α(k+2)dρ
∫
(Rd)k+1
Tk(0,y)dy
=sd−1c
k+2p
(α(k + 2)− d)(k + 2)!
∫
(Rd)k+1
Tk(0,y)dy
= µp,k.
This proves the first part of the lemma.
Next, the terms Gk(r) and Gk(r) in Lemma 4.7.2 differ only by the term e−np(re1,re1+y),
so the DCT still applies. Now,
p(re1, re1 + y) =
∫
U(re1,re1+y)
f(z)dz =
∫
U(0,y)
f(re1 + z)dz,
and substituting r → Rnρ yields,
p(Rnρe1, Rnρe1 + y) = cpR−αn
∫
U(0,y)
1
R−αn + ‖ρe1 +R−1
n z‖αdz.
If nR−αn → 0, then using the DCT we have
limn→∞
np(Rnρe1, Rnρe1 + y) = 0.
Thus,
limn→∞
e−np(Rnρe1,Rnρe1+y) = 1,
4.7. PROOFS 123
and therefore,
limn→∞
(nk+2Rd−α(k+2)
n
)−1E{Sk,n} = lim
n→∞
(nk+2Rd−α(k+2)
n
)−1E{Sk,n} = µp,k.
This completes the proof for the second part of the lemma.
Lemma 4.7.5. If f = fp, and Rn → ∞ then
limn→∞
(nk+3Rd−α(k+3)
n
)−1E {Lk,n} = µp,k,
for some µp,k > 0.
Proof. The proof is very similar to the proof of Lemma 4.7.4, just replace Tk with an
indicator function that tests whether a the sub-complex generated by k + 3 points is
connected. The exact value of µp,k will not be needed anywhere.
We can now prove Theorem 4.3.1.
Proof of Theorem 4.3.1. To prove the limit for β0,n just combine Lemma 4.7.3 with the
inequality (4.7.2). To prove the limit for βk,n, k ≥ 1 combine Lemmas 4.7.4 and 4.7.5
with the inequality in (4.7.3).
4.7.4 Crackle - The Exponential Distribution
In this section we wish to prove Theorem 4.4.1. We start with the following lemmas.
Lemma 4.7.6. If f = fe, and Rn → ∞ then,
limn→∞
(nRd−1
n e−Rn)−1
E {S0,n} = µe,0,
where µe,0 is defined in (4.4.1).
If, in addition, ne−Rn → 0 then,
limn→∞
(nRd−1
n e−Rn)−1
E{S0,n} = µe,0.
Proof. From Lemma 4.7.1 we have that
E {S0,n} = sd−1n
∫ ∞
Rn
rd−1f(r)dr.
124 CHAPTER 4. NOISE CRACKLES
Using the change of variables r → ρ+Rn yields
E {S0,n} = sd−1n
∫ ∞
0
(ρ+Rn)d−1cee
−(ρ+Rn)dρ
= sd−1cenRd−1n e−Rn
∫ ∞
0
(ρ
Rn+ 1
)d−1
e−ρdρ.
Applying the DCT to the last integral yields,
limn→∞
(nRd−1
n e−Rn)−1
E {S0,n} = sd−1ce
∫ ∞
0
e−ρdρ = sd−1ce = µe,0.
This proves the first part of the lemma.
Next, from Lemma 4.7.1 we have that
E{S0,n} = sd−1n
∫ ∞
Rn
rd−1f(r)e−np(re1)dr.
The exponential term will not affect the DCT conditions. Thus, we only need to evaluate
its limit.
p(re1) =
∫
B2(re1)
f(z)dz =
∫
B2(0)
cee−‖re1+z‖dz,
and after the change of variables r → ρ+Rn we have,
p((ρ+Rn)e1) =
∫
B2(0)
cee−‖(ρ+Rn)e1+z‖dz ≤ e−(Rn+ρ)
∫
B2(0)
cee‖z‖dz.
If ne−Rn → 0, then
limn→∞
np((ρ+Rn)e1) = 0.
Thus,
limn→∞
e−np((ρ+Rn)e1) = 1,
and therefore we have
limn→∞
(nRd−1
n e−Rn)−1
E{S0,n} = limn→∞
(nRd−1
n e−Rn)−1
E{S0,n} = µe,0.
This completes the proof for the second part of the lemma.
Lemma 4.7.7. If f = fe, and Rn → ∞ then,
limn→∞
(nk+2Rd−1
n e−(k+2)Rn)−1
E {Sk,n} = µe,k,
where µe,k is defined in (4.4.2).
If, in addition, ne−Rn → 0 then,
limn→∞
(nk+2Rd−1
n e−(k+2)Rn)−1
E{Sk,n} = µe,k.
4.7. PROOFS 125
Proof. From Lemma 4.7.2 we have that
E {Sk,n} =nk+2
(k + 2)!Ik,
where
Ik = sd−1
∫ ∞
Rn
rd−1f(r)Gk(r)dr.
Using the change of variables r → ρ+Rn, yields
Ik = sd−1
∫ ∞
0
(ρ+Rn)d−1f(ρ+Rn)Gk(ρ+Rn)dρ
= sd−1ck+2e
∫ ∞
0
∫
(Rd)k+1
(ρ+Rn)d−1e−(ρ+Rn)
k+1∏
i=1
e−‖(ρ+Rn)e1+yi‖
× Tk(0,y)
k+1∏
i=1
1 {‖(ρ+Rn)e1 + yi‖ > Rn} dydρ
= sd−1ck+2e e−(k+2)RnRd−1
n
∫ ∞
0
∫
(Rd)k+1
(ρ
Rn+ 1
)d−1
e−ρ
k+1∏
i=1
e−‖(ρ+Rn)e1+yi‖eRn
× Tk(0,y)
k+1∏
i=1
1 {‖(ρ+Rn)e1 + yi‖ > Rn} dydρ.
The last integral can be easily shown to satisfy the DCT conditions. In addition, it is
easy to show that
limn→∞
e−‖(ρ+Rn)e1+yi‖eRn = e−(ρ+〈e1,yi〉) = e−(ρ+y1i ),
where y1i is the first coordinate of yi ∈ Rd, and also that
limn→∞
1 {‖(ρ+Rn)e1 + yi‖ > Rn} = 1{y1i ≥ −ρ}.
Altogether, we have that
limn→∞
(nk+2Rd−1
n e−(k+2)Rn)−1
E {Sk,n}
=sd−1c
k+2e
(k + 2)!
∫ ∞
0
∫
(Rd)k+1
Tk(0,y)e−((k+2)ρ+
∑k+1i=1 y1i )
k+1∏
i=1
1{y1i ≥ −ρ}dydρ,
proving the first part of the lemma.
Next, as in the proof of Lemma 4.7.4, we need to evaluate the term p(re1, re1 + y).
p(re1, re1 + y) =
∫
U(0,y)
cee−‖re1+z‖dz ≤
∫
U(0,y)
cee−(r−‖z‖)dz.
126 CHAPTER 4. NOISE CRACKLES
The change of variables r → ρ+Rn yields
p((ρ+Rn)e1, (ρ+Rn)e1 + y) ≤ e−Rne−ρ
∫
U(0,y)
cee‖z‖dz.
If ne−Rn → 0, then
limn→∞
np((ρ+Rn)e1, (ρ+Rn)e1 + y) = 0.
Thus,
limn→∞
e−np(Rnρe1,Rnρe1+y) = 1,
and therefore,
limn→∞
(nk+2Rd−1
n e−(k+2)Rn)−1
E{Sk,n} = limn→∞
(nk+2Rd−1
n e−(k+2)Rn)−1
E {Sk,n} = µe,k.
This completes the proof.
Lemma 4.7.8. If f = fe, and Rn → ∞ then
limn→∞
(nk+3Rd−1
n e−(k+3)Rn)−1
E {Lk,n} = µe,k.
where µe,k > 0.
Proof. The proof is very similar to the proof of Lemma 4.7.7, just replace Tk with an
indicator function that tests whether a sub-complex generated by k+3 points is connected.
The exact value of µe,k will not be needed anywhere, so we do not attempt to compute
it.
Proof of Theorem 4.4.1. The proof follows the same steps as the proof of Theorem 4.3.1.
4.7.5 Crackle - The Gaussian Distribution
In this section we wish to prove Theorem 4.5.1.
Proof of Theorem 4.5.1. From Lemma 4.7.1 we have that
E {S0,n} = sd−1n
∫ ∞
Rn
rd−1f(r)dr.
4.7. PROOFS 127
Now, use the change of variables r → (ρ2 +R2n)
1/2, dr = ρ(ρ2+R2
n)1/2dρ, then
E {S0,n} = sd−1cgne−R2
n/2
∫ ∞
0
(ρ2 +R2n)
(d−2)/2ρe−ρ2/2dρ
= sd−1cgne−R2
n/2Rd−2n
∫ ∞
0
((ρ/Rn)
2 + 1)(d−2)/2
ρe−ρ2/2dρ.
The integrand is bounded, and using the DCT we have
limn→∞
(ne−R2
n/2Rd−2n
)−1
E {S0,n} = sd−1cg.
Taking Rn = Rǫ0,n ,
√2 logn+ (d− 2 + ǫ) log log n, we have
e−R2n/2 = n−1(logn)−(d−2+ǫ)/2
and so
limn→∞
ne−R2n/2Rd−2
n = 0
which implies that
E {S0,n} → 0.
Finally, for every 0 ≤ k ≤ d− 1,
βk,n ≤ S0,n.
Therefore,
limn→∞
E {βk,n} = 0,
completing the proof.
128 CHAPTER 4. NOISE CRACKLES
Bibliography
[1] Robert J. Adler. On excursion sets, tube formulas and maxima of random fields. The
Annals of Applied Probability, 10(1):1–74, 2000.
[2] Robert J. Adler, Omer Bobrowski, Matthew S. Borman, Eliran Subag, and Shmuel
Weinberger. Persistent homology for random fields and complexes. Institute of Math-
ematical Statistics Collections, 6:124–143, 2010.
[3] Robert J. Adler and Jonathan E. Taylor. Random fields and geometry. Springer
Monographs in Mathematics. Springer, New York, 2007.
[4] Lior Aronshtam, Nathan Linial, Tomasz Luczak, and Roy Meshulam. Vanishing of
the top homology of a random complex. Arxiv preprint arXiv:1010.1400, 2010.
[5] Richard Arratia, Larry Goldstein, and Louis Gordon. Two moments suffice for pois-
son approximations: the Chen-Stein method. The Annals of Probability, 17(1):9–25,
1989.
[6] Eric Babson, Christopher Hoffman, and Matthew Kahle. The fundamental group of
random 2-complexes. J. Amer. Math. Soc, 24(1):128, 2011.
[7] Yuliy Baryshnikov and Robert Ghrist. Target enumeration via Euler characteristic
integrals. SIAM Journal on Applied Mathematics, 70(3):825–844, 2009.
[8] Yuliy Baryshnikov and Robert Ghrist. Euler integration over definable functions.
Proceedings of the National Academy of Sciences of the United States of America,
107(21):9525–9530, 2010.
[9] Omer Bobrowski and Robert J. Adler. Distance functions, critical points, and topol-
ogy for some random complexes. arXiv:1107.4775, July 2011.
129
130 BIBLIOGRAPHY
[10] Omer Bobrowski and Matthew Strom Borman. Euler integration of gaussian random
fields and persistent homology. Journal of Topology and Analysis, 4(1), 2012.
[11] Karol Borsuk. On the imbedding of systems of compacta in simplicial complexes.
Fund. Math, 35(217-234):5, 1948.
[12] Peter Bubenik, Gunnar Carlsson, Peter T. Kim, and Zhiming Luo. Statistical topol-
ogy via Morse theory, persistence and nonparametric estimation. 0908.3668, August
2009. Contemporary Mathematics, Vol. 516 (2010), pp. 75-92.
[13] Peter Bubenik and Peter T. Kim. A statistical approach to persistent homology.
Homology, Homotopy and Applications, 9(2):337–362, 2007.
[14] Jin Cao. The geometry of correlation fields with an application to functional con-
nectivity of the brain. The Annals of Applied Probability, 9(4):1021–1057, Novem-
ber 1999. Mathematical Reviews number (MathSciNet): MR1727913; Zentralblatt
MATH identifier: 0961.60052.
[15] Gunnar Carlsson. Topology and data. American Mathematical Society. Bulletin.
New Series, 46(2):255–308, 2009.
[16] Gunnar Carlsson and Vin de Silva. Plex: MATLAB software for
computing persistent homology of finite simplicial complexes, 2006.
http://comptop.stanford.edu/programs/plex.html.
[17] Gunnar Carlsson, Tigran Ishkhanov, Vin de Silva, and Afra Zomorodian. On the
local behavior of spaces of natural images. International Journal of Computer Vision,
76(1):1–12, January 2008.
[18] Frdric Chazal, David Cohen-Steiner, and Quentin Mrigot. Geometric inference for
measures based on distance functions. 2010.
[19] Moo K. Chung, Peter Bubenik, and Peter T. Kim. Persistence diagrams of cortical
surface data. In Information Processing in Medical Imaging, page 386397, 2009.
[20] Daniel C. Cohen, Michael Farber, and Thomas Kappeler. The homotopical dimension
of random 2-complexes. Arxiv preprint arXiv:1005.3383, 2010.
BIBLIOGRAPHY 131
[21] Justin Curry, Robert Ghrist, and Michael Robinson. Euler calculus with applications
to signals and sensing. arXiv:1202.0275, January 2012.
[22] Vin de Silva and Robert Ghrist. Coverage in sensor networks via persistent homology.
Algebraic & Geometric Topology, 7:339–358, 2007.
[23] Herbert Edelsbrunner and John Harer. Persistent homology - a survey. In Surveys on
discrete and computational geometry, volume 453 of Contemp. Math., pages 257–282.
Amer. Math. Soc., Providence, RI, 2008.
[24] Vladimir Gershkovich and Hyam Rubinstein. Morse theory for min-type functions.
The Asian Journal of Mathematics, 1(4):696–715, 1997.
[25] Robert Ghrist. Barcodes: the persistent topology of data. American Mathematical
Society. Bulletin. New Series, 45(1):61–75, 2008.
[26] Robert Ghrist. Applied algebraic topology & sensor networks, 2010.
http://www.math.upenn.edu/˜ghrist/preprints/ATSN.pdf.
[27] Robert Ghrist and Michael Robinson. Euler-Bessel and Euler-Fourier transforms.
Inverse Problems, 27(12):124006, December 2011.
[28] Allen Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002.
[29] Matthew Kahle. Topology of random clique complexes. Discrete Mathematics,
309(6):1658–1671, 2009.
[30] Matthew Kahle. Random geometric complexes. Discrete & Computational Geome-
try. An International Journal of Mathematics and Computer Science, 45(3):553–573,
2011.
[31] Matthew Kahle and Elizabeth Meckes. Limit theorems for Betti numbers of random
simplicial complexes. 1009.4130, September 2010.
[32] Ji Matouek. Using the Borsuk-Ulam theorem: lectures on topological methods in
combinatorics and geometry. Springer Verlag, 2003.
132 BIBLIOGRAPHY
[33] Roy Meshulam and Nathan Wallach. Homological connectivity of random k-
dimensional complexes. Random Structures & Algorithms, 34(3):408417, 2009.
[34] Yuriy Mileyko, Sayan Mukherjee, and John Harer. Probability measures on the space
of persistence diagrams. Inverse Problems, 27(12):124007, December 2011.
[35] John W. Milnor. Morse theory. Based on lecture notes by M. Spivak and R. Wells.
Annals of Mathematics Studies, No. 51. Princeton University Press, Princeton, N.J.,
1963.
[36] Marston Morse and Stewart Scott Cairns. Critical point theory in global analysis and
differential topology: an introduction. Academic Press, 1969.
[37] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of
submanifolds with high confidence from random samples. Discrete & Computational
Geometry. An International Journal of Mathematics and Computer Science, 39(1-
3):419–441, 2008.
[38] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. A topological view of unsu-
pervised learning from noisy data. SIAM Journal on Computing, 40(3):646, 2011.
[39] Mathew D. Penrose. Random geometric graphs, volume 5 of Oxford Studies in Prob-
ability. Oxford University Press, Oxford, 2003.
[40] Mathew D. Penrose and Joseph E. Yukich. Limit theory for point processes in
manifolds. 1104.0914, April 2011.
[41] Nicholas Pippenger and Kristin Schleich. Topological characteristics of random tri-
angulated surfaces. Random Structures & Algorithms, 28(3):247–288, May 2006.
[42] Dietrich Stoyan, Wilfried S. Kendall, and Joseph Mecke. Stochastic geometry and its
applications. Wiley Series in Probability and Mathematical Statistics: Applied Prob-
ability and Statistics. John Wiley & Sons Ltd., Chichester, 1987. With a foreword
by D. G. Kendall.
[43] Jonathan E. Taylor. A gaussian kinematic formula. The Annals of Probability,
34(1):122–158, 2006.
BIBLIOGRAPHY 133
[44] Jonathan E. Taylor and Robert J. Adler. Euler characteristics for gaussian fields on
manifolds. The Annals of Probability, 31(2):533–563, 2003.
[45] Jonathan E. Taylor, Akimichi Takemura, and Robert J. Adler. Validity of the ex-
pected euler characteristic heuristic. The Annals of Probability, 33(4):1362–1396,
2005.
[46] Jonathan E. Taylor, Keith J. Worsley, and Frederic Gosselin. Maxima of discretely
sampled random fields, with an application to ‘bubbles’. Biometrika, 94(1):1–18,
March 2007.
[47] James W. Vick. Homology theory. Academic Press, New York, 1973. An introduction
to algebraic topology, Pure and Applied Mathematics, Vol. 53.
[48] O. Ya. Viro. Some integral calculus based on Euler characteristic. In Topology and
geometryRohlin Seminar, volume 1346 of Lecture Notes in Math., pages 127–138.
Springer, Berlin, 1988.
[49] Keith J. Worsley. Boundary corrections for the expected euler characteristic of excur-
sion sets of random fields, with an application to astrophysics. Advances in Applied
Probability, pages 943–959, 1995.
[50] Keith J. Worsley. Estimating the number of peaks in a random field using the Had-
wiger characteristic of excursion sets, with applications to medical images. The An-
nals of Statistics, 23(2):640–669, April 1995. Mathematical Reviews number (Math-
SciNet): MR1332586; Zentralblatt MATH identifier: 0898.62120.
טופולוגיה אלגברית של שדות וקומפלקסים אקראיים
עומר בוברובסקי
טופולוגיה אלגברית של שדות וקומפלקסים אקראיים
חיבור על מחקר
לשם מילוי חלקי של הדרישות לקבלת התואר
דוקטור לפילוסופיה
עומר בוברובסקי
מכון טכנולוגי לישראל –הוגש לסנט הטכניון
1021יולי חיפה בתמוז תשע"
בפקולטה להנדסת רוברט אדלר המחקר נעשה בהנחיית פרופ' חשמל.
תודות
ההנחיה המסורה, העזרה על ,ראשית, אני רוצה להודות למנחה שלי, פרופסור רוברט אדלר
על ההכוונה והפיתוח המקצועי שלי, תוך מתן אמון מלא ותחושת לאורך כל הדרך. והסבלנות
על היחס החם, ועל .דחיפה קדימה, פתיחת הדלתות, והנתינה מעל ומעברעל העצמאות מחקרית.
שהראית לי שאפשר להיות על - מעבר לכלאך ויה בפני עצמה. חו תהפגישות שכל אחת מהן הי
על לשמור ו יים על הקרקערגל עם רלהישא, ויחד עם זאת מצליח ומקצועי ביותראיש אקדמיה
ההערכה והכרת התודה שלי חורגות הרבה לפרגן. אמתית, ועל יכולת הומור משובח, על צניעות
תודה.. זו מעבר לפסקה בודדת
מאוניברסיטת שיקגו, על האירוח בתחילת הדוקטורט, ועל תודה לפרופסור שמואל ויינברגר
טרום בורמן מאוניברסיטת סמת'יו ל כמו כןשנמשך לאורך כל הדרך. הפורה שיתוף הפעולה
.עמוזו הוא פרי עבודה משותפת בתזהשהחלק הראשון ,שיקגו
ךתודה לפרופסור רון מאיר מהטכניון, שהנחה אותי בתואר השני, אך תמך בי רבות גם בהמש
., ועל הדלת הפתוחה תמיד. על שיחות מרתקות, היחס החברילימודי
ארן ברגמן, דניאל סיגלוב, הדס לתודה מיוחדת תודה לחברים הרבים שרכשתי במהלך תקופה זו.
פחות, מיכה והעידוד ברגעים קשים יותר והתורונן טלמון. תודה על ,)יששכר( ולטר איסטי, זיגי-בן
.כעת קשה לעזוב אותומאוד , וכיף לבוא אליומאוד מקום ש היה הטכניוןבזכותכם ועל ש
הנתינהו, התמיכה, העידודדב ולילי, על להורים שליאני רוצה להודות למשפחתי היקרה לי מאוד.
אך יותר מהכל, על . לבקש בתנאים הכי טובים שאפשר זושאפשרו לי להגיע לנקודה ללא סייג
אהבה. לאחים שלי )במובן המון והבנה בבחיים הדרכים שאני בוחר לעצמישאתם מקבלים את
אחרונים אחרונים והתמיכה. הקרובה ניצן וקרן, על החברות אודי, ברק, הילה, יעל, –הרחב(
רז, שחר, שרה, תאיר, טליה, יונתן, לבנת, ענבר, צור, אור, אשכר, –חביבים, לאחיינים שלי
שיך. תודה לכולכם. אני קטנים שנותנים לי אנרגיה להמים כמו מטענ אתם תמר. והגיברת שלי
אוהב אתכם.
אני רוצה להקדיש את העבודה הזאת לסבא וסבתא שלי, אהרון )ז"ל( ואסתר לנדוי. לסיום,
מקור השראה עצום בשאיפה לידע ובהסתכלות היווה לי והמיוחדת לסבא, שבדרכו השקטה
מפוקחת על החיים. ולסבתא, שמתחילת הדרך היתה שם לצדי, התעניינה בכל פרט ועודדה אותי
יכם.להתקדם. אני מתגעגע אל
על התמיכה ולמלגת אדאמס מטעם האקדמיה הלאומית הישראלית למדעים אני מודה לטכניון,
.הכספית הנדיבה בהשתלמותי
א
תקציר
על ידי .בשיטות אלגבריות שימוש טופולוגיה אלגברית עוסקת באפיון מרחבים טופולוגיים על ידי
יים, ניתן לסווגם התאמת מבנים אלגבריים )לדוגמא, חבורות הומולוגיה( למרחבים טופולוג
, ללמוד על המאפיינים האיכותיים שלהם, ועל ההתנהגות של למחלקות של מרחבים "דומים"
נקרא "טופולוגיה אלגברית שימושית" מתמקד בשילוב נקציות בין מרחבים שונים. התחום הפו
המופיעים בבעיות משטחים ופונקציות אפיוןכלים מתחום הטופולוגיה האלגברית לצורך
. תחום זה צבר (manifold learning) יריעות הנדסיות שונות, יצירת כלים לניתוח מידע, ושחזור
מקורות בתחום זה עוסקות בשרוב הבעיות נים האחרונות. עם זאת, למרותהתעניינות רבה בש
נמצאים , היסודות ההסתברותיים עליהם נשענים הכלים המפותחים במסגרת זו יםמידע אקראי
תחום זה, ת מא לחקור בעיויזו, ה בתזהשל המחקר העיקרית. המטרה בלבד בשלב ראשוניעדיין
ומלאה. להן תשתית הסתברותית עמוקהולספק
persistent) זו מתחלקת לשלושה פרקים מרכזיים. הפרק הראשון עוסק בהומולוגיה עקבית תזה
homology ) של שדות אקראיים גאוסיים, הפרק השני עוסק בהתנהגות גבולית של קומפלקסים
של התפלגויות ( crackling) גאומטריים אקראיים, והפרק האחרון עוסק בתופעת ה"התפוררות"
לושת הפרקים עוסקים בנושאים המשלבים שתומך שאינו חסום. נציין כי בעוד שבעלות
הסתברות וטופולוגיה אלגברית, למחקר בפרק הראשון אין קשר ישיר למחקר בשני הפרקים
האחרים.
. הומולוגיה עקבית של שדות אקראיים גאוסיים1
מעל מרחב פרמטרים אלו הם תהליכים אקראיים המוגדרים (random fields) שדות אקראיים
M, כאשר ,למשל ,. דוגמא שימושית היא1-בעל ממד גדול מM ייצג את התבנית התלת מ
והמדידות ממכשירי הדמיה שונים הן ממדית של המוח, או תבנית דו ממדית של קליפת המוח,
, הגרף יך הינו רב ממדיתהליכים אקראיים על מרחבים אלה. כיוון שהתחום עליו מוגדר התהל
ב
הנוצר על ידי תהליך אקראי מסוג זה יהיה מרחב אקראי או יריעה אקראית )בניגוד לקו חד ממדי
הנוגעות לגאומטריה תוצאה מכך, עולות שאלות מעניינות עבור תהליכים אקראיים פשוטים(. כ
.גרפים אלהגיה של והטופולו
בכלים מתחום בעיקר עשו שימושגאוסיים, ים המרכזיים לניתוח שדות אקראיים עד כה, הכל
)בניגוד דיפרנציאלית. במחקר זה אנו שואפים ללמוד על המאפיינים הטופולוגיים הגאומטריה ה
ן שה ( ובמיוחד על ההומולוגיה העקביתsub-level setsסף )-קבוצות תתשל לגאומטריים(
שינויים , עוקבת אחרfכלשהי , ההומולוגיה העקבית של פונקציהעל רגל אחתמייצרות.
-מהצורה סף-בהומולוגיה של קבוצות תת 1 ( , ]f u את הסף גדילים. כאשר מu קבוצות ,
רכיבי הומולוגיה שונים )כלומר, "חורים" ורכיבי קשירות( סף הולכות וגדלות. בתהליך זה, -התת
. התאוריה מאחורי ההומולוגיה תהליך זהההומולוגיה העקבית מתעדת . נוצרים ונהרסים
לא הוכחה אף טענה הנוגעת להומולוגיה העקבית של , ועד לעבודה זוהעקבית הינה חדשה יחסית
מגדירים את . התוצאה המוצגת בחיבור זה, היא הראשונה מסוגה. אנו כלשהםשדות אקראיים
עקבית" ומחשבים את התוחלת של ערך זה עבור מחלקה ההומולוגיה המושג "אפיין אוילר של ה
מסקנה ממחקר זה מלבד התוצאה הטופולוגית, עולהרחבה של שדות אקראיים גאוסיים.
של שדות אלו. כפועל יוצא של המחקר על (critical points) קיצוןמפתיעה הנוגעת לנקודות ה
ההומולוגיה העקבית, גילינו כי עבור שדות גאוסיים על יריעות סגורות, הסכום "המתחלף" של
וזאת ממדית של היריעה, -ערכי הקיצון אינו גדל בהתאם לנפח היריעה אלא בהתאם למידה חד
ללא תלות בממד היריעה.
ומטריים אקראיים. הטופולוגיה של קומפלקסים גא2
(. תהי manifold learningיריעות" ) שחזורהמוטיבציה המרכזית למחקר זה הינה מהתחום של "
dM יריעה סגורה שאינה ידועה, ואנו מעוניינים לשחזר את מאפייניה הטופולוגיים, מתוך
נקודותאוסף 1, ,
nX X היריעה. על פניהנדגמות באקראי ( "מספרי בטי"Betti numbers )
ימים, תחת תנאים מתא מייצגים את מספר רכיבי הקשירות וה"חורים" של מרחבים טופולוגיים.
d-של היריעה, על ידי חישוב מספרי הבטי של איחוד הכדורים הניתן לשחזר את מספרי הבטי
-ממדיים 1
n
r r iiU B X
שיטה בומרכזם בנקודות הדגימה. הבעיה העיקרית rשרדיוסם
[ מוצגים 73,73]-. לבעיה זו הוצעו מספר פתרונות בעבר. בrזו היא רגישותה לבחירת הרדיוס
n,תנאים מספיקים על ערכי r .כך שההסתברות לשחזור נכון של מספרי הבטי גבוה כרצוננו
ג
פתרון אחר הינו לחשב את ההומולוגיה העקבית של הפילטרציה 0r r
U
ולאתר את רכיבי
רדיוסים ניכר, תחת ההנחה כי רכיבים אלה מייצגים רכיבי לאורך טווחההומולוגיה המתקיימים
ה המקורית.הומולוגיה של היריע
יהי במחקר זה עסקנו בבעיה הבאה, הקשורה למוטיבציה שהצגנו. 1, ,
n nX X אוסף של
. אנו מעוניינים f, בעלות פונקציית צפיפות ידועה dתלויות במרחב -נקודות אקראיות בלתי
לחקור את מספרי הבטי של איחוד הכדורים nr
U בגבול כאשר ,n 0-וnr ניתן לפשט .
( Čech complexפתרון בעיה זו, על ידי הסתכלות על הקומפלקס צ'ך ) ,n nC r קומפלקס(
כדורים בעלי חיתוך שאינו ריק(. k+1ממדי עבור כל אוסף של -kסימפלציאלי, המכיל סימפלקס
(, אנו יודעים כי המרחבים Nerve theoremממשפט העצב )nr
U ו- ,n nC r הם שקולים
[, מספרי 73,71זהים. בעבודות ]שלהם בטי המספרי (, ולכן homotopy equivalentהומוטופית )
הבטי של ,n nC X r נחקרו בצורה ישירה. תחת ההנחות שציינו, ההתנהגות הגבולית של
0d כאשרשונים. תחומיםהקומפלקס, מתחלקת לשלושה
nnr קריטי(, -)התחום התת
הקומפלקס מורכב מהרבה רכיבי קשירות קטנים, ומעט מאוד חורים. בתחום הקריטי
0,d
nnr ניתן למצוא חורים בכל הממדים. בתחום ו גבוההקומפלקס קשירות של ה, ה
d ,קריטי-הסופר
nnr יבי קשירות וחורים. של רכ מאוד קומפלקס מכיל מספר מועט, ה
באופן ונעשה מסובךקריטי, -, אפשרי בעיקר בתחום התתהישירבצורה מספרי הבטי חישוב
משמעותי בתחומים האחרים. לכן, במחקר זה ניסינו לחקור אותם בדרך אחרת, על ידי שימוש
בפונקציות מרחק.
:תהי d
nd יית המרחק מהאוסף האקראי , פונקצ
nהמוגדרת על ידי ,
1min
n k n kd x x X
נשים לב כי מתקיים . 1 ( , ]
n rd r U כלומר קבוצות התת ,-
איחוד הכדורים סביב הנקודות האקראיות. מהתחום למעשהסף של פונקציית המרחק, הם
סף -מספרי הבטי של קבוצות התת(, אנו יודעים כי Morse theoryהנקרא "תורת מורס" )
( של הפונקציה. לכן, אם נדע כיצד מתנהגות נקודות critical levelsמשתנים ברמות הקריטיות )
הקיצון של פונקציית המרחק nd נוכל ללמוד מכך על מספרי הבטי של , ,n n
C r .
התנהגות הגבולית של נקודות הקיצון של פונקציית המרחק התוצאות שאנו מקבלים מראות, שה
dמתחלקת אף היא לשלושה תחומים שונים, בהתאם לגבול של
nnr בעבודה זו אנו מציגים .
מורס(, עבור כל אחד מהתחומים -)מסווגות לפי אינדקס משפטי גבול עבור מספר נקודות הקיצון
ד
קשרים את התוצאות שקיבלנו לתוצאות הידועות לגבי מספרי הבטי של לאחר מכן, אנו מהשונים.
צ'ך, ומראים כיצד המחקר על נקודות הקיצון מרחיב את ידיעותינו על הטופולוגיה של -קומפלקסי
קומפלקסי צ'ך גבוליים.
התפוררות של דגימות רעש. 3
בפרק זה אנו חוקרים בעיה דומה לפרק הקודם, אך במצב שבו הרדיוס של הכדורים נשמר קבוע,
1כלומר nr .להתפלגות ממנה נדגמות הנקודות יש תומך חסום כאשרS אז עבור מספר גדול ,
-מספיק של נקודות נקבל ש 11Tube ,1 : dist , 1
n dkk
B X S x x S
.
מקרה זה הוא פחות מעניין ולא נעסוק בו. לעומת זאת, כאשר התומך אינו חסום, מתרחשות
תופעות מעניינות.
. במקרים אלה, ישנו אזור כולו dהמרחב האוקלידי הינואנו בוחנים התפלגויות שהתומך שלהן
ה שהוא מכסה ד הכדורים סביבן, נגלחו, כך שאם נסתכל על אינקודותהמכיל ריכוז גבוה של
( גדל, n) הנקודותלחלוטין את השטח. נכנה אזור זה בשם ה"ליבה" של ההתפלגות. ככל שמספר
הליבה הולכת ומתרחבת. מחוץ לליבה ניתן למצוא מספר גדול של נקודות, אך נקודות אלו אינן
הקומפלקס צ'ך( באזורים אלה, נגלה על ד הכדורים )או צפופות מספיק, כך שאם נסתכל על איחו
הרבה מאוד רכיבי קשירות עם מספרי בטי גבוהים )מספר רב של חורים(. תופעה זו נכנה בשם
"התפוררות".
המאפיינים של תופעת ההתפוררות תלויים בצורה מאוד חזקה בבחירת ההתפלגות ממנה מוגרלות
(, power-lawהתפלגות לפי חוק חזקה ) –הדגימות. אנו בוחנים שלוש התפלגויות מייצגות
(, והתפלגות גאוסית. התפלגויות אלה הן בעלות סימטריה exponentialהתפלגות מעריכית )
ראשון ה(, ולכן הליבה שלהן היא כדור שמרכזו בראשית. בשלב spherical symmetry) כדורית
מכן, אנו בוחנים (. לאחר n-אנו בוחנים את גודל הליבה של כל אחת מההתפלגויות )כתלות ב
מספרי הבטי של איחוד הכדורים )או הקומפלקס( מתנהגים מחוץ לליבה. בהתפלגות חוק כיצד
וכן בהתפלגות המעריכית, ניתן לראות כי המרחב האוקלידי מחולק למעין "שכבות" החזקה,
שונה באופן מהותי. בהתפלגות ההתפלגות הגאוסית כאשר בכל שכבה מופיעים מספרי בטי שונים.
זו לא קיימת תופעת ההתפוררות. אנו נראה כי עבור ההתפלגות הגאוסית, מעט מאוד נקודות
nוכאשרמוגרלות מחוץ לליבה, הוא בקירוב כדור אחד הקטנים נקבל כי איחוד הכדורים
גדול, ללא חורים כלל.