Algebraic Topology of Random Fields and Complexesomer/phd_thesis.pdfIntroduction The ﬁeld of...

Algebraic Topology of RandomFields and Complexes

Omer Bobrowski

Algebraic Topology of RandomFields and Complexes

Research Thesis

As Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy

Omer Bobrowski

Submitted to the Senate of the Technion—Israel Institute of Technology

Tammuz 5772 Haifa July 2012

THE RESEARCH THESIS WAS DONE UNDER THE SUPERVISION OF

PROFESSOR ROBERT J. ADLER IN THE DEPARTMENT OF ELECTRICAL

ENGINEERING.

Acknowledgement

First and foremost, I would like to express my deepest gratitude to my advisor, Professor

Robert Adler, for his amazing dedication in guiding and inspiring me through my PhD,

while showing full confidence in me and letting me build my academic independence; for

always encouraging, opening every possible door for me, and giving me much more than I

could have ever expected; for making each and every meeting truly enjoyable; but above

all this - for showing me that it is possible to be a highly professional and successful

scientist, while keeping both feet on the ground and maintaining modesty, a great sense

of humor, and a true capability to appreciate others. My gratitude extends way beyond

this single paragraph. I owe you so much. Thank you.

I would like to thank Professor Shmuel Weinberger from the University of Chicago for

hosting me at the early stages of my PhD, and for the long-lasting and fruitful collabora-

tion. I would also like to thank Matthew Strom Borman from the University of Chicago

for the joint work on the first part of this thesis.

I would like to thank Professor Ron Meir from the Technion, my advisor during my

Masters, for his support both during my masters studies and after, and for many fasci-

nating and friendly chats.

I met wonderful people during my many years in the Technion. Special thanks to Aran

Bergman, Daniel Sigalov, Hadas Benisty, Ronen Talmon and Zigi (Isask’har) Walter for

being such good friends, for being there when I needed you, and for making the Technion

a warm and fun place to come to.

I would like to thank my dearest family: My parents Dov and Lili, for believing in me

and providing me with everything I needed to reach this point, but mostly for accepting

my choices in life in the most understanding and loving way; my sisters and brothers

(in the wide sense) Barak, Hila, Keren, Nitzan, Udi and Yael, for your support and

friendship; and last but not least, my nieces and nephews Eshkar, Inbar, Livnat, Or, Raz,

Sara, Shahar, Tair, Talia, Tamar, Yonatan and Zur. You are the batteries that keep my

energies up. Thank you all. I love you.

Finally, I would like to dedicate this thesis to my grandmother Ester Landoi and my

late grandfather Aharon Landoi. To grandpa, who was a true inspiration for the pursuit

of knowledge and for an amazingly balanced perspective on life; and to grandma, who

has been there from the very early stages, always supported me, showed interest in what

I do, and pushed me forward. I really miss you.

The generous financial help of the Technion and the Adams Fellowship Program of the

Israel Academy of Sciences and Humanities is gratefully acknowledged.

Contents

1 Introduction 5

1.1 Background - Algebraic Topology . . . . . . . . . . . . . . . . . . . . . . . 6

1.1.1 Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1.2 Homotopy Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.3 Morse Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 Persistent Homology of Gaussian Random Fields . . . . . . . . . . . . . . . 9

1.3 The Topology of Random Geometric Complexes . . . . . . . . . . . . . . . 12

1.4 Noise Crackles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Persistent Homology of Gaussian Random Fields 17

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Gaussian Random Fields . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.2 The Geometry of Gaussian Random Fields . . . . . . . . . . . . . . 19

2.1.3 Persistent Homology . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.1.4 Euler Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2 Redefining the Euler Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2.1 The Euler Integral and Morse Theory . . . . . . . . . . . . . . . . . 29

2.2.2 The Euler Integral and Persistent Homology . . . . . . . . . . . . . 31

2.3 The Euler Integral of Gaussian Random Fields . . . . . . . . . . . . . . . . 32

2.3.1 Real Valued Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3.2 Vector Valued Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4 Persistent Homology of Gaussian Random Fields . . . . . . . . . . . . . . . 38

2.5 Weighted Sum of Critical Values . . . . . . . . . . . . . . . . . . . . . . . . 40

2.6 Towards Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

iv CONTENTS

2.7 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 The Topology of Random Geometric Complexes 47

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.1.1 Geometric Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.1.2 Motivation and Previous Work . . . . . . . . . . . . . . . . . . . . 50

3.2 The Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.2.1 Definition and Motivation . . . . . . . . . . . . . . . . . . . . . . . 52

3.2.2 Critical Points of the Distance Function . . . . . . . . . . . . . . . 53

3.3 Limit Theorems for the Distance Function . . . . . . . . . . . . . . . . . . 56

3.3.1 The Subcritical Range (nrdn → 0) . . . . . . . . . . . . . . . . . . . 57

3.3.2 The Critical and Supercritical Ranges (nrdn → λ ∈ (0,∞]) . . . . . . 59

3.4 The Topology of Random Cech Complexes . . . . . . . . . . . . . . . . . . 62

3.4.1 Critical Points and Betti Numbers . . . . . . . . . . . . . . . . . . 62

3.4.2 The Limiting Behavior of the Cech Complex . . . . . . . . . . . . . 63

3.5.1 The Supercritical Phase . . . . . . . . . . . . . . . . . . . . . . . . 67

3.5.2 The Distance Function on Closed Manifolds . . . . . . . . . . . . . 68

3.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.6.1 Some Notation and Elementary Considerations . . . . . . . . . . . 70

3.6.2 Means for the Subcritical Range (nrdn → 0) . . . . . . . . . . . . . . 71

3.6.3 Variances and Limit Distributions for the Subcritical Range . . . . 74

3.6.4 The Critical and Supercritical Ranges (nrdn → λ ∈ (0,∞]) . . . . . . 91

3.6.5 Asymptotic Means . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

3.6.6 Asymptotic Variance - Poisson Case . . . . . . . . . . . . . . . . . . 94

3.6.7 CLT - Poisson Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.6.8 CLT - Random Sample Case . . . . . . . . . . . . . . . . . . . . . . 98

3.6.9 Euler Characteristic Results . . . . . . . . . . . . . . . . . . . . . . 102

3.A Palm Theory for Poisson Processes . . . . . . . . . . . . . . . . . . . . . . 103

3.B Stein’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.C De-Poissonization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

CONTENTS v

4 Noise Crackles 107

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.2 The Core of Distributions with Unbounded Support . . . . . . . . . . . . . 108

4.3 How Power-Law Noise Crackles . . . . . . . . . . . . . . . . . . . . . . . . 110

4.4 How Exponential Noise Crackles . . . . . . . . . . . . . . . . . . . . . . . . 112

4.5 Gaussian Noise Does Not Crackle . . . . . . . . . . . . . . . . . . . . . . . 113

4.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.7.1 The Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.7.2 Crackle - Notation and General Lemmas . . . . . . . . . . . . . . . 117

4.7.3 Crackle - The Power Law Distribution . . . . . . . . . . . . . . . . 120

4.7.4 Crackle - The Exponential Distribution . . . . . . . . . . . . . . . . 123

4.7.5 Crackle - The Gaussian Distribution . . . . . . . . . . . . . . . . . 126

Bibliography 129

vi CONTENTS

List of Figures

1.1 The first homology group of the torus . . . . . . . . . . . . . . . . . . . . . 7

1.2 Morse theory for the height function on the Torus . . . . . . . . . . . . . . 10

2.1 Capturing the homology of an annulus . . . . . . . . . . . . . . . . . . . . 23

2.2 The barcode of a Rips complex . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Barcodes for the excursion sets of a function . . . . . . . . . . . . . . . . . 25

3.1 Simplicial complexes in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2 The Cech and Rips complexes . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3 Critical points of a distance function in R2. . . . . . . . . . . . . . . . . . . 54

3.4 Generating a critical point of index 2 in R2 . . . . . . . . . . . . . . . . . . 56

3.5 The γk(λ) function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1 Crackle layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

viii LIST OF FIGURES

Abstract

Algebraic topology studies the topology of spaces using algebraic machinery. One of its

main aims lies in the fact that assigning algebraic structures (e.g. homology groups) to

topological spaces can be used to classify them into classes of “similar” (e.g. homotopy

equivalent) spaces, to study their properties and to study the behavior of mappings be-

tween them. The field of ‘Applied Algebraic Topology’ focuses on applying algebraic

topology methods to study features of surfaces and functions arising in engineering sce-

narios, as well as for data analysis and manifold learning. This field has generated con-

siderable interest over the past few years. However, despite the fact that many of the

problems in this area involve random data collection, and thus randomness, its proba-

bilistic foundations are still at a very preliminary stage. The main goal of this thesis is

to explore such problems, and to help supply at least some of them with rigorous proba-

bilistic statements. We focus on two different probabilistic setups, that generate intricate

topological spaces which we are interested in studying using the methods of algebraic

topology.

Random fields are stochastic processes defined over parameter spaces of dimension

greater than one. For example, consider a noisy image as a random field on [0, 1]× [0, 1] ⊂R

2. As the domain of the process is of dimension greater than one, the graph of the process

is typically a (random) manifold, rather than a simple one-dimensional line. Thus, many

intriguing probabilistic questions on the geometrical and topological structure of the image

arise.

A simplicial complex is a collection of vertices, edges, triangles, tetrahedra, and sim-

plexes of higher dimension, following a few basic rules, so one can think of it as a gen-

eralization of a graph. A geometric complex is a simplicial complex, where in order to

decide whether to include a k-dimensional simplex or not, we need to verify whether its

2 ABSTRACT

k+1 vertices satisfy a certain geometrical property. Choosing the vertices of a geometric

complex at random yields a random topological space with many interesting features.

In the first part of the thesis we study the persistent homology of Gaussian random

fields, and compute its expected Euler characteristic. The results we present also have

surprising and interesting consequences related to the critical points of Gaussian fields.

In the second part we focus on the limiting behavior of the Betti numbers of random

geometric complexes, as the number of vertices goes to infinity. We study different ways

to construct a geometric complex, each resulting in a completely different structure.

Notation

(·)⊤ transpose operator

| · | absolute value / size of a set1 {·} indicator function

P (·) probability

E {·} expectation

Var (·) variance

‖·‖ Euclidean norm

Sd−1 a unit (d− 1)-sphere in Rd

sd−1 the volume of Sd−1

ωd−1 the volume of a unit ball in Rd

Hk the k-th homology group

βk the k-th Betti number

X the Euler characteristic

PH persistent-homology

Lk the k-th Lipschitz-Killing curvature

Mk the k-th Gaussian Minkowski functional

C a Cech complex

Br(x) a ball with radius r centered at x

N (·, ·) the Gaussian distribution

φ(x) the standard Gaussian density function

Φ(x) the standard Gaussian cumulative distribution function

dP(·) the distance function from a set of points P

4 NOTATION

Chapter 1

Introduction

The field of algebraic topology focuses on studying the topology of spaces using algebraic

machinery. Assigning topological spaces with algebraic structures (e.g. homology, coho-

mology and homotopy groups) can be used to classify them into classes of “similar” (e.g.

homotopy equivalent) spaces, study their properties and study the behavior of mappings

between spaces.

Over the past few years there has been a very interesting and exciting effort to es-

tablish a new field called ‘Applied Algebraic Topology’. This field focuses on applying

the methods of algebraic topology to study features of surfaces and functions arising in

engineering scenarios, as well as for data analysis and manifold learning. Although at this

point sophisticated applications are still few and mostly at a theoretical stage, there is

a growing feeling that the gap between theory and practice is closing. However, despite

the fact that many of the problems in this area involve random data collection, and thus

randomness, its probabilistic foundations are still at a very preliminary stage.

The main goal of this research is twofold. On the one hand, we are interested in using

probability theory to study concepts and methods from applied algebraic topology in cases

where the data being analyzed are random. This study should significantly contribute

to the development of powerful applied algebraic topology tools. On the other hand, we

believe that our understanding of even well studied stochastic processes, such as Gaussian

random fields, can be significantly enhanced by considering a topological point of view.

In this introduction we are going to give a very brief and sketchy introduction to some

basic notions of algebraic topology. A concise yet very clear introduction can be found

6 CHAPTER 1. INTRODUCTION

in [13, 26], while [28, 47] are good examples of a thorough coverage of homology theory.

For the details behind Morse theory, see [35]. Once we cover the key concepts in algebraic

topology which are relevant for the current work, we shall discuss the three main topics

dealt with in this thesis, and the main results in each of them.

1.1 Background - Algebraic Topology

The field of algebraic topology is extremely wide, and involves many interesting concepts

and deep theorems. In the following sections we wish to focus on two main topics which are

relevant to the current research - Homology Theory and Morse Theory. We shall describe

each of them in a rather intuitive way, avoiding rigorous definitions and theorems, but at a

level which we believe should suffice for the purposes of understanding the motivation and

ideas in the current work. We also briefly describe the notion of Homotopy Equivalence,

since we will use this term repeatedly throughout this work.

1.1.1 Homology

Let X be a topological space. The homology of X is a set of abelian groups {Hk(X)}∞k=0,

called ‘homology groups’. The zeroth homology H0(X) is generated by elements that

represent connected components ofX . For example, ifX has three connected components,

then H0(X) ∼= Z ⊕ Z ⊕ Z (where ∼= denotes group isomorphism), and each of the three

generators of this group corresponds to a different connected component of X . For k ≥ 1,

the k-th homology group Hk(X) is generated by elements representing k-dimensional

“holes” in X . Without giving precise definitions, a k-dimensional hole should be thought

of as the result of taking the (empty) boundary of a (k + 1)-dimensional body. For

example, if X = S1 - the unit circle in R2 then H1(X) ∼= Z, if X = S2 - the unit sphere

in R3 then H2(X) ∼= Z, and in general if X = Sn is an n-dimensional sphere, then

Hk(X) ∼=

Z k = 0, n

{0} otherwise.

1.1. BACKGROUND - ALGEBRAIC TOPOLOGY 7

A slightly more interesting example is given by the 2-dimensional torus T 2 = S1×S1 (see

Figure 1.1). In this case

Hk(T2) ∼=

Z k = 0, 2,

Z⊕ Z k = 1,

{0} otherwise.

(1.1.1)

It is clear that T 2 has a single connected component as well as a single 2-dimensional

hole. However, we can find infinitely many 1-dimensional holes on the surface of the

empty torus. The reason we claim that there are only two 1-dimensional holes in this

case, is that we consider only ‘equivalence classes’ of holes, so that if we can continuously

deform one hole into the other, they are considered the same object in homology. Thus,

as can be seen in Figure 1.1, we have only two equivalence classes of holes in T 2, and

therefore H1(T2) ∼= Z⊕ Z.

(a) (b)

Figure 1.1: The first homology group of the torus - H1(T2). (a) All the blue loops correspond

to the same equivalency class in H1(T2), since we can continuously deform one loop into the

other. (b) The red loops also correspond to a single generator of H1(T2). However, they do not

belong the same equivalence class as the blue ones, since there is no way to deform a blue loop

into a red one (without leaving the torus).

Note that it is true in general that if X is of dimension N , then Hk∼= {0} for k > N .

The rank of Hk(X), denoted by βk, is called the k-th Betti number. Thus, for k ≥ 1, βk is

the number of k-dimensional holes in X , while β0 is the number of connected components.

1.1.2 Homotopy Equivalence

Another term we will use often is ‘homotopy equivalence’. Let X, Y be topological spaces,

and let f0, f1 : X → Y be continuous functions. A homotopy between f0 and f1 is a

continuous function H : X × [0, 1] → Y such that H(·, 0) ≡ f0 and H(·, 1) ≡ f1. If

there exists a homotopy between f0 and f1 we say that these functions are homotopic,

and denote this by f0 ≃ f1. Two spaces X, Y are called ‘homotopy equivalent’ (denoted

X ≃ Y ) if there exist continuous functions f : X → Y and g : Y → X such that

g ◦ f ≃ 1X and f ◦ g ≃ 1Y where 1X ,1Y are the identity mappings on X, Y respectively.

Informally, X, Y are homotopy equivalent if there is a continuous deformation from one

space to the other, which is not necessarily invertible. For example, a ball is homotopy

equivalent to a single point, and an annulus is homotopy equivalent to a circle, even

though there exists no one-to-one mapping between the spaces. Note that if X, Y are

homeomorphic, then they are also homotopy equivalent, but the converse is not true (as

the previous examples show). For our purposes, the key property of homotopy equivalent

spaces is that their homology groups are isomorphic, i.e. Hk(X) ∼= Hk(Y ) for all k ≥ 0.

Two homotopy equivalent spaces are also said to have the same ‘homotopy type’. A space

that has the homotopy type of a point is called ‘contractible’.

1.1.3 Morse Theory

The study of homology is strongly connected to the study of critical points of real valued

functions. The link between them is called Morse theory, and we shall describe it briefly.

Let M be a smooth manifold embedded in Rn, and let f : M → R be a C2 function.

A point p ∈ M is called a critical point of f if ∇f(p) = 0, and the number f(p) is called

a critical value of f . A critical point p is called non-degenerate if the Hessian Hf(p) is

non-singular. In that case, the Morse index of f at p, denoted by µ(p) is the number of

negative eigenvalues of Hf(p). A C2 function f is a Morse function if all its critical points

are non-degenerate, and its critical levels are distinct.

The main idea of Morse theory is as follows. Suppose that M is a closed manifold

(i.e. a compact manifold without a boundary), and let f : M → R be a Morse function.

1.2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS 9

Denote

Mρ , f−1((−∞, ρ]) = {x ∈ M : f(x) ≤ ρ} ⊂ M

(sublevel sets of f). If there are no critical levels in (a, b], then Ma and Mb are homotopy

equivalent, and in particular have the same homology. Next, suppose that p is a critical

point of f with Morse index k, and let v = f(p) be the critical value at p. Then the

homology of Mρ changes at v in the following way. For a small enough ǫ we have that the

homology of Mv+ǫ is obtained from the homology of Mv−ǫ by either adding a generator

to Hk (increasing βk by one) or removing a generator of Hk−1 (decreasing βk−1 by one).

In other words, as we pass a critical level, either a new k-dimensional hole is formed, or

an existing (k − 1)-dimensional hole is terminated (filled up). Consequently, the change

in the Euler characteristic (described in Section 2.1.4) is always ±1.

Figure 1.2 presents a classic visual example of how Morse theory works. Take the

torus T 2 = S1 × S1, as depicted there. Let h : T 2 → R be the function measuring the

height of each point p ∈ T 2, and consider the filtration of sublevel sets {Mρ}ρ. For ρ < v1

we have Mρ = ∅, and therefore Hk(Mρ) ∼= {0} , k ≥ 0. At the level ρ = v1 we have a

minimum point, i.e. a critical point of index 0. Indeed, as we cross this level we reach Mρ1

(v1 < ρ1 < v2) in which a new connected component appears, and thus H1(Mρ1)∼= Z.

At the level v2 we have a saddle point, or a critical point with index 1. As we cross

this level, we reach Mρ2 (v2 < ρ2 < v3) where a 1-dimensional hole shows up, and so

H1(Mρ2)∼= Z. Similarly, v3 adds another generator to H1, so that H1(Mρ3)

∼= Z ⊕ Z.

Finally, at level v4 we have a maximum point, or a critical point of index 2. Once we cross

this level the surface of the torus is completed, introducing a 2-dimensional hole, and thus

H2(Mρ4)∼= Z. For every ρ > v4 we have Mρ = Mρ4 = T 2, so there are no more changes

to the sublevel sets, and indeed at the end of this process we retrieve the homology of T 2

(see (1.1.1)).

1.2 Persistent Homology of Gaussian Random Fields

In this section we describe the first main topic in this thesis, which is treated in detail

in Chapter 2. Random fields are stochastic processes defined over a parameter space

X of dimension greater than one. For example, X could be a 3-dimensional brain or a

Figure 1.2: Morse theory for the height function h : T 2 → R on the torus. The red crosses mark

the critical points of h, and v1 < v2 < v3 < v4 are the critical levels with Morse index 0, 1, 1, 2,

respectively. We present four sublevel sets, each demonstrating a single change in the homology

as we cross a critical level.

2-dimensional cortical surface, examples which have been of a significant practical impor-

tance. As the domain of the process is of dimension greater than one, the graph of the

process is typically a (random) manifold, rather than a simple one-dimensional line. Thus,

many intriguing probabilistic questions on the geometrical and topological structure of

the image arise.

As for to random processes on the real line, the distribution of random fields is

determined by the multidimensional distribution of any finite collection of elements

f(x1), . . . , f(xn) ; xi ∈ X . A Gaussian random field is a random field where any finite

collection of elements f(x1), . . . , f(xn) has a multidimensional Gaussian distribution. Let

f : X → Rd be a Gaussian random field. We define its mean value function m : X → R

m(x) = E {f(x)} , x ∈ X,

and the covariance function C : X×X → Rd×d by

C(x, y) = E{(f(x)−m(x))(f(y)−m(y))⊤

}, x, y ∈ X,

where ⊤ denotes the transpose operator, and we write our vectors as columns. As for

Gaussian processes, the distribution of a Gaussian random field is completely determined

by these two functions. For more details on Gaussian random fields, see Section 2.1.1.

The primary mathematical techniques used so far to analyze Gaussian (or Gaussian

related) random fields have come from the area of differential geometry. In the current

research we are interested in studying topological features of the sublevel sets of Gaussian

random fields and in particular, the persistent homology they generate. Briefly, the persis-

tent homology of a real valued function f tracks changes in the homology of sublevel sets

f−1((−∞, u]). As the sublevel sets grow (by increasing u), new homology elements (i.e.

“holes”) are born and others die. Persistent homology keeps a record of this birth/death

process. More details on persistent homology can be found in Section 2.1.3.

The theory of persistent homology is relatively new, and until recently nothing was

known about the persistent homology of Gaussian random fields. This thesis contains the

first result in this area, based on the Gaussian Kinematic Formula (GKF) of Adler and

Taylor (see [3] and Section 2.1.1), which is the state of the art in the theory of Gaussian

fields. An important special case of the GKF gives a formula for computing the mean

value of the Euler characteristic of sublevel sets of Gaussian fields.

The Euler characteristic is an integer-valued topological invariant, which gives a partial

description of the ‘shape’ of a topological space (see Section 2.1.4 for more details). For

a compact d-dimensional space X , we can compute the Euler characteristic (denoted by

χ(X)) from its Betti numbers βk, using the formula

χ(X) =

(−1)kβk.

In the work presented in Chapter 2, we extend the notion of Euler characteristic to

persistent homology and then compute its expected value for a wide class of Gaussian

and Gaussian related fields. Here is a preview of these results.

Let M be a ‘nice’ space, and f = (f1, . . . , fk) : M → Rk be a Gaussian random field

such that its elements f1, . . . , fk are i.i.d. real valued Gaussian random fields, with zero

mean, unit variance, and a ‘nice’ covariance function C. Let G : Rk → R be a ‘nice’

function, and set g = G◦f . Then g is a real valued random field called a Gaussian related

field. Set gmax = supx∈M g(x), and consider the filtration of sublevel sets

{g−1((−∞, u])

u=−∞ .

Note that once we pass gmax the sublevel sets remain unchanged, so we can terminate our

filtration there. Let PH∗(g, gmax) be the persistent homology of this filtration. Our main

result (see Theorem 2.4.1) states that

E {χ(PH∗(g, gmax))} = χ(M) (E {gmax} − E {g}) +d∑

(2π)−j/2Lj(M)

Mγj (Du)du,

(1.2.1)

where E {g} , E {g(x)} (for any x ∈ M), and Du , G−1((−∞, u]). The Lj-s and Mγj -s

are geometrical measures known as Lipschitz-Killing curvatures and Gaussian Minkowski

functionals, respectively. This formula provides means to evaluate the expected Euler

characteristic of the persistent homology, using geometrical features of the space M and

the (deterministic) sublevel sets Du. In Chapter 2 we discuss this formula in more detail,

and give precise definitions for its ingredients and their ‘niceness’ requirements.

One surprising corollary of the computations in Chapter 2 is related to the expected

signed sum of critical values of a Gaussian random field, a functional of considerable

interest in the study of Coulomb gases. If f : M → R is a Gaussian random field, and M

is a closed manifold, we prove that

p∈CP(f)

(−1)µ(p)f(p)

= −L1(M)√

2π, (1.2.2)

where CP(f) is the set of critical points of the field f , and µ(p) is the Morse index of f at

p. The functional L1(M) represents a one dimensional measure of M . Thus, the expected

signed sum of critical values of a Gaussian field does not scale according to the volume of

M as one might expect, but rather according to a one dimensional measure of the space.

This result is very surprising and nonintuitive, and we shall discuss it further in Section

1.3 The Topology of Random Geometric Complexes

In this section we describe the second part of this thesis, treated in detail Chapter 3. Let

be V ⊂ Rd be a set of vertices. A geometric graph G(V, ǫ) is an undirected graph on the

set of vertices V , where we connect a pair of vertices v1, v2 if ‖v1 − v2‖ ≤ ǫ. The field of

random geometric graphs has been thoroughly studied, and many of the known results to

date can be found in [39].

1.3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES 13

A simplicial complex is a collection of vertices, edges, triangles, tetrahedra, and sim-

plexes of higher dimension, following a few basic rules (see Section 3.1.1), so one can think

of it as a generalization of a graph. A geometric complex is a simplicial complex, where

in order to decide whether to include a k-dimensional simplex or not, we need to verify

whether its k+ 1 vertices satisfy a certain geometrical property. There are a few ways to

choose this property, which typically yield different complexes. For example, in the Cech

complex C(V, ǫ) we need to check whether the intersection of k + 1 balls with radius ǫ

centered at the vertices is nonempty.

The main motivation for our study of random complexes is the following manifold

learning problem. Let M ⊂ Rd be an unknown closed manifold which we wish to recover,

and suppose that we are given a set of random points X1, . . . , Xn sampled from some

distribution over M . It turns out that under mild conditions, the Betti numbers of the

hidden manifold can be recovered by computing the Betti numbers of the union of d-

dimensional balls Ur ,⋃n

i=1Br(Xi) centered at the samples with a fixed radius r. This

method, however, is highly sensitive to the choice of the radius r. A few methods have

been suggested to overcome this sensitivity. In [37, 38] sufficient conditions on n and r

are given so that the probability to recover the correct Betti numbers is sufficiently high.

A different approach is to compute the persistent homology of the filtration {Ur}∞r=0, and

locate the homology elements that last through a long range of radii, which most likely

correspond to real features of the original manifold.

Recent research on such random Cech complexes focuses on the following related setup.

Xn = {X1, . . . , Xn} is a set of random points in Rd, sampled from a known distribution f .

We would like to study the Betti numbers of Urn in the limit when n → ∞ and rn → 0.

This problem can turn into a simpler combinatorial problem by studying the Cech complex

C(Xn, rn). By the celebrated Nerve Theorem (see [11]), Urn and C(Xn, rn) are homotopy

equivalent, and in particular have the same Betti numbers. Recent work (see [30, 31])

studied the Cech complex in the setup just described. In this scenario, the behavior of

the Cech complex (or the union of balls) splits into three main regimes. If nrdn → 0 (the

subcritical or ‘dust’ phase), the complex is very sparse, with many small disconnected

components and hardly any holes. In the critical phase nrdn → λ ∈ (0,∞), the complex

becomes connected with many holes of any dimension k < d. Finally, if nrdn → ∞ the

complex is highly connected, with very few holes, if any. Detailed study of the Betti

numbers is possible mostly in the dust phase, and is significantly more complicated in the

other regimes. Thus, in this thesis we have adopted an alternative approach, based on

distance functions, which yields results in all regimes.

Let dn : Rd → R+ be the distance function from Xn defined as

dn(x) = min1≤k≤n

‖x−Xk‖ .

The key observation is that d−1n ((−∞, r]) = Ur ≃ C(Xn, r). By Morse Theory, changes

in the Betti numbers of d−1n ((−∞, r]) occur at the critical levels of dn. Thus, studying

the critical points of dn should reveal information about the topology of C(Xn, r). Note,

however, that dn is non-differentiable (and so certainly not a Morse function). Neverthe-

less, following [24], we can define a special notion of a critical point and Morse index for

dn, and apply Morse theory to it. We then define the values Nk,n to be the number of

critical points p of dn with index k, such that dn(p) ≤ rn. In other words, we count the

number of critical points that “construct” the topology of C(Xn, rn).

As in the behavior of the Cech complex described above, the limit behavior of Nk,n

splits into three different regimes, depending on the limit of nrdn. In Chapter 3, we present

a significant body of limit theorems for Nk,n in all three regimes. Not surprisingly, there

is a high correspondence between our results and the Betti number results in [31], which

have a Morse theoretic explanation. However, while the results for the Betti numbers are

mainly restricted to the subcritical phase, the study of the distance function expands over

the other regimes as well. Thus, the indirect approach of studying critical points (rather

than Betti numbers) is found to be advantageous. For example, using our results, we can

easily derive limit theorems for the Euler characteristic of the Cech complex in all three

regimes.

1.4 Noise Crackles

This is the last part of this thesis, which is treated in detail in Chapter 4. We wish to

study the behavior of random Cech complexes with a fixed radius, i.e. C(Xn, 1), rather

than C(Xn, rn) as studied in Chapter 3. The setup is the same as described in the previous

1.4. NOISE CRACKLES 15

section. Obviously, if the sample distribution has a compact support S, then for large

enough n we have that⋃n

k=1B1(Xk) ≈ Tube(S, 1). Thus, there is not much to study

in this case. However, when the support of the distribution is unbounded, interesting

phenomena occur.

In Chapter 4 we study distributions supported on Rd. In this case, there exists a ‘core’,

i.e. a region where the random samples are very dense, so that placing unit balls around

the individual points completely covers the region. Consequently, the Cech complex inside

the core is contractible. The size of the core obviously grows as n → ∞. Outside the core

there may be additional isolated points, but not enough for the associated balls to cover

the entire area. Thus, in this region, the topology of the Cech complex is nontrivial, and

many holes of different dimensions might show up. We call this phenomenon ‘crackling’.

The exact crackling behavior depends on the choice of distribution. We study three

representative examples - the power law, exponential, and Gaussian distributions. These

three distributions are spherically symmetric, and therefore their cores are balls centered

at the origin. The size of the ball is different for each distribution. Denoting by Rcn the

radius of the core, we show in Section 4.2 that

Rcn ∼

(n/ logn)1/α f(x) ∝ 11+‖x‖α ,

logn f(x) ∝ e−‖x‖,

√2 logn f(x) ∝ e−‖x‖2/2.

When studying crackling behavior, however, the Gaussian distribution turns out to be

fundamentally different than the other two distributions. In the power-law case as well

as in the exponential case, quite a lot is going on outside the core. In Sections 4.3 and

4.4, we show that the exterior of the core can be divided into separate annuli at radii

Rd−1,n ≪ Rd−2,n ≪ · · · ≪ R0,n (defined differently for each distribution). At [R0,n,∞)

there are mostly disconnected points, and no holes. At [R1,n, R0,n) connectivity is a bit

higher, and a finite number of 1-dimensional holes shows up. At [R2,n, R1,n) we have a

finite number of 2-dimensional holes, while the number of 1-dimensional holes grows to

infinity as n → ∞. In general, at [Rk,n, Rk−1,n), as n → ∞ we have a finite number

of k-dimensional holes, infinitely many l-dimensional holes for l < k, and no holes of

dimension l > k. In other words, the crackle starts with a pure dust at Rn,0 and as we

get closer to the core, higher dimensional holes gradually appear.

The Gaussian distribution behaves very differently. It does not crackle. In Section 4.5

we show that, for the Gaussian distribution, there are hardly any points located outside

the core. Thus, as n → ∞, the union of balls around the sample points becomes a giant

contractible ball of radius ∼ √2 logn.

The results presented in Chapter 3 on the homology of the Cech complex C(Xn, rn) do

not cover distributions with unbounded supports in the super-critical regime (nrdn → ∞).

In Chapter 4 we also discuss how the crackling results may shed some light on the behavior

of C(Xn, rn) in this case. In addition, we discuss how studying the crackling phenomenon

can be useful for noisy manifold learning applications.

Chapter 2

Persistent Homology of Gaussian

Random Fields

2.1 Background

The primary mathematical techniques used so far to analyze Gaussian (or Gaussian re-

lated) random fields have come from the area of differential geometry. Recent advances in

the study of excursion sets of Gaussian fields have produced applications in brain imaging

and astronomy (see [3,14,46,49,50]). In this chapter we extend the toolkit used to study

these objects to include methods of algebraic topology. Specifically, we are interested in

studying the persistent homology of sublevel sets of Gaussian fields. There is no doubt

that studying the algebraic topology of random fields will significantly strengthen existing

applications and introduce others.

In this section we are going to review the probabilistic and topological background

needed in order to present the results. The results presented in this chapter were published

in [2, 10].

2.1.1 Gaussian Random Fields

In this section we give a brief introduction to Gaussian random fields. As we already de-

scribed in the introduction, a random field f : X → Rd is a stochastic process defined over

a topological space X of dimension greater than one (in most cases this will be a manifold

or a stratified space). As for random processes on the real line, the distribution of random

18 CHAPTER 2. PERSISTENT HOMOLOGY OF GAUSSIAN RANDOM FIELDS

fields is determined by the multidimensional distribution of any finite collection of vector

valued random variables f(x1), . . . , f(xn) ; xi ∈ X . A Gaussian random field is a ran-

dom field where any finite collection of elements f(x1), . . . , f(xn) has a multidimensional

Gaussian distribution.

We define the mean value function m : X → Rd of a random field f by

m(x) = E {f(x)} ,

and the covariance function C : X×X → Rd×d by

C(x, y) = E{(f(x)−m(x))(f(y)−m(y))⊤

where ⊤ denotes the transpose operator, and our vectors are columns. As for Gaussian

processes, the distribution of a Gaussian random field is completely determined by these

two functions. One way to construct a real valued Gaussian random field is as follows.

Let {φn}n be a set of functions, φn : X → Rd, such that

∑n φ

2n(x) < ∞ for all x ∈ X .

Let {ξn}n be a set of i.i.d. random variables such that ξn ∼ N (0, 1). Then

f(x) ,∑

ξnφn(x)

is a Gaussian random field. In this case m(x) ≡ 0, and

C(x, y) =∑

φn(x)φn(y).

The theory of Gaussian random field is extremely wide and very deep. However, for

the purposes of this work, we mainly need recent results from the study of the geometry

of Gaussian fields, which we shall describe in the next Section.

Throughout this Chapter we use the following notation. Denote by

ϕ(x) ,1√2π

e−x2/2,

the standard normal density and

Φ(x) ,

−∞ϕ(x)dx,

the normal cumulative distribution. Also, denote by γk the Gaussian measure on Rk, i.e.

for A ⊂ Rk

γk(A) , P (X ∈ A)

2.1. BACKGROUND 19

where X has a standard multi-normal distribution in Rk (i.e. i.i.d. standard normal

components). Finally, for a nice set D ⊂ Rk the Gaussian Minkowski functionals Mγ

are defined via the Taylor expansion, for small enough ρ ≥ 0,

γk(Tube(D, ρ)) =

∞∑

ρjMγ

j!, (2.1.1)

where Tube(D, ρ) ={x ∈ R

k : dist(D, x) ≤ ρ}, and dist(D, x) , infy∈D ‖x− y‖. The

functionals Mγj play a key role in the results of this chapter.

2.1.2 The Geometry of Gaussian Random Fields

There has been extensive effort over the past few years to study the sample paths of

smooth random fields, f , from a general Riemannian manifold M to Rd. In particular, M

could be a 3-dimensional brain or a 2-dimensional cortical surface, examples which have

been of significant practical importance. The basic (random) geometrical objects studied

were the excursion sets of the random fields, defined by

AD ≡ AD(f ;M) , {x ∈ M : f(x) ∈ D} = M ∩ f−1(D) (2.1.2)

for nice subsets D of Rd, and the tools for quantifying these sets were those of differential

geometry. The theory of this subject has developed rapidly over the past few years

(see [3, 43, 44]). One of its most powerful results is an explicit expression for the mean

value of all Lipschitz-Killing curvatures of excursion sets for centered (i.e. E {f(x)} =

0), constant variance, C2, Gaussian random fields. The result presented in [3] links

random field theory and integral and differential geometry, and leads to approximations of

other important objects in probability and statistics, such as the exceedance probabilities

P (supM f(x) > u) (cf. [1, 45]).

The main theorem in [3] is called the Gaussian Kinematic Formula (GKF), and the

purpose of this section is to properly state this result, which is at the heart of this chapter.

The Lipschitz-KillingCurvatures

Let M be a Riemannian manifold. Lipschitz-Killing curvatures are geometric objects

that depend on the Riemannian metric on M , such that Lk(M) is a measure of the k-

dimensional ‘size’ of M . This means that if we scale the metric by a constant λ, then

Lk(M) scales by λk. For a large class of spaces, which include smooth manifolds and

convex compact regions, if M ⊂ Rn is given the Euclidean metric, then the following tube

formula holds for sufficiently small ρ ≥ 0

µ(Tube(M, ρ)) =n∑

ωjLn−j(M)ρj , (2.1.3)

where Tube(M, ρ) , {x ∈ Rn : dist(M,x) ≤ ρ} is the tube of radius ρ about M , and µ is

the Lebesgue measure. For example, if M ⊂ R2 is convex and compact then

L0(M) = 1, L1(M) = (perimeter of M)/2, L2(M) = area(X).

For a general d-dimensional Riemannian manifold (M, g), the Lipschitz-Killing curva-

tures can be expressed in an integral form with respect to the Riemannian volume induced

by the metric g. For example, if M is a manifold without boundary, then these are given

Lj(M) =1

(2π)(d−j)/2((d− j)/2)!

TrM(−R)(d−j)/2 Volg,

when d−j ≥ 0 is even, and 0 otherwise. Here R is the curvature tensor and TrM the trace

operator on the algebra of double forms on M . For more details see [3]. Note that it is

always true that Ld(M) ≡ Volg(M) is the Riemannian volume of M , and L0(M) ≡ χ(M)

is its Euler characteristic.

The Gaussian Kinematic Formula

Suppose that M ⊂ RN is an d-dimensional, C2, Whitney stratified manifold satisfying

some mild side conditions (cf. [3] for details) and D a similarly nice stratified submanifold

of Rk. Let f = (f1, . . . , fk) : M → Rk be a vector valued random process, satisfying the

following conditions:

• f1, . . . , fk are i.i.d. real Gaussian fields, with a common covariance function C(s, t),

• for every 1 ≤ i ≤ k and x ∈ M , fi(x) ∼ N (0, 1),

• fi has C2 sample paths almost surely for every 1 ≤ i ≤ k,

• the joint distributions of fi and its first and second order derivatives are non-

degenerate,

2.1. BACKGROUND 21

• ∃K,α > 0 such that ∀s, t ∈ M :

maxi,j

|Cij(t, t) + Cij(s, s)− 2Cij(s, t)| ≤ K |ln |t− s||−(1+α) ,

where Cij is the covariance function of ∂2

∂ti∂tjfm(t).

Essentially, this list of condition ensures that the samples of f are Morse functions (almost

surely). For example, the covariance function C(s, t) = exp(−‖s− t‖2) s, t ∈ Rd, satisfies

this list of conditions.

Using f , define a Riemannian metric on M by setting

gx(X, Y ) , E{(Xfi)(x) (Y fi)(x)}, (2.1.4)

for any i and for X, Y ∈ TxM , the tangent space to M at x ∈ M . In other words, Xfi is

the derivative of fi in the direction represented by the tangent vector X . Next, use this

metric to define the Lipschitz-Killing curvatures, Lj , j = 0, . . . , d, on M . With the above

definitions and conditions, we are now ready to state the Gaussian Kinematic Formula

(GKF).

Theorem 2.1.1 (The Gaussian Kinematic Formula, [3]). Let M ⊂ Rd and D ⊂ R

nice stratified spaces. Let f = (f1, . . . , fk) : M → Rk be a C2 k-dimensional Gaussian

field, satisfying the conditions above. Then,

E{Li(f

−1(D))}=

dimM−i∑

](2π)−j/2Li+j(M)Mγ

j (D),

where:

• Li(·) is the i-th Lipschitz-Killing curvature of M computed with respect to the Rie-

mannian metric defined in (2.1.4),

• Mγi (·) is the i-th Gaussian Minkowski functional, defined in (2.1.1).

• The combinatorial coefficients here are the standard flag coefficients of integral ge-

ometry, given by [n

ωkωn−k,

where ωn is the volume of the n-dimensional unit ball.

More details about the Lipschitz-Killing curvatures, the induced Riemannian metric,

the Minkowski functionals and the niceness of the spaces can be found in [3, 43]. Aside

from the generality of the GKF, the fact that the spacesM andD appear in two completely

separate terms in the formula makes it even more elegant. Each of the terms Lj(M) and

Mγj (D) can be computed separately, and independently of the other.

An interesting special case of the GKF is when i = 0. In this case L0 is just the Euler

characteristic χ. Therefore, we have that

E{χ(f−1(D))

dimM∑

(2π)−j/2Lj(M)Mγj (D). (2.1.5)

2.1.3 Persistent Homology

In this section we present the main ideas behind the theory of persistent homology. Con-

sider the following situation. Let X be an unknown subspace of Rd with a finite Lebesgue

measure, and let X1, . . . , Xn be n independent random samples uniformly distributed on

X . We would like to study the homology of X from the given set of random points. In

many cases we can find an ǫ for which the union of balls

Bǫ(Xi)

is homotopy equivalent to X (and hence has the same homology, see Figure 2.1(a)).

However, we do not know a-priori what is the correct choice of ǫ. For example, if ǫ is

chosen to be too small (Figure 2.1(b)) then U is homotopy equivalent to the union of n

distinct points (and hence contains no information on X). On the other hand, choosing ǫ

to be too big (Figure 2.1(c)), then U is just a big contractible blob (which again tells us

nothing about X). Persistent homology tries to overcome this sensitivity to the choice of

The main idea behind persistent homology is to consider the whole range of possible

values of ǫ rather than one particular value. Starting with ǫ = 0, we have n distinct points.

As we increase ǫ, homology elements (i.e. connected components and k-dimensional holes)

are created and destroyed, until we reach a point were U is contractible (a giant blob). The

theory of persistent homology describes very accurately how to follow homology elements

throughout this birth/death process. The result is a set of pairs (bi, di) standing for the

2.1. BACKGROUND 23

(a) (b) (c)

Figure 2.1: Trying to capture the homology of an annulus (where β0 = 1, β1 = 1) from a

union of balls around a random set of samples. (a) A good choice of radius recovers the correct

homology. (b) The radius chosen is too small, hence the union of balls has the same homology

as n distinct points (β0 = 15, β1 = 0). (c) The radius chosen is too big, and the union is

contractible (β0 = 1, β1 = 0).

birth and death times (values of ǫ) of each homology element. The key assumption is

that homology elements that “live longer” (or, persist) are more likely to represent “real”

homology elements of X , whereas the others are just “noise”.

The description above is just a special case where persistent homology is useful, pre-

sented as a motivation. However, persistent homology can be defined for any filtration of

spaces. Given a filtration X = {Xu}u such that Xs ⊂ Xt if s < t, the persistent homology

of X , denoted by PH∗(X ), consists of families of homology elements that ‘persist’ through

time. More explicitly, an element of PHk(X ) is a family of homology elements α = {αu}u,where αu ∈ Hk(Xu) (the k-th homology group of Xu). Let ı

ks,t : Hk(Xs) → Hk(Xt) be the

homomorphism between homology groups induced by the inclusion Xs → Xt. The birth

time b of an element α ∈ PHk(X ) can be thought of as the first time α appears, which is

defined by the condition that αb 6∈ Im(ıks,b) for all s < b. The death time d of an element α

is the moment that αt merges with an element that existed before b. Formally, we require

that αt 6∈ Im(ıks,t) for all s < b and t < d, but αd ∈ Im(ıks,d) for all s < b.

A useful way to describe persistent homology is via the notion of barcodes. A barcode

for the persistent homology of a filtration {Xu}u is a collection of graphs, one for each

collection of homology groups of common order. A bar in the k-th graph, starting at b

and ending at d (b ≤ d) indicates the existence of a generator of Hk(Xu) whose birth

and death times are b, d respectively. For example, Figure 2.2 presents the persistent

homology of a filtration of simplicial complexes known as Rips complexes (see Section

3.1.1 for details). In this example random samples are taken from an annulus in R2, and

the Rips complexes are used in order to recover the homology of the annulus.

Figure 2.2: The barcode of a Rips complex, taken from [25]. The points were sampled from an

annulus in R2. We see that there is a single H0 bar that persists forever. This bar represents the

single connected component of the annulus. In H1 we see a couple of dominant bars indicating

that the sample space contains holes. The longest bar actually represents the real hole of the

annulus. In H2 there is nothing significant, and indeed β2 = 0 in this case.

A particularly interesting case is the persistent homology of functions. Suppose that

M is a nice space, that f : M → R is smooth, and consider the sublevel sets Mρ ,

f−1((−∞, ρ]). Note that if u ≤ v then Mu ⊂ Mv, thus {Mρ}∞ρ=−∞ is a valid filtration.

Going from u to v, components of Mu may merge, and new components may be born

and possibly later merge with one another or with the components of Mu. Similarly,

the topology of these components may change, as holes and other structures form and

disappear. Following the topology of sets in this filtration as a function of ρ, by following

their homology, is another example of persistent homology. From our discussion on Morse

theory in Section 1.1.3, it is clear the birth/death times of homology elements will be the

2.1. BACKGROUND 25

critical levels of the function f (since these are the only levels where topology changes).

Figure 2.3 presents the barcodes of a function f : [0, 1]2 → R. In this figure, however,

the persistent homology is computed for superlevel (or ‘excursion’) sets defined by Aρ ,

f−1([ρ,∞)), rather than sublevel sets. Thus, to have a filtration of sets, we start from a

very high level, and gradually decrease it. In this case the excursion sets are subsets of

[0, 1]2, and thus only have non-trivial H0 and H1 (i.e. connected components and holes).

For more details on persistent homology see [15, 23, 25].

Figure 2.3: Barcodes for the excursion (superlevel) sets of a function on [0, 1]2. The top seven

boxes show the surfaces generated by a 2-dimensional random field above excursion sets Au for

different levels u. To determine the level for each figure, follow the dotted line down to the

scale at the bottom of the barcode. As the dotted lines pass through the boxes labeled H0 and

H1, the number of intersections with bars in the H0 (H1) box gives the number of connected

components (resp. holes) in Au. Thus, at u ∼ 1.9, Au has 4 connected components but no holes,

while at u ∼ −1.2, Au has only 1 connected component, but 9 holes. The horizontal lengths

of the bars indicate how long the different topological structures (generators of the homology

groups) persist. Computation of the barcodes was carried out in Matlab by Eliran Subag from

the Technion, using Plex (Persistent Homology Computations) from Stanford [16].

2.1.4 Euler Integration

Proving the main result of this Chapter (Theorem 2.4.1), we use a relatively new notion

of integration which treats the Euler characteristic operator as a measure. This integral is

gaining increasing interest lately and seems to have great potential to become a powerful

data analysis and signal processing tool (see [7, 8, 27], and the recent survey paper [21]).

In this section we review the basic ideas behind the Euler calculus.

The Euler characteristic is an integer value assigned to topological spaces, which pro-

vides a partial description of their shape. It is a topological invariant, meaning that if X

and Y are homeomorphic spaces, then they have the same Euler characteristic. There are

a few equivalent ways to define the Euler characteristic (e.g. using simplicial, cell or basic

complexes). For compact d-dimensional spaces X , the Euler characteristic (denoted by

χ(X)) can be defined by

χ(X) =d∑

(−1)kβk, (2.1.6)

where βk = rankHk(X) is the k-th Betti number of X . For example, χ(point) =

1, χ(S1) = 0, χ(S2) = 2, χ(T 2) = 0. One of the key properties of the Euler char-

acteristic is that it is additive, in the sense that for nice compact sets A,B we have

χ(A ∪B) = χ(A) + χ(B)− χ(A ∩B).

Therefore, it is tempting consider χ as a measure and integrate with respect to it. The

main problem in doing so is that χ is only finitely additive (it is also not positive, but

that can be overcome using the theory of signed measures).

At first (see [48]), integration with respect to the Euler characteristic was defined for

a small set of functions called constructible functions defined by

CF (X) =

{h(x) =

ak 1Ak(x)

∣∣∣∣∣ ak ∈ Z, Ak are disjoint tame subsets of X

where ‘tame’ means having a finite Euler characteristic. For this set of functions we can

define the Euler integral analogously to the Lebesgue integral. Let

h(x) =

ak1Ak(x),

2.1. BACKGROUND 27

and define ∫

hdχ =

akχ(Ak). (2.1.7)

This integral has many nice properties, similar to those of the Lebesgue integral, such as

linearity and a version of the Fubini theorem (see [48]). However, as mentioned above,

the Euler characteristic is not countably additive, and therefore we cannot continue from

here by approximating other functions using functions in CF (X).

In [8] two extensions were suggested for the Euler integral of real valued functions using

the notion of a definable function over an O-minimal structure (see [8] and references

therein for more background). Let ⌊x⌋ and ⌈x⌉ be the floor and ceiling values of x

respectively. In the O-minimal language, if h : X → R is a definable function on a

definable space X , then an important property is that both ⌊h⌋ and ⌈h⌉ are constructiblefunctions and hence have well defined Euler integrals. This leads to the following Riemann-

sum like definition:

Definition 2.1.2 ([8]). Let h : X → R be a definable function on a definable space X .

The lower Euler integral is defined by

h⌊dχ⌋ = limn→∞

⌊nh⌋ dχ, (2.1.8)

and the upper Euler integral is defined by

h⌈dχ⌉ = limn→∞

⌈nh⌉ dχ. (2.1.9)

These two extensions coincide with the original Euler integral in (2.1.7) for con-

structible functions. For other functions they might be completely different. Easier

formulae to work with are given in the next proposition.

Proposition 2.1.3 ([8]). If h : X → R is a definable function, then

h⌊dχ⌋ =∫ ∞

[χ(h ≥ u)− χ(h < −u)] du,

and ∫

h⌈dχ⌉ =∫ ∞

[χ(h > u)− χ(h ≤ −u)] du,

where χ(h ≥ u) , χ (h−1([u,∞))) and χ(h > u) , χ (h−1((u,∞))), etc.

Unfortunately, these extensions of the Euler integral have many flaws, of which the

most prominent one is the lack of additivity. For example, a simple computation shows

that for X = [0, 1]

x⌊dχ⌋ +∫

(1− x)⌊dχ⌋ = 2 6= 1 =

1⌊dχ⌋.

Nevertheless, these integrals still have interesting properties. One of these is stated in the

following theorem.

Theorem 2.1.4 ( [8]). If M is a closed d-dimensional manifold and h : M → R is a

Morse function, then

h⌊dχ⌋ =∑

p∈CP(h)

(−1)d−µ(p)h(p),

h⌈dχ⌉ =∑

p∈CP(h)

(−1)µ(p)h(p).

where CP(h) is the set of critical points of h and µ(p) is the Morse index of p (i.e. the

number of negative eigenvalues of the Hessian matrix Hh(p)).

2.2 Redefining the Euler Integral

The definition of the Euler integral as it appears in [8] and stated in Section 2.1.4 uses

the language of O-minimal structures and definable functions. Later in this chapter we

would like to evaluate the expected value of the Euler integral over a Gaussian random

field. Unfortunately, it is not clear if Gaussian random fields can be made to fit inside an

O-minimal setting. Therefore, we now introduce a simplified definition of a tame function,

which we shall use throughout this chapter. We then re-define the Euler integral for this

type of function.

Definition 2.2.1. A continuous function h : X → R on a compact topological space

X with a finite Euler characteristic is tame if the homotopy types (and hence the Euler

characteristics) of h−1((−∞, u]) and h−1([u,∞)) change only finitely many times as u

varies over R and the Euler characteristic of each set is always finite.

2.2. REDEFINING THE EULER INTEGRAL 29

With this wide definition of tame functions, the formulae appearing in Proposition

2.1.3 are well defined. Thus, for tame functions we use these formulae as our definition of

the Euler integral, replacing Definition 2.1.2 with

Definition 2.2.2. Let h : X → R be a tame function. Then the lower and upper Euler

integrals are defined by

h⌊dχ⌋ =∫ ∞

[χ(h ≥ u)− χ(h < −u)] du,

and ∫

h⌈dχ⌉ =∫ ∞

[χ(h > u)− χ(h ≤ −u)] du.

From here on, we focus only on the upper Euler integral . Note, however, that all the

results we present have a straightforward lower integral analogue.

It turns out that the Euler integral is strongly related to both Morse theory and

persistent homology. In the following sections we introduce and discuss these connections.

2.2.1 The Euler Integral and Morse Theory

In this section we discuss the connection between the Euler integral of a tame function,

and its critical points. In [8] the Euler integral was given a stratified Morse theory

interpretation. A corollary of this approach was stated in Proposition 2.1.4, claiming that

if h : M → R is a Morse function, and M is a closed manifold, then

h⌈dχ⌉ =∑

p∈CP(h)

(−1)µ(p)h(p), (2.2.1)

where CP(h) is the set of critical points of h and µ(p) is the index of p as a critical point.

Using our definition of tame functions, we have the following more general proposition.

Proposition 2.2.3. Let h : X → R be a tame function and let CV(h) be the set of values

where the homotopy type of h−1(−∞, u] changes (the critical values of h). Then

h⌈dχ⌉ =∑

v∈CV(h)

∆χ(h, v) v,

where ∆χ(h, v) = χ(h ≤ v + ǫ) − χ(h ≤ v − ǫ), for sufficiently small ǫ, is the change in

the Euler characteristic of h−1((−∞, u]) as u passes through the critical value v.

Proof. Label the critical values CV(h) = {v1, . . . , vn} in increasing order such that v1 <

· · · < vi < 0 ≤ vi+1 < · · · < vn. If vk < u < vk+1, then via a telescoping sum

χ(h ≤ u) = ∆χ(h, v1) + · · ·+∆χ(h, vk),

χ(h > u) = χ(X)− χ(h ≤ u) = ∆χ(h, vk+1) + · · ·+∆χ(h, vn).

Therefore for u ∈ [0,∞) and u 6= ±vj ,

χ(h > u) =

∆χ(h, vj)1[0,vj ](u), and

χ(h ≤ −u) =

∆χ(h, vj)1[0,−vj ](u).

Thus, ∫

h⌈dχ⌉ =∫ ∞

(χ(h > u)− χ(h ≤ −u)) du =

vj ∆χ(h, vj),

as desired.

This recovers the Morse theoretic viewpoint in (2.2.1), since if h : M → R is a Morse

function then Morse theory says that the Euler characteristic changes by the addition of

(−1)k as h−1((−∞, u]) passes through a critical point of index k (See Section 1.1.3). The

following corollary slightly generalizes Proposition 2.2.3.

Corollary 2.2.4. Let h : X → R be tame, satisfying the conditions of Proposition 2.2.3.

If a 6∈ CV(h), then

v∈CV(h)v<a

∆χ(h, v)v =

(h ∧ a)⌈dχ⌉ + aχ(h ≤ a)− aχ(X).

Note, that by taking a > supX h(x) we recover Proposition 2.2.3.

Proof. If v1 < · · · < vn are the critical values of h and a is such that vk < a < vk+1, then

ha , (h ∧ a) has critical values at v1, · · · , vk, a. By Proposition 2.2.3,

ha⌈dχ⌉ =k∑

vj ∆χ(ha, vj) + a∆χ(ha, a)

vj ∆χ(h, vj) + a(χ(M)− χ(h ≤ a)).

This gives us the desired result.

2.2. REDEFINING THE EULER INTEGRAL 31

2.2.2 The Euler Integral and Persistent Homology

The Euler integral of a tame function h is strongly related to the persistent homology of

h (described in Section 2.1.3). In light of Proposition 2.2.3, this is not surprising, since

the Euler integral is a measure of how the Euler characteristic of h−1((−∞, u]) changes,

while the persistent homology tracks how the homology of h−1((−∞, u]) changes. To

make the relationship precise we introduce the following natural extension of the Euler

characteristic to barcodes.

Recall that a barcode is a graphical representation of persistent homology, as a collec-

tion of bars divided into groups for different homology degrees. A bar in the j-th group,

starting at b and terminating at d, represents a generator for the homology group Hj that

is born at level b and dies at level d. We therefore write PH∗ to denote the persistent

homology of a filtration thought of as a collection of bars B. For each bar in PH∗ we write

b(B), d(B) for its birth and death levels, ℓ(B) = d(B)− b(B) for its length, and µ(B) for

the degree of homology it belongs to.

Definition 2.2.5. Suppose PH∗ contains only a finite number of bars, and no bars of

infinite length. We define the Euler characteristic of PH∗ to be

χ(PH∗) =∑

B∈PH∗

(−1)µ(B)ℓ(B).

Let h : X → R be a tame function and let PH∗(h, a) be the barcode of the persistent

homology of the filtration {h−1((−∞, u])}a−∞. The relation between the Euler integral

and the Euler characteristic of the persistent homology PH∗(h, a) is given by the following

proposition.

Proposition 2.2.6. Let h : X → R be a tame function, and set hmax , supX h(x). Then

χ(PH∗(h, hmax)) = hmax χ(X)−∫

h⌈dχ⌉,

and, in general,

χ(PH∗(h, a)) = aχ(X)−∫

(h ∧ a)⌈dχ⌉.

Proof. Let B ∈ PH∗(h, a), and denote 1B(u) , 1[b(B),d(B)](u), then using Definition 2.2.5

we have

χ(PH∗(h, a)) =∑

B∈PH∗(h,a)

(−1)µ(B)

−∞1B(u)du

−∞

B∈PH∗(h,a)

(−1)µ(B)1B(u)du.

Note, that the number of bars of index k that intersect a level u is exactly the k-th Betti

number of h−1((−∞, u]), denoted by βk(u). Thus,

B∈PH∗(h,a)

(−1)µ(B)1B(u) =∑

(−1)kβk(u) = χ(h ≤ u),

and therefore we have,

χ(PH∗(h, a)) =

−∞χ {h ≤ u} du.

If a ≥ hmax, then using Definition 2.2.2 and the fact that χ(f ≤ u) = χ(M) for

u ≥ hmax, we have

−∞χ {h ≤ u} du = aχ(M) +

∫ ∞

(χ {h ≤ u} − χ(M)) du+

−∞χ {h ≤ u} du

= aχ(M)−∫

h⌈dχ⌉

= aχ(M)−∫

(h ∧ a)⌈dχ⌉.

On the other hand, if a < hmax, then the maximum of ha , (h∧a) is a. Thus, applyingthe first part of the Lemma for ha yields

χ(PH∗(ha, a)) = aχ(M)−∫

ha⌈dχ⌉ = aχ(M)−∫

(h ∧ a)⌈dχ⌉.

Finally, the barcodes of h and ha are identical in (−∞, a) and therefore χ(PH∗(ha, a)) =

χ(PH∗(h, a)). This completes the proof.

2.3 The Euler Integral of Gaussian Random Fields

Let M be a stratified space and let g : M → R be a Gaussian or Gaussian related random

field. We are interested in computing the expected value of the Euler integral of the field g

over M , which is a precursor to the main result of the chapter (Theorem 2.4.1). As noted

2.3. THE EULER INTEGRAL OF GAUSSIAN RANDOM FIELDS 33

earlier, while we focus on the upper Euler integral, everything we do has a lower Euler

integral analogue. The following result is a consequence of the GKF (Theorem 2.1.1) and

Proposition 2.2.2.

Theorem 2.3.1. Let M be a compact d-dimensional stratified space, satisfying the con-

ditions of the GKF, and let f : M → Rk be a k-dimensional Gaussian random field,

both satisfying the GKF conditions. For a piecewise C2 tame function G : Rk → R, let

g = G ◦ f . Setting Du = G−1((−∞, u]), we have

g⌈dχ⌉}

= χ(M)E {g} −d∑

(2π)−j/2Lj(M)

Mγj (Du)du (2.3.1)

where E {g} = E {g(x)} (g(x) has a constant mean).

The difficulty in evaluating the expression above lies in computing the Minkowski

functionals Mγj (Du). In Sections 2.3.1 and 2.3.2 we present a few cases where they have

been computed, which allows us to simplify (2.3.1).

Proof. Using our definition for the Euler integral (Definition 2.2.2) we have that

g⌈dχ⌉ =∫ ∞

(χ(g > u)− χ(g ≤ −u)) du

∫ ∞

(χ(M)− χ(g ≤ u)) du−∫ 0

−∞χ(g ≤ u)du.

Therefore,

g⌈dχ⌉}

∫ ∞

(χ(M)− E {χ(g ≤ u)}) du

−∫ 0

−∞E {χ(g ≤ u)} du.

(2.3.2)

Replacing D with Du in the GKF (Theorem 2.1.1) yields

E {χ(g ≤ u)} = E{χ(f−1(Du))

(2π)−j/2Lj(M)Mγj (Du).

Substituting this formula into (2.3.2) yields,

g⌈dχ⌉}

(2π)−j/2ρjLj(M),

−∫RMγ

j (Du)du j > 0,

∫∞0

(1−Mγ0(Du)) du−

−∞ (Mγ0(Du)du) j = 0.

The expression for ρ0 can be further simplified. Let X ∈ Rk be a standard multi-normal

vector and Y = G(X), then

Mγ0 (Du) = γk(Du) = P (X ∈ Du) = P (Y ≤ u) .

Therefore,

∫ ∞

(1− P (Y ≤ u)) du−∫ 0

−∞P (Y ≤ u) du

∫ ∞

P (Y > u) du−∫ 0

−∞P (Y ≤ u) du

= E {Y } .

Since for all x ∈ M , f(x) ∼ N (0, 1), we can replace Y with G(f(x)) = g(x). Finally,

recalling that L0 ≡ χ completes the proof.

2.3.1 Real Valued Fields

For real valued fields (i.e. k = 1) we can improve Theorem 2.3.1 by computing the terms

Mγj (Du) that appear in (2.3.1). First, we need to recall some facts about the family of

Hermite polynomials. For n ≥ 0, the n-th Hermite polynomial is defined as

Hn(x) = (−1)nϕ(x)−1 dn

dxnϕ(x),

where ϕ(x) = (2π)−1/2e−x2/2 is the density of the standard Gaussian distribution. This

family of polynomials is orthogonal under the inner product on functions f, g : R → R

〈f, g〉 =∫

f(x)g(x)ϕ(x)dx.

A consistent and useful convention is

H−1(x) = ϕ(x)−1

∫ ∞

ϕ(u)du.

Theorem 2.3.2. Let M be a compact d-dimensional stratified space, and let f : M → R

be a real valued Gaussian random field, both satisfying the GKF conditions. Let G : R → R

be piecewise C2 and g = G ◦ f . Then

g⌈dχ⌉}

= χ(M)E {g}+d∑

(−1)jLj(M)〈Hj−1, (sign(G

′))jG′〉(2π)j/2

In the case that the functionG is strictly monotone, this result can be further simplified

using the fact that sign(G′) is constant and then integrating by parts.

Corollary 2.3.3. Let f be as in Theorem 2.3.2, and G be a strictly increasing function.

g⌈dχ⌉}

(−1)jLj(M)〈Hj, G〉(2π)j/2

If G is strictly decreasing then,

g⌈dχ⌉}

Lj(M)〈Hj , G〉(2π)j/2

To prove Theorem 2.3.2 we will need the following calculus lemma, which is a special

case of Federer’s coarea formula.

Lemma 2.3.4. Let h : R → R be an integrable function and let G : R → R be a piecewise

differentiable continuous function that is nondifferentiable on a countable set. Then

h(x) |G′(x)| dx =

x∈G−1(t)

Proof of Theorem 2.3.2. By Theorem 2.3.1, it suffices to show that

Mγj (Du)du = (−1)j

⟨Hj−1, (sign(G

′))jG′⟩ , (2.3.3)

for j ≥ 1, where Du = G−1((−∞, u]). Since G is continuous, we can write the inverse

image of (−∞, u] as a disjoint union of closed intervals

Du =⋃

[ai, bi],

where we allow one ai to be −∞ and one bi to be ∞. Note that for all the finite values

we have G(ai) = G(bi) = u, G′(ai) < 0 and G′(bi) > 0.

For small enough ρ we have

Tube (Du, ρ) =⋃

[ai − ρ, bi + ρ].

Therefore

γk (Tube (Du, ρ)) =∑

(Φ(bi + ρ)− Φ(ai − ρ)) , (2.3.4)

where Φ(x) =∫ x

−∞ ϕ(u)du, and γk is Gauss measure on Rk (see Section 2.1.1). The Taylor

expansion of Φ(x+ ρ) in ρ is

Φ(x+ ρ) = Φ(x) +

∞∑

j!(−1)j−1Hj−1(x)ϕ(x),

so in particular Mγj ((−∞, x]) = (−1)j−1Hj−1(x)ϕ(x). Therefore we conclude that for

j ≥ 1

Mγj (Du) =

((−1)j−1Hj−1(bi)ϕ(bi) +Hj−1(ai)ϕ(ai)

). (2.3.5)

Note that if bi = ∞ (or ai = −∞) its contribution to the volume of the tube in (2.3.4) is

independent of ρ (1 or 0 respectively). Thus, it will affect only Mγ0 and we can assume

all the ai and bi in (2.3.5) are finite and hence⋃

i{ai, bi} = G−1(u).

If j is odd, then from (2.3.5) we have that

Mγj (Du) =

(Hj−1(bi)ϕ(bi) +Hj−1(ai)ϕ(ai)) =∑

x∈G−1(u)

Hj−1(x)ϕ(x).

Using Lemma 2.3.4 we have

Mγj (Du) du =

x∈G−1(u)

Hj−1(x)ϕ(x)

Hj−1(x)ϕ(x)|G′(x)|dx

= 〈Hj−1, |G′|〉

= (−1)j−1⟨Hj−1, (sign(G

′))jG′⟩ .

If j is even, then from (2.3.5)

Mγj (Du) =

(−Hj−1(bi)ϕ(bi) +Hj−1(ai)ϕ(ai))

= −∑

x∈G−1(u)

sign(G′(x))Hj−1(x)ϕ(x).

Using Lemma 2.3.4 we have

Mγj (Du) du = −

∫ ∞

−∞

x∈G−1(u)

Hj−1(x)ϕ(x) sign(G′(x))du

= −∫

Hj−1(x)ϕ(x)G′(x)dx

= −〈Hj−1, G′〉

= (−1)j−1⟨Hj−1, (sign(G

′))jG′⟩ .

This completes the proof.

2.3.2 Vector Valued Fields

When f is a vector valued Gaussian field, it can be difficult to evaluate the Minkowski

functionals Mγj . In this subsection we treat two special cases, in the first of which it is

possible to do the calculus and find nice explicit formula for the mean Euler integral.

The χ2 case

Let M be a compact d-dimensional manifold. A χ2 field with k degrees of freedom is of

the form g = G ◦ f , where f = (f1, . . . , fk) : M → Rk is a Gaussian random field with

i.i.d., mean zero and unit variance components, and G(x1, . . . , xk) =∑k

i=1 x2i .

Theorem 2.3.5. The mean Euler integral for a χ2 field with k degrees of freedom, with

k ≥ d, is given by

g⌈dχ⌉}

= kL0(M)− 2√π

Γ(k+12)

Γ(k2)L1(M) +

πL2(M).

Proof. First note that in this case, Mγj (Du) = Mγ

j (G−1(−∞, u]) = 0 when u < 0 since

G is nonnegative. In [3, Section 15.10.2] it is shown that for k ≥ d and j ≥ 1

Mγj (Du) =

dj−1pk(x)

dxj−1

∣∣∣∣x=

where pk(x) =xk−1e−x2/2

Γ(k/2)2(k−2)/2.

Therefore,

Mγj (Du)du =

∫ ∞

dj−1pk(x)

dxj−1

∣∣∣∣x=

du = 2

∫ ∞

dj−1pk(t)

dtj−1t dt.

Computing for j = 1, j = 2, d ≥ j ≥ 3, we have that

∫ ∞

Mγ1(Du)du = 2

∫ ∞

pk(t)t dt = 2√2Γ(k+1

Γ(k2),

∫ ∞

Mγ2(Du)du = 2

∫ ∞

p′k(t)t dt = 2,

and integration by parts yields

∫ ∞

Mγj (Du)du = 2

(dj−2pk(t)

dtj−2t

∣∣∣∣∞

− dj−3pk(t)

dtj−3

∣∣∣∣∞

Finally, noting that E {g} = k completes the proof.

The F case

Let M be a compact d-dimensional manifold and let f : M → Rn+m be a vector valued

Gaussian field with i.i.d., mean zero and unit variance components,

G(x) =n

∑mi=1 x

2i∑n

i=1 x2m+i

and g = G ◦ f . In this case, it is proved in [3, Theorem 15.10.3] that for j ≥ 1

(G−1([u,∞))

)=(1 +

)−m+n−22

⌊ j−12 ⌋∑

j−2l−1∑

Cm,n,j,l,i

)m−j2

for a set of constants Cm,n,j,l,i.

Using basic calculus we can show that for n > j + 2 and for all m, the integral∫∞0

Mγj (G

−1[u,∞)) du converges. This can be used to compute the expected lower Euler

integral∫Mg⌊dχ⌋ rather than the expected upper integral that we have computed so far.

Thus, we can conclude that for n > d+ 2 the expected lower Euler integral is finite. For

each n,m it is possible to compute the exact value, but no general formula is known. In

order to compute the upper Euler integral, we need to compute Mγj (G

−1((−∞, u])). We

note that this is feasible, but technically too complicated to be pursued here.

2.4 Persistent Homology of Gaussian Random Fields

In Section 2.2.2 we described the connection between the Euler integral of a function

and its persistent homology. This allows us to interpret our computation of the expected

Euler integral for Gaussian random fields as a computation on the expected value of a

quantitative measure of the persistent homology of a Gaussian random field. We consider

this interpretation to be the main result of this Chapter. This result is the first of its kind

giving a precise form for the expected value of a quantitative property of the persistent

homology of random functions.

Theorem 2.4.1. Let f : M → Rk be a Gaussian random field satisfying the GKF condi-

tions, G : Rk → R continuous and piecewise C2, and g = G ◦ f . Then

E {χ(PH∗(g, gmax))} = χ(M) (E {gmax} − E {g}) +d∑

(2π)−j/2Lj(M)

Mγj (Du)du.

If f : M → R is a real valued field then

E {χ(PH∗(f, fmax))} = E {fmax}χ(M) +L1(M)√

Proof. By Proposition 2.2.6,

E {χ(PH∗(g, gmax))} = E {gmax}χ(M)− E

g⌈dχ⌉}.

and using Theorem 2.3.1 completes the proof .

It can also be useful to consider the partial barcode PH∗(g, a), terminated at some

fixed level a (rather than gmax). For real valued fields we have an explicit formula for this

case as well.

Theorem 2.4.2. Let f : M → R be a Gaussian random field satisfying the GKF condi-

tions. Then for any a ∈ R,

E {χ(PH∗(f, a))} = χ(M) (ϕ(a) + aΦ(a)) + ϕ(a)d∑

(−1)j(2π)−j/2Lj(M)Hj−2(a).

To prove Theorem 2.4.2 we need the following lemma.

Lemma 2.4.3. Let f : M → R be a Gaussian random field, satisfying the GKF condi-

tions. Then,

(f ∧ a)⌈dχ⌉}

= χ(M) (a− aΦ(a)− ϕ(a))

− ϕ(a)

(−1)j(2π)−j/2Lj(M)Hj−2(a).

Proof. We will apply Theorem 2.3.2 to the function Ga(x) , (x∧a). In this case G′a(x) =1(−∞,a](x). Therefore,

⟨Hj−1, (sign(G

′a))

jG′a

−∞Hj−1(u)ϕ(u)du = −Hj−2(a)ϕ(a).

In addition,

E {Ga ◦ f} =

−∞xϕ(x)dx+

∫ ∞

aϕ(x)dx = a− aΦ(a)− ϕ(a).

Thus, by Theorem 2.3.2, we are done.

Proof of Theorem 2.4.2. Using Proposition 2.2.6 we have

E {χ(PH∗(f, a))} = aχ(M)− E

(f ∧ a)⌈dχ⌉}

and applying Lemma 2.4.3 completes the proof.

2.5 Weighted Sum of Critical Values

In this section we use the link between the Euler integral and Morse theory discussed in

Section 2.2.1, to present novel statements about critical points of Gaussian random fields.

Taking G(x) = H1(x) = x in Theorem 2.3.2 and using Proposition 2.2.3 yields the

following compact formula.

Theorem 2.5.1. Let f : M → R be a Gaussian random field satisfying the conditions of

the GKF. Then

v∈CV(f)

∆χ(f, v)v

= −L1(M)√

2π, (2.5.1)

where CV(f) is the set of critical values of f and ∆χ(f, v) is the change in the Euler

characteristic of f−1((−∞, u]) as u passes through v from below (see Section 2.2.1). In

the case that M is a closed manifold,

p∈CP(f)

(−1)µ(p)f(p)

= −L1(M)√

2π. (2.5.2)

where CP(f) is the set of critical points of f , and µ(p) is the Morse index of the critical

point p.

2.5. WEIGHTED SUM OF CRITICAL VALUES 41

In the case that M is a closed even dimensional manifold, L1(M) = 0 so (2.5.2) states

p∈CP(f)

(−1)µ(p)f(p)

Note that this fact has the following alternative proof. If f is a Morse function, then so is

f , −f . In addition, p is a critical point of f with index µ(p) if and only if p is a critical

point of f with index µ(p) = d − µ(p). Finally, f is a Gaussian random field with zero

mean, therefore f and f have the same probability law. Thus,

p∈CP(f)

(−1)µ(p)f(p)

p∈CP(f)

(−1)µ(p)f(p)

= −E

p∈CP(f)

(−1)µ(p)f(p)

The first equality holds because f and f have the same probability law. The second

equality holds because µ = d− µ and d is even.

The thing to note about Theorem 2.5.1 is that the expected value of a weighted

sum of the critical values scales like L1(M), a 1-dimensional measure of M and not

the volume Ld(M), as one might have expected. Consider the following example: Let

f : Rd → R be a Gaussian random field with covariance function C : Rd × Rd → R given

by C(x, y) = e−‖x−y‖2

2 . This covariance function induces the Euclidian metric on Rd and

Theorem 2.5.1 implies that

[0,L]df⌈dχ⌉

}= −L1([0, L]

d)√2π

= − d√2π

In comparison to Theorem 2.5.1, letting G(x) = xd and using Theorem 2.3.2 we get

that E{∫

Mf d⌈dχ⌉

}depends on the volume Ld(M) (as well the other measures). So while

in general the behavior of the critical points and the critical values depends on the volume,

when one takes the weighted sum of the critical values a lot of cancelation occurs and the

result only depends on a 1-dimensional measure. This phenomenon is very surprising and

is intrinsically interesting. In fact, there is an alternative non-topological way to prove

this result using the Kac–Rice formula. However, it is not clear if there is a topological

phenomenon behind these cancelations, and so we leave this for future research.

The result in Theorem 2.5.1 can be generalized to the case where we consider only

critical values below some level a.

Theorem 2.5.2. Let f : M → R be a Gaussian random field satisfying the conditions of

the GKF. Then

v∈CV(f)v<a

∆χ(f, v)v

=− ϕ(a)L0(M)

− ϕ(a)

(−1)j(2π)−j/2Lj(M) (Hj−2(a) + aHj−1(a)).

(2.5.3)

In the case that M is a closed manifold, then the left hand side above can be replaced with

p∈CP(f): f(p)<a

(−1)µ(p)f(p)

Observe that taking a → ∞ recovers the result in Theorem 2.5.1.

Proof. According to Corollary 2.2.4,

v∈CV(f)v<a

∆χ(f, v)v

(f ∧ a)⌈dχ⌉}+ aE {χ(f ≤ a)} − aχ(M).

The first term on the right hand side is given by Lemma 2.4.3 and the second term is

given by the GKF (Theorem 2.1.1).

2.6 Towards Applications

An interesting application of the Euler integral is suggested in [7]. Suppose that an

unknown number of targets are located in a space X , and each target α is represented

by its support Uα ⊂ X . Suppose also that the space X is covered with sensors, reporting

only the number of targets each one sees (i.e. no identification). Let h : X → Z be the

sensor field, i.e.

h(x) = # {targets activating the sensor located at x} .

The following theorem states how to combine the readings from all the sensors and get

the exact number of targets.

2.6. TOWARDS APPLICATIONS 43

Theorem 2.6.1 ([7]). If all the target supports Uα satisfy χ(Uα) = N for some N 6= 0,

# {targets} =1

where dχ is the original Euler integration for constructible functions.

Note that we do not need to assume anything about the targets other than they all have

the same Euler characteristic. For example, we need not assume that they are all convex

or even have the same number of connected components. On the other hand, the theorem

assumes an ideal sensor field, in the sense that the entire (most likely continuous) space

X is covered with extremely accurate sensors (the range of each sensor is a single point

in X). In [8] more realizable models using the lower/upper Euler integral are discussed.

Using the results from Section 2.3 we can extend the setup above to the case where

the readings from the sensors are contaminated by a Gaussian (or Gaussian related) noise

f(x). We will use the following proposition.

Proposition 2.6.2. Let h, f : X → R be tame functions and suppose that h(X) is

discrete, then ∫

(h + f)⌈dχ⌉ =∫

h⌈dχ⌉ +∫

f⌈dχ⌉.

Proof. Let h(x) =∑n

i=1 ai1Ai(x), where the Ai are disjoint. Then by the additivity of

the Euler characteristic we have that

(h + f)⌈dχ⌉ =n∑

(h+ f)⌈dχ⌉. (2.6.1)

Next, ∫

(h + f)⌈dχ⌉ =∫

(ai + f)⌈dχ⌉ = aiχ(Ai) +

f⌈dχ⌉,

where the last equality follows from Proposition 2.2.3, since every critical value is changed

by ai. Applying this to (2.6.1) completes the proof.

Returning to the target enumeration problem, suppose that we have a deterministic

signal x =∫Xh⌈dχ⌉, observed via a noisy measurement Y =

∫X(h + f)⌈dχ⌉. By the

above proposition we have that

(h + f)⌈dχ⌉ =∫

h⌈dχ⌉ +∫

f⌈dχ⌉ = x+N,

so we have the classical parameter estimation with additive noise model. If f(x) is a

Gaussian or Gaussian related random field satisfying the conditions in Theorem 2.3.2, then

we can use the estimator x = Y −E {N}. This is a very naive estimator indeed, however

it still reduces the mean squared error compared to just taking the measurement Y .

Further investigating the properties of the Euler integral might lead to useful estimation

techniques for this model.

2.7 Summary and Future Work

In this chapter we presented novel quantitative claims about the persistent homology of

Gaussian random fields. To do this, we first gave a very general Morse theoretic interpre-

tation for the Euler integral (Proposition 2.2.3). We then used this interpretation to relate

the persistent homology of a function to its Euler integral (Proposition 2.2.6). Finally,

we applied the Gaussian Kinematic Formula (Theorem 2.1.1), to evaluate the expected

value of the Euler integral of Gaussian and Gaussian related fields, and consequently, the

expected value of the Euler characteristic of the persistent homology of these fields.

Persistent homology is a very powerful theoretical and analytical tool that can be

used to study spaces and functions. It is already being used in a variety of data analysis

applications (cf. [12,15,17,19]). However, in order to make it into a statistical compelling

tool, there is a need to introduce rigorous probabilistic models describing the behavior

of persistence diagrams. There has been some work in this direction (cf. [13, 34]), but

including the work presented in this chapter, this is all just the tip of the iceberg.

Possible ways to continue the work presented in this chapter are numerous. In this

chapter we defined the Euler characteristic of a barcode, and computed its expected value.

However, it is still remains to further understand what this Euler characteristic can tell

us about excursion sets of random fields. In addition, the results about the signed sum

of critical value deserves further investigation. It would be very interesting to understand

the exact causes of the phenomenon presented in Section 2.5, and to determine if it is

unique to Gaussian fields, or extends beyond those.

From a more general perspective, we believe that there is lot more to reveal about the

persistent homology of random fields. It is important to find other parameters charac-

2.7. SUMMARY AND FUTURE WORK 45

terizing the persistent homology of a function, and to investigate them statistically. As

the nature of persistent homology is abstract, it would be difficult to find such quanti-

tative parameters for which one can also carry out probabilistic computations. Possible

candidates are the average bar length, maximal bar length, number of bars, distribution

of birth/death times of a bar and so forth. In addition, as homology elements and critical

points are strongly connected via Morse theory, it is highly probable that results related

to the homology of excursion sets could lead to novel statements about critical points of

random fields, about which very little is known at this point.

Finally, the following points are a few more ideas for future work, which were raised

while we were working on the results presented in this chapter.

• Similarly to Euler integration, one can make sense of integration with respect to

any of the Lipschitz-Killing curvatures (or any linear combination of them). In this

setting an appropriate generalization of Theorem 2.5.1 should hold. It could be

interesting to study those integrals as well, and see if they also lead to new and

interesting statements about Gaussian random fields.

• Theorems 2.5.1 and 2.4.1 show that if f : M → R is a real valued Gaussian ran-

dom field satisfying the GKF, we have that E{∫

f⌈dχ⌉}

grows linearly with a

1-dimensional measure of the space M . Could this be developed further into a

way to statistically test if a set of given measurements originated from a Gaussian

random field?

• One motivation for computing the expected Euler characteristic of super-level sets of

a random field f : M → R is that it allows one to estimate the excursion probabilities

P (supx∈M f(x) ≥ u) (see [3, Chapter 14]). In Theorem 2.5.2 we computed

v∈CV(f)v<a

∆χ(f, v)v

When a is a large negative number, this value can approximate the number of local

minima that are below the value a. Could these results be used to gain meaningful

information about E {fmin}?

Chapter 3

The Topology of Random Geometric

Complexes

3.1 Background

In this chapter we study the limiting behavior of critical points of the distance function

(defined in Section 3.2). While the critical points are, by themselves, intrinsically inter-

esting, knowledge of their behavior also has immediate implications (via Morse theory) to

the study of the topology of Cech complexes built over random point sets. In this Section

we give a brief introduction to geometric complexes, discuss their use as an applied topol-

ogy tool, and review previous work. The results presented in this chapter were published

in [2, 9].

3.1.1 Geometric Complexes

A k-dimensional simplex (or just a ‘k-simplex’) in Rd is the convex hull of k + 1 points

x0, . . . , xk ∈ Rd, denoted by σ = [x0, . . . , xk]. A simplicial complex is a collection of

simplexes satisfying the following conditions.

Definition 3.1.1. A set of simplexes ∆ is a simplicial complex if

1. For any σ ∈ ∆, if σ′ ⊂ σ then σ′ ∈ ∆, and

2. For any σ1, σ2 ∈ ∆, σ1 ∩ σ2 ∈ ∆.

48 CHAPTER 3. THE TOPOLOGY OF RANDOM GEOMETRIC COMPLEXES

Figure 3.1 depicts two collection of simplexes in R2, one of which is a simplicial com-

plex, and another one which is not.

(a) (b)

Figure 3.1: Simplicial complexes in R2. (a) This collection of vertices, edges and triangles is a

valid simplicial complex (see Definition 3.1.1). (b) Here we also have a collection of simplexes,

however, the intersection of the two triangles is not included in this collection, and therefore this

is not a simplicial complex. However, representing each simplex by its vertices, this collection

does represent an abstract simplicial complex (see Definition 3.1.2).

We will use the notion of an ‘abstract simplicial complex’, in which the simplexes are

considered just as finite subsets of a global set S, and lose their geometrical meaning.

Definition 3.1.2. Let S be a set. A collection ∆ of finite subsets of S is called an abstract

simplicial complex if, for any σ ∈ ∆, if σ′ ⊂ σ then σ′ ∈ ∆.

From the definition it is clear that any simplicial complex is an abstract simplicial

complex as well (if we think of every k-simplex as a set of k + 1 vertices). The collection

of simplexes in Figure 3.1(b) demonstrates an abstract simplicial complex which is not a

simplicial complex. Two types of abstract simplicial complexes which are commonly used

in applied algebraic topology, are the Cech and Rips complexes.

Definition 3.1.3 (The Cech Complex). Let P = {x1, x2, . . .} be a collection of points in

a metric space X . Construct an abstract simplicial complex C(P, ε) in the following way:

1. The 0-simplexes are the points in P.

2. An n-simplex [xi0 , . . . , xin ] is in C(P, ǫ) if⋂n

k=0Bǫ(xik) 6= ∅,

where Bǫ(x) is the ball of radius ǫ around x. The complex C(P, ǫ) is called the Cech

complex attached to P and ǫ.

3.1. BACKGROUND 49

Definition 3.1.4 (The Vietoris-Rips Complex). Let P = {x1, x2, . . .} a collection of

points in a metric space X . Construct an abstract simplicial complex R(P, ǫ) in the

following way:

1. The 0-simplexes are the points in P.

2. An n-simplex [xi0 , . . . , xin ] is in R(P, ǫ) if Bǫ(xik) ∩ Bǫ(xim) 6= ∅ for every 0 ≤ k <

m ≤ n.

The complex R(P, ǫ) is called the Rips complex attached to P and ǫ.

(a) (b)

Figure 3.2: The Cech and Rips complexes. (a) A Cech complex constructed from a set of points

and a given radius. The complex consists of 6 vertices, 7 edges, and a single triangle. The grey

area represents the balls used to construct the complex, and does not belong to the complex

itself. (b) A Rips complex constructed from the same set of points and the same radius ǫ. Note

that a new triangle was added on the right, since we have pairwise intersection between the

balls. This triangle does not belong to the Cech complex, however, since the intersection of the

three balls is empty.

Figure 3.2 shows a Cech and a Rips complex, constructed from the same set of points

and the same given radius. It is important to note that while the space X might have

a finite dimension d (for example X = Rd), the Cech and Rips complexes might contain

k-simplexes with k > d. Thus, neither of them necessarily embeds into X .

From the definitions above it is obvious that C(P, ǫ) ⊂ R(P, ǫ). In addition, it is

proved in [22] that R(P, ǫ′) ⊂ C(P, ǫ) for ǫ/ǫ′ ≥√2d/(d+ 1). In other words, a Cech

complex can be “approximated” by Rips complexes. This fact is used in computational

applications, since working with Rips complexes is much more efficient than Cech com-

plexes. There are occasions when Rips and Cech complexes coincide, as is the case when

X is Euclidean but the metric is the L∞ rather than the more standard L2 norm.

Homology theory can be applied to abstract simplicial complexes as well. This variant

of homology is known as ‘Simplicial Homology’. For simplicial complexes embedded in

Euclidean spaces, we can still think of simplicial homology as describing connected com-

ponents and holes of the complex. The main importance of the Cech complex, and its

relevance to homology theory, is given by the nerve theorem we state next. This theorem

goes back to [11], but can also be found in many other resources (e.g. [32, Theorem 4.4.4]).

Theorem 3.1.5 (The Nerve Theorem). Suppose that the intersections⋂

x∈P ′ Bε(x) are

either empty or contractible for any subset P ′ of P. Then the Cech complex C(P, ǫ) is

homotopy equivalent to⋃

x∈P Bǫ(x). In particular, if X is a finite dimensional normed

linear space, or a compact Riemannian manifold with convexity radius greater than ǫ, and

if {Bǫ(x)}x∈P is a cover of the space X, then C(P, ǫ) is homotopy equivalent to X.

More simply, this theorem states that in order to study the homology of the topological

space⋃

x∈P Bǫ(x), we can study the (simplicial) homology of the combinatorial space

C(P, ǫ). This fact can be useful in proving theoretical results, but its main contribution

is to computational applications. We noted above that if X is of a finite dimension

d, then C(P, ǫ) might have higher dimensional simplexes. However, the nerve theorem

asserts that this higher dimensional space will not have homology of dimension greater

than d. This is not true, however, for Rips complexes.

3.1.2 Motivation and Previous Work

There is considerable current interest in the study, from a topological, homological, point

of view of random structures such as graphs and simplicial complexes. Some recent

references are [4, 6, 20, 33, 41] with two reviews, from different aspects, in [2] and [25].

Many of these papers find their raison d’etre in essentially statistical problems, in which

data generates these structures.

The main motivation for the work in this chapter is the same manifold learning problem

described in Section 2.1.3. Let M be an unknown manifold, and suppose that we are a

given a set if i.i.d. random samples X = {X1, . . . , Xn} from this manifold. In order

3.1. BACKGROUND 51

to recover the homology of M , we look at the homology of U =⋃n

k=1Bǫ(Xk). The

subject of manifold learning goes, obviously, well beyond such an example, and examples

of algorithms for ‘estimating’ an underlying manifold from a finite sample abound in the

statistics and computer science literatures. Very few of them, however, take an algebraic

point of view.

One contribution in the spirit of this chapter is [37], where the problem of estimating

the homology of smooth manifolds from finite samples was studied. For every δ > 0, the

main theorem in [37] provides sufficient conditions on n and ǫ such that the homology

of U is equal to the homology of M with a probability of at least (1 − δ). Of course,

one of the most important issues in dealing with data is noise. In the setting of manifold

learning this translates to the sample points possibly not coming from the submanifold

that theoretically models the phenomenon because of experimental, measurement, or other

error. The work in [38] deal with this issue, as does [18] from a different and enlightening

point of view.

In this chapter we wish to study the homology of U (or, equivalently, C(X , ǫ)), when

the number of points |X | = n goes to infinity and ǫ , rn → 0. It turns out that even

in the case where M = Rd, when the underlying manifold is trivial, there is quite a lot

to study about the homology of C(X , ǫ), and this is the main focus of this chapter. In

Section 3.5 we discuss how one might extend our results to the case of sampling from

closed smooth manifolds.

Recent work (see [30,31]) studied the Betti numbers of the Cech complex in the setup

just described. In this scenario, the behavior of the Cech complex (or the union of balls)

splits into three main regimes. If nrdn → 0 (the subcritical or ‘dust’ phase), the complex

is very sparse, with many small disconnected components and hardly any holes. In the

critical phase nrdn → λ ∈ (0,∞), the complex becomes connected with many holes of any

dimension k < d. Finally, if nrdn → ∞ the complex is highly connected, with very few

holes, if any. Detailed study of the Betti numbers is possible mostly in the dust phase,

and is significantly more complicated in the other regimes. Thus, we tried to take an

indirect approach, by studying critical points of the distance function (described in the

next section), and applying Morse theory.

Not surprisingly, there is close correspondence between the results in this chapter, and

the Betti number results in [30,31]. However, while the results for the Betti numbers are

mainly to the subcritical phase, the study of the distance function extends to the other

regimes as well. Thus, the indirect approach of studying critical points (rather than Betti

numbers) has some advantages. For example, using our results, we can easily derive limit

theorems for the Euler characteristic of the Cech complex in all three regimes.

In Section 3.2 we define the distance function, and discuss its own version of Morse

theory. Section 3.3 presents all the limit theorems we have for the critical points of the

distance function. In Section 3.4 we return to discuss the topology of Cech complexes

in light of the new results, and compare our results with those in [30, 31]. Proofs are

relegated to Section 3.6, and Section 3.5 contains a summary and some directions for

future research.

3.2 The Distance Function

The distance function is the main object of study in this chapter. In this section we define

the distance function and its critical point theory.

3.2.1 Definition and Motivation

For a finite set P of points in Rd, of size |P|, let dP : Rd → R

+ be the distance function

for P, so that

dP(x) , minp∈P

‖x− p‖, x ∈ Rd. (3.2.1)

We are interested in studying the asymptotic behavior (in |P|) of critical points of

this distance function, for random sets P. While the critical points are, by themselves,

intrinsically interesting, knowledge of their behavior also has immediate implications to

the study of the topology of Cech complexes built over random point sets.

Ur ,⋃

x∈PBr(x).

The key observation is that d−1n ((−∞, r]) = Ur and so, by the Nerve theorem (3.1.5)

d−1n ((−∞, r]) is homotopy equivalent to C(P, r). By Morse Theory, changes in the ho-

mology of d−1n ((−∞, r]) occur at the critical levels of dn. Thus, studying the critical points

3.2. THE DISTANCE FUNCTION 53

of dn should reveal information about the topology of C(P, r). Note, however, that dn is

non-differentiable (and certainly not a Morse function). Nevertheless, following [24], we

can define a special notion of a critical point and Morse index for dn, and apply Morse

theory to it. This will be the focus of the following section.

3.2.2 Critical Points of the Distance Function

Critical points of smooth functions have been studied since the earliest days of calculus,

but took on significant additional importance following the development of Morse theory

(e.g. [35, 36]) which tied them closely to the homologies of manifolds (see Section 1.1.3).

Recall that if M is a nice (closed, smooth) d-dimensional manifold, and f : M → R

a nice (Morse) function, then a point c is called a critical point if ∇f(c) = 0. A non-

degenerate critical point is one for which the Hessian matrix Hf(c) is non-singular. The

Morse index k ∈ {0, 1, . . . , d} of a non-degenerate critical point c is then the number

of negative eigenvalues of Hf(c). Note that critical points of index 0 are local minima,

while critical points of index d are local maxima. The indexes between 0 and d represent

different types of ‘saddle points’. The critical points, along with their indexes, provide

one of the main links between differential and algebraic topology.

Classical Morse theory does not directly apply to the distance function in (3.2.1)

mainly because it is not everywhere differentiable. However, one can still define a notion

of non-degenerate critical points for the distance function, as well as their Morse index,

which we now do. Our arguments follow [24], which we specialize to the case of the

distance function.

Given a set P of points in Rd, and defining the distance function dP (3.2.1), we start

with the local (and global) minima of dP ; viz. the points of P, where dP = 0, and call

these critical points with index 0. For higher indexes, we have the following definition.

Definition 3.2.1. A point c ∈ Rd is a critical point of dP with index 1 ≤ k ≤ d if there

exists a subset Y of k + 1 points in P such that:

1. ∀y ∈ Y : dP(c) = ‖c− y‖, and for all p ∈ P\Y we have ‖c− p‖ > dP(p).

2. The points in Y are in general position (i.e. the k + 1 points of Y do not lie in a

(k − 1)-dimensional affine space).

3. c ∈ conv◦(Y), where conv◦(Y) is the interior of the convex hull of Y (an open

k-simplex in this case).

Figure 3.3: Critical points of a distance function in R2. The grayscale image represents the

values of the distance function dP for P = {p1, p2, p3}. Clearly, the minima of dP are the points

in P themselves. Looking at c2, we observe that there is one direction in which the function

decreases (the green arrows), and one in which the distance function increases (the red arrows).

Hence, this points is considered a saddle point, or a critical point of index 1. Note that c2 is

located on the edge between p2, p3, which is their convex hull, in accordance with Definition

3.2.1. The same applies to c1, c3. Finally, c4 is a maximum point, or a critical point of index 2.

This point lies inside the triangle whose vertices are p1, p2, p3, which is again, the convex hull of

the points.

Note that the first condition implies that dP ≡ dY in a small neighborhood of c. The

second condition implies that the points in Y lie on a unique (k− 1)- dimensional sphere.

Figure 3.3 depicts a distance function from a set of three points in R2, and its critical

points of indexes 0, 1, 2.

3.2. THE DISTANCE FUNCTION 55

We shall use the following notation:

S(Y) = The unique (k − 1)-dimensional sphere containing Y , (3.2.2)

C(Y) = The center of S(Y) in Rd, (3.2.3)

R(Y) = The radius of S(Y), (3.2.4)

B(Y) = The open ball in Rd with radius R(Y) centered at C(Y), (3.2.5)

Note that S(Y) is a (k − 1)-dimensional sphere, whereas B(Y) is a d-dimensional ball.

Obviously, S(Y) ⊂ B(Y), but, unless k = d, S is not the boundary of B. Since the critical

point c in Definition 3.2.1 is equidistant from all the points in Y , we have that c = C(Y).

Thus, we say that c is the unique index k critical point generated by the k + 1 points of

the subset Y . The last statement can be rephrased as follows:

Lemma 3.2.2. A subset Y ⊂ P of k + 1 points in general position generates an index k

critical point if, and only if, the following two conditions hold:

CP1 C(Y) ∈ conv◦(Y),

CP2 P ∩ B(Y) = ∅.

Furthermore, the critical point is C(Y) and the critical value is R(Y).

Figure 3.4 depicts the generation of an index 2 critical point in R2 by subsets of 3

points. We shall also be interested in critical points c that are within distance ǫ from P,

i.e. dP(c) ≤ ǫ. This adds a third condition, which we will refer to later,

CP3 R(Y) ≤ ǫ.

The following indicator functions, related to CP1–CP3, will appear often.

Definition 3.2.3. Using the notation above,

h(Y) , 1 {C(Y) ∈ conv◦(Y)} (CP1) (3.2.6)

hǫ(Y) , h(Y)1[0,ǫ](R(Y)) (CP1+CP3) (3.2.7)

gǫ(Y ,P) , hǫ(Y)1 {P ∩ B(Y) = ∅} (CP1+CP2+CP3) (3.2.8)

Figure 3.4: Generating a critical point of index 2 in R2 (i.e. a maximum point). The small blue

disks are the points of P. We examine three subsets of P: Y1 = {y1, y2, y3}, Y2 = {y4, y5, y6},and Y3 = {y7, y8, y9}. S(Yi) are the dashed circles, whose centers are C(Yi) = ci. The shaded

balls are B(Yi), and the interior of the triangles are conv◦(Yi). (1) We see that both C(Y1) ∈conv◦(Y1) (CP1) and P ∩ B(Y1) = ∅ (CP2). Hence c1 is a critical point of index 2. (2)

C(Y2) 6∈ conv◦(Y2), which means that (CP1) does not hold, and therefore c2 is not a critical

point (as can be observed from the flow arrows). (3) C(Y3) ∈ conv◦(Y3), so (CP1) holds.

However, we have P ∩ B(Y3) = {p}, so (CP2) does not hold, and therefore c3 is also not a

critical point. Note that in a small neighborhood of c3 we have dP ≡ d{p}, completely ignoring

the existence of Y3.

3.3 Limit Theorems for the Distance Function

In this section we present the main results of this chapter. In order to avoid interrupting

the chain of events in this section, we postpone the proofs to Section 3.6.

We wish to study the distance function dP when the set P is random and |P| → ∞.

We shall focus on two different (yet very similar) setups.

Random Sample

Let Xn = {X1, . . . , Xn} be a set of i.i.d. random points in Rd, with a common probability

density f , which we assume to be bounded. Denote by CPk(dXn) the sets of critical points

of dXn with index k. Let {rn}∞n=1 be a sequence of positive numbers with limn→∞ rn = 0,

and define

Nk,n , # {c ∈ CPk(dXn) : dXn(c) ≤ rn} ,

3.3. LIMIT THEOREMS FOR THE DISTANCE FUNCTION 57

the number of critical points of dXn with index k, and with a critical value bounded by

rn. In other words, Nk,n counts critical points with index k which are within distance rn

from Xn.

Poisson Process

Let Pn be a spatial Poisson process on Rd with intensity function λn = nf , where f is a

bounded probability density function on Rd (so that E {|Pn|} = n). Denote by CPk(dPn)

the sets of critical points of dPn with index k. Let {rn}∞n=1 be a sequence of positive

numbers with limn→∞ rn = 0, and define

Nk,n , # {c ∈ CPk(dPn) : dPn(c) ≤ rn} .

Our main goal in this section is to study the limits of Nk,n and Nk,n as n → ∞. Since

N0,n = E{N0,n} = n (the minima are the points of Xn or Pn) we shall only be interested in

1 ≤ k ≤ d. The results split into three main regimes, depending on the rate of convergence

of rn to zero, specifically, on the limit of the term nrdn. We shall state all the results in

terms of Nk,n. Unless otherwise stated, exactly the same results apply for Nk,n.

A word on notation: In the formulae presented below, for g : (Rd)k+1 → R and

y = (y1, . . . , yk) ∈ (Rd)k we write g(0,y) for g(0, y1, . . . , yk).

3.3.1 The Subcritical Range (nrdn → 0)

This range is also known as the ‘dust phase’, for reasons that will become clearer later,

when we discuss Cech complexes. We start with the limiting mean.

Theorem 3.3.1 (Limit mean). If nrdn → 0, then for 1 ≤ k ≤ d,

limn→∞

(nk+1rdkn )−1E {Nk,n} = µk,

µk =1

(k + 1)!

fk+1(x)dx

(Rd)kh1(0,y)dy < ∞,

and h1 is defined by (3.2.7).

In general, as is common for results of this nature, we cannot explicitly compute µk.

However, when k = 1, y contains only a single point, and so h ≡ 1 and R(0,y) = ‖y‖/2.Therefore, h1(0,y) = 1 {‖y‖ ≤ 2}, yielding

µ1 = 2d−1ωd

f 2(x) dx,

where ωd is the volume of the unit ball in Rd. Some numerics for other cases are given

below.

The observation that, for a specific choice of rn, there is at most one α > 0 such that

limn→∞ nα+1rdαn ∈ (0,∞) leads to the important fact that there is a ‘critical’ index,

0 limn→∞ nα+1rdαn = 0, ∀α > 0,

⌊α⌋ limn→∞ nα+1rdαn ∈ (0,∞),

∞ limn→∞ nα+1rdαn = ∞, ∀α > 0,

such that

limn→∞

E {Nk,n} =

∞ k < kc

0 k > kc

(3.3.1)

with any value in (0,∞] possible at k = kc. That is, there is phase transition occurring

within the subcritical regime itself.

Similar regimes, with identical limits, appear for asymptotic variances.

Theorem 3.3.2 (Limit variance). If nrdn → 0, then for 1 ≤ k ≤ d,

limn→∞

(nk+1rdkn )−1Var (Nk,n) = µk.

Not surprisingly, the three regimes also yield different limit distributions.

Theorem 3.3.3 (Limit distribution). Let nrdn → 0, and 1 ≤ k ≤ d,

1. If limn→∞ nk+1rdkn = 0, then

Nk,nL2

−→ 0.

2. If limn→∞ nk+1rdkn = α ∈ (0,∞), then

Nk,nL−→ Poisson (αµk) .

3. If limn→∞ nk+1rdkn = ∞, then

Nk,n − E {Nk,n}(nk+1rdkn )1/2

L−→ N (0, µk).

As above, for a specific choice of rn, there is going to be at most a single kc for which

the Poisson limit applies. Otherwise, Nk,n converges either to zero or infinity. Thus, in

the subcritical regime, the picture is that n = N0,n ≫ N1,n ≫ · · · ≫ Nkc,n, while, for

k > kc the value of Nk,n will be zero, with high probability, which increases with k.

3.3.2 The Critical and Supercritical Ranges (nrdn → λ ∈ (0,∞])

We now look at the critical (nrdn → λ ∈ (0,∞)) and supercritical (nrdn → ∞) regimes.

While there are differences between the two regimes, the general outline of the results

is the same. In both, the correct scaling for Nk,n is n (as opposed to nk+1rdkn in the

subcritical range). Consequently, the limit results are similar for all the indexes.

The supercritical regime is significantly more difficult to analyze than either the crit-

ical or subcritical, and we shall require an additional assumption for this case, which

necessitates a definition.

Definition 3.3.4. Let f : Rd → R be a probability density function. We say that f is

lower bounded if it has compact support and fmin , inf {f(x) : x ∈ supp(f)} > 0.

Henceforth, when dealing with the supercritical phase, we always assume that f is a

lower bounded probability density, and that supp(f) is convex. As we shall see in Chapter

4, the compact support assumption is crucial here. However, it is not clear at this point

if convexity is a necessary condition, or a consequence of our proofs.

Theorem 3.3.5 (Limit mean). If nrdn → λ ∈ (0,∞], then, for 1 ≤ k ≤ d,

limn→∞

n−1E {Nk,n} = γk(λ),

γk(λ) =λk

(k + 1)!

(Rd)k+1

fk+1(x)h1(0,y)e−λωdR

d(0,y)f(x) dxdy,

γk(∞) = limλ→∞

γk(λ) =1

(k + 1)!

(Rd)kh(0,y)e−ωdR

d(0,y) dy,

ωd is the volume of the unit ball in Rd, and R, h, and h1 are defined in (3.2.4), (3.2.6),

and (3.2.7), respectively.

Again, these terms can be evaluated for k = 1, in which case

γ1(λ) =λ

‖y‖≤2

f 2(x)e−λωd2−d‖y‖df(x) dydx,

γ1(∞) =1

e−ωd2−d‖y‖d dy = 2d−1.

For a uniform distribution on a compact set D ⊂ Rd it is easy to show that γ1(λ) is

given by

γ1(λ) = 2d−1(1− e−λωd/Vol(D)), (3.3.2)

from which it is easy to check that γ1(λ) → γ1(∞) as λ → ∞. For higher indexes, we

have no analytic way to compute γk(λ). However, it can be evaluated numerically, and

an example is given in Figure 3.5 for the uniform distribution on [0, 1]3. Note that, in

that example, γ0(∞)− γ1(∞) + γ2(∞)− γ3(∞) ≈ 0. This is not a coincidence, and the

explanation for this phenomenon will be given in Section 3.4, where we discuss the mean

Euler characteristic of Cech complexes.

0 1 2 3 4 5 6 7 8 9 100

γ k(λ)

k=0k=1k=2k=3

Figure 3.5: The γk(λ) function. In this example d = 3, and f(x) is the uniform density on [0, 1]3.

For k = 0 we know that n−1N0,n = 1, and for k = 1 we have an explicit formula in (3.3.2). For

k = 2, 3 we had to use a numerical approximation, hence the noisiness of the graphs.

Recall that, in the subcritical phase, the limit mean and the limit variance were exactly

the same. For other phases, this is no longer true.

Theorem 3.3.6 (Limit variance). If nrdn → λ ∈ (0,∞] and 1 ≤ k ≤ d,

limn→∞

n−1Var (Nk,n) = σ2k(λ), lim

n→∞n−1Var(Nk,n) = σ2

k(λ),

where 0 < σ2k(λ) < σ2

k(λ) < ∞.

The expressions defining σ2k(λ) and σ2

k(λ) are rather complicated, and can be found

at (3.6.31) and (3.6.24), respectively. Note that this theorem, and the following central

limit theorem (CLT), are the only places where the limit values differ between the random

sample and Poisson cases.

Theorem 3.3.7 (CLT). If nrdn → λ ∈ (0,∞], then for 1 ≤ k ≤ d,

Nk,n − ENk,n√n

L−→ N (0, σ2k(λ)),

Nk,n − ENk,n√n

L−→ N (0, σ2k(λ)).

Note that as an immediate corollary of these CLTs and Theorem 3.3.6 we have the

‘law of large numbers’ that, under the conditions of the CLTs,

n−1Nk,nL2

−→ γk(λ).

To conclude this section, we note an interesting result which is unique to the super-

critical regime, for which we define

NGk,n , |CPk(dXn)| ,

the ‘global’ number of critical points of the distance function dXn in Rd (i.e. without

requiring (CP3)). We note first that Nk,n and NGk,n have identical asymptotic behaviors,

at least at the level of their first two moments and CLT:

Theorem 3.3.8. If nrdn → ∞, and f is lower bounded with convex support, then, for

1 ≤ k ≤ d,

limn→∞

n−1E{NG

}= γk(∞), lim

n→∞n−1Var

)= σ2

k(∞),

k,n − E{NG

L−→ N (0, σ2k(∞)),

where γk and σk are the same as in Theorems 3.3.5 and 3.3.6.

As usual, the results are the same for the Poisson case. An obvious corollary of

Theorem 3.3.8 is that n−1E{NG

k,n −Nk,n

}→ 0. However, much more is true:

Proposition 3.3.9. Under the conditions of Theorem 3.3.8, and if nrdn ≥ D⋆ logn, for

sufficiently large (f -dependent) D⋆, then, for 1 ≤ k ≤ d,

limn→∞

E{∣∣NG

k,n −Nk,n

∣∣} = 0.

Thus, in the supercritical phase, the slow decrease of the radii rn implies that the

global and the local number of critical points are ultimately equal with high probability,

despite the fact that both grow to infinity with increasing n. This is an interesting

and unexpected result, and will turn out to be important when we discuss the Euler

characteristic of the Cech complex in the next section. However, Proposition 3.3.9 relies

heavily on the assumed convexity of supp(f). For example, take f to be the uniform

density on the annulus A = {x ∈ R2 : 1 ≤ |x| ≤ 2}. Then, for n large enough, we would

expect to have a maximum point (index 2) close to the origin. This critical point will be

accounted for in NG2,n, but will be ignored by N2,n, since its distance to Xn is greater than

1. Thus, we would expect that E{|NG2,n −N2,n|} → 1, which contradicts Proposition 3.3.9

3.4 The Topology of Random Cech Complexes

As mentioned already a number of times, the results of the previous section regarding

critical points of the distance function have implications for the homology and Betti

numbers of certain random Cech complexes, and so are related to recent results of [29]

and [31]. Our plan in this section is to describe this connection.

3.4.1 Critical Points and Betti Numbers

The link between the distance function and the Cech complex is given by the following

equivalence, which is due to the Nerve theorem (Theorem 3.1.5),

d−1P ((−∞, ǫ]) =

p∈PBǫ(p) ≃ C(P, ǫ). (3.4.1)

Morse theory, and in particular the version developed in [24] that applies to the distance

function, tells us that, in view of the equivalences in (3.4.1), there is a connection between

3.4. THE TOPOLOGY OF RANDOM CECH COMPLEXES 63

the critical points of dP over the set d−1P ([0, ǫ]), and the Betti numbers of C(P, ǫ). In

particular, for every critical point of dXn at height ǫ and of index k, for all small enough

δ, either

(C(Xn, ǫ+ δ)

)= βk

(C(Xn, ǫ− δ)

βk−1

(C(Xn, ǫ+ δ)

)= βk−1

(C(Xn, ǫ− δ)

)− 1.

Despite this connection, Betti numbers, dealing, as they do, with ‘holes’, are typically

determined by global phenomena, and this makes them hard to study directly in the

random setting. On the other hand, the structure of critical points is a local phenomenon,

which is why, in the random case, we can say more about critical points than what is known

for Betti numbers to date.

3.4.2 The Limiting Behavior of the Cech Complex

For the remainder of this section we shall treat only the random sample Xn, although

similar statements could be made regarding the Poisson case. Retaining the notation

of the previous section, and defining βk,n , βk(C(Xn, rn)), our aim will be to examine

relationships between the random variables Nk,n and the βk,n and βk−1,n. In addition, we

shall compare our results for Nk,n to those of [29] and [31] for βk,n, using Morse theory to

explain the connections.

In direct analogy to the results of Section 3.3, [29,31] show that the limiting behavior of

C(Xn, rn) splits into three main regimes, depending on the limit of nrdn. In the subcritical

(nrdn → 0) or dust phase, in which the Cech complex consists mostly of small disconnected

particles and very few holes, Theorem 3.2 in [29] states that for 1 ≤ k ≤ d− 1,

limn→∞

(nk+2rd(k+1)n )−1

E {βk,n} = Dk,

for some constant Dk defined in an integral form and related to the µk of our Theorem

3.3.1. In [31] the subcritical phase is explored in more detail, and limit theorems analogous

to those of Theorem 3.3.3 are proved. Combining their results with those in Section 3.3.1,

observe that the Nk,n and the βk−1,n exhibit similar limiting behavior, and are O(nk+1rdkn ).

Furthermore, we can summarize the relationship between the different Nk,n and βk,n as

follows:

N1,n ≫ N2,n ≫ N3,n ≫ · · · ≫ Nkc,n

≈ ≈ ≈β1,n ≫ β2,n ≫ · · · ≫ βkc−1,n,

where ≈ means ‘same order of magnitude’ and kc is as in (3.3.1). For k > kc all terms

are zero with high probability, which, as before, grows with k.

Recall that Morse theory tells us that each critical point of index k contributes either

+1 to βk,n or −1 to βk−1,n. Splitting Nk,n and βk,n accordingly as

Nk,n = N+k,n +N−

βk,n = N+k,n −N−

k+1,n,

the diagram implies that N+k,n = O(nk+2r

d(k+1)n ) and N−

k,n = O(nk+1rdkn ). Hence we con-

clude that N−k,n ≫ N+

k,n. In other words, most of the critical points of index k destroy

homology generators rather than create new ones. In the case where k = 0, noting that

β0,n = N0,n −N−1,n yields the following corollary,

Corollary 3.4.1. If nrdn → 0, then

limn→∞

n−1E {β0,n} = 1.

Recall that β0,n represents the number of connect components of the complex

C(Xn, rn), and is of an essentially different nature to that of the other Betti numbers.

The study in [31] does not apply to β0 at all, while in [39] limit theorems for β0,n are

proved for the critical phase. The Morse theoretic point of view we use here, thus gives

additional results not accessible from the direct approach to Betti numbers.

For the other regimes, making statements about the Cech complex becomes extremely

difficult, and thus the theory is still incomplete.

In the critical phase (nrdn → λ ∈ (0,∞)), the Cech complex starts to connect and the

topology becomes more complex. In addition, once λ passes a certain threshold, a giant

component emerges (cf. Chapter 9 of [39]), from which comes the alternate description of

this phase as the ‘percolation phase’. Theorem 4.1 in [29] states that for 1 ≤ k ≤ d− 1,

limn→∞

n−1E {βk,n} ∈ (0,∞),

3.4. THE TOPOLOGY OF RANDOM CECH COMPLEXES 65

although the exact limit is not computed. This agrees with the results in Section 3.3.2 of

this paper. The main difference between the two sets of results is that for critical points

we are able to give a closed form expression for the limit mean of Nk,n (Theorem 3.3.5),

as well as stronger limit results (Theorems 3.3.7–3.3.9). This will be useful below, when

we discuss Euler characteristics.

In the supercritical regime (nrdn → ∞) even less is known about the Cech complex.

In general, the Cech complex becomes highly connected, the topology becomes simpler

and the Betti numbers decrease. Theorem 6.1 of [29] gives the precise results that if f is

a uniform density with a compact and convex support, and limn→∞(log n/n)−1/drn > 0 ,

limn→∞

P (β0,n = 1, β1,n = · · · = βd−1,n = 0) = 1, (3.4.2)

which is described in [31] by saying that C(Xn, rn) is “asymptotically almost surely con-

tractible”. We have no analogous result about critical points, nor could we, since Nk,n

is O(n) and thus Nk,n → ∞ (Section 3.3.2). However, Corollary 3.4.2 below gives infor-

mation about the Euler characteristic of the Cech complex which is different from, but

related to, (3.4.2). (Note that (3.4.2) requires that the underlying probability density is

lower bounded with convex support, the same assumption we adopted Section 3.3.2.)

To conclude this section, we present a novel statement about the Cech complex

C(Xn, rn) which can be made based on the results in Section 3.3. The Euler charac-

teristic of a simplicial complex S has a number of equivalent definitions, and a number of

important applications. One of the definitions, via Betti numbers, is

χ(S) =∞∑

(−1)kβk(S). (3.4.3)

However, χ(S) also has a definition via indexes of critical points of appropriately defined

functions supported on S, and this leads to

Corollary 3.4.2. Let χn be the Euler characteristic of C(Xn, rn). Then, under the as-

sumptions of Theorems 3.3.1 and 3.3.5, we have

limn→∞

n−1E {χn} =

1 nrdn → 0,

1 +∑d

k=1 (−1)kγk(λ) nrdn → λ ∈ (0,∞),

0 nrdn → ∞.

(3.4.4)

Moreover, when nrdn → ∞ and nrdn ≥ D⋆ logn (see Proposition 3.3.9), then

E {χn} → 1.

Note that (3.4.4) cannot be proven using only the existing results on Betti numbers,

since the values of the limiting mean in the critical and supercritical regimes are not

available. This demonstrates one of the advantages of studying the homology of the Cech

complex via the distance function.

In closing we note some of the implications of Corollary 3.4.2. In the subcritical phase,

we have that χn ∼ n, which agrees with the intuition developed so far that, in this range,

the Cech complex consists of mostly small disconnected particles and very few holes. In

the critical range we have a non-trivial limit resulting from the fact that the Cech complex

has many holes of all possible dimensions. In the supercritical range, χn ∼ 1 which is

exactly what we get when β0,n = 1, β1,n = · · · = βd−1,n = 0 (cf. (3.4.3), (3.4.2)). Finally,

since n−1E {χn} → 0 in this regime, it is clear now why the numerics of Figure 3.5 showed

that∑3

k=0(−1)kγk(∞) ≈ 0.

In this chapter we presented a body of limit theorems for the distance function from a

random set of points in Euclidean space Rd. We observed different limiting behavior in

three different phases, which are controlled by the term nrdn. Using a special version

of Morse theory for the distance function (which was developed in [24]), we linked our

results for the critical points with the results in [30,31] where limit theorems for the Betti

numbers βk,n of the Cech complex C(Xn, rn) were presented.

There is a lot more to study on the theory of random distance functions and their

relationships with random Cech complexes. In this chapter we have already established

3.5. SUMMARY AND FUTURE WORK 67

a number of novel topological results about random Cech complexes, such as Corollaries

3.4.1 and 3.4.2. We would like to pursue more results of this kind. Of particular interest

are the critical (nrdn → λ) and super-critical (nrdn → ∞) regimes, where the behavior of

the Betti numbers of the Cech complex has not yet been figured out. We believe that

the results we have for the distance function in these regimes could be highly useful for

understanding the behavior of the complex in these regimes.

The following two sections discuss in more detail two topics, which remain for future

research, and which seem particularly interesting.

3.5.1 The Supercritical Phase

In this chapter we presented the results for the supercritical phase under the assumption

that f is lower bounded with a convex support. We rely on this assumption in the proofs

presented in section 3.6. We would really like to extend our results beyond this set of

distributions.

For distributions with compact supports, the result in Corollary 3.4.2 suggests that in

the supercritical phase, the Cech complex captures the topology of the support. However,

under our assumption, the support of f is always contractible, in which case β0 = 1 and

βk = 0 for k ≥ 1. Extending our results to non-convex supports, we believe that the

result in Corollary 3.4.2 could be extended to the claim that for the right choice of rn we

have E {χn} → χ(supp(f)). This also might lead to finding a regime in which we can

prove convergence for the Betti numbers, e.g. that E {βk,n} → βk(supp(f)). Proving such

results would be a significant contribution to the manifold learning problem described as

a motivation for this chapter. It means that we could find conditions on the radius rn,

such that in the limit, the topology of the Cech complex recovers the topology of the

original space.

For distributions with unbounded support, the results we will present in Chapter 4

indicate a very different limiting behavior. In Chapter 4 we will show that the power-

law and exponential distributions ‘crackle’. Briefly, this means that as we add more and

more points a contractible core is formed, but outside this core there are many small

disconnected particles and homology elements of any order. Therefore, the techniques to

prove the results in this chapter may need adjustments to handle such distributions.

3.5.2 The Distance Function on Closed Manifolds

The main assumption in this chapter is that the samples are generated from a nice proba-

bility density f in Rd. This setup is interesting on its own, and the Cech complex behavior

exhibits a rich variety of phenomena yet to be fully studied. However, we are also very

interested in extending the results we have so far to the manifold case. Here, the samples

are drawn from a m-dimensional (smooth, closed) manifold M ⊂ Rd, where m < d.

The techniques used in this chapter need significant adaptation, and this is work in

progress. As one would expect, the general behavior is similar, and in particular we

observe the same phase transition phenomena. The main difference is that the term

controlling the transition is now nrmn rather than nrdn.

Similarly to the discussion in Section 3.5.1, Corollary 3.4.2 suggests that after ex-

tending our results to manifolds, we would find a sub-case of the supercritical phase

(nrmn → ∞) where the topology of the Cech complex recovers the topology of M .

In the critical phase (nrmn → λ ∈ (0,∞)) we can easily extend the limit theorems in

this chapter to closed manifolds, using the results in [40]. Let M ⊂ Rd be a smooth closed

manifold, and let f : M → R be a probability density function on M , i.e. f ≥ 0, and∫

f(x)dx = 1,

where dx is the volume form on M . Let Xn = {X1, . . . , Xn} be a set of i.i.d. random

samples with density f . In [40], limit theorems for functionals defined on such random

sets are introduced. For simplicity, assume that rn = λn−1/m, although the results can be

easily extended for any choice of rn in the critical range.

Following the notation in [40], define

ξ(x,X ) , 1 {∃x1, . . . , xk ∈ X : hλ(x, x1, . . . , xk) = 1} ,

ξn(x,X ) , ξ(n1/mx, n1/mX ) = 1 {∃x1, . . . , xk ∈ X : hrn(x, x1, . . . , xk) = 1}

Then clearly,

Nk,n =1

X∈Xn

ξn(X,Xn).

Thus, adapting Theorem 3.1 in [40] to this special case, we have

3.6. PROOFS 69

Theorem 3.5.1. Let Hα be a homogeneous Poisson process on Rm with rate α. Then,

n−1Nk,n → 1

E{ξ(0,Hf(x))

}f(x)dx,

both in L1 and almost surely.

Next, simple computations show that

E{ξ(0,Hf(x))

λkfk(x)

(Rm)kh1(0,y)e

−λωdf(x)Rd(0,y)dy.

Therefore, we have that

n−1Nk,n → γk(λ),

γk(λ) =λk

(k + 1)!

(Rm)kfk+1(x)h1(0,y)e

−λωdRd(0,y)f(x)dydx.

Note that the expression of γk(λ) here is very similar to the one given in Theorem 3.3.5.

The only difference is the domain of integration. In light of this result, it seems very likely

that all the limit theorems presented in this chapter have an equivalent manifold version.

This topic remains as future work.

Sampling from Fractals

A similar idea to the manifold setup is sampling from fractals. Here, we are interested in

generating samples from some distribution over domains with a fractal (e.g. Hausdorff)

dimension m, which is not necessarily an integer. It would be extremely interesting to

see if our results can be extended to this case as well, and how will they depend on the

fractal dimension. Numerical simulations on samples taken from the graph of a Brownian

motion (whose Hausdorff dimension is m = 1.5) suggest that the phase transition in this

case indeed occurs at nrmn → λ. While this is very encouraging, we leave fractals as future

work as well.

3.6 Proofs

This section is devoted to prove the results in Sections 3.3 and 3.4, and is organized

according to situations: sub-critical (dust), critical (percolation), and super-critical (con-

nected). In the proofs below we use theorems from Palm theory, Stein’s method and the

de-Possoinization method. The appendixes to this chapter, provide a brief background to

each of these topics and state the required theorems.

3.6.1 Some Notation and Elementary Considerations

In this section we list some common notation and note some simple facts that will be

used in many of them.

• Henceforth, k will be fixed, and whenever we use Y ,Y ′ or Yi we implicitly assume that

|Y| = |Y ′| = |Yi| = k + 1, unless stated otherwise.

• Usually, finite subsets of Rd will be denoted calligraphically (X ,Y). However inside

integrals we use boldfacing and lower case (x,y).

• For x ∈ Rd, x ∈ (Rd)k+1 and y ∈ (Rd)k, we use the shorthand

f(x) , f(x1)f(x2) · · · f(xk+1),

f(x+ rny) , f(x+ rny1)f(x+ rny2) · · ·f(x+ rnyk),

h(0,y) , h(0, y1, . . . , yk).

• The symbol ‘c⋆’, denotes a constant value, which might depend on d (ambient dimen-

sion), f (the probability density of the samples), and k (the Morse index), but on

neither n nor rn. The actual value of c⋆ may change between and even within lines.

• While not exactly a notational issue, we shall often use the fact that, for every k,

n−k(nk

)→ 1/k! as n → ∞, and thus there is a c⋆ such that

)≤ c⋆nk.

Finally, the following lemma will be used extensively throughout the proofs below.

Lemma 3.6.1. Let X = (X1, . . . , Xk) be a set of k i.i.d. points in Rd sampled from a

bounded density f . Then there exists a constant c⋆ such that

P (X is contained in a ball with radius r) ≤ c⋆rd(k−1).

3.6. PROOFS 71

Proof. If X is bounded by a ball with radius r, then X2, . . . , Xk are all within distance

2r from X1, thus

P (X is bounded by a ball of radius r) ≤∫

B2r(x)

f(y)dy

)k−1

f(x)dx

≤∫

(fmax Vol(B2r(x))k−1 f(x)dx

= fk−1maxω

k−1d (2r)d(k−1)

= c⋆rd(k−1),

where fmax , supx∈Rd f(x), and ωd is the volume of the unit ball in Rd.

3.6.2 Means for the Subcritical Range (nrdn → 0)

We start by proving Theorem 3.3.1 (the limit expectation), which requires the follow-

ing important lemma. Note that the lemma has two implications. Firstly, it gives a

precise order of magnitude, with constant, for the probability that k + 1 points in the

rn-neighborhood of a point in χn generate an index-k critical point. Secondly, it implies

that if an additional, high density set of Poisson points is added to these k + 1 points,

any of these will lie in the ball containing the k+ 1 original points. Recall the definitions

of the indicator functions h, hǫ, gǫ, given by (3.2.6), (3.2.7),(3.2.8) respectively.

Lemma 3.6.2. Let Y ⊂ Xn, be a subset (chosen in advance) of k + 1 random variables

from Xn, and assume that Y is independent of the Poisson process Pn. Then,

limn→∞

r−dkn E {hrn(Y)} = lim

n→∞r−dkn E {grn(Y ,Xn)}

= limn→∞

r−dkn E {grn(Y ,Y ∪ Pn)} = (k + 1)!µk,

where µk is defined in Theorem 3.3.1.

Proof. Note that from the definition of hǫ(·) (see (3.2.7)), it follows that

hǫ(x, x+ ǫy) , hǫ(x, x+ ǫy1, . . . , x+ ǫyk) = h1(0,y).

Thus, using the change of variables x → (x, x+ rny),

E {hrn(Y)} =

(Rd)k+1

f(x)hrn(x)dx

= rdkn

(Rd)kf(x)f(x+ rny)hrn(x, x+ rny)dydx

= rdkn

(Rd)kf(x+ rny)h1(0,y)dydx. (3.6.1)

Now, for h1(0,y) to be nonzero, all the elements y1, . . . , yk ∈ Rd must lie inside B2(0) -

the ball of radius 2 around the origin. Therefore,

|f(x+ rny)h1(0,y)| ≤ fkmax1B2(0)(y1) · · ·1B2(0)(yk),

and applying the dominated convergence theorem (DCT) to (3.6.1) yields

limn→∞

(Rd)kf(x+ rny)h1(0,y)dxdy = fk(x)

(Rd)kh1(0,y)dy, (3.6.2)

from which follows

limn→∞

r−dkn E {hrn(Y)} =

fk+1(x)dx

(Rd)kh1(0,y)dy = (k + 1)!µk, (3.6.3)

completing the proof for hrn(Y). For grn(Y ,Xn) we have

E {grn(Y ,Xn)} = E {E {grn(Y ,Xn) | Y}} = E{hrn(Y)(1− p(Y))n−k−1

where p(Y) ,∫B(Y)

f(z)dz (B(Y) is defined in (3.2.5)). Thus,

E {grn(Y ,Xn)} =

(Rd)k+1

f(x)hrn(x)(1− p(x))n−k−1dx (3.6.4)

= rdkn

(Rd)kf(x+ rny)h1(0,y)(1− p(x, x+ rny))

n−k−1dydx.

The integrand here is smaller or equal to the one in (3.6.1), therefore we can safely apply

the DCT to it. To find the limit, first note that

np(x, x+ rny) = n

B(x,x+rny)

f(z)dz

= nVol(B(x, x+ rny))

∫B(x,x+rny)

f(z)dz

Vol(B(x, x+ rny))

= nωd(rnR(0,y))d

∫B(x,x+rny)

f(z)dz

Vol(B(x, x+ rny)).

3.6. PROOFS 73

Applying the Lebesgue differentiation theorem yields

limn→∞

∫B(x,x+rny)

f(z)dz

Vol(B(x, x+ rny))= f(x).

Therefore, since nrdn → 0, we have

limn→∞

np(x, x+ rny) = 0. (3.6.5)

Thus, it is easy to show that

limn→∞

(1− p(x, x+ rny))n−k−1 = 1,

and using (3.6.3) and (3.6.4) yields

limn→∞

r−dkn E {grn(Y ,Xn)} = lim

n→∞r−dkn E {hrn(Y)} = (k + 1)!µk.

Finally, the definition of Pn as a Poisson process with intensity nf(x) implies

E {grn(Y ,Y ∪ Pn) | Y} = hrn(Y)P (B(Y) ∩ Pn = ∅ | Y) = hrn(Y)e−np(Y).

E {grn(Y ,Y ∪ Pn)} = E {E {grn(Y ,Y ∪ Pn) | Y}}

(Rd)k+1

f(x)hrn(x)e−np(x)dx

= rdkn

(Rd)kf(x+ rny)h1(0,y)e

−np(x,x+rny)dydx,

Applying the DCT as before, and using (3.6.3) and (3.6.5), yields

limn→∞

r−dkn E {grn(Y ,Pn)} = lim

n→∞r−dkn E {hrn(Y)} = (k + 1)!µk,

and we are done.

Using the previous lemma, it is now easy to prove Theorem 3.3.1.

Proof of Theorem 3.3.1. First, note that

Nk,n =∑

Y⊂Xn

grn(Y ,Xn),

where the sum is over all the subsets of size k + 1. Therefore,

E {Nk} =∑

Y⊂Xn

E {grn(Y ,Xn)} =

)E {grn(Xk+1,Xn)} .

Using the fact n−(k+1)(

)→ 1

(k+1)!together with Lemma 3.6.2, yields

limn→∞

(nk+1rdkn )−1E {Nk} = µk,

as required. As for the Poisson case, note first that Nk,n =∑

Y⊂Pngrn(Y ,Pn). Applying

Theorem 3.A.1 therefore yields that

(k + 1)!E {grn(Y ′,Y ′ ∪ Pn)} ,

where Y ′ is a copy of Y independent of Pn. Lemma 3.6.2 then implies

limn→∞

(nk+1rdkn )−1E

}= µk,

as required.

3.6.3 Variances and Limit Distributions for the Subcritical

The proofs of Theorems 3.3.2 and 3.3.3 split into three different cases, depending on the

limit of nk+1rdkn .

Case 1: nk+1rdkn → 0

We start with the limit variance for this case.

Proof of Theorem 3.3.2.

Y1⊂Xn

Y2⊂Xn

grn(Y1,Xn)grn(Y2,Xn)

=k+1∑

Y1⊂Xn

Y2⊂Xn

grn(Y1,Xn)grn(Y2,Xn)1 {|Y1 ∩ Y2| = j}}

k+1∑

E {Ij} .

3.6. PROOFS 75

Note that

Ik+1 =∑

Y1⊂Xn

grn(Y1,Xn) = Nk,n.

Thus, from Theorem 3.3.1,

limn→∞

(nk+1rdkn )−1E {Ik+1} = µk. (3.6.6)

Next, for 0 < j < k + 1, if |Y1 ∩ Y2| = j and grn(Y1,Xn)grn(Y2,Xn) = 1, then necessarily

the 2k + 2 − j points in Y1 ∪ Y2 are bounded by a ball of radius 2rn, and using Lemma

3.6.1 we have

E {Ij} =

)(n− k − 1

k + 1− j

)(k + 1

)E {grn(Y1,Xn)grn(Y2,Xn)}|Y1∩Y2|=j

≤ c⋆n2k+2−jrd(2k+1−j)n .

(nk+1rdkn )−1E {Ij} ≤ c⋆(nrdn)

k+1−j → 0. (3.6.7)

For j = 0, the sets Y1 and Y2 are independent, and since grn(Yi,Xn) ≤ hrn(Yi), we

E {grn(Y1,Xn)grn(Y2,Xn)} ≤ E {hrn(Y1)hrn(Y2)} = (E {hrn(Y1)})2 .

Therefore,

E {I0} =

)(n− k − 1

)E {grn(Y1,Xn)grn(Y2,Xn)}|Y1∩Y2|=0

≤ c⋆n2(k+1) (E {hrn(Y)})2 .

Using Lemma 3.6.2 together with the fact that nk+1rdkn → 0 yields

(nk+1rdkn )−1E {I0} ≤ c⋆nk+1rdkn

(r−dkn E {hrn(Y)}

)2 → 0. (3.6.8)

Combining (3.6.6), (3.6.7), and (3.6.8) yields

limn→∞

(nk+1rdkn )−1E{N2

}= µk.

In addition, Theorem 3.3.1 implies

(nk+1rdkn )−1(E {Nk,n})2 = nk+1rdkn((nk+1rdkn )−1

E {Nk,n})2 → 0.

Therefore, since Var (Nk,n) = EN2k,n − (ENk,n)

2, we conclude that

limn→∞

(nk+1rdkn )−1Var (Nk,n) = µk,

which gives Theorem 3.3.2 for the random sample case. The proof for the Poisson case

(i.e. for Nk,n) is similar in spirit, but technically more complicated. The main steps of

the argument follow. We start by writing

Y1⊂Pn

Y2⊂Pn

grn(Y1,Pn)grn(Y2,Pn)

k+1∑

Y1⊂Pn

Y2⊂Pn

grn(Y1,Pn)grn(Y2,Pn)1 {|Y1 ∩ Y2| = j}}

k+1∑

E{Ij}.

Again, for j = k + 1 we have

limn→∞

(nk+1rdkn )−1E{Ik+1} = lim

n→∞(nk+1rdkn )−1

E{Nk,n} = µk. (3.6.9)

For 0 ≤ j < k + 1, using Corollary 3.A.2 we have

E{Ij} = c⋆n2k+2−jE {grn(Y ′

1,Y ′12 ∪ Pn)grn(Y ′

2,Y ′12 ∪ Pn)}|Y ′

1∩Y ′2|=j .

where Y ′1,Y ′

2 are sets of k + 1 i.i.d. points in Rd with density f(x), independent of Pn,

such that |Y ′1 ∩ Y ′

2| = j, and Y ′12 = Y ′

1 ∪ Y ′2. Similar arguments to those we used in the

previous case then yield that

limn→∞

(nk+1rdkn )−1E{Ij} = 0.

Furthermore, it is also easy to see that

limn→∞

(nk+1rdkn )−1(E{Nk,n})2 = 0.

Thus, we conclude that

limn→∞

(nk+1rdkn )−1Var(Nk,n) = µk,

which completes the proof of the theorem.

Next, we wish to prove the first part of Theorem 3.3.3, i.e. that Nk,nL2→ 0.

3.6. PROOFS 77

Proof of Theorem 3.3.3 - Part 1. Clearly, it suffice to show that

limn→∞

}= lim

n→∞E{N2

k,n} = 0. (3.6.10)

However, in the previous proof, we saw that

limn→∞

(nk+1rdkn )−1E{N2

}= lim

n→∞(nk+1rdkn )−1

E{N2k,n} = µk.

Since nk+1rdkn → 0, (3.6.10) follows immediately, and we are done.

Case 2: nk+1rdkn → α ∈ (0,∞)

Proof of Theorem 3.3.2. The proof in this case is similar to the previous one, the only

difference being in how to bound the terms E {I0} and E{I0}. For that, a proof in the

spirit of Lemma 3.6.2 can be used to show that

limn→∞

r−2dkn E {grn(Y1,Xn)grn(Y2,Xn)}|Y1∩Y2|=0 = ((k + 1)!µk)

limn→∞

r−2dkn E {grn(Y1,Pn)grn(Y2,Pn)}|Y1∩Y2|=0 = ((k + 1)!µk)

Therefore,

limn→∞

(nk+1rdkn )−1E {I0}

= limn→∞

(nk+1rdkn )−1

)(n− k − 1

)E {grn(Y1,Xn)grn(Y2,Xn)}|Y1∩Y2|=0

= αµ2k,

Similarly, using Corollary 3.A.2, we have

limn→∞

(nk+1rdkn )−1E{I0}

= limn→∞

(nk+1rdkn )−1 n2k+2

((k + 1)!)2E {grn(Y1,Pn)grn(Y2,Pn)}|Y1∩Y2|=0

= αµ2k.

Finally, we also have

limn→∞

(nk+1rdkn )−1 (E {Nk,n})2 = limn→∞

(nk+1rdkn )−1(E{Nk,n}

)2= αµ2

Next, we prove the Poisson limit of Theorem 3.3.3, for which we need

Lemma 3.6.3. Denote the total variation norm by dTV. Then

1. Let Sk,n ,∑

Y⊂Xnhrn(Y), and let Z ∼ Poisson (E {Sk,n}). Then

dTV (Sk,n, Z) ≤ c⋆nrdn.

2. Let Sk,n ,∑

Y⊂Pnhrn(Y), and Z ∼ Poisson

(E{Sk,n}

). Then

(Sk,n, Z

)≤ c⋆nrdn.

Proof. The proof is very similar to the proof of Theorem 3.4 in [39], and uses the Poisson

approximation given in Theorem 3.B.2.

Part 1: Let In = {i ⊂ {1, 2, . . . , n} : |i| = k + 1}. Then, for i = {i0, . . . , ik}, and Xi =

{Xi0 , . . . , Xik}, we can write

Sk,n =∑

i∈Inhrn(Xi).

Set Ni = {j ∈ In : |i ∩ j| > 0}, and let ∼ be a relation on In such that i ∼ j if and only

if j ∈ Ni. For i 6= j, Xi and Xj are independent unless j ∈ Ni. Thus, the graph (In,∼) is

the dependency graph for ξi , hrn(Xi).

Now, if hrn(Xi) 6= 0 then the k + 1 points in Xi are bounded by a ball of radius rn,

and using Lemma 3.6.1 we have

pi , E {ξi} ≤ c⋆rdkn .

Therefore,

i∈In

j∈Ni

pipj ≤(

)−(n− k − 1

))c⋆r2dkn

≤ c⋆n2k+1r2dkn

= c⋆nk+1rdkn (nrdn)k

≤ c⋆nk+1rdkn (nrdn),

where the last inequality uses the facts that nrdn → 0 and k ≥ 1.

3.6. PROOFS 79

Next, if i ∼ j with |i ∩ j| = l > 0, and hrn(Xi)hrn(Xj) 6= 0, then necessarily the

2k + 2− l points in Xi ∪ Xj are bounded by a ball of radius 2rn, and therefore,

pi,j , E {ξiξj} ≤ c⋆rd(2k+1−l)n .

i∈In

j∈Ni\{i}pi,j ≤

)(n− k − 1

k + 1− l

)(k + 1

)c⋆rd(2k+1−l)

≤ c⋆k∑

n2k+2−lrd(2k+1−l)n

≤ c⋆nk+1rdkn (nrdn).

Finally, using Lemma 3.6.2 it is easy to prove that

limn→∞

(nk+1rdkn )−1E {Sk,n} = µk,

which implies that

E {Sk,n}≤ c⋆(nk+1rdkn )−1.

Therefore, from Theorem 3.B.2 we conclude that

dTV (Sk,n, Z) ≤ c⋆(nrdn).

Part 2: The proof here relies on the preceding one, albeit with additional technicalities.

We start by conditioning on |Pn|, the number of points in Pn.

∣∣∣P(Sk,n ∈ A

)− P

(Z ∈ A

) ∣∣∣

=∣∣∣

∞∑

(Sk,n ∈ A | |Pn| = m

)− P

(Z ∈ A

))P (|Pn| = m)

∣∣∣

≤∞∑

∣∣∣P(Sk,n ∈ A | |Pn| = m

)− P

(Z ∈ A

) ∣∣∣P (|Pn| = m) .

(3.6.11)

Given |Pn| = m, using the notation in the proof Lemma 3.6.3, we can write

Sk,n =∑

i∈Imhrn(Xi).

Setting ξi = hrn(Xi), pi = E {hrn(ξi)} and pi,j = E {ξiξj}, then, as in the proof of Lemma

3.6.3, it is easy to show that

i∈Im

j∈Ni

pipj ≤ c⋆m2k+1r2dkn ,

i∈Im

j∈Ni\ipi,j ≤ c⋆

m2k+2−lrd(2k+1−l)n ,

E{Sk,n}≤ c⋆(nk+1rdkn )−1.

Therefore, from Theorem 3.B.2, we can conclude that

∣∣∣P(Sk,n ∈ A | |Pn| = m

)− P

(Z ∈ A

)∣∣∣ ≤ c⋆n−(k+1)k∑

m2k+2−lrd(k+1−l)n .

Substituting back into (3.6.11), we have

(Sk,n, Z

)≤ c⋆n−k+1

rd(k+1−l)n E

{|Pn|2k+2−l

Since |Pn| ∼ Poisson (n), it is easy to find a constant c⋆ such that

{|Pn|2k+2−l

}≤ c⋆n2k+2−l,

for every 1 ≤ l ≤ k. So, finally, we have that

(Sk,n, Z

)≤ c⋆

nk+1−lrd(k+1−l)n ≤ c⋆nrdn,

since nrdn → 0 and so is bounded.

Note that the previous result did not use on the assumption that nk+1rdkn → α ∈(0,∞). However, to prove an analogous result for Nk,n rather than Sk,n we shall need it.

We shall also need the following two lemmas.

Lemma 3.6.4. Let X, Y be integer random variables defined over the same probability

space, such that ∆ , X − Y ≥ 0. Then dTV (X, Y ) ≤ E {∆} .

3.6. PROOFS 81

Proof. For every A ∈ B(R) (the Borel sets of R),

|P (X ∈ A)− P (Y ∈ A)| = |P (X ∈ A,X 6= Y )− P (Y ∈ A,X 6= Y )|

= |P (X 6= Y ) (P (X ∈ A | X 6= Y )− P (Y ∈ A | X 6= Y ))|

≤ P (X 6= Y )

= P (∆ ≥ 1)

≤ E {∆} .

dTV (X, Y ) = supA∈B(R)

|P (X ∈ A)− P (Y ∈ A)| ≤ E {∆}

and we are done

Lemma 3.6.5. Let X ∼ Poisson (λx) , Y ∼ Poisson (λy). Then dTV (X, Y ) ≤ |λx − λy|.

Proof. Assume that λx ≥ λy. Let ∆ ∼ Poisson (λx − λy) be independent of Y , and

define X , Y + ∆. Then X ∼ Poisson (λx), and so dTV (X, Y ) = dTV(X, Y ). Since

∆ = X − Y ≥ 0, it follows from Lemma 3.6.4 that

dTV(X, Y ) ≤ E {∆} = λx − λy,

and we are done.

Proof of Theorem 3.3.3 - Part 2. For a start, we need to prove that dTV (Nk,n, Sk,n) ≤c⋆nrdn. To this end, define ∆ , Sk,n − Nk,n and note that ∆ counts the number of

subsets Y ⊂ Xn for which hrn(Y) = 1 but grn(Y ,Xn) = 0. This implies that there exists

X ∈ Xn\Y for which X ∈ B(Y). Thus ∆ is bounded from above by k + 2 times the

number of (k+2)-subsets contained in a ball of radius rn. From Lemma 3.6.1 and Lemma

3.6.4 we have

dTV (Nk,n, Sk,n) ≤ E {∆} ≤ c⋆(

)rd(k+1)n ≤ c⋆(nk+1rdkn )(nrdn) ≤ c⋆(nrdn),

where we used the fact that nk+1rdkn is bounded.

Next, if ZN ∼ Poisson (E {Nk,n}) and ZS ∼ Poisson (E {Sk,n}) , then from Part 1 of

Lemma 3.6.3 and the triangle inequality,

dTV (Nk,n, ZN) ≤ dTV (Nk,n, Sk,n) + dTV (Sk,n, ZS) + dTV (ZS, ZN)

≤ c⋆(nrdn) + dTV (ZS, ZN) .

Finally, Lemma 3.6.5 implies that

dTV (ZS, ZN) ≤ |E {Sk,n} − E {Nk,n}| = |E {∆}| ≤ c⋆(nrdn).

Thus, we conclude that,

dTV (Nk,n, ZN) ≤ c⋆(nrdn) → 0.

From Theorem 3.3.1, since nk+1rdkn → α, we have that E {Nk,n} → αµk. Using the fact

that ZN ∼ Poisson (E {Nk,n}), it is easy to see that dTV (Nk,n,Poisson (αµk)) → 0 which

implies convergence in distribution.

The proof for the Poisson case (i.e. Nk,n) is exactly the same, other than using Part

2 of Lemma 3.6.3) rather than Part 1..

Case 3: nk+1rdkn → ∞

This is the most complicated case. We start by proving Theorems 3.3.2 and 3.3.3 (variance

and CLT) for the Poisson case. Then, using “De-Poissonization” (Appendix 3.C) we treat

the random sample case.

CLT for the Poisson Case

Proof of Theorem 3.3.2 - Part 3 (Nk,n only). We start with the second moment of Nk,n,

E{N2k,n} = E

Y1⊂Pn

Y2⊂Pn

grn(Y1,Pn)grn(Y2,Pn)

k+1∑

Y1⊂Pn

Y2⊂Pn

grn(Y1,Pn)grn(Y2,Pn)1 {|Y1 ∩ Y2| = j}}

k+1∑

E{Ij}.

As in the proof of Theorem 3.3.2 for the previous cases, we have that

limn→∞

(nk+1rdkn )−1E{Ij} = 0, 1 ≤ j ≤ k,

limn→∞

(nk+1rdkn )−1E{Ik+1} = µk.

3.6. PROOFS 83

However, in this case, I0 requires a different treatment. Recall that our interest is in the

variance - Var(Nk,n). So we have,

Var(N2k,n) = E{N2

k,n} −(E{Nk,n}

= E{Ik+1}+k∑

E{Ij}+(E{I0} −

(E{Nk,n}

Thus, to complete the proof, we need to show that

limn→∞

(nk+1rdkn )−1

(E{I0} −

(E{Nk,n}

)2)= 0.

Applying Corollary 3.A.2 we have

E{I0} =

(k + 1)!

E {grn(Y ′1,Y ′

12 ∪ Pn)grn(Y ′2,Y ′

12 ∪ Pn)}Y ′1∩Y ′

2=∅ ,

where Y ′1 and Y ′

2 are sets of i.i.d. points with density f , independent of Pn, and Y ′12 =

Y ′1 ∪ Y ′

2. Similarly, applying Theorem 3.A.1, we have

E{Nk,n} =nk+1

(k + 1)!E {grn(Y ′

1,Y ′1 ∪ Pn)} .

Therefore, we can write

(E{Nk,n}

(k + 1)!

E {grn(Y ′1,Y ′

1 ∪ Pn)grn(Y ′2,Y ′

2 ∪ P ′n)} ,

where P ′n is an independent copy of Pn. Set

∆ , grn(Y ′1,Y ′

12 ∪ Pn)grn(Y ′2,Y ′

12 ∪ Pn)− grn(Y ′1,Y ′

1 ∪ Pn)grn(Y ′2,Y ′

2 ∪ P ′n).

Showing that nk+1r−dkn E {∆} → 0 will complete the proof. Set

∆1 = ∆ · 1 {B(Y ′1) ∩ B(Y ′

2) 6= ∅} , ∆2 = ∆ · 1 {B(Y ′1) ∩ B(Y ′

2) = ∅} .

If ∆1 6= 0 then all the elements in Y ′1 and Y ′

2 are bounded by a ball of radius 2rn.

Therefore, using Lemma 3.6.1

E {∆1} ≤ c⋆rd(2k+1)n .

Next, note that

∆2 = hrn(Y ′1)hrn(Y ′

2)1 {B(Y ′1) ∩B(Y ′

2) = ∅}

×(1 {Pn ∩ B(Y ′

1) = ∅}1 {Pn ∩B(Y ′2) = ∅}

− 1 {Pn ∩B(Y ′1) = ∅}1 {P ′

n ∩ B(Y ′2) = ∅}

If ∆2 6= 0, then B(Y ′1) and B(Y ′

2) are disjoint. Therefore, given Y ′1 and Y ′

2, the set

Pn∩B(Y ′2) is independent of the set Pn∩B(Y ′

1) (by the spatial independence of the Poisson

process), and has the same distribution as P ′n ∩B(Y ′

2). Thus, E {∆2 | Y ′1,Y ′

2} = 0, which

implies that E {∆2} = 0.

To conclude, E {∆} ≤ c⋆rd(2k+1)n . Therefore,

limn→∞

nk+1r−dkn E {∆} ≤ lim

n→∞c⋆(nrdn)

k+1 = 0.

This completes the proof for the limit variance.

Next, we wish to prove the CLT in Theorem 3.3.3.

Proof of Theorem 3.3.3 - Part 3 (Nk,n only). The proof is based on the normal approxi-

mation for sums of dependent variables given by Stein’s method (Appendix 3.B). We start

by counting only critical points located in a compact A ⊂ Rd for which

∫Af(x)dx > 0.

For a fixed n, let {Qi,n}i∈N be a partition of Rd into cubes of side rn, and let IA ⊂ N be

the (finite) set of indexes i for which Qi,n ∩A 6= ∅. For i ∈ IA, set

g(i)rn (Y ,Pn) , grn(Y ,Pn)1A∩Qi,n(C(Y)), (3.6.12)

where C(Y) is the critical point in Rd generated by Y (cf. (3.2.3)). That is, g

(i)rn = 1 if

and only if Y generates a critical point located in A ∩Qi,n. Then

N(i)k,n ,

Y⊂Pn

g(i)rn (Y ,Pn),

is the number of critical points inside A ∩Qi,n, and

NAk,n , # {critical points of dPn inside A} =

i∈IAN

(i)k,n.

First, as in the proof of Theorem 3.3.2, one can show that

µk(A) , limn→∞

(nk+1rdkn )−1Var(NA

)∈ (0,∞) (3.6.13)

Now, for i, j ∈ IA, define the relation i ∼ j if the distance between Qi,n and Qj,n is less

than 2rn. Then (IA,∼) is the dependency graph (cf. (3.B.1)) for the set{N

(i)k,n

}i∈IA

. This

follows from the fact that a critical point located inside Qi,n is generated by points of Pn

3.6. PROOFS 85

that are within distance rn from Qi,n (along with the spatial independence of Pn). The

degree of this graph is bounded by 5d. Consider the normalized random variables

ξi ,N

(i)k,n − E

(i)k,n

))1/2 .

According to Theorem 3.B.3, in order to prove a CLT for NAk,n, all we have to do now is

to find bounds for E {|ξi|p} , p = 3, 4 .

Let Brn(Qi,n) ⊂ Rd be the set of points within distance rn of Qi,n, and let Zi ,

|Pn ∩ Brn(Qi,n)| be the number points of the Poisson process Pn lying inside Brn(Qi,n).

Then Zi ∼ Poisson (λi) where λi =∫Brn (Qi,n)

nf(x)dx ≤ nfmax(3rn)d. Thus, Zi is stochas-

tically dominated by a Poisson random variable with parameter c⋆nrdn. Now,

N(i)k,n ≤

)≤ c⋆Zk+1

Therefore, for any p ≥ 1,

{∣∣∣N (i)k,n

∣∣∣p}

≤ c⋆E{Z

p(k+1)i

}≤ c⋆(nrdn)

p(k+1) ≤ c⋆(nrdn)k+1,

since nrdn is bounded (note that each of the c⋆’s stands for a different value). Thus, it is

easy to show that also

{∣∣∣N (i)k,n − E

(i)k,n

}∣∣∣p}

≤ c⋆(nrdn)k+1.

Since A is compact, there exists a constant v such that |IA| ≤ vr−dn . Therefore, for

p = 3, 4,

i∈IAE {|ξi|p} ≤ vr−d

n c⋆(nrdn)k+1

))p/2 = vc⋆(nk+1rdkn )1−p/2

(nk+1rdkn )

Var(NA

→ 0,

where we used the fact that nk+1rdkn → ∞ and the limit in Theorem 3.3.2. From Theorem

3.B.3, we conclude that

NAk,n − E

))1/2L−→ N (0, 1). (3.6.14)

Now that we have a CLT for NAk,n, we need to extend it to one for Nk,n. The method we

shall use is exactly the same as the one used in [39], but, for completeness, we nevertheless

include it.

Set AM = [−M,M ]d, AM = Rd\AM , and suppose that M is large enough such that

f(z)dz > 0. Set

ζn(A) =NA

k,n − E

(nk+1rdkn )1/2ζn =

Nk,n − E

(nk+1rdkn )1/2

To complete the proof we need to show that∣∣P (ζn ≤ t)− Φ(t/

√µk)∣∣ → 0, where Φ(·)

is the standard normal distribution function. Clearly, ζn = ζn(AM) + ζn(AM), and from

(3.6.14) we have that

ζn(AM)L−→ N (0, µk(AM)). (3.6.15)

For every t ∈ R and M, δ > 0 we have

|P (ζn ≤ t)− Φ(t/√µk)| ≤ |P (ζn ≤ t)− P (ζn(AM) ≤ t− δ)|

+∣∣∣P (ζn(AM) ≤ t− δ)− Φ((t− δ)/

√µk(AM))

∣∣∣

+∣∣∣Φ((t− δ)/

√µk(AM)

)− Φ (t/

√µk)∣∣∣ . (3.6.16)

P (ζn ≤ t) = P (ζn(AM ) ≤ t− δ, ζn ≤ t) + P (|ζn(AM)− t| < δ, ζn ≤ t)

+ P (ζn(AM) ≥ t+ δ, ζn ≤ t) .

Note that the first term equals

P (ζn(AM) ≤ t− δ)− P (ζn(AM) ≤ t− δ, ζn > t) .

|P (ζn ≤ t)− P (ζn(AM) ≤ t− δ)| ≤ P (ζn(AM) ≤ t− δ, ζn > t)

+ P (|ζn(AM)− t| < δ, ζn ≤ t) + P (ζn(AM) ≥ t+ δ, ζn ≤ t)

≤ P(∣∣ζn(AM)

∣∣ > δ)+ P (|ζn(AM)− t| < δ) .

From Chebyshev’s inequality we have that P(∣∣ζn(AM)

∣∣ > δ)≤ δ−2Var

(ζn(A

M )). From

(3.6.15), we have that

limn→∞

P (|ζn(AM)− t| < δ) = Φ((t + δ)/√µk(AM))− Φ((t− δ)/

√µk(AM))

≤ 2δ√2πµk(AM)

3.6. PROOFS 87

Therefore,

lim supn→∞

|P (ζn ≤ t)− P (ζn(AM ) ≤ t− δ)| ≤ µk(AM)

2δ√2πµk(AM)

For ǫ > 0, choose δ = ǫ√πµk/4. Since limM→∞ µk(AM) = µk, and limM→∞ µk(A

0, there exists M large enough such that µk(AM) ≥ µk/2, µk(AM) ≤ ǫδ2/2, and also∣∣∣Φ

((t− δ)/

√µk(AM)

)− Φ

(t/√µk

)∣∣∣ < 2ǫ. For this choice of δ,M , using last displayed

inequality, we have

lim supn→∞

|P (ζn ≤ t)− P (ζn(AM) ≤ t− δ)| ≤ ǫ.

Finally, returning to (3.6.16), there exists N > 0 such that for every n > N

|P (ζn ≤ t)− Φ(t/√µk)| < 4ǫ.

CLT for the Random Sample Case

We shall now return from the Poisson case to the random sample one. Our argument will

be based on the De-Poissonization of Theorem 3.C.1.

Proof of Theorems 3.3.2 and 3.3.3 (Nk,n). Let Dm,n denote the increment:

Dm,n =∑

Y⊂Xm+1

grn(Y ,Xm+1)−∑

Y⊂Xm

grn(Y ,Xm).

In other words, Dm,n is the change in the number of critical points, as we add a new point

to our fixed-size set. Let γ be an arbitrary number in (1/2, 1). We wish to apply Theorem

3.C.1, with Hn(Pn) = (nrdn)−k/2Nk,n and α = 0. Thus, we need to prove the following:

limn→∞

supn−nγ≤m≤n+nγ

∣∣(nrdn)−k/2E {Dm,n}

∣∣ = 0, (3.6.17)

limn→∞

supn−nγ≤m<m′≤n+nγ

∣∣(nrdn)−kE {Dm,nDm′,n}

∣∣ = 0, (3.6.18)

limn→∞

n−1/2(nrdn)−kE{D2

}= 0. (3.6.19)

Considering only the cases where grn(Y ,Xm) 6= grn(Y ,Xm+1), we can write

Dm,n = D+m,n −D−

D+m,n = # {Y ⊂ Xm+1 : grn(Y ,Xm+1) = 1 and Xm+1 ∈ Y} ,

D−m,n = # {Y ⊂ Xm : grn(Y ,Xm) = 1 and Xm+1 ∈ B(Y)} .

In other words, D+m,n counts the critical points added when we move from Xm to Xm+1,

and D−m,n counts those who disappear. We now prove (3.6.17)–(3.6.19), starting with

(3.6.17). Note that

|E {Dm,n}| ≤ E{D+

We shall show that the supremum over each of the terms goes to zero. From the definition

of D+m,n we have that

)E {grn(Y ,Xm+1)} ≤

(n+ nγ

)E {hrn(Y)} .

where we define(xk

),(⌊x⌋

)if x is non-integer. Thus, using Lemma 3.6.2,

limn→∞

(nrdn)−k/2

≤ limn→∞

(nrdn)k/2

(n−k

(n + nγ

))(r−dkn E {hrn(Y)}

From the definition of D−m,n we have

E{D−

)E{grn(Y ,Xm)1B(Y)(Xm+1)

≤(n + nγ

)E{hrn(Y)1B(Y)(Xm+1)

≤ c⋆nk+1E{hrn(Y)1B(Y)(Xm+1)

for some constant c⋆. Now,

E{1B(Y)(Xm+1) | Y

f(x)dx ≤ fmaxωdrdn,

which implies that E{hrn(Y)1B(Y)(Xm+1)

}≤ c⋆rdnE {hrn(Y)}, and so

limn→∞

n−nγ≤m≤n+nγ(nrdn)

−k/2E{D−

≤ limn→∞

c⋆(nrdn)k/2+1

(r−dkn E {hrn(Y)}

3.6. PROOFS 89

This proves (3.6.17). To prove (3.6.18) we need to show that

limn→∞

n−nγ≤m<m′≤n+nγ

∣∣(nrdn)−kE {Dm,nDm′,n}

∣∣)

Recall that Dm,n = D+m,n−D−

m,n. Thus we can write Dm,nDm′,n as a sum of four different

product terms. We start by looking at the term D+m,nD

+m′,n. Recalling the definition of

D+m,n, we can write this as

D+m,n =

Y⊂Xm+1Xm+1∈Y

grn(Y ,Xm+1) ≤∑

Y⊂Xm+1Xm+1∈Y

hrn(Y).

m,nD+m′,n

Y⊂Xm+1Xm+1∈Y

Y ′⊂Xm′+1

Xm′+1∈Y ′

E {hrn(Y)hrn(Y ′)} . (3.6.20)

Now, if |Y ∩ Y ′| = j > 0, and hrn(Y)hrn(Y ′) = 1, then Y ∪Y ′ must be bounded by a ball

of radius 2rn. This set contains 2k + 2− j points, so that, by Lemma 3.6.1, we have

E {hrn(Y)hrn(Y ′)} ≤ c⋆rd(2k+1−j)n .

If Y ∩ Y ′ = ∅, then the two sets are disjoint and independent. Each consists of k + 1

points and must be bounded by a ball of radius rn. Therefore,

E {hrn(Y)hrn(Y ′)} = (E {hrn(Y)})2 ≤ c⋆r2dkn .

Applying these bounds to (3.6.20) yields

m,nD+m′,n

}≤ c⋆

)((m′ − k − 1

)r2dkn +

(m′ − k − 1

k − j

)(k + 1

)rd(2k+1−j)n

≤ c⋆(n + nγ

)((n + nγ

)r2dkn +

(n+ nγ

k − j

)rd(2k+1−j)n

≤ c⋆

(n2kr2dkn +

n2k−jrd(2k+1−j)n

where we emphasize that each of the appearances of c⋆ represents a different value. Mul-

tiplying by (nrdn)−k and taking the limit, we obtain

limn→∞

(nrdn)−kE{D+

m,nD+m′,n

To handle D−m,n recall its definition and write

D−m,n =

Y⊂Xm

grn(Y ,Xm)1B(Y)(Xm+1)

≤∑

Y⊂Xm

hrn(Y)1B(Y)(Xm+1).

E{D−

m,nD−m′,n

}(3.6.21)

≤∑

Y⊂Xm

Y ′⊂Xm′

E{hrn(Y)hrn(Y ′)1B(Y)(Xm+1)1B(Y ′)(Xm′+1)

If Xm+1 ∈ Y ′, and |Y ∩ Y ′| = j ≥ 0, then Y ∪ Y ′ ∪ {Xm+1, Xm′+1} consists of 2k + 3− j

points, and for the expression inside the expectation to be nonzero all the points must be

contained in a ball of radius 2rn. Thus, by Lemma 3.6.1,

}≤ c⋆rd(2k+2−j)

If Xm+1 6∈ Y ′, and |Y ∩ Y ′| = j > 0, then the set Y ∪ Y ′ ∪ {Xm+1, Xm′+1} consists of

2k + 4− j points, and therefore,

}≤ c⋆rd(2k+3−j)

n ≤ c⋆rd(2k+2−j)n .

If j = 0, however, then the sets Y∪{Xm+1} and Y ′∪{Xm′+1} are disjoint and independent,

each containing k + 2 points. In addition, we need each of this sets to be contained in a

ball of radius rn. Therefore,

}≤ c⋆rd(2k+2)

3.6. PROOFS 91

Substituting the above into (3.6.21) we have

E{D−

m,nD−m′,n

}≤ c⋆

)(m′ − k − 2

k − j

)(k + 1

)rd(2k+2−j)n

+ c⋆k+1∑

)(m′ − k − 2

k + 1− j

)(k + 1

)rd(2k+2−j)n

≤ c⋆k∑

(n+ nγ

)(n + nγ

k − j

)rd(2k+2−j)n

+ c⋆k+1∑

(n+ nγ

)(n+ nγ

k + 1− j

)rd(2k+2−j)n

≤ c⋆k∑

n2k+1−jrd(2k+2−j)n + c⋆

k+1∑

n2k+2−jrd(2k+2−j)n .

From the above we can conclude that

limn→∞

(nrdn)−kE{D−

m,nD−m′,n

})= 0.

We shall stop with the computations here. The convergence of the cross-products (i.e.

D+m,nD

−m′,n and D−

m,nD+m′,n) can be shown using similar techniques, and these will prove

(3.6.18). The proof of (3.6.19) is also very similar.

Finally, the last condition in Theorem 3.C.1 requires that

Hn(Xm) = (nrdn)−k/2

Y⊂Xm

grn(Y ,Xm) ≤ β(n+m)β .

for some β > 0. Using that facts that∑

Y⊂Xmgrn(Y ,Xm) ≤

), nrdn → 0, and

nk+1rdkn → ∞, we have

Hn(Xm) ≤ c⋆(nrdn)−k/2mk+1 ≤ c⋆mk+1n(nrdn)

k/2(nk+1rdkn )−1 ≤ c⋆(n+m)k+2.

Thus, taking β = max(c⋆, k + 2) completes the De-Poissonization proof. Consequently,

we have that both Theorem 3.3.2 and Theorem 3.3.3 hold for the random sample case as

3.6.4 The Critical and Supercritical Ranges (nrdn → λ ∈ (0,∞])

We start with the expectation computations. The following standard lemma is going to

play a key role in the supercritical regime.

Lemma 3.6.6. Let D ⊂ Rd be a compact convex set with positive Lebesgue measure, and

let Br(x) ⊂ Rd be the ball of radius r around x. Then there exists a constant c⋆ such that

for every r < diam(D) and x ∈ D,

Vol(Br(x) ∩D) ≥ c⋆rd.

The following Lemma is analogous to Lemma 3.6.2.

Lemma 3.6.7. Let Y ⊂ Xn, be a set of k+1 random variables from Xn, and assume that

Y is independent of the Poisson process Pn. Then,

limn→∞

nkE {grn(Y ,Xn)} = lim

n→∞nk

E {grn(Y ,Y ∪ Pn)} = (k + 1)!γk(λ).

Proof. We shall show the full proof for the Poisson case (grn(Y ,Y ∪ Pn)). The proof for

the random sample case is similar. Setting sn = n−1/d and mimicking the proof of Lemma

3.6.2 we obtain

E {grn(Y ,Y ∪ Pn)} =

(Rd)k+1

f(x)hrn(x)e−np(x)dx

= sdkn

(Rd)kf(x)f(x+ sny)hrn(x, x+ sny)e

−np(x,x+sny)dydx

= n−k

(Rd)kf(x+ sny)hτn(0,y)e

−np(x,x+sny)dydx, (3.6.22)

where τn = rn/sn = n1/drn. We wish to apply the dominated convergence theorem for

the last integral. Thus, we need to bound the integrand with an integrable expression.

In the critical range this is done much as in the subcritical range. Since nrdn → λ < ∞,

we have that τn is bounded by some value M . Now, for hτn(0,y) to be nonzero, all the

elements y1, . . . , yk ∈ Rd must lie inside B2τn(0) ⊂ B2M (0). Therefore,

∣∣f(x+ sny)hτn(0,y)e−np(x,x+sny)

∣∣ ≤ fkmax1B2M (0)(y1) · · ·1B2M (0)(yk),

and this expression is integrable.

The last argument cannot be applied in the supercritical range since τn is no longer

bounded. This is where we use our additional, lower bounded, assumptions on the f .

Since we now have fmin > 0 we also have

p(x) =

f(z)dz ≥ fminVol(B(x) ∩ supp(f)).

3.6. PROOFS 93

If hrn(x) 6= 0, then necessarily C(x) ∈ conv◦(x) and R(x) ≤ rn (cf. (3.2.7)). In addition,

if f(x) 6= 0, then x ⊂ supp(f). Since we assume that supp(f) is convex, we have that

C(x) ∈ supp(f) as well. Thus, B(x) is a ball centered at C(x) ∈ supp(f), with radius

R(x) small enough, and Lemma 3.6.6 yields

Vol(B(x) ∩ supp(f)) ≥ c⋆Rd(x).

This can be used to bound the integrand in (3.6.22), so that

∣∣f(x+ sny)hτn(0,y)e−np(x,x+sny)

∣∣ ≤ fkmaxe

−nfminc⋆Rd(x,x+sny)

= fkmaxe

−fminc⋆Rd(0,y).

(3.6.23)

Next, note that for i = 1, . . . , k, R(0,y) ≥ ‖yi‖ /2. Thus,

Rd(0,y) ≥ 1

‖yj‖d,

which implies that the expression in (3.6.23) is indeed integrable, and so the DCT can be

safely applied in both regimes.

Next, we compute the limit of the integral in (3.6.22). Note first that

np(x, x+ sny) = n

B(x,x+sny)

f(z)dz

= nVol(B(x, x+ sny))

∫B(x,x+sny)

f(z)dz

Vol(B(x, x+ sny))

= nωd(snR(0,y))d

∫B(x,x+sny)

f(z)dz

Vol(B(x, x+ sny)).

= ωdRd(0,y)

∫B(x,x+sny)

f(z)dz

Vol(B(x, x+ sny)),

and using the Lebesgue differentiation theorem yields

limn→∞

np(x, x+ sny) = ωdRd(0,y)f(x).

Taking the limit of all the other terms in (3.6.22) we have

limn→∞

nkE {grn(Y ,Y ∪ Pn)} =

(Rd)k+1

fk+1(x)hτ∞(0,y)e−ωdRd(0,y)f(x)dydx,

where τ∞ = limn→∞ τn. In the supercritical regime, τ∞ = ∞, and consequently hτ∞ =

h∞ ≡ h. Thus,

limn→∞

(Rd)k+1

fk+1(x)h(0,y)e−ωdRd(0,y)f(x)dydx,

and using the change of variables yi → f−1/d(x)yi (where f(x) > 0) we have

limn→∞

(Rd)kh(0, y)e−ωdR

d(0,y) dy = (k + 1)!γk(∞).

In the critical range, τn → λ1/d. Therefore,

limn→∞

(Rd)k+1

fk+1(x)hλ1/d(0,y)e−ωdRd(0,y)f(x)dydx

(Rd)k+1

fk+1(x)hλ1/d(0, λ1/dz)e−λωdRd(0,z)f(x)dzdx = (k + 1)!γk(λ).

3.6.5 Asymptotic Means

Using Lemma 3.6.7 we can prove Theorem 3.3.5.

Proof of Theorem 3.3.5. For the random sample case we have

E {Nk,n} =

)E {grn(Y),Xn)} ,

and, using Lemma 3.6.7,

limn→∞

n−1E {Nk,n} = lim

n→∞

(n−(k+1)

E {grn(Y ,Xn)})= γk(λ).

For the Poisson case, using Theorem 3.A.1,

E{Nk,n} =nk+1

(k + 1)!E {grn(Y ′,Y ′ ∪ Pn)} ,

and, using Lemma 3.6.7,

limn→∞

n−1E{Nk,n} = γk(λ),

which completes the proof.

3.6.6 Asymptotic Variance - Poisson Case

For the variance and CLT results, as in the subcritical phase, we shall first treat the

Poisson case. Then, using De-Poissonization we shall turn to the random sample case.

3.6. PROOFS 95

Proof of Theorem 3.3.6, (Nk,n only). As in the proof of Theorem 3.3.2,

Var(N2k,n) = E{Nk,n}+

E{Ij}+(E{I0} −

(E{Nk,n}

Ij =∑

Y1⊂Pn

Y2⊂Pn

grn(Y1,Pn)grn(Y2,Pn)1 {|Y1 ∩ Y2| = j}.

From Corollary 3.A.2,

E{Ij} =n2k+2−j

j!((k + 1− j)!)2E {grn(Y ′

1,Y ′12 ∪ Pn)grn(Y ′

2,Y ′12 ∪ Pn)}|Y ′

1∩Y ′2|=j .

where Y ′1,Y ′

2 are sets of k + 1 i.i.d. points in Rd with density f(x), independent of Pn,

such that |Y ′1 ∩ Y ′

2| = j, and Y ′12 = Y ′

1 ∪Y ′2. For 0 < j < k + 1, as in the proof of Lemma

3.6.7, one can show that

limn→∞

nd(2k+1−j)E {grn(Y ′

1,Y ′12 ∪ Pn)grn(Y ′

2,Y ′12 ∪ Pn)}|Y ′

1∩Y ′2|=j

Rd(2k+2−j)

f 2k+2−j(x)hτ∞(0,y1 ∪ z)hτ∞(0,y2 ∪ z)

× e−Vol(B(0,y1∪z)∪B(0,y2∪z))f(x)dxdy1dy2dz,

where x ∈ Rd, yi ∈ R

d(k+1−j), z ∈ Rd(j−1), and τ∞ = limn→∞ n1/drn. Therefore,

limn→∞

n−1E{Ij} = γ

(j)k (λ),

γ(j)k (λ) ,

λ2k+1−j

j!((k + 1− j)!)2

Rd(2k+2−j)

f 2k+2−j(x)h1(0,y1 ∪ z)h1(0,y2 ∪ z)

× e−λVol(B(0,y1∪z)∪B(0,y2∪z))f(x)dxdy1dy2dz.

for λ ∈ (0,∞), and

γ(j)k (∞) ,

j!((k + 1− j)!)2

Rd(2k+2−j)

f 2k+2−j(x)h(0,y1 ∪ z)h(0,y2 ∪ z)

× e−Vol(B(0,y1∪z)∪B(0,y2∪z))f(x)dxdy1dy2dz.

It is easy to show that 0 < γjk(λ) < ∞ for λ ∈ (0,∞]. For j = 0, we define

∆ , grn(Y ′1,Y ′

12 ∪ Pn)grn(Y ′2,Y ′

12 ∪ Pn)− grn(Y ′1,Y ′

1 ∪ Pn)grn(Y ′2,Y ′

2 ∪ P ′n)

so that

E {I0} −(E{Nk,n}

((k + 1)!)2E {∆} .

Now set

∆1 = ∆ · 1 {B(Y ′1) ∩ B(Y ′

2) 6= ∅} , ∆2 = ∆ · 1 {B(Y ′1) ∩B(Y ′

2) = ∅} .

Then, as in the proof of Theorem 3.3.2, we can show that E {∆2} = 0, and

limn→∞

n2k+1E {∆1} =

Rd(2k+2)

f 2k+2(x)hτ∞(0,y1)hτ∞(0,y2)1 {B(0,y1) ∩ B(z, z + y2) 6= ∅}

×(e−Vol(B(0,y1)∪B(z,z+y2))f(x) − e−ωd(R

d(0,y1)+Rd(0,y2))f(x))dxdzdy1dy2,

where x, z ∈ Rd, and yi ∈ (Rd)k. Thus,

limn→∞

(E {I0} −

(E{Nk,n}

)2)= γ

(0)k (λ),

γ(0)k (λ) ,

λ2k+1

((k + 1)!)2

Rd(2k+2)

f 2k+2(x)h1(0,y1)h1(0,y2)1 {B(0,y1) ∩ B(z, z + y2) 6= ∅}

×(e−λVol(B(0,y1)∪B(z,z+y2))f(x) − e−λωd(R

d(0,y1)+Rd(0,y2))f(x))dxdzdy1dy2,

for λ < ∞, and

γ(0)k (∞) ,

((k + 1)!)2

Rd(2k+2)

f 2k+2(x)h(0,y1)h(0,y2)1 {B(0,y1) ∩B(z, z + y2) 6= ∅}

×(e−Vol(B(0,y1)∪B(z,z+y2))f(x) − e−ωd(R

d(0,y1)+Rd(0,y2))f(x))dxdzdy1dy2.

To conclude, we have proven that

limn→∞

n−1Var(Nk,n) = γk(λ) +

γ(j)k (λ) , σ2

k(λ) ∈ (0,∞), (3.6.24)

as required.

3.6. PROOFS 97

3.6.7 CLT - Poisson Case

Next, we prove the CLT result in Theorem 3.3.7, again using Stein’s method, as in the

proof of Theorem 3.3.3.

Proof of Theorem 3.3.7 (Nk,n only). We start again by counting only critical points lo-

cated in a compact set A ⊂ Rd, with

∫Af(x)dx > 0. We define Qi,n, N

(i)k,n, N

Ak,n, g

(i)rn , (IA,∼)

and ξi the same way as in the proof of Theorem 3.3.3. Then, as in the proof of Theorem

3.3.6, one can show that

limn→∞

n−1Var(NA

)∈ (0,∞). (3.6.25)

According to Theorem 3.B.3, in order to prove a CLT for NAk,n, we need to find bounds

for E {|ξi|p} , p = 3, 4 . We start with p = 3.

(i)k,n − E

(i)k,n

)(−1)j

(i)k,n

})3−j

(i)k,n

The computation of the bound here is similar in spirit to the ones we used in the proof of

Theorem 3.3.2, but technically more complicated, and we shall not give details. Rather,

we shall suffice with a brief description of the main ideas: Every element in the sum can

be expressed as the expectation of a triple sum of the form

Y1⊂P(1)n

Y2⊂P(2)n

Y3⊂P(3)n

g(i)rn (Y1,P(1)n )g(i)rn (Y2,P(2)

n )g(i)rn (Y3,P(3)n )

}, (3.6.26)

where each of the Poisson processes can either be equal to one of the others or an inde-

pendent copy, depending on j. As for E {∆2} in the proof of Theorem 3.3.6, we can use

Palm theory, collect all the terms in which at least one of the balls B(Yi) is disjoint from

the others, and show that they cancel each other. For each of the remaining terms, we

can show that if |Y1 ∪ Y2 ∪ Y3| = 3k + 3 − j, with 0 ≤ j ≤ 3k + 3, then the relevant

part of the sum in (3.6.26) is bounded by c⋆n3k+3−jsd(3k+2−j)n rdn = c⋆nrdn. This bound is

achieved using integral evaluations similar to the ones used in the proof of Theorem 3.3.6,

along with the fact that all the points are located within distance of rn from the cube

Qi,n. Thus, we have

(i)k,n − E

(i)k,n

})3}≤ c⋆nrdn.

Recall, that |IA| ≤ c⋆r−dn . Therefore,

i∈IAE{|ξi|3

}≤ c⋆r−d

n nrdn(Var

))3/2 =c⋆n

n3/2(n−1Var

))3/2 → 0.

The proof for p = 4 is similar, and from Theorem 3.B.3 we have that

NAk,n − E

))1/2L−→ N (0, 1).

To conclude the proof, we need to show that the CLT for NAk,n implies a CLT for Nk,n.

This is done exactly as for Part 3 of Theorem 3.3.3.

3.6.8 CLT - Random Sample Case

To complete the proof of Theorems 3.3.6 and 3.3.7, we need to show that the same limit

results apply to the random sample case as well. While we again rely on De-Poissonization,

it is worth noting that, as opposed to the subcritical range, here the limiting variances

are different in the Poisson and random sample cases. We start by defining

ηk(λ) ,λk+1

(k + 1)!

(Rd)k+2

fk+2(x)h(0,y)1B(0,y)(z)e−λωdR

d(0,y)f(x)dxdydz

ηk(∞) ,1

(k + 1)!

(Rd)k+2

fk+2(x)h(0,y)1B(0,y)(z)e−ωdR

d(0,y)f(x)dxdydz

where λ < ∞, x ∈ Rd and y ∈ (Rd)k, z ∈ R

Proof of Theorems 3.3.6 and 3.3.7 (Nk,n). Let Dm,n denote the increment:

Dm,n =∑

Y⊂Xm+1

grn(Y ,Xm+1)−∑

Y⊂Xm

grn(Y ,Xm).

Let γ be an arbitrary number in (1/2, 1). We wish to apply Theorem 3.C.1, withHn(Pn) =

Nk,n and α = αk(λ) , (k + 1)γk(λ)− ηk(λ). Thus, we need to prove:

limn→∞

|E {Dm,n} − αk(λ)| = 0 (3.6.27)

limn→∞

∣∣E {Dm,nDm′,n} − α2k(λ)

∣∣ = 0 (3.6.28)

limn→∞

n−1/2E{D2

}= 0 (3.6.29)

3.6. PROOFS 99

As in the proof of Theorem 3.3.3, write Dm,n = D+m,n −D−

m,n, where

D+m,n = # {Y ⊂ Xm+1 : grn(Y ,Xm+1) = 1 and Xm+1 ∈ Y} ,

D−m,n = # {Y ⊂ Xm : grn(Y ,Xm) = 1 and Xm+1 ∈ B(Y)} .

From the definition of D+m,n we have that

)E {grn(Y ,Xm+1)} .

(n− nγ

)E {grn(Y ,Xn+nγ)} ≤ E

}≤(n+ nγ

)E {grn(Y ,Xn−nγ)} .

As in the proof of Lemma 3.6.7, since γ ∈ (1/2, 1) it is easy to show that

limn→∞

nkE {grn(Y ,Xn±nγ)} = (k + 1)!γk(λ)

and since limn→∞ n−k(n±nγ

)= 1/k!, we have

limn→∞

n−nγ≤m≤n+nγ

∣∣E{D+

}− (k + 1)γk(λ)

∣∣)

Next, from the definition of D−m,n we have,

E{D−

)E{grn(Y ,Xm)1B(Y)(Xm+1)

Note that if X is a random variable in Rd with density f , independent of Xn, then we can

replace Xm+1 with X in the last equality. Thus, we have

E{D−

}≥(n− nγ

)E{grn(Y ,Xn+nγ)1B(Y)(X)

E{D−

}≤(n + nγ

)E{grn(Y ,Xn−nγ)1B(Y)(X)

In addition, it is easy to show that

limn→∞

nk+1E{grn(Y ,Xn±nγ)1B(Y)(X)

}= (k + 1)!ηk(λ).

limn→∞

n−nγ≤m≤n+nγ

∣∣E{D−

}− ηk(λ)

∣∣)

Finally, since |Dm,n − αk(λ)| ≤∣∣D+

m,n − (k + 1)γk(λ)∣∣+∣∣D−

m,n − ηk(λ)∣∣, we conclude that

(3.6.27) holds.

To prove (3.6.28) we need to show that,

limn→∞

∣∣E {Dm,nDm′,n} − α2k(λ)

∣∣)

Recall that Dm,n = D+m,n −D−

m,n, and so we can write the product Dm,nDm′,n as a sum of

four different products. We start with D+m,nD

+m′,n:

m,nD+m′,n

Y⊂Xm+1Xm+1∈Y

Y ′⊂Xm′+1

Xm′+1∈Y ′

E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} . (3.6.30)

Now, if |Y ∩ Y ′| = j > 0, then it is easy to show that

limn→∞

n2k−jE {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} = 0.

Therefore, the relevant part of the sum in (3.6.30) satisfies(m

)(m′ − k − 1

k − j

)(k + 1

)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)}

≥(n− nγ

)(n− nγ − k − 1

k − j

)(k + 1

)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} → 0

)(m′ − k − 1

k − j

)(k + 1

≤(n+ nγ

)(n+ nγ − k − 1

k − j

)(k + 1

)E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} → 0

If Y ∩ Y ′ = ∅, then

E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} ≥ E {grn(Y ,Xn+nγ)grn(Y ′,Xn+nγ)} ,

E {grn(Y ,Xm+1)grn(Y ′,Xm′+1)} ≤ E {grn(Y ,Xn−nγ)grn(Y ′,Xn−nγ)} ,

and it is easy to show that

limn→∞

n2kE {grn(Y ,Xn±nγ)grn(Y ′,Xn±nγ)} = ((k + 1)!γk(λ))

Therefore, the relevant part of the sum in (3.6.30) satisfies(m

)(m′ − k − 1

≥(n− nγ

)(n− nγ − k − 1

)E {grn(Y ,Xn+nγ)grn(Y ′,Xn+nγ)} → ((k + 1)γk(λ))

3.6. PROOFS 101

)(m′ − k − 1

≤(n + nγ

)(n + nγ − k − 1

)E {grn(Y ,Xn−nγ)grn(Y ′,Xn−nγ)} → ((k + 1)γk(λ))

limn→∞

∣∣E{D+

m,nD+m′,n

}− ((k + 1)γk(λ))

2∣∣)

Similarly, we can show that

limn→∞

∣∣E{D−

m,nD−m′,n

}− (ηn(λ))

2∣∣)

limn→∞

∣∣E{D−

m,nD+m′,n

}− (k + 1)γk(λ)ηn(λ)

∣∣)

limn→∞

∣∣E{D+

m,nD−m′,n

}− (k + 1)γk(λ)ηn(λ)

∣∣)

Combining all these limits together shows that (3.6.28) holds. Finally, similar computa-

tions yield (3.6.29).

For the last condition in Theorem 3.C.1, note that

Hn(Xm) ≤(

)≤ c⋆mk+1 ≤ c⋆(n+m)k+1.

Thus, taking β = max(c⋆, k+1) completes the De-Poissonization proof, and from Theorem

3.C.1 we conclude that α2k(λ) ≤ σ2

k(λ), and that

limn→∞

Var (Nk,n) = σ2k(λ)− α2

k(λ) , σ2k(λ). (3.6.31)

and thatNk,n − E {Nk,n}√

L−→ N (0, σ2k(λ))

which completes the proof of Theorem 3.3.7, as promised.

The only remaining results in Section 3.3 that still require proofs relate to the global

number of critical points - NGk,n.

Proof of Theorem 3.3.8. This theorem is proved exactly the same way as Theorems 3.3.5,

3.3.6, and 3.3.7 are proved in the super-critical phase. The only difference is that, through-

out, h(x) replaces hτn(x). This, however does not affect any of the results, since in the

limit hτn(x) → h(x).

Proof of Proposition 3.3.9. We prove the proposition for the Poisson case. The random

sample case is similar.

k,n − Nk,n

(k + 1)!n−k

(Rd)k+1

f(x)f(x+ sny)(h(0,y)− hτn(0,y))e−np(x,x+sny)dydx

(k + 1)!

(Rd)k+1

f(x)f(x+ sny)(h(0,y)− hτn(0,y))ne−np(x,x+sny)dydx. (3.6.32)

As in the proof of Theorem 3.3.5 (cf. (3.6.23)), we can show that the integrand is bounded

f(x)fkmax(h(0,y)− hτn(0,y))ne

−fminc⋆Rd(0,y). (3.6.33)

Now note that if the integrand is nonzero then h 6= hτn , and so R(0,y) > τn. Therefore,

Rd(0,y) > 1/2(Rd(0,y) + nrdn), and (3.6.33) can be replaced by

f(x)fkmax(h(0,y)− hτn(0,y))e

−fminc⋆Rd(0,y)/2ne−fminc

⋆nrdn/2. (3.6.34)

Assuming that nrdn ≥ D⋆ log n, with D⋆ = (fminc⋆/2)−1 then ne−fminc

⋆nrdn/2 ≤ 1 and we

obtain an integrable bound for the integrand. Thus, we can apply the DCT to (3.6.32).

Finally, note that the bound we found in (3.6.34) converges to zero (since hτn → h), so

we are done.

3.6.9 Euler Characteristic Results

In this section we prove Corollary 3.4.2.

Proof of Corollary 3.4.2. Recall that χn , χ(C(Xn, rn)), and χn , χ(C(Pn, rn)). Morse

theory provides an alternative way to compute the Euler characteristic via the number of

critical points. Specifically, in our case we have

χn =d∑

(−1)kNk,n, χn =d∑

(−1)kNk,n.

First note that N0 = n in the random sample case, and E{Nk,n} = n in the Poisson

case. Therefore,

E {χn} = n +

(−1)kE {Nk,n} , E {χn} = n+

(−1)kE{Nk,n

3.A. PALM THEORY FOR POISSON PROCESSES 103

The first two cases of the theorem are now obvious consequences of Theorems 3.3.1 and

3.3.5. For the third case, using Theorem 3.3.8, we have

limn→∞

n−1χn = limn→∞

(−1)kNGk,n.

However, since NGk,n counts all the critical points in R

d, Morse theory implies

(−1)kNGk,n = χ(Rd) = 1,

and we can conclude that limn→∞ n−1χn = 0.

If, in addition, rdn satisfies the conditions of Proposition 3.3.9 (i.e. nrdn ≥ D⋆ log n),

0 = limn→∞

(−1)kE{NG

k,n −Nk,n

}= 1− lim

n→∞χn,

which implies that χn → 1.

3.A Palm Theory for Poisson Processes

This appendix contains a collection of definitions and theorems which are used in the

proofs of this paper. Most of the results are cited from [39], although they may not

necessarily have originated there. However, for notational reasons we refer the reader

to [39], while other resources include [5, 42]. The following theorem is very useful when

computing expectations related to Poisson processes.

Theorem 3.A.1 (Palm theory for Poisson processes, [39, Theorem 1.6] ). Let f be a

probability density on Rd, and let Pn be a Poisson process on R

d with intensity λn = nf .

Let h(Y ,X ) be a measurable function defined for all finite subsets Y ⊂ X ⊂ Rd with

|Y| = k. Then

Y⊂Pn

h(Y ,Pn)}=

k!E {h(Y ′,Y ′ ∪ Pn)}

where Y ′ is a set of k iid points in Rd with density f , independent of Pn.

We shall also need the following corollary, which treats second moments:

Corollary 3.A.2. With the notation above, assuming |Y1| = |Y2| = k,

Y1,Y2⊂Pn

|Y1∩Y2|=j

h(Y1,Pn)h(Y2,Pn)}=

n2k−j

j!((k − j)!)2E {h(Y ′

1,Y ′12 ∪ Pn)h(Y ′

2,Y ′12 ∪ Pn)}

where Y ′12 = Y ′

1 ∪ Y ′2 is a set of 2k − j iid points in R

d with density f(x), independent of

Pn, and |Y ′1 ∩ Y ′

2| = j.

Proof. Given |Pn| = m, the sum on the LHS is finite. Therefore,

Y1,Y2⊂Pn

|Y1∩Y2|=j

h(Y1,Pn)h(Y2,Pn)∣∣∣|Pn| = m

}(3.A.1)

2k − j

)(2k − j

)E {h(Y1,Pn)h(Y2,Pn) | |Pn| = m}|Y1∩Y2|=j

Choosing now all possible subsets Y of size 2k − j, and splitting each of them into two

arbitrary subsets Y1,Y2 of size k with |Y1 ∩ Y2| = j, yields

Y⊂Pn|Y|=2k−j

h(Y1,Pn)h(Y2,Pn)∣∣∣|Pn| = m

}(3.A.2)

2k − j

)E {h(Y1,Pn)h(Y2,Pn) | |Pn| = m}|Y1∩Y2|=j .

Combining (3.A.1), (3.A.2), and Theorem 3.A.1 for subsets Y of size 2k − j yields,

Y1,Y2⊂Pn

|Y1∩Y2|=j

h(Y1,Pn)h(Y2,Pn)}

(2k − j

Y⊂Pn|Y|=2k−j

h(Y1,Pn)h(Y2,Pn)}

=n2k−j

j!((k − j)!)2E {h(Y ′

1,Y ′12 ∪ Pn)h(Y ′

2,Y ′12 ∪ Pn)} ,

where Y ′12 = Y ′

1 ∪ Y ′2 is a set of 2k − j iid points in R

d with density f(x), independent of

Pn, and |Y ′1 ∩ Y ′

2| = j.

3.B Stein’s Method

In this chapter we heavily used Stein’s method to derive limit theorems for the sums of

dependent Bernoulli variables. We need both the Poisson and normal approximations,

which are presented below.

3.C. DE-POISSONIZATION 105

Definition 3.B.1. Let (I, E) be a graph. For i, j ∈ I we denote i ∼ j if (i, j) ∈ E. Let

{ξi}i∈I be a set of random variables. We say that (I,∼) is a dependency graph for {ξi}if for every I1 ∩ I2 = ∅, with no edges between I1 and I2, the set of variables {ξi}i∈I1 is

independent of {ξi}i∈I2. We also define the neighborhood of i as Ni , {i}∪{j ∈ I : j ∼ i}.

Theorem 3.B.2 (Stein’s Method for Bernoulli Variables, [39, Theorem 2.1]). Let {ξi}i∈Ibe a set of Bernoulli random variables, with dependency graph (I,∼). Let

pi , E {ξi} , pi,j , E {ξiξj} , λ ,∑

i∈Ipi, W ,

i∈Iξi, Z ∼ Poisson (λ) .

dTV (W,Z) ≤ min(3, λ−1)(∑

j∈Ni\{i}pij +

j∈Ni

Theorem 3.B.3 (CLT for sums of weakly dependent variables, [39, Theorem 2.4]). Let

(ξi)i∈I be a finite collection of random variables, with E {ξi} = 0. Let (I,∼) be the

dependency graph of (ξi)i∈I , and assume that its maximal degree is D−1. SetW ,∑

i∈I ξi,

and suppose that E {W 2} = 1. Then for all w ∈ R,

|FW (w)− Φ(w)| ≤ 2(2π)−1/4

√D2∑

i∈IE{|ξi|3

√D3∑

i∈IE{|ξi|4

where FW is the distribution function of W and Φ that of a standard Gaussian.

3.C De-Poissonization

Recall that the results in this chapter apply to both fixed size sets Xn and for Poisson

processes Pn. In some cases it is easier to prove the results for Pn first, and then conclude

that similar results apply to Xn. The second step is known as ‘De-Poissonization’, and

our use of it will depend primarily on the following theorem.

Theorem 3.C.1 (De-Poissonization, [39, Theorem 2.12]). For every n ∈ N, let Hn(X ) be

a functional defined for all finite sets of points X ⊂ Rd. Let Pn a Poisson process defined

the same way as in Section 3.3, such that

n−1Var (Hn(Pn)) → σ2 andHn(Pn)− E {Hn(Pn)}√

D−→ N (0, σ2),

as n → ∞. Define

Rm,n , Hn(Xm+1)−Hn(Xm),

where Xm is defined as in Section 3.3. In other words, Rm,n measures the change in the

value of the functional Hn as a single point is added to the random set. Supposed that

there exist α ∈ R and γ > 1/2 such that,

limn→∞

n−nγ≤m≤n+nγ|E {Rm,n} − α|

)= 0 (3.C.1)

limn→∞

∣∣E {Rm,nRm′,n} − α2∣∣)

= 0 (3.C.2)

limn→∞

n−nγ≤m≤n+nγn−1/2

})= 0, (3.C.3)

and also that there exists β > 0 such that |Hn(Xm)| ≤ β(n+m)β (a.s.).

Then α2 ≤ σ2, and as n → ∞,

n−1Var (Hn(Xn)) → σ2 − α2 andHn(Xn)− E {Hn(Xn)}√

D−→ N (0, σ2 − α2).

Chapter 4

Noise Crackles

4.1 Introduction

In this chapter we continue to study random Cech complexes constructed from either a

random sample Xn or a Poisson process Pn (see Section 3.3 for definitions). The main

difference from Chapter 3 is that we look at C(Xn, 1) rather than C(Xn, rn), i.e. we take

fixed-sized balls rather than shrinking ones. Obviously, if the sample distribution has com-

pact support S, then for large enough n we have that⋃n

k=1B1(Xk) ≈ Tube(S, 1). Thus,

there is not much to study in this case. However, when the support of the distribution is

unbounded, interesting phenomena occur.

We shall study distributions supported on Rd, and find that there exists a ‘core’ -

a region in which the density of points is very high and so, placing unit balls around

them completely covers this region. Consequently, the Cech complex inside the core is

contractible. The size of the core obviously grows as n → ∞. Outside the core there may

be additional isolated points, but not enough to cover the entire area. Thus, in this region,

the topology of the Cech complex is nontrivial, and many holes of different dimensions

may appear. We call this phenomenon ‘crackling’.

The exact crackling behavior depends on the choice of distribution. In this chapter

we study three representative examples. The power-law, exponential, and the standard

108 CHAPTER 4. NOISE CRACKLES

Gaussian distributions, whose density functions are given, respectively, by

fp(x) ,cp

1 + ‖x‖α , (4.1.1)

fe(x) , cee−‖x‖, (4.1.2)

fg(x) , cge−‖x‖2/2, (4.1.3)

where α > d, ‖·‖ is the standard L2 norm in Rd, and cp, ce, cg are normalization constants.

The motivation for our study is threefold. Firstly, studying how different distributions

crackle is an interesting pure probability problem. Secondly, recall the manifold learning

problem discussed repeatedly in this thesis - we have a set of random samples from a

compact manifold M ⊂ Rd and we would like to recover the homology of M . One

may consider a similar problem, but where noise is added to the samples. For example,

in [38], the samples are of the form Yk = Xk + Nk, where Xk ∈ M and Nk ∈ Rd is a

Gaussian noise lying on the normal of M at Xk. In such cases, the noise outliers might

introduce homology elements which do not belong to the original manifoldM . Indeed, the

algorithm suggested in [38] includes a significant step of throwing away what seem to be

outliers. Studying how pure noise crackles would be the first step in understanding how

to handle noisy manifold learning schemes, generally and rigorously. Finally, the results

in this chapter shed some light on the behavior of the Cech complex C(Xn, rn) (studied

in Chapter 3) in the super-critical range (nrdn → ∞). We will discuss this in Section 4.6.

We note that the work described in this chapter is still in progress. While we have

uncovered the main interesting crackling phenomena, there is still more to study on the

crackling of pure noise, as well as providing stronger limit statements. Also note that

while we present all the results in terms of the random sample Xn, exactly the same

results apply to the Poisson process Pn as well.

4.2 The Core of Distributions with Unbounded Sup-

We start by examining the core of the power-law, exponential and Gaussian distributions

presented in the previous section. These distributions are spherically symmetric and the

4.2. THE CORE OF DISTRIBUTIONS WITH UNBOUNDED SUPPORT 109

samples are concentrated near the origin. By ‘core’ we refer to a centered ball BRn ,

BRn(0) ⊂ Rd containing a very large number of points from the sample Xn (or Pn), such

BRn ⊂⋃

X∈Xn∩BRn

B1(X).

i.e. the unit balls around the samples cover BRn completely. Since BRn is covered, it

contains no holes, and therefore the homology of⋃

X∈Xn∩BRnB1(X), or equivalently, of

C(Xn ∩ BRn , 1), is trivial. Obviously, as n → ∞, the radius Rn grows as well.

Let {Rn}∞n=1 be an increasing sequence of positive numbers. Define by Cn the event

that BRn is covered, i.e.

BRn ⊂

X∈Xn∩BRn

We wish to find the largest possible value of Rn such that P (Cn) → 1. The following

theorem presents lower bounds for this value.

Theorem 4.2.1. Let ǫ > 0, and define

(logn−e−ǫ log logn)− 1)1/α

f = fp

log n− log log logn− δe − ǫ f = fe√

2 (logn− log log logn− δg − ǫ) f = fg

δp = cpα2−dd−(1+d/2),

δe = (1 + d/2) log d+ d log 2− log ce,

δg = (1 + d/2) log d+ (d− 1) log 2− log cg.

If Rn ≤ Rcn, then

P (Cn) → 1.

We see that the core size has a completely different order of magnitude in the three

distributions we chose. The heavy-tailed power-law distribution has the largest core, while

the core of the Gaussian distribution is the smallest one. In the following sections we will

study the behavior of the Cech complex outside the core.

4.3 How Power-Law Noise Crackles

In this section we explore the crackling phenomenon in the power-law distribution, i.e.

f = fp (defined in (4.1.1)). Let BRn ⊂ Rd be the centered ball with radius Rn, and let

Cn , C(Xn ∩ (BRn)c, 1),

be the Cech complex constructed from sample points outside BRn . We wish to study

βk,n , βk(Cn),

the k-th Betti number of Cn.

Note that the minimum number of points required to form a k-dimensional hole (k ≥ 1)

is k + 2. For k ≥ 1 and Y ⊂ Rd, denote

Tk(Y) , 1{|Y| = k + 2, βk(C(Y , 1)) = 1},

i.e. Tk takes the value 1 if C(Y , 1) is a minimal k-dimensional hole, and 0 otherwise. This

indicator function will be used to define the limits of the Betti numbers.

Theorem 4.3.1. If limn→∞ nR−αn = 0, then

limn→∞

(nRd−α

)−1E {β0,n} = µp,0,

limn→∞

(nk+2Rd−α(k+2)

)−1E {βk,n} = µp,k, 1 ≤ k ≤ d− 1

µp,0 ,sd−1cpα− d

, (4.3.1)

µp,k ,sd−1c

(α(k + 2)− d)(k + 2)!

(Rd)k+1

Tk(0,y)dy, 1 ≤ k ≤ d− 1, (4.3.2)

and where sd−1 is the surface area of the (d− 1)-dimensional sphere in Rd.

Next, we define the following values, which serve as critical radii for the crackle,

Rǫ0,n , n(

1α−d

R0,n , R00,n,

Rǫk,n , n(

1α−d/(k+2)

+ǫ), (k ≥ 1)

Rk,n , R0k,n.

4.3. HOW POWER-LAW NOISE CRACKLES 111

The following is a straightforward corollary of Theorem 4.3.1, and summarizes the behav-

ior of E {βk,n}.

Corollary 4.3.2. For k ≥ 0 and ǫ > 0,

limn→∞

E {βk,n} =

0 Rn = Rǫk,n,

µp,k Rn = Rk,n,

∞ Rn = R−ǫk,n,

Theorem 4.3.1 and Corollary 4.3.2 reveal that the crackling behavior is organized into

separate ‘layers’, see Figure 4.1. Dividing Rd into a sequence of annuli at radii

Rǫ0,n ≫ R0,n ≫ Rǫ

1,n ≫ R1,n ≫ · · · ≫ Rǫd−1,n ≫ Rd−1,n ≫ Rc

we observe a different behavior of the Betti numbers in each annulus. We shall briefly

review the behavior in each annulus, in a decreasing order of radii values. The following

description is mainly qualitative, and refers to expected values only.

• [Rǫ0,n,∞) - there are hardly any points (βk ∼ 0, 0 ≤ k ≤ d− 1).

• [R0,n, Rǫ0,n) - points start to appear, and β0 ∼ µp,0. The points are very few and

scattered, so no holes are generated (βk ∼ 0, 1 ≤ k ≤ d− 1).

• [Rǫ1,n, R0,n) - the number of components grows to infinity, but no holes are formed

yet (β0 ∼ ∞, and βk = 0, 1 ≤ k ≤ d− 1).

• [R1,n, Rǫ1,n) - a finite number of 1-dimensional holes show up, among the infinite

number of components (β0 ∼ ∞, β1 ∼ µp,1, and βk = 0, 1 ≤ k ≤ d− 1).

• [Rǫ2,n, R1,n) - we have β0 ∼ ∞, β1 ∼ ∞, and βk ∼ 0 for k ≥ 1.

This process goes on, until the (d− 1)-dimensional holes appear -

• [Rd−1, Rǫd−1) - we have βd−1 ∼ µp,d−1 and βk ∼ ∞ for 0 ≤ k ≤ d− 2.

• [Rcn, Rd−1) - just before we reach the core, the complex exhibits the most intricate

structure, with βk ∼ ∞ for 0 ≤ k ≤ d− 1.

Note that there is a very fast phase transition as we move from the contractible core

to the first crackle layer. At this point we do not know exactly where and how this phase

transition takes place. A reasonable conjecture would be that the transition occurs at

Rn = n1/α (since at this radius the term nR−αn that appears in Theorem 4.3.1 changes its

limit, affecting the limiting Betti numbers). However, this remains for future work.

Figure 4.1: The layered behavior of crackle. Inside the core (BRcn) the complex consists of a

single component and no holes. The exterior of the core is divided into separate annuli. Going

from right to left, we see how the Betti numbers grow. In each annulus we present the Betti

number that was most recently changed.

4.4 How Exponential Noise Crackles

In this section we focus on the exponential density function, i.e. f = fe (defined in (4.1.2)).

The results in this section are very similar to the ones of the power law distribution, and

we shall describe them briefly. Differences will show in (a) the values of Rk,n, and (b) the

exact limits.

Theorem 4.4.1. If limn→∞ ne−Rn = 0, then

limn→∞

(nRd−1

n e−Rn)−1

E {β0,n} = µe,0,

limn→∞

(nk+2Rd−1

n e−(k+2)Rn)−1

E {βk,n} = µe,k, k ≥ 1

µe,0 , sd−1ce, (4.4.1)

µe,k ,sd−1c

(k + 2)!

∫ ∞

(Rd)k+1

Tk(0,y)e−((k+2)ρ+

∑k+1i=1 y1i )

k+1∏

1{y1i > −ρ}dydρ, (4.4.2)

and where y1i is the first coordinate of yi ∈ Rd.

4.5. GAUSSIAN NOISE DOES NOT CRACKLE 113

Next, define

Rǫ0,n , logn + (d− 1 + ǫ) log logn,

R0,n , R00,n,

Rǫk,n , logn +

(d− 1

k + 2+ ǫ

)log log n, (k ≥ 1)

Rk,n , R0k,n.

From Theorem 4.4.1 we can conclude the following.

Corollary 4.4.2. For k ≥ 0 and ǫ > 0,

limn→∞

E {βk,n} =

0 Rn = Rǫk,n,

µe,k Rn = Rk,n,

∞ Rn = R−ǫk,n,

As in the power-law case, Theorem 4.4.1 implies the same ‘layered’ behavior, the only

difference being in the values of Rk,n. From examining the values of Rcn, and Rk,n it is

reasonable to guess that the phase transition in the exponential case occurs at Rn = log n.

4.5 Gaussian Noise Does Not Crackle

The standard Gaussian distribution (defined in (4.1.3)) exhibits a completely different

behavior than the power-law and the exponential distributions. Define

Rǫ0,n ,

√2 logn + (d− 2 + ǫ) log logn,

Theorem 4.5.1. If f = fg, ǫ > 0, and Rn = Rǫ0,n, then for 0 ≤ k ≤ d− 1

limn→∞

E {βk,n} = 0.

Note that in the Gaussian case limn→∞(Rǫ

0,n −Rcn

)= 0. This implies that as n → ∞

we have the core which is contractible, and outside the core there is hardly anything. In

other words, the ball placed around every new point we add to the sample immediately

connects to the core, and thus, the Gaussian noise does not crackle.

In the preceding sections we presented the crackling phenomenon which occur in some

distributions with unbounded supports. We examined three prototype distributions - the

power-law, exponential and Gaussian. We characterized the ‘core’ of the distributions and

found bounds on its size. Once we move outside the core, the Cech complex crackles -

i.e. it splits up into many particles with non-trivial homology. We described the crackling

phenomenon in the power-law and exponential distributions, and found that different

Betti number show up in different layers. Comparing the results in Theorems 4.3.1 and

4.4.1, we see that the exponential distribution crackles much closer to the origin than the

heavy-tailed power-law distribution. For the Gaussian distribution, on the other hand,

we showed that crackling does not occur. In the Gaussian case the Cech complex consists

mainly of its core, and thus remains contractible, even after adding n → ∞ points.

Beyond these results, there remains much to investigate. Firstly, we would like to

extend our results beyond expectations and provide stronger limit theorems, as in Chapter

3. In addition, we wish to carefully study the bounds we established for the different

crackling radii (i.e. Rcn, Rk,n, R

ǫk,n), and see if they can be refined. We also wish to

characterize the phase-transition phenomenon as we move from the contractible core, into

the chaotic first layer of the crackle. Finally, in this chapter we studied the power-law,

exponential and Gaussian distributions, but it would be interesting to see if the results

we have could be generalized to broader classes of distributions.

In Section 4.1 we discussed the motivation for the study in this chapter. At this point,

we may start gaining some intuition about the noisy manifold learning problem discussed

there. For example, if the distribution of the noise is Gaussian, our results imply that

noise outliers should not significantly interfere with homology recovery, since Gaussian

noise does not introduce any artificial homology elements (components, holes). On the

other hand, if the distribution of the noise is power-law or exponential, then noise outliers

will typically generate extraneous homology elements that will damage the estimation

of the original manifold. Thus, in these cases homology recovery algorithms should re-

move outliers before attempting to analyze the data. In this chapter we studied the

regions where crackling occur for these distributions. Further investigating the phenom-

4.7. PROOFS 115

ena presented here, can be later used to develop outlier removal methods that reduce the

probability of artificial homology elements. This, however, remains as future work.

Another motivation mentioned in Section 4.1 is the study of the Cech complex

C(Xn, rn) in the super-critical phase (nrdn → ∞), which was presented in Chapter 3.

Recall that in the super-critical phase the results were restricted to distributions with a

compact and convex support. Under this assumption, the results of Chapter 3 (combined

with [30]) indicate that, in the limit, the Cech complex C(Xn, rn) becomes contractible

(i.e. β0 = 1 and βk = 0 for k ≥ 1). The results in Sections 4.3 and 4.4 imply that if

the support of the distribution is non-compact, the behavior of C(Xn, rn) is completely

different. We saw that the power-law and exponential distributions crackle and have an

infinite number of components, even for a fixed radius of 1. Taking an even smaller radius

(rn → 0) can only enhance crackling, and thus there is no reason to believe that the result-

ing complex would be contractible. On the other hand, since the Gaussian distribution

does not crackle, it is possible that it behaves like a compactly supported distribution.

At this point we cannot make any concrete statements, but the results in this chapter

definitely gives us a lead to where we should look.

4.7 Proofs

4.7.1 The Core

In this section we prove the main result of Section 4.2

Proof of Theorem 4.2.1. The proof is general for all three distributions. Take a grid on

Rd of size g = 1

2√d. Let Qn be the collection of cubes in this grid that are contained in

BRn . Let Cn be the following event

Cn , {∀Q ∈ Qn : Q ∩ Xn 6= ∅} ,

i.e. Cn is the event that every cube in Qn contains at least one point from Xn. Recall the

definition of Cn,

BRn ⊂

X∈Xn∩BRn

Then it is easy to show that Cn ⊂ Cn. The complementary event Ccn is the event that at

least one cube is empty. Thus,

P(Ccn) ≤

Q∈Qn

P (Q ∩ Xn = ∅) =∑

Q∈Qn

(1− p(Q))n ≤∑

Q∈Qn

e−np(Q)

p(Q) =

f(z)dz ≥ gdf(Rn).

In addition, the number of cubes that are contained in BRn is less than(2Rn

)d. Therefore,

P(Ccn) ≤ (2g−1)dRd

ne−ngdf(Rn). (4.7.1)

Now, choose any ǫ > 0 and set

Rn = Rcn ,

(logn−e−ǫ log logn)− 1)1/α

f = fp

log n− log log log n− δe − ǫ f = fe√

2 (log n− log log log n− δg − ǫ) f = fg

δp = cpα2−dd−(1+d/2),

δe = log d− log ce − log gd,

δg = log(d/2)− log cg − log gd.

It is easy to verify that in all cases we have

−ngdf(Rn) → 0.

Thus, from (4.7.1) we conclude that P(Cn) → 1. Since P (Cn) ≥ P(Cn) we now have that

for Rn = Rcn, in each of the distributions,

P (Cn) → 1,

which completes the proof. The proof for the Poisson case (Pn) follows exactly the same

steps.

4.7. PROOFS 117

4.7.2 Crackle - Notation and General Lemmas

As noted in Section 4.1, while the results in this chapter are stated for the random sample

case (Xn), they apply to the Poisson case (Pn) as well. We will present the proofs for the

Poisson case only. The proofs for the random sample case follow exactly the same steps,

using the same bounds and yielding the same results. Thus, to avoid duplicated notation

and proofs, we omit them.

For Rn > 0, denote

Pn,Rn , Pn ∩ (BRn)c,

i.e. Pn,Rn consists of the points of Pn located outside the ball BRn . Next, recall the

definition of Tk,

Tk(Y) , 1{|Y| = k + 2, βk(C(Y , 1)) = 1},

for Y ⊂ Rd, and denote

S0,n , |Pn,Rn | ,

S0,n , #{X ∈ Pn,Rn : X is a connected component of C(Pn, 1)

Sk,n ,∑

Y⊂Pn,Rn

Tk(Y),

Sk,n ,∑

Y⊂Pn,Rn

Tk(Y)1{C(Y , 1) is a connected component of C(Pn, 1)},

Lk,n ,∑

Y⊂Pn,Rn

1{|Y| = k + 3, C(Y , 1) is connected},

where k ≥ 1. Observe that

S0,n ≤β0,n ≤ S0,n (4.7.2)

Sk,n ≤βk,n ≤ Sk,n + Lk,n, k ≥ 1 (4.7.3)

We will evaluate the limits of E {Sk,n}, E{Sk,n} and E {Lk,n} and deduce from these the

limit of E {βk,n}.

In the following proofs we will use the notation introduced in Section 3.6.1. In addition,

we set

e1 , (1, 0, . . . , 0) ∈ Rd,

f(r) , f(re1), r ∈ R,

U(x) ,

B2(xi), x ∈ (Rd)k,

p(x) ,

f(z)dz, x ∈ (Rd)k.

The following Lemmas are purely technical, but will considerably simplify our computa-

tions later.

Lemma 4.7.1. Let f : Rd → R be a spherically symmetric probability density. Then,

E {S0,n} = sd−1n

∫ ∞

rd−1f(r)dr,

E{S0,n} = sd−1n

∫ ∞

rd−1f(r)e−np(re1)dr,

where sd−1 is the volume of the d− 1 dimensional unit sphere.

Proof. Using Palm theory (Theorem 3.A.1) we have

E {S0,n} = n

f(x)1 {‖x‖ > Rn} dx.

Next, we move to polar coordinates, using the change of variables x → rθ where r ∈ R+

and θ ∈ Sd−1. This yields

E {S0,n} = n

∫ ∞

Sd−1

f(rθ)rd−1J(θ)dθdr,

where J(θ) =∣∣∂x∂θ

∣∣. Since f is spherically symmetric, f(rθ) = f(r), and therefore,

E {S0,n} = sd−1n

∫ ∞

rd−1f(r)dr.

The proof for S0,n is similar, using the fact that the probability that a point x ∈ Rd is

disconnected from the rest of the complex C(Pn, 1) is e−np(x).

4.7. PROOFS 119

Lemma 4.7.2. Let f : Rd → R be a spherically symmetric probability density. Then for

k ≥ 1,

E {Sk,n} =sd−1n

(k + 2)!

∫ ∞

rd−1f(r)Gk(r)dr,

E{Sk,n} =sd−1n

(k + 2)!

∫ ∞

rd−1f(r)Gk(r)dr,

where sd−1 is the volume of the d− 1 dimensional sphere, and where

Gk(r) ,

(Rd)k+1

f(‖re1 + y‖)Tk(0,y)k+1∏

1 {‖re1 + yi‖ > Rn} dy,

Gk(r) ,

(Rd)k+1

f(‖re1 + y‖)Tk(0,y)

k+1∏

1 {‖re1 + yi‖ > Rn} e−np(re1,re1+y)dy.

Proof. The proof is in the same spirit of the proof of Lemma 4.7.1, but technically more

complicated. Using Palm theory (Theorem 3.A.1), we have that

E {Sk,n} =nk+2

(k + 2)!

(Rd)k+2

f(x)Tk(x)k+2∏

1 {‖xi‖ > Rn} dx.

Let Ik denote the integral above. Then, using the change of variables

x1 → x, xi → x+ yi−1, (i > 1),

yields

‖x‖≥Rn

(Rd)k+1

f(x)f(x+ y)Tk(x, x+ y)

k+1∏

1 {‖x+ yi‖ > Rn} dydx

‖x‖≥Rn

(Rd)k+1

f(x)f(x+ y)Tk(0,y)

k+1∏

1 {‖x+ yi‖ > Rn} dydx.

Next, we move to polar coordinates, using the change of variables x → rθ where r ∈ R+

and θ ∈ Sd−1. This yields

∫ ∞

Sd−1

(Rd)k+1

f(rθ)f(rθ + y)Tk(0,y)k+1∏

1 {‖rθ + yi‖ > Rn} rd−1J(θ)dydθdr

∫ ∞

rd−1f(r)

Sd−1

(Rd)k+1

f(‖rθ + y‖)Tk(0,y)k+1∏

1 {‖rθ + yi‖ > Rn} dydθdr,

where J(θ) =∣∣∂x∂θ

∣∣, and f(x) = f(‖x‖) by the spherical symmetry assumption. Denote

Gk(r, θ) ,

(Rd)k+1

f(‖rθ + y‖)Tk(0,y)k+1∏

1 {‖rθ + yi‖ > Rn} dy.

Since Tk is rotation invariant, it is easy to show that for every θ ∈ Sd−1

Gk(r, θ) = Gk(r, e1) , Gk(r).

Ik = sd−1

∫ ∞

rd−1f(r)Gk(r)dr, (4.7.4)

where sd−1 is the surface area of the d-dimensional unit ball. This completes the proof

for Sk,n. The proof for Sk,n is similar.

4.7.3 Crackle - The Power Law Distribution

In this section we wish to prove the results in Section 4.3. First, we need a few lemmas.

Lemma 4.7.3. If f = fp, and Rn → ∞, then

limn→∞

(nRd−α

)−1E {S0,n} = µp,0,

where µp,0 is defined in (4.3.1).

If, in addition, nR−αn → 0, then

limn→∞

(nRd−α

)−1E{S0,n} = µp,0.

Proof. From Lemma 4.7.1 we have that

E {S0,n} = sd−1n

∫ ∞

rd−1f(r)dr.

Using the change of variables r → Rnρ yields

E {S0,n} = sd−1n

∫ ∞

cp(Rnρ)d−1

1 + (Rnρ)αRndρ

= sd−1cpnRd−αn

∫ ∞

ρd−1

R−αn + ρα

Applying the DCT to the last integral yields,

limn→∞

(nRd−α

)−1E {S0,n} = sd−1cp

∫ ∞

ρd−1−αdρ =sd−1cpα− d

= µp,0.

This proves the first part of the lemma.

4.7. PROOFS 121

Next, from Lemma 4.7.1 we have that

E{S0,n} = sd−1n

∫ ∞

rd−1f(r)e−np(re1)dr.

The exponential term will not affect the DCT conditions. Thus, we only need to evaluate

its limit.

p(re1) =

B2(re1)

f(z)dz =

cp1 + ‖re1 + z‖dz,

and after the change of variables r → Rnρ we have,

p(Rnρe1) = cpR−αn

R−αn + ‖ρe1 +R−1

n z‖αdz.

If nR−αn → 0, then using the DCT we have

limn→∞

np(Rnρe1) = 0.

limn→∞

e−np(Rnρe1) = 1,

and therefore we have

limn→∞

(nRd−α

)−1E{S0,n} = lim

n→∞

(nRd−α

)−1E {S0,n} = µp,0.

This completes the proof for the second part of the lemma.

Lemma 4.7.4. If f = fp, and Rn → ∞ then

limn→∞

(nk+2Rd−α(k+2)

)−1E {Sk,n} = µp,k,

where µp,k is defined in (4.3.2). If, in addition,nR−αn → 0, then

limn→∞

(nk+2Rd−α(k+2)

)−1E{Sk,n} = µp,k.

Proof. The proof is in the spirit of the proof of Lemma 4.7.3, but technically more com-

plicated. From Lemma 4.7.2 we have that

E {Sk,n} =nk+2

(k + 2)!Ik,

Ik = sd−1

∫ ∞

rd−1f(r)Gk(r)dr.

Using the change of variables r → Rnρ, yields

Ik = sd−1Rn

∫ ∞

(Rnρ)d−1f(Rnρ)Gk(Rnρ)dρ

= sd−1ck+2p (Rn)

d−α(k+2)

∫ ∞

(Rd)k+1

ρd−1

R−αn + ρα

k+1∏

R−αn + ‖ρe1 +R−1

n yi‖α

× Tk(0,y)

k+1∏

1{∥∥ρe1 +R−1n yi

∥∥ > 1}dy.

(nk+2Rd−α(k+2)n )−1

E {Sk,n} =sd−1c

(k + 2)!

∫ ∞

(Rd)k+1

ρd−1

R−αn + ρα

k+1∏

R−αn + ‖ρe1 +R−1

n yi‖α

× Tk(0,y)k+1∏

1{∥∥ρe1 +R−1n yi

∥∥ > 1}dy.

It is easy to show that the integrand is bounded properly, so the DCT applies, yielding

limn→∞

(nk+2Rd−α(k+2)n )−1

E {Sk,n} =sd−1c

(k + 2)!

∫ ∞

ρd−1−α(k+2)dρ

(Rd)k+1

Tk(0,y)dy

=sd−1c

(α(k + 2)− d)(k + 2)!

(Rd)k+1

Tk(0,y)dy

= µp,k.

Next, the terms Gk(r) and Gk(r) in Lemma 4.7.2 differ only by the term e−np(re1,re1+y),

so the DCT still applies. Now,

p(re1, re1 + y) =

U(re1,re1+y)

f(z)dz =

U(0,y)

f(re1 + z)dz,

and substituting r → Rnρ yields,

p(Rnρe1, Rnρe1 + y) = cpR−αn

U(0,y)

R−αn + ‖ρe1 +R−1

n z‖αdz.

If nR−αn → 0, then using the DCT we have

limn→∞

np(Rnρe1, Rnρe1 + y) = 0.

limn→∞

e−np(Rnρe1,Rnρe1+y) = 1,

4.7. PROOFS 123

and therefore,

limn→∞

(nk+2Rd−α(k+2)

)−1E{Sk,n} = lim

n→∞

(nk+2Rd−α(k+2)

)−1E{Sk,n} = µp,k.

Lemma 4.7.5. If f = fp, and Rn → ∞ then

limn→∞

(nk+3Rd−α(k+3)

)−1E {Lk,n} = µp,k,

for some µp,k > 0.

Proof. The proof is very similar to the proof of Lemma 4.7.4, just replace Tk with an

indicator function that tests whether a the sub-complex generated by k + 3 points is

connected. The exact value of µp,k will not be needed anywhere.

We can now prove Theorem 4.3.1.

Proof of Theorem 4.3.1. To prove the limit for β0,n just combine Lemma 4.7.3 with the

inequality (4.7.2). To prove the limit for βk,n, k ≥ 1 combine Lemmas 4.7.4 and 4.7.5

with the inequality in (4.7.3).

4.7.4 Crackle - The Exponential Distribution

In this section we wish to prove Theorem 4.4.1. We start with the following lemmas.

Lemma 4.7.6. If f = fe, and Rn → ∞ then,

limn→∞

(nRd−1

n e−Rn)−1

E {S0,n} = µe,0,

where µe,0 is defined in (4.4.1).

If, in addition, ne−Rn → 0 then,

limn→∞

(nRd−1

n e−Rn)−1

E{S0,n} = µe,0.

E {S0,n} = sd−1n

∫ ∞

rd−1f(r)dr.

Using the change of variables r → ρ+Rn yields

E {S0,n} = sd−1n

∫ ∞

(ρ+Rn)d−1cee

−(ρ+Rn)dρ

= sd−1cenRd−1n e−Rn

∫ ∞

)d−1

e−ρdρ.

Applying the DCT to the last integral yields,

limn→∞

(nRd−1

n e−Rn)−1

E {S0,n} = sd−1ce

∫ ∞

e−ρdρ = sd−1ce = µe,0.

Next, from Lemma 4.7.1 we have that

E{S0,n} = sd−1n

∫ ∞

rd−1f(r)e−np(re1)dr.

The exponential term will not affect the DCT conditions. Thus, we only need to evaluate

its limit.

p(re1) =

B2(re1)

f(z)dz =

cee−‖re1+z‖dz,

and after the change of variables r → ρ+Rn we have,

p((ρ+Rn)e1) =

cee−‖(ρ+Rn)e1+z‖dz ≤ e−(Rn+ρ)

cee‖z‖dz.

If ne−Rn → 0, then

limn→∞

np((ρ+Rn)e1) = 0.

limn→∞

e−np((ρ+Rn)e1) = 1,

and therefore we have

limn→∞

(nRd−1

n e−Rn)−1

E{S0,n} = limn→∞

(nRd−1

n e−Rn)−1

E{S0,n} = µe,0.

Lemma 4.7.7. If f = fe, and Rn → ∞ then,

limn→∞

(nk+2Rd−1

n e−(k+2)Rn)−1

E {Sk,n} = µe,k,

where µe,k is defined in (4.4.2).

If, in addition, ne−Rn → 0 then,

limn→∞

(nk+2Rd−1

n e−(k+2)Rn)−1

E{Sk,n} = µe,k.

4.7. PROOFS 125

E {Sk,n} =nk+2

(k + 2)!Ik,

Ik = sd−1

∫ ∞

rd−1f(r)Gk(r)dr.

Using the change of variables r → ρ+Rn, yields

Ik = sd−1

∫ ∞

(ρ+Rn)d−1f(ρ+Rn)Gk(ρ+Rn)dρ

= sd−1ck+2e

∫ ∞

(Rd)k+1

(ρ+Rn)d−1e−(ρ+Rn)

k+1∏

e−‖(ρ+Rn)e1+yi‖

× Tk(0,y)

k+1∏

1 {‖(ρ+Rn)e1 + yi‖ > Rn} dydρ

= sd−1ck+2e e−(k+2)RnRd−1

∫ ∞

(Rd)k+1

)d−1

e−ρ

k+1∏

e−‖(ρ+Rn)e1+yi‖eRn

× Tk(0,y)

k+1∏

1 {‖(ρ+Rn)e1 + yi‖ > Rn} dydρ.

The last integral can be easily shown to satisfy the DCT conditions. In addition, it is

easy to show that

limn→∞

e−‖(ρ+Rn)e1+yi‖eRn = e−(ρ+〈e1,yi〉) = e−(ρ+y1i ),

where y1i is the first coordinate of yi ∈ Rd, and also that

limn→∞

1 {‖(ρ+Rn)e1 + yi‖ > Rn} = 1{y1i ≥ −ρ}.

Altogether, we have that

limn→∞

(nk+2Rd−1

n e−(k+2)Rn)−1

E {Sk,n}

=sd−1c

(k + 2)!

∫ ∞

(Rd)k+1

Tk(0,y)e−((k+2)ρ+

∑k+1i=1 y1i )

k+1∏

1{y1i ≥ −ρ}dydρ,

proving the first part of the lemma.

Next, as in the proof of Lemma 4.7.4, we need to evaluate the term p(re1, re1 + y).

p(re1, re1 + y) =

U(0,y)

cee−‖re1+z‖dz ≤

U(0,y)

cee−(r−‖z‖)dz.

The change of variables r → ρ+Rn yields

p((ρ+Rn)e1, (ρ+Rn)e1 + y) ≤ e−Rne−ρ

U(0,y)

cee‖z‖dz.

If ne−Rn → 0, then

limn→∞

np((ρ+Rn)e1, (ρ+Rn)e1 + y) = 0.

limn→∞

e−np(Rnρe1,Rnρe1+y) = 1,

and therefore,

limn→∞

(nk+2Rd−1

n e−(k+2)Rn)−1

E{Sk,n} = limn→∞

(nk+2Rd−1

n e−(k+2)Rn)−1

E {Sk,n} = µe,k.

Lemma 4.7.8. If f = fe, and Rn → ∞ then

limn→∞

(nk+3Rd−1

n e−(k+3)Rn)−1

E {Lk,n} = µe,k.

where µe,k > 0.

Proof. The proof is very similar to the proof of Lemma 4.7.7, just replace Tk with an

indicator function that tests whether a sub-complex generated by k+3 points is connected.

The exact value of µe,k will not be needed anywhere, so we do not attempt to compute

Proof of Theorem 4.4.1. The proof follows the same steps as the proof of Theorem 4.3.1.

4.7.5 Crackle - The Gaussian Distribution

In this section we wish to prove Theorem 4.5.1.

Proof of Theorem 4.5.1. From Lemma 4.7.1 we have that

E {S0,n} = sd−1n

∫ ∞

rd−1f(r)dr.

4.7. PROOFS 127

Now, use the change of variables r → (ρ2 +R2n)

1/2, dr = ρ(ρ2+R2

n)1/2dρ, then

E {S0,n} = sd−1cgne−R2

∫ ∞

(ρ2 +R2n)

(d−2)/2ρe−ρ2/2dρ

= sd−1cgne−R2

n/2Rd−2n

∫ ∞

((ρ/Rn)

2 + 1)(d−2)/2

ρe−ρ2/2dρ.

The integrand is bounded, and using the DCT we have

limn→∞

(ne−R2

n/2Rd−2n

E {S0,n} = sd−1cg.

Taking Rn = Rǫ0,n ,

√2 logn+ (d− 2 + ǫ) log log n, we have

e−R2n/2 = n−1(logn)−(d−2+ǫ)/2

and so

limn→∞

ne−R2n/2Rd−2

which implies that

E {S0,n} → 0.

Finally, for every 0 ≤ k ≤ d− 1,

βk,n ≤ S0,n.

Therefore,

limn→∞

E {βk,n} = 0,

completing the proof.

Bibliography

[1] Robert J. Adler. On excursion sets, tube formulas and maxima of random fields. The

Annals of Applied Probability, 10(1):1–74, 2000.

[2] Robert J. Adler, Omer Bobrowski, Matthew S. Borman, Eliran Subag, and Shmuel

Weinberger. Persistent homology for random fields and complexes. Institute of Math-

ematical Statistics Collections, 6:124–143, 2010.

[3] Robert J. Adler and Jonathan E. Taylor. Random fields and geometry. Springer

Monographs in Mathematics. Springer, New York, 2007.

[4] Lior Aronshtam, Nathan Linial, Tomasz Luczak, and Roy Meshulam. Vanishing of

the top homology of a random complex. Arxiv preprint arXiv:1010.1400, 2010.

[5] Richard Arratia, Larry Goldstein, and Louis Gordon. Two moments suffice for pois-

son approximations: the Chen-Stein method. The Annals of Probability, 17(1):9–25,

[6] Eric Babson, Christopher Hoffman, and Matthew Kahle. The fundamental group of

random 2-complexes. J. Amer. Math. Soc, 24(1):128, 2011.

[7] Yuliy Baryshnikov and Robert Ghrist. Target enumeration via Euler characteristic

integrals. SIAM Journal on Applied Mathematics, 70(3):825–844, 2009.

[8] Yuliy Baryshnikov and Robert Ghrist. Euler integration over definable functions.

Proceedings of the National Academy of Sciences of the United States of America,

107(21):9525–9530, 2010.

[9] Omer Bobrowski and Robert J. Adler. Distance functions, critical points, and topol-

ogy for some random complexes. arXiv:1107.4775, July 2011.

130 BIBLIOGRAPHY

[10] Omer Bobrowski and Matthew Strom Borman. Euler integration of gaussian random

fields and persistent homology. Journal of Topology and Analysis, 4(1), 2012.

[11] Karol Borsuk. On the imbedding of systems of compacta in simplicial complexes.

Fund. Math, 35(217-234):5, 1948.

[12] Peter Bubenik, Gunnar Carlsson, Peter T. Kim, and Zhiming Luo. Statistical topol-

ogy via Morse theory, persistence and nonparametric estimation. 0908.3668, August

2009. Contemporary Mathematics, Vol. 516 (2010), pp. 75-92.

[13] Peter Bubenik and Peter T. Kim. A statistical approach to persistent homology.

Homology, Homotopy and Applications, 9(2):337–362, 2007.

[14] Jin Cao. The geometry of correlation fields with an application to functional con-

nectivity of the brain. The Annals of Applied Probability, 9(4):1021–1057, Novem-

ber 1999. Mathematical Reviews number (MathSciNet): MR1727913; Zentralblatt

MATH identifier: 0961.60052.

[15] Gunnar Carlsson. Topology and data. American Mathematical Society. Bulletin.

New Series, 46(2):255–308, 2009.

[16] Gunnar Carlsson and Vin de Silva. Plex: MATLAB software for

computing persistent homology of finite simplicial complexes, 2006.

http://comptop.stanford.edu/programs/plex.html.

[17] Gunnar Carlsson, Tigran Ishkhanov, Vin de Silva, and Afra Zomorodian. On the

local behavior of spaces of natural images. International Journal of Computer Vision,

76(1):1–12, January 2008.

[18] Frdric Chazal, David Cohen-Steiner, and Quentin Mrigot. Geometric inference for

measures based on distance functions. 2010.

[19] Moo K. Chung, Peter Bubenik, and Peter T. Kim. Persistence diagrams of cortical

surface data. In Information Processing in Medical Imaging, page 386397, 2009.

[20] Daniel C. Cohen, Michael Farber, and Thomas Kappeler. The homotopical dimension

of random 2-complexes. Arxiv preprint arXiv:1005.3383, 2010.

BIBLIOGRAPHY 131

[21] Justin Curry, Robert Ghrist, and Michael Robinson. Euler calculus with applications

to signals and sensing. arXiv:1202.0275, January 2012.

[22] Vin de Silva and Robert Ghrist. Coverage in sensor networks via persistent homology.

Algebraic & Geometric Topology, 7:339–358, 2007.

[23] Herbert Edelsbrunner and John Harer. Persistent homology - a survey. In Surveys on

discrete and computational geometry, volume 453 of Contemp. Math., pages 257–282.

Amer. Math. Soc., Providence, RI, 2008.

[24] Vladimir Gershkovich and Hyam Rubinstein. Morse theory for min-type functions.

The Asian Journal of Mathematics, 1(4):696–715, 1997.

[25] Robert Ghrist. Barcodes: the persistent topology of data. American Mathematical

Society. Bulletin. New Series, 45(1):61–75, 2008.

[26] Robert Ghrist. Applied algebraic topology & sensor networks, 2010.

http://www.math.upenn.edu/˜ghrist/preprints/ATSN.pdf.

[27] Robert Ghrist and Michael Robinson. Euler-Bessel and Euler-Fourier transforms.

Inverse Problems, 27(12):124006, December 2011.

[28] Allen Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002.

[29] Matthew Kahle. Topology of random clique complexes. Discrete Mathematics,

309(6):1658–1671, 2009.

[30] Matthew Kahle. Random geometric complexes. Discrete & Computational Geome-

try. An International Journal of Mathematics and Computer Science, 45(3):553–573,

[31] Matthew Kahle and Elizabeth Meckes. Limit theorems for Betti numbers of random

simplicial complexes. 1009.4130, September 2010.

[32] Ji Matouek. Using the Borsuk-Ulam theorem: lectures on topological methods in

combinatorics and geometry. Springer Verlag, 2003.

132 BIBLIOGRAPHY

[33] Roy Meshulam and Nathan Wallach. Homological connectivity of random k-

dimensional complexes. Random Structures & Algorithms, 34(3):408417, 2009.

[34] Yuriy Mileyko, Sayan Mukherjee, and John Harer. Probability measures on the space

of persistence diagrams. Inverse Problems, 27(12):124007, December 2011.

[35] John W. Milnor. Morse theory. Based on lecture notes by M. Spivak and R. Wells.

Annals of Mathematics Studies, No. 51. Princeton University Press, Princeton, N.J.,

[36] Marston Morse and Stewart Scott Cairns. Critical point theory in global analysis and

differential topology: an introduction. Academic Press, 1969.

[37] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of

submanifolds with high confidence from random samples. Discrete & Computational

Geometry. An International Journal of Mathematics and Computer Science, 39(1-

3):419–441, 2008.

[38] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. A topological view of unsu-

pervised learning from noisy data. SIAM Journal on Computing, 40(3):646, 2011.

[39] Mathew D. Penrose. Random geometric graphs, volume 5 of Oxford Studies in Prob-

ability. Oxford University Press, Oxford, 2003.

[40] Mathew D. Penrose and Joseph E. Yukich. Limit theory for point processes in

manifolds. 1104.0914, April 2011.

[41] Nicholas Pippenger and Kristin Schleich. Topological characteristics of random tri-

angulated surfaces. Random Structures & Algorithms, 28(3):247–288, May 2006.

[42] Dietrich Stoyan, Wilfried S. Kendall, and Joseph Mecke. Stochastic geometry and its

applications. Wiley Series in Probability and Mathematical Statistics: Applied Prob-

ability and Statistics. John Wiley & Sons Ltd., Chichester, 1987. With a foreword

by D. G. Kendall.

[43] Jonathan E. Taylor. A gaussian kinematic formula. The Annals of Probability,

34(1):122–158, 2006.

BIBLIOGRAPHY 133

[44] Jonathan E. Taylor and Robert J. Adler. Euler characteristics for gaussian fields on

manifolds. The Annals of Probability, 31(2):533–563, 2003.

[45] Jonathan E. Taylor, Akimichi Takemura, and Robert J. Adler. Validity of the ex-

pected euler characteristic heuristic. The Annals of Probability, 33(4):1362–1396,

[46] Jonathan E. Taylor, Keith J. Worsley, and Frederic Gosselin. Maxima of discretely

sampled random fields, with an application to ‘bubbles’. Biometrika, 94(1):1–18,

March 2007.

[47] James W. Vick. Homology theory. Academic Press, New York, 1973. An introduction

to algebraic topology, Pure and Applied Mathematics, Vol. 53.

[48] O. Ya. Viro. Some integral calculus based on Euler characteristic. In Topology and

geometryRohlin Seminar, volume 1346 of Lecture Notes in Math., pages 127–138.

Springer, Berlin, 1988.

[49] Keith J. Worsley. Boundary corrections for the expected euler characteristic of excur-

sion sets of random fields, with an application to astrophysics. Advances in Applied

Probability, pages 943–959, 1995.

[50] Keith J. Worsley. Estimating the number of peaks in a random field using the Had-

wiger characteristic of excursion sets, with applications to medical images. The An-

nals of Statistics, 23(2):640–669, April 1995. Mathematical Reviews number (Math-

SciNet): MR1332586; Zentralblatt MATH identifier: 0898.62120.

טופולוגיה אלגברית של שדות וקומפלקסים אקראיים

עומר בוברובסקי

טופולוגיה אלגברית של שדות וקומפלקסים אקראיים

חיבור על מחקר

לשם מילוי חלקי של הדרישות לקבלת התואר

דוקטור לפילוסופיה

עומר בוברובסקי

מכון טכנולוגי לישראל –הוגש לסנט הטכניון

1021יולי חיפה בתמוז תשע"

בפקולטה להנדסת רוברט אדלר המחקר נעשה בהנחיית פרופ' חשמל.

תודות

ההנחיה המסורה, העזרה על ,ראשית, אני רוצה להודות למנחה שלי, פרופסור רוברט אדלר

על ההכוונה והפיתוח המקצועי שלי, תוך מתן אמון מלא ותחושת לאורך כל הדרך. והסבלנות

על היחס החם, ועל .דחיפה קדימה, פתיחת הדלתות, והנתינה מעל ומעברעל העצמאות מחקרית.

שהראית לי שאפשר להיות על - מעבר לכלאך ויה בפני עצמה. חו תהפגישות שכל אחת מהן הי

על לשמור ו יים על הקרקערגל עם רלהישא, ויחד עם זאת מצליח ומקצועי ביותראיש אקדמיה

ההערכה והכרת התודה שלי חורגות הרבה לפרגן. אמתית, ועל יכולת הומור משובח, על צניעות

תודה.. זו מעבר לפסקה בודדת

מאוניברסיטת שיקגו, על האירוח בתחילת הדוקטורט, ועל תודה לפרופסור שמואל ויינברגר

טרום בורמן מאוניברסיטת סמת'יו ל כמו כןשנמשך לאורך כל הדרך. הפורה שיתוף הפעולה

.עמוזו הוא פרי עבודה משותפת בתזהשהחלק הראשון ,שיקגו

ךתודה לפרופסור רון מאיר מהטכניון, שהנחה אותי בתואר השני, אך תמך בי רבות גם בהמש

., ועל הדלת הפתוחה תמיד. על שיחות מרתקות, היחס החברילימודי

ארן ברגמן, דניאל סיגלוב, הדס לתודה מיוחדת תודה לחברים הרבים שרכשתי במהלך תקופה זו.

פחות, מיכה והעידוד ברגעים קשים יותר והתורונן טלמון. תודה על ,)יששכר( ולטר איסטי, זיגי-בן

.כעת קשה לעזוב אותומאוד , וכיף לבוא אליומאוד מקום ש היה הטכניוןבזכותכם ועל ש

הנתינהו, התמיכה, העידודדב ולילי, על להורים שליאני רוצה להודות למשפחתי היקרה לי מאוד.

אך יותר מהכל, על . לבקש בתנאים הכי טובים שאפשר זושאפשרו לי להגיע לנקודה ללא סייג

אהבה. לאחים שלי )במובן המון והבנה בבחיים הדרכים שאני בוחר לעצמישאתם מקבלים את

אחרונים אחרונים והתמיכה. הקרובה ניצן וקרן, על החברות אודי, ברק, הילה, יעל, –הרחב(

רז, שחר, שרה, תאיר, טליה, יונתן, לבנת, ענבר, צור, אור, אשכר, –חביבים, לאחיינים שלי

שיך. תודה לכולכם. אני קטנים שנותנים לי אנרגיה להמים כמו מטענ אתם תמר. והגיברת שלי

אוהב אתכם.

אני רוצה להקדיש את העבודה הזאת לסבא וסבתא שלי, אהרון )ז"ל( ואסתר לנדוי. לסיום,

מקור השראה עצום בשאיפה לידע ובהסתכלות היווה לי והמיוחדת לסבא, שבדרכו השקטה

מפוקחת על החיים. ולסבתא, שמתחילת הדרך היתה שם לצדי, התעניינה בכל פרט ועודדה אותי

יכם.להתקדם. אני מתגעגע אל

על התמיכה ולמלגת אדאמס מטעם האקדמיה הלאומית הישראלית למדעים אני מודה לטכניון,

.הכספית הנדיבה בהשתלמותי

תקציר

על ידי .בשיטות אלגבריות שימוש טופולוגיה אלגברית עוסקת באפיון מרחבים טופולוגיים על ידי

יים, ניתן לסווגם התאמת מבנים אלגבריים )לדוגמא, חבורות הומולוגיה( למרחבים טופולוג

, ללמוד על המאפיינים האיכותיים שלהם, ועל ההתנהגות של למחלקות של מרחבים "דומים"

נקרא "טופולוגיה אלגברית שימושית" מתמקד בשילוב נקציות בין מרחבים שונים. התחום הפו

המופיעים בבעיות משטחים ופונקציות אפיוןכלים מתחום הטופולוגיה האלגברית לצורך

. תחום זה צבר (manifold learning) יריעות הנדסיות שונות, יצירת כלים לניתוח מידע, ושחזור

מקורות בתחום זה עוסקות בשרוב הבעיות נים האחרונות. עם זאת, למרותהתעניינות רבה בש

נמצאים , היסודות ההסתברותיים עליהם נשענים הכלים המפותחים במסגרת זו יםמידע אקראי

תחום זה, ת מא לחקור בעיויזו, ה בתזהשל המחקר העיקרית. המטרה בלבד בשלב ראשוניעדיין

ומלאה. להן תשתית הסתברותית עמוקהולספק

persistent) זו מתחלקת לשלושה פרקים מרכזיים. הפרק הראשון עוסק בהומולוגיה עקבית תזה

homology ) של שדות אקראיים גאוסיים, הפרק השני עוסק בהתנהגות גבולית של קומפלקסים

של התפלגויות ( crackling) גאומטריים אקראיים, והפרק האחרון עוסק בתופעת ה"התפוררות"

לושת הפרקים עוסקים בנושאים המשלבים שתומך שאינו חסום. נציין כי בעוד שבעלות

הסתברות וטופולוגיה אלגברית, למחקר בפרק הראשון אין קשר ישיר למחקר בשני הפרקים

האחרים.

. הומולוגיה עקבית של שדות אקראיים גאוסיים1

מעל מרחב פרמטרים אלו הם תהליכים אקראיים המוגדרים (random fields) שדות אקראיים

M, כאשר ,למשל ,. דוגמא שימושית היא1-בעל ממד גדול מM ייצג את התבנית התלת מ

והמדידות ממכשירי הדמיה שונים הן ממדית של המוח, או תבנית דו ממדית של קליפת המוח,

, הגרף יך הינו רב ממדיתהליכים אקראיים על מרחבים אלה. כיוון שהתחום עליו מוגדר התהל

הנוצר על ידי תהליך אקראי מסוג זה יהיה מרחב אקראי או יריעה אקראית )בניגוד לקו חד ממדי

הנוגעות לגאומטריה תוצאה מכך, עולות שאלות מעניינות עבור תהליכים אקראיים פשוטים(. כ

.גרפים אלהגיה של והטופולו

בכלים מתחום בעיקר עשו שימושגאוסיים, ים המרכזיים לניתוח שדות אקראיים עד כה, הכל

)בניגוד דיפרנציאלית. במחקר זה אנו שואפים ללמוד על המאפיינים הטופולוגיים הגאומטריה ה

ן שה ( ובמיוחד על ההומולוגיה העקביתsub-level setsסף )-קבוצות תתשל לגאומטריים(

שינויים , עוקבת אחרfכלשהי , ההומולוגיה העקבית של פונקציהעל רגל אחתמייצרות.

-מהצורה סף-בהומולוגיה של קבוצות תת 1 ( , ]f u את הסף גדילים. כאשר מu קבוצות ,

רכיבי הומולוגיה שונים )כלומר, "חורים" ורכיבי קשירות( סף הולכות וגדלות. בתהליך זה, -התת

. התאוריה מאחורי ההומולוגיה תהליך זהההומולוגיה העקבית מתעדת . נוצרים ונהרסים

לא הוכחה אף טענה הנוגעת להומולוגיה העקבית של , ועד לעבודה זוהעקבית הינה חדשה יחסית

מגדירים את . התוצאה המוצגת בחיבור זה, היא הראשונה מסוגה. אנו כלשהםשדות אקראיים

עקבית" ומחשבים את התוחלת של ערך זה עבור מחלקה ההומולוגיה המושג "אפיין אוילר של ה

מסקנה ממחקר זה מלבד התוצאה הטופולוגית, עולהרחבה של שדות אקראיים גאוסיים.

של שדות אלו. כפועל יוצא של המחקר על (critical points) קיצוןמפתיעה הנוגעת לנקודות ה

ההומולוגיה העקבית, גילינו כי עבור שדות גאוסיים על יריעות סגורות, הסכום "המתחלף" של

וזאת ממדית של היריעה, -ערכי הקיצון אינו גדל בהתאם לנפח היריעה אלא בהתאם למידה חד

ללא תלות בממד היריעה.

ומטריים אקראיים. הטופולוגיה של קומפלקסים גא2

(. תהי manifold learningיריעות" ) שחזורהמוטיבציה המרכזית למחקר זה הינה מהתחום של "

dM יריעה סגורה שאינה ידועה, ואנו מעוניינים לשחזר את מאפייניה הטופולוגיים, מתוך

נקודותאוסף 1, ,

nX X היריעה. על פניהנדגמות באקראי ( "מספרי בטי"Betti numbers )

ימים, תחת תנאים מתא מייצגים את מספר רכיבי הקשירות וה"חורים" של מרחבים טופולוגיים.

d-של היריעה, על ידי חישוב מספרי הבטי של איחוד הכדורים הניתן לשחזר את מספרי הבטי

-ממדיים 1

r r iiU B X

שיטה בומרכזם בנקודות הדגימה. הבעיה העיקרית rשרדיוסם

[ מוצגים 73,73]-. לבעיה זו הוצעו מספר פתרונות בעבר. בrזו היא רגישותה לבחירת הרדיוס

n,תנאים מספיקים על ערכי r .כך שההסתברות לשחזור נכון של מספרי הבטי גבוה כרצוננו

פתרון אחר הינו לחשב את ההומולוגיה העקבית של הפילטרציה 0r r

ולאתר את רכיבי

רדיוסים ניכר, תחת ההנחה כי רכיבים אלה מייצגים רכיבי לאורך טווחההומולוגיה המתקיימים

ה המקורית.הומולוגיה של היריע

יהי במחקר זה עסקנו בבעיה הבאה, הקשורה למוטיבציה שהצגנו. 1, ,

n nX X אוסף של

. אנו מעוניינים f, בעלות פונקציית צפיפות ידועה dתלויות במרחב -נקודות אקראיות בלתי

לחקור את מספרי הבטי של איחוד הכדורים nr

U בגבול כאשר ,n 0-וnr ניתן לפשט .

( Čech complexפתרון בעיה זו, על ידי הסתכלות על הקומפלקס צ'ך ) ,n nC r קומפלקס(

כדורים בעלי חיתוך שאינו ריק(. k+1ממדי עבור כל אוסף של -kסימפלציאלי, המכיל סימפלקס

(, אנו יודעים כי המרחבים Nerve theoremממשפט העצב )nr

U ו- ,n nC r הם שקולים

[, מספרי 73,71זהים. בעבודות ]שלהם בטי המספרי (, ולכן homotopy equivalentהומוטופית )

הבטי של ,n nC X r נחקרו בצורה ישירה. תחת ההנחות שציינו, ההתנהגות הגבולית של

0d כאשרשונים. תחומיםהקומפלקס, מתחלקת לשלושה

nnr קריטי(, -)התחום התת

הקומפלקס מורכב מהרבה רכיבי קשירות קטנים, ומעט מאוד חורים. בתחום הקריטי

nnr ניתן למצוא חורים בכל הממדים. בתחום ו גבוההקומפלקס קשירות של ה, ה

d ,קריטי-הסופר

nnr יבי קשירות וחורים. של רכ מאוד קומפלקס מכיל מספר מועט, ה

באופן ונעשה מסובךקריטי, -, אפשרי בעיקר בתחום התתהישירבצורה מספרי הבטי חישוב

משמעותי בתחומים האחרים. לכן, במחקר זה ניסינו לחקור אותם בדרך אחרת, על ידי שימוש

בפונקציות מרחק.

:תהי d

nd יית המרחק מהאוסף האקראי , פונקצ

nהמוגדרת על ידי ,

n k n kd x x X

נשים לב כי מתקיים . 1 ( , ]

n rd r U כלומר קבוצות התת ,-

איחוד הכדורים סביב הנקודות האקראיות. מהתחום למעשהסף של פונקציית המרחק, הם

סף -מספרי הבטי של קבוצות התת(, אנו יודעים כי Morse theoryהנקרא "תורת מורס" )

( של הפונקציה. לכן, אם נדע כיצד מתנהגות נקודות critical levelsמשתנים ברמות הקריטיות )

הקיצון של פונקציית המרחק nd נוכל ללמוד מכך על מספרי הבטי של , ,n n

התנהגות הגבולית של נקודות הקיצון של פונקציית המרחק התוצאות שאנו מקבלים מראות, שה

dמתחלקת אף היא לשלושה תחומים שונים, בהתאם לגבול של

nnr בעבודה זו אנו מציגים .

מורס(, עבור כל אחד מהתחומים -)מסווגות לפי אינדקס משפטי גבול עבור מספר נקודות הקיצון

קשרים את התוצאות שקיבלנו לתוצאות הידועות לגבי מספרי הבטי של לאחר מכן, אנו מהשונים.

צ'ך, ומראים כיצד המחקר על נקודות הקיצון מרחיב את ידיעותינו על הטופולוגיה של -קומפלקסי

קומפלקסי צ'ך גבוליים.

התפוררות של דגימות רעש. 3

בפרק זה אנו חוקרים בעיה דומה לפרק הקודם, אך במצב שבו הרדיוס של הכדורים נשמר קבוע,

1כלומר nr .להתפלגות ממנה נדגמות הנקודות יש תומך חסום כאשרS אז עבור מספר גדול ,

-מספיק של נקודות נקבל ש 11Tube ,1 : dist , 1

B X S x x S

מקרה זה הוא פחות מעניין ולא נעסוק בו. לעומת זאת, כאשר התומך אינו חסום, מתרחשות

תופעות מעניינות.

. במקרים אלה, ישנו אזור כולו dהמרחב האוקלידי הינואנו בוחנים התפלגויות שהתומך שלהן

ה שהוא מכסה ד הכדורים סביבן, נגלחו, כך שאם נסתכל על אינקודותהמכיל ריכוז גבוה של

( גדל, n) הנקודותלחלוטין את השטח. נכנה אזור זה בשם ה"ליבה" של ההתפלגות. ככל שמספר

הליבה הולכת ומתרחבת. מחוץ לליבה ניתן למצוא מספר גדול של נקודות, אך נקודות אלו אינן

הקומפלקס צ'ך( באזורים אלה, נגלה על ד הכדורים )או צפופות מספיק, כך שאם נסתכל על איחו

הרבה מאוד רכיבי קשירות עם מספרי בטי גבוהים )מספר רב של חורים(. תופעה זו נכנה בשם

"התפוררות".

המאפיינים של תופעת ההתפוררות תלויים בצורה מאוד חזקה בבחירת ההתפלגות ממנה מוגרלות

(, power-lawהתפלגות לפי חוק חזקה ) –הדגימות. אנו בוחנים שלוש התפלגויות מייצגות

(, והתפלגות גאוסית. התפלגויות אלה הן בעלות סימטריה exponentialהתפלגות מעריכית )

ראשון ה(, ולכן הליבה שלהן היא כדור שמרכזו בראשית. בשלב spherical symmetry) כדורית

מכן, אנו בוחנים (. לאחר n-אנו בוחנים את גודל הליבה של כל אחת מההתפלגויות )כתלות ב

מספרי הבטי של איחוד הכדורים )או הקומפלקס( מתנהגים מחוץ לליבה. בהתפלגות חוק כיצד

וכן בהתפלגות המעריכית, ניתן לראות כי המרחב האוקלידי מחולק למעין "שכבות" החזקה,

שונה באופן מהותי. בהתפלגות ההתפלגות הגאוסית כאשר בכל שכבה מופיעים מספרי בטי שונים.

זו לא קיימת תופעת ההתפוררות. אנו נראה כי עבור ההתפלגות הגאוסית, מעט מאוד נקודות

nוכאשרמוגרלות מחוץ לליבה, הוא בקירוב כדור אחד הקטנים נקבל כי איחוד הכדורים

גדול, ללא חורים כלל.

Algebraic Topology of Random Fields and Complexesomer/phd_thesis.pdfIntroduction The ﬁeld of...

Documents

Transcript of Algebraic Topology of Random Fields and Complexesomer/phd_thesis.pdfIntroduction The ﬁeld of...

More Concise Algebraic Topology ~ May

Introduction to Algebraic Topology - Combinatoricscombinatorics.sbu.ac.ir/Topcomb/Talks/TalkIntroduction.pdf · Introduction to Algebraic Topology Topological space Topological space

MATH5665: Algebraic Topology- Course notesweb.maths.unsw.edu.au/~danielch/algtop15a/algtop_notes.pdf · MATH5665: Algebraic Topology- Course notes ... primarily Munkres’s \Elements

An Axiomatic Approach to Algebraic Topology: A Theory of ...conferences.inf.ed.ac.uk/ct2019/slides/3.pdf · Step I: Algebraic Topology We use the fact from algebraic topology that

752 Notes - Algebraic Topology

Diï¬€erential Algebraic Topology

algebraic topology - MIT Mathematicsebelmont/231br-notes.pdf · algebraic topology Lectures delivered ... then any possible lift would have to be compatible with the new algebraic

Recent Advances and Trends in Applied Algebraic Topology...History and trends in applied algebraic topology Outline 1. History and trends in applied algebraic topology 2. Showcasing

Basic Notions of Algebraic Topology

Math 317 - Algebraic Topology - Department of …math.uchicago.edu/~subhadip/coursedocuments/Topology-I/math317... · Math 317 - Algebraic Topology Lectures by Benson Farb Notes by

Algebraic Topology - arXiv · Algebraic Topology Vanessa Robins ... matical Tools for Physicists, 2nd edition, edited by Michael Grinfeld ... Algebraic topology is the mathematical

Directed Algebraic Topology and Concurrency

Algebraic Topology - GBV

Foundations of Algebraic Topology

Recommended books: Algebraic Topology; Bott & Tu ... · Algebraic Topology (Part III) Paul Minter 1. INTRODUCTION Algebraic topology concerns the connectivity properties of topological

Algebraic Topology - GitHub Pagesaaronkychan.github.io/notes/Algebraic Topology.pdf · Algebraic Topology Dr. Rasmussen (J.Rasmussen@dpmms.cam.ac.uk) Typeset by Aaron Chan (akyc2@cam.ac.uk)

M3/4/5P21 - Algebraic Topology

Algebraic Topology by Tim Perutz

TIFR pamphlet on Algebraic Topology

Introduction to Algebraic Topology and Algebraic Geometry - U[1]. Bruzzo.pdf