Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He...
Transcript of Printed: 1 January 2016 - Yale Universitypollard/Books/Mini/Majorizing.pdfk n k:= 22 k for k 1. He...
7 Majorizing measures 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.2 An illuminating example . . . . . . . . . . . . . . . . . . . . . 57.3 Measures versus partitions . . . . . . . . . . . . . . . . . . . . 8
7.3.1 From measures to partitions . . . . . . . . . . . . . . . 87.3.2 From partitions to majorizing measure . . . . . . . . . 12
7.4 From partitions to expected suprema . . . . . . . . . . . . . . 137.5 The role of Gaussianity . . . . . . . . . . . . . . . . . . . . . 177.6 From functional to nested partitions . . . . . . . . . . . . . . 187.7 Uniformly continuous Gaussian sample paths . . . . . . . . . 237.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Printed: 1 January 2016
version: 1jan2016printed: 1 January 2016
Mini-empiricalc©David Pollard
Chapter 7
Majorizing measures
Section 7.1 reminds you of the three quantities that are inter-related in themajorizing measure approach to chaining: an integral that depends on theprobability of small balls around each point in the index set; a weightedsum involving the diameters of sets in a sequence of finite parttions ofthe index set; and the expected supremum of a stochastic process whoseincrements are controlled by a Young function involved in the definitionof the first two quantities.
Section 7.2 presents a variation on the classical example for illustratingthe differences between packing number and majorizing measure assump-tions. It describes a simple Gaussian process whose supremum has afinite expected value, which satisfies the majorizing measure condition,but which fails the usual packing number condition.
Section 7.3 explains the connection between the majorizing measures andsequences of partitions of the index set.
Section 7.4 explains how a suitable sequence of partitions leads to a chain-ing upper bound on expected suprema of the stochastic process.
Section 7.5 identifies the special properties of Gaussian processes that leadto the lower bound that complements the chaining upper bound.
Section 7.6 recursively applies a partitioning scheme, derived from theGaussian inequality in Section 7.5, to generate a nested sequence of par-titions from properties of the supremum functional.
Section 7.7 describes the effect on the majorizing measure lower bound ifa centered Gaussian processes has uniformly continuous sample paths.
version: 1jan2016printed: 1 January 2016
Mini-empiricalc©David Pollard
§7.1 Introduction 2
7.1 IntroductionMajorizing::S:intro
My overall goal for this Chapter is to have you understand the Fernique-Talagrand result that relates finiteness of the expected supremum of a cen-tered Gaussian process to the existence of a majorizing measure and theequivalent characterizations using sequences of finite partitions of the indexset. To make the Chapter more nearly self-contained I repeat (in slightlygreater generality) some of the material from Section 4.6 and 4.7.
My exposition is based largely on Talagrand (1992, 1996, 2001), andtwo books, Talagrand (2005, Sections 1.3 and 2.1) and Talagrand (2014,Chapter 2).
Throughout the Chapter there lurks a stochastic process Xt : t ∈ Tindexed by a metric space (T, d), with
D := diam(T ) := supd(s, t) : s, t ∈ T
denoting its diameter. I assume, for a given Young function Ψ, that theincrements of the X process are controlled by the Ψ-Orlicz norm, in thesense that
\E@ increment\E@ increment <1> ‖Xs −Xt‖Ψ ≤ d(s, t) for all s, t ∈ T,
which gives a tail bound
\E@ Psi.tail\E@ Psi.tail <2> P|Xs −Xt| ≥ ηd(s, t) ≤ β(η) := min (1, 1/Ψ(η)) .
I also assume the function Ψ satisfies an inequality for each x0 > 0,
\E@ growth\E@ growth <3> Ψ(x)Ψ(y) ≤ Ψ(c0(x+ y)) whenever x ∧ y ≥ x0,
where c0 is a constant depending on x0. Equivalently (cf. Ledoux andTalagrand 1991, page 310), for each u1 > 0,
\E@ igrowth\E@ igrowth <4> Ψ−1(uv) ≤ c1
(Ψ−1(u) + Ψ−1(v)
)whenever u ∧ v ≥ u1,
where c1 is a constant depending on u1. As shown by Problem [1], suchinequalities force Ψ(x)/eαx →∞ as x→∞, for some α > 0.
Each of the Young functions Ψα (growing like exp(xα)) from Section 2.3satisfies <3>. The special case where Ψ2(x) = exp(x2)−1 takes center stagein Sections 7.5 onwards, for the analysis of centered Gaussian processes.
Initially for Gaussian processes X, but subsequently for a variety ofdifferent X, Talagrand derived upper and lower bounds for analogs of thefunctional
\E@ sup.fnal\E@ sup.fnal <5> F (A) := P supt∈AXt for A ⊆ T .
Draft: 1jan2016 c©David Pollard
§7.1 Introduction 3
Remark. To avoid measurability issues you may assume that T iscountable. Or you could work with the methods described in Chapter 5to reduce the analysis to countable subsets of T . Or you couldinterpret F (A) as supPmaxt∈S Xt : finite S ⊆ A.
As mentioned in Section 4.6, Talagrand replaced chaining bounds forpacking number by chaining bounds based on admissible sequences, π =πk : k ∈ N0, of finite partitions of T that are nested (each πk+1 is a
refinement of the preceding πk) with π0 = T and #πk ≤ nk := 22k
for k ≥ 1. He worked with functionals
γα(T, π) := supt∈T∑
k∈N0
2k/αdiam(Ek(t)),
where Ek(t) denotes the member of the partition πk that contains the point t.The subscript α suggests the connection with the Young function Ψα, forwhich Ψ−1
α (nk) grows like 2k/α.To emphasize the general connection with Ψ, I decided to work with a
closely related concept that I call (Ψ, λ)-admissibility of a sequence π =πk : k ∈ N0 of (not necessarily nested) finite partitions: again π0 = Tbut #πk ≤ Ψ(λk) for some fixed λ > 1. The corresponding functionals are
γλ(T, π) := supt∈Tγλ(t, π, 0)
where γλ(t, π, p) :=∑
k≥pλkdiam(Ek(t)) for p ∈ N0.\E@ gamlampi\E@ gamlampi <6>
Remark. Nonzero values of p become relevant when studying continuityproperties of sample paths, as in Sections 7.4 and 7.7.
Section 7.3 shows that it is easy to enforce the nestedness property on thepartitions, at the cost of a slight increase in some of the constants. Alter-natively (Section 7.4), one can just slightly modify the chaining argument.The nesting is just a notational convenience, not a necessity.
Even though it is no longer essential to the “majorizing measure” method,I also want you to see the connections between the functional γλ(T, π) andthe original idea of a majorizing (Borel probability) measure µ on T :
γ(T, µ) := supt∈Tγ(t, µ, 1)
where γ(t, µ, δ) :=
∫ δ
0Ψ−1 (1/µB[t, rD]) dr for 0 < δ ≤ 1.\E@ MM.fnal\E@ MM.fnal <7>
Here B[t, rD] denotes the closed ball with center t and radius rD. The µ iscalled a majorizing measure if γ(T, µ) <∞.
Draft: 1jan2016 c©David Pollard
§7.2 An illuminating example 4
Remark. The unorthodox presence of the rD in the definitionof γ(t, µ, δ) essentially rescales the metric to a diameter of one.It ensures that γ(T, µ) is always bounded below by a fixed constant nomatter what value D takes. To see why, consider any pair of points s, tin T for which d(s, t) > 2D/3. The balls B[t,D/3] and B[s,D/3]are disjoint, which implies that at least one of them—suppose itis B[t,D/3]—has µ measure ≤ 1/2. It follows that
γ(T, µ) ≥∫ 1/3
0
Ψ−1 (1/µB[t,D/3]) dr = Ψ−1(2)/3.
Put another way, B[t, rD] is just the the closed ball of radius r for the
rescaled metric d(s, t) = d(s, t)/D and the process Xt = Xt/D has its
increments controlled by d, in the sense that∥∥∥Xs − Xt
∥∥∥Ψ≤ d(s, t) for
all s, t ∈ T .
The following Theorem, whose proof is spread out across the Sectionsof the Chapter, summarizes the interconnections between the various func-tionals.
Majorizing::summary <8> Theorem. Suppose Xt : t ∈ T has increments for which ‖Xs −Xt‖Ψ ≤d(s, t), where theYoung function Ψ satisfies the growth assumption <3>.Then there exists a finite constant Cλ, depending on λ (and Ψ) for whichthe following hold.
(i) for each majorizing measure µ there exists a (Ψ, λ)-admissible sequenceof partitions π for which γλ(T, π) ≤ CλDγ(T, µ)
(ii) for each (Ψ, λ)-admissible sequence of partitions π there exists a ma-jorizing measure µ for which Dγ(T, µ) ≤ Cλγλ(T, π)
(iii) if there exists a (Ψ, λ)-admissible sequence of partitions π and if X iscentered then F (T ) ≤ Cλγλ(T, π)
(iv) if X is a centered Gaussian process with F (T ) < ∞ then there ex-ists a (Ψ2, λ)-admissible sequence of partitions π for which γλ(T, π) ≤CλF (T ).
Remark. Even though (i) and (ii) show that majorizing measuresand admissible partitions are equivalent (up to constant multipliers),the partition functional γλ(t, π, p) has several advantages over themajorizing measure functional γ(t, µ, δ). For example, suppose T1 isa subset of T . It is not obvious that finiteness of supt∈T γ(t, µ, 1) fora probability µ on T should imply existence of a Borel probabilitymeasure ν on T1 for which supt∈T1
γ(t, ν, 1) is finite; but it is a trivialmatter to restrict the partitions πk to T1. Compare with Problem [2].
Draft: 1jan2016 c©David Pollard
§7.2 An illuminating example 5
7.2 An illuminating exampleMajorizing::S:MM.vs.pack
The following example is based on a analogous construction by Fernique(1975, Exemple 5.5.2). (See also Talagrand (2005, page 44).) It brings outsome subtle differences between conditions for bounded sample paths basedon packing numbers and the more refined majorizing measure conditions.
As before define nk := 22k so that Ψ−12 (nk) grows like 2k/2. DefineN0 = 0
and Nk :=∑
i≤k ni = nk (1 + o(1)). Partition N into blocks B1, B2, . . . oflengths n1, n2, . . . ,
Bk := (Nk−1, Nk] := t ∈ N : Nk−1 < t ≤ Nk.
The process X = Xt : t ∈ N consists of a sequence of independentrandom variables with Xt ∼ N(0, σ2
t ) with σ2t = δ2
k = 2−k for all t inblock Bk. The natural metric is given by
d2X(s, t) = P|Xs −Xt|2 = σ2
s + σ2t = δ2
j + δ2k if s ∈ Bj and t ∈ Bk.
Under dX the index set N has diameter 1 and diam (∪j≥kBj) =√
2δk, sothat pack(r,N, dX) <∞ for each r > 0.
The X process and the metric space (N, dX) have the following proper-ties.
(i) P supt |Xt| <∞;
(ii)∫ D
0 Ψ−12 (pack(r,N, dX)) dr =∞;
(iii) there exists nested partitions πk of N with #πk = O(nk) for which
supt∑
k∈N2k/2diam (Ek(t)) <∞
(iv) there exists a majorizing measure.
Remark. The results in this Chapter show that (i), (iii), and (iv) areequivalent. Nevertheless, it helped my intuition to carry out explicitcalculations for each assertion.
Remember that
\E@ normal.tail\E@ normal.tail <9> Φ(u) := (2π)−1/2
∫ ∞u
e−y2/2 dy ≤ e−u2/2 for u ≥ 0.
Draft: 1jan2016 c©David Pollard
§7.2 An illuminating example 6
Proof of (i).Convert the expected value to an integral of tail bounds:
P supt |Xt| =∫ ∞
0Psupt |Xt| > x dx
≤ 2 +
∫ ∞2
∑tP|Xt| > x dx
= 2 +∑
k∈Nnk
∫ ∞2
Φ(x/σt) dx
The union bound would be too crude for x too close to zero. Invoke <9>twice to bound the integrals:∫ ∞
2Φ(x/δk) dx ≤ δk
∫ ∞2/δk
exp(−y2/2) dy ≤√
2π exp(−(2/δk)
2/2).
The kth term in the sum is less than a constant times exp(2k log 2− 2× 2k);the sum converges.
Remark. Even though the sample paths are bounded almost surely theydo oscillate a lot. As shown by Problem [5], maxt∈Bk
Xt concentratesnear δk
√2 log nk = c :=
√log 2 . By symmetry, maxs∈Bk
(−Xs) alsoconcentrates near c. Thus maxt,s∈Bk
|Xt −Xs| concentrates near 2c,even though diam(Bk) → 0 as k → ∞. A similar calculation providesanother explanation for why supt |Xt| is well behaved.
Proof of (ii).Note that dX(s, t) =
√2δk for distinct s, t in Bk, implying Thus
pack(r,N, dX) ≥ nk for√
2δk > r,
and ∫ √2δk
√2δk+1
Ψ−12 (pack(r,N, dX)) dr ≥
√2 (δk − δk+1) Ψ−1
2 (nk) ≥ c
where c =(√
2− 1)√
log 2. The sum over k is infinite.
Proof of (iii).Define πk to consist of all the singleton sets t for 1 ≤ t ≤ Nk, the
Draft: 1jan2016 c©David Pollard
§7.3 Measures versus partitions 7
blocks Bk+1, . . . , B2k, and I2k = (N2k,∞) = t ∈ N : t ≤ Nm. Thesize of πk equals Nk + k + 1 = O(nk). If t ∈ Bj then
diam(Ek(t)) =
diam(t) = 0 if j ≤ kdiam(Bj) =
√2δj if k < j ≤ 2k
diam(I2k) ≤√
2δ2k+1 if j > 2k.
It follows, for some constant C, that∑k∈N
2k/2diam(Ek(t))
≤ C∑
k2k/2 (δ2k+1k < j/2+ δjj/2 ≤ k < j)
≤ C2
∑k<j/2
2−k/2 + C22−j/2∑
k<j2k/2,
which is bounded uniformly in j.
Proof of (iv).Define µk to be the uniform distribution on Bk and µ =
∑k∈N 2−kµk. Once
again consider a t in some Bj . The distance of t to some other point s in Bk
is Dk :=√δ2j + δ2
k . For 0 < r ≤ 1
B[t, r] =
N if r ≥ D1
[Nk+1,∞) if Dk > r ≥ Dk+1 for k ∈ Nt for δj ≥ r.
so that
µB[t, r] =
1 if r ≥ D1
2−k if Dk > r ≥ Dk+1 for k ∈ N(2jnj)
−1 for δj ≥ r.
Note that Dk −Dk+1 = (δ2k − δ2
k+1)/(Dk +Dk+1) = O(δk), uniformly in j.∫ D
0Ψ−1
2
(1
µB[t, r]
)dr
= Ψ−12 (1)(1−D1) +
∑k∈N
Ψ−12 (2k)(Dk −Dk+1) + δjΨ
−12 (2jnj)
≤ Ψ−12 (1) +O
(∑k∈N
√k2−k/2
)+O(δj
√2j )
which is uniformly bounded in j.
Draft: 1jan2016 c©David Pollard
§7.3 Measures versus partitions 8
7.3 Measures versus partitionsMajorizing::S:MM.part
This Section explains the essential equivalence between majorizing measuresand admissible sequences of partitions.
7.3.1 From measures to partitionsMajorizing::MM.to.PART
The construction is easier to understand when written in a form that hasnothing to do with Young functions and convexity. The next Lemma provesexistence of a (not necessarily nested) sequence of partitions πk of T thathas properties analogous to <6>. Be sure to notice the simple trick wherethe properties of a measure are needed: if A1, . . . , An are disjoint Borelsubsets of T with mini µAi ≥ ε then n ≤ 1/ε.
Majorizing::mm.partition <10> Lemma. Suppose h : (0, 1]→ R+ is continuous and strictly decreasing withh(r) → ∞ as r → 0 and h(1) = 1. Let µ be a Borel probability measureon T . Define
γh(t, µ, δ) :=
∫ δ
0h (rµB[t, rD]) dr for t ∈ T and δ ∈ (0, 1],
γh(T, µ, δ) := supt∈T γ(t, µ, δ).
Suppose γ(T, µ, 1) <∞. Then for each λ > 1 and for each k ∈ N there exista finite partition πk of T for which
(i) #πk ≤ 2/h−1(λk)
(ii) for each p ∈ N and t ∈ T ,∑k≥p
λkdiam(Ek(t)) ≤8Dλ
λ− 1γh(T, µ, δp)
where δp := γh(T, µ, 1)/λp.
Proof Without loss of generality we may assume D = 1. For each j ∈ N0
define rj = 2−j and abbreviate B[t, rj ] to Bj [t].The function H(t, r) := h(rµB[t, r]) is decreasing in r. The sequence
Hj(t) := H(t, rj) is increasing in j and
1 = H0(t) ≤ H1(t) ≤ · · · → ∞ as j →∞, for each t ∈ T .
For each k ∈ N define εk := h−1(λk) and
Jk(t) := maxj ∈ N0 : Hj(t) ≤ λk = maxj ∈ N0 : rjµBj [t] ≥ εk.
Draft: 1jan2016 c©David Pollard
§7.3 Measures versus partitions 9
By construction,
Jk(t) = j iff Hj(t) ≤ λk < Hj+1(t)
iff rjµBj [t] ≥ εk > rj+1µBj+1[t].\E@ Jk.def\E@ Jk.def <11>
For each k ∈ N the sets Cj,k := t ∈ T : Jk(t) = j form a partition of T .A refinement of that partition will define πk.
tiEi,j
Cj = t ∈T: Jk(t)=j
C0 C1
Construction of πkWith k fixed for the moment, let me omit some j and k subscripts whiledefining πk, in order to avoid a lot of notational clutter. Partition each
nonempty Cj into subsets Ei,j : i = 1, . . . , nj, each havingdiameter at most 4rj , by first choosing a maximal 2rj-separatedsubset ti : i = 1, . . . , nj of Cj . The balls Bj [ti] are disjointand, by <11>, each has µ measure at least εk/rj . From the factthat µ(T ) = 1 it follows—the key idea—that nj ≤ rj/εk. Bymaximality of the separated set, each t in Cj must lie withina distance 2rj of some ti. Define Ei,j to be the set of those tin Cj for which ti is the closest, with the smallest i being chosenin the case of ties. The total number of Ei,j ’s, the sum of allthe nj ’s, is at most
∑j≥0 rj/εk = 2/εk, as asserted by (i).
If Jk(t) = j then Ek(t) is one of the Ei,j sets, which has diameter atmost 4rj . Thus
diam(Ek(t)) ≤ 4× 2−Jk(t) = 4∑
j∈N0
Jk(t) < jrj .
The last expansion opens the way for another cute argument based on in-terchanging the order of summation in two geometric series.
λp
δpr
H(t,r)
rjrjrj+1
Hj(t)
(t)
Control of the sum in (ii)By the equivalences in <11>, the inequality Jk(t) < j is equal to the in-equality Hj(t) > λk. In addition, if k ≥ p then we also have Hj(t) > λp,that is, rj < δp(t), where H(t, δp(t)) = λp. To eliminate the dependenceof δp(t) on t, note that
λpδp(t) ≤∫ 1
0H(t, r) dr = γh(t, µ, 1) ≤ γh(T, µ, 1),
Draft: 1jan2016 c©David Pollard
§7.3 Measures versus partitions 10
which implies δp(t) ≤ δp := γh(T, µ, 1)/λp. Thus∑k≥p
λkdiam(Ek(t)) ≤ 4∑k≥p
λk∑
j∈N0
Jk(t) < jrj
≤ 4∑j∈N0
rjrj < δp∑k≥p
λkHj(t) > λk.
The sum over k is at most 1 + λ+ · · ·+ λK ≤ λK+1/(λ− 1) where K is thevalue for which λK < Hj(t) ≤ λK+1. The double sum is bounded by
4∑
j∈N0
rjrj < δpλHj(t)
λ− 1≤ 8λ
λ− 1γh(t, µ, δp),
as asserted.
Majorizing::MM.partition.eg <12> Example. Suppose µ is a majorizing measure on T , that is, suppose γ(T, µ) =supt∈T γ(t, µ, 1) is finite, where
γ(t, µ, δ) :=
∫ δ
0Ψ−1
(1
µB[t,Dr]
)dr.
Define h(x) := Ψ−1(κ/x), where κ := Ψ(1). Then, for some constant c1
(corresponding to u1 = 1 in inequality <4>),
γh(t, µ, δ) =
∫ δ
0h (rµB[t, rD]) dr ≤ c1
(γ(t, µ, δ) +
∫ δ
0Ψ−1(κ/r) dr
).
The growth condition<3> ensures (Problem [1]) that g(δ) :=∫ δ
0 Ψ−1(κ/r) dris finite for 0 < δ ≤ 1. In particular, there exists some constant CΨ depend-ing only on Ψ for which
γh(T, µ, 1) ≤ CΨγ(T, µ, 1).
Lemma <10> gives a sequence of (not necessarily nested) partitions πkwith
#πk ≤ 2/h−1(λk) = 2Ψ(λk)/κ
and, for some constant K = Kλ,Ψ,
\E@ gam.p\E@ gam.p <13> γλ(T, π, p) ≤ DK (γ(T, µ, δ) + g(δ)) .
Draft: 1jan2016 c©David Pollard
§7.3 Measures versus partitions 11
In particular,
γλ(T, π) ≤ DK (γ(T, µ) + g(1)) .
Because γ(T, µ) ≥ Ψ−1(2)/3 and g(1) < ∞, there exists some constant K1
(depending on Ψ) for which g(1) ≤ K1γ(T, µ). At the cost of an extrafactor 1 +K1 the g(1) can be discarded from the upper bound.
You might well object that the 2Ψ(λk)/κ bound for the size of πk doesnot quite match the Ψ(λk) upper bound for (Ψ, λ)-admissibility. That istrue, but not a great concern. You could safely skip the following nitpickingexplanation.
We could just absorb an extra factor of 2/κ into the upper bounds.Alternatively, we could find a positive integer ` for which 2/κ ≤ λ` so that
#πk ≤ λ`Ψ(λk) ≤ Ψ(λk+`).
We could then define a new sequence of partitions,
πk =
π0 for k ≤ `πk−` for k > `
,
for which #πk ≤ Ψ(λk) and γ(T, π) ≤∑
k≤` λkD + λ`γλ(T, π). The fact
that γλ(T, π) ≥ D ensures that γ(T, π) ≤ K2γλ(T, π), for some K2 depend-ing on λ and Ψ. And so on.
A similar modification bridges the gap between Talgrand’s admissibiltyand (Ψ, λ)-admissibility. Suppose Ψ(x) ≤ exp(xα) for some α > 0. If wechoose λ = 21/α then
#πk ≤ Ψ(λk) ≤ exp(2k) ≤ 22k+1.
While we are at it we can also take care of the nesting, replacing πk by thecommon refinement of π0, . . . , πk, a partition containing at most∏
i≤k22i+1
= 2∑i≤k 2i+1
≤ 22k+2
members. More relabelling, with two more copies of π0 at the start ofthe sequence then gives the desired nested sequence with the desired upperbound on the cardinality.
Draft: 1jan2016 c©David Pollard
§7.3 Measures versus partitions 12
7.3.2 From partitions to majorizing measureMajorizing::part.to.MM
The path from (Ψ, λ)-admissible partitions back to a majorizing measure ismuch more direct, particularly so if the partitions are nested.
For each k let Sk consist of a single point from each E in πk, so that
#Sk = Nk := #πk ≤ Ψ(λk).
Let µk denote the uniform distribution on Sk, with mass 1/Nk on each point.The exponential growth of Ψ ensures convergence of
∑k 1/Ψ(λk) and
existence of an ` ∈ N for which∑
k∈N0θk ≤ 1, where θk := 1/Ψ(λk+`).
Define µ := ν +∑
k∈N0θkµk with ν an arbitrary measure having total
mass 1−∑
k θk.Write ∆k(t) for diam(Ek(t))/D. The assumed finiteness of γλ(T, π) en-
sures that ∆k(t) → 0 as k → ∞, although the convergence might not bemonotone if the partitions are not nested. For each t ∈ T and each r ∈ [0, 1],the closed ball B[t, rD] contains the set Ek(t) if r ≥ ∆k(t). In that case,
µB[t, rD] ≥ µEk(t) ≥ θk/Nk,
which implies
G(t, r) := Ψ−1(1/µB[t, rD]) ≤ Ψ−1(
Ψ(λk+`)Ψ(λk))≤ c0
(λk+` + λk
).
For each t ∈ T and r ∈ (0, 1) there exists a positive integer k forwhich ∆k(t) ≤ r < ∆k−1(t): just choose the first integer k with ∆k(t) ≤ r.That is, for some subset Nt of N, the open interval (0, 1) is covered by aunion of intervals ∪k∈Nt [∆k(t),∆k−1(t)). It follows that
γ(t, µ) =
∫ 1
0G(t, r) dr
≤∑
k∈Nt
∫∆k(t) ≤ r < ∆k−1(t)G(t, r) dr
≤∑
k∈Nt2c0λ
k+`∆k−1(t)
≤ 2c0λ`+1γλ(t, π) ≤ Cλγλ(T, π).
Remark. You might have been expecting an analog of inequality <13>,something of the form
γ(T, µ, δp) ≤ Cλγλ(T, π, p) for all p ∈ N0.
Draft: 1jan2016 c©David Pollard
§7.4 From partitions to expected suprema 13
Such a universal bound could not be true in general. For example,consider the case where T consists of exactly n points. There must besome t for which µt ≤ 1/n, which implies
γ(t, µ, δ) ≥∫ δ
0
Ψ−1(n) dr = δΨ−1(n)
whereas γλ(T, π, p) is zero when p is large enough that each Ep(t) is asingleton.
7.4 From partitions to expected supremaMajorizing::S:part-fnal
The chaining argument, starting from a sequence π = πk : k ∈ N0of (Ψ, λ)-admissible partitions (with γλ(T, π) < ∞), is similar to one al-ready described in Section 4.7 if the partitions are nested. As Example <12>showed, at least for any of the Ψα Young functions, some refining and rela-belling would create a nested sequence. However, I have asserted that thenesting is more of a convenience than a necessity, so let me take this oppor-tunity to show you how chaining works when the approximating subsets Siare not nested.
As in subsection 7.3.2, for each k let Sk = tE : E ∈ πk consist of asingle point tE chosen from each E in πk, so that
#Sk = Nk := #πk ≤ Ψ(λk).
Define maps `k : T → Sk by `kt = tE if Ek(t) = E ∈ πk. For each t in Smand each p ≤ m we again have a chain of points
t 7→ `m−1t 7→ `m−2t . . . 7→ `pt ∈ Sp
linking t to a point of Sp.As in Section 4.7 we still have an upper bound via the triangle inequality
applied to the sum of increments down the chain,
\E@ Xt.incr\E@ Xt.incr <14> |X(t)−X(`pt)| ≤∑m−1
i=p|X(`i+1t)−X(`it)|.
The choice of the `i’s ensures that
d(`i+1t, `it) ≤ d(`i+1t, t) + d(t, `it) ≤ ∆i(t) + ∆i+1(t)
where ∆k(t) := diam(Ek(t). Instead of the uniform control over the diam-eters ∆i(t) provided by the packing numbers we now have indirect controlvia the bound
\E@ MMp\E@ MMp <15>∑
k≥pλk∆k(t) ≤ γλ(T, π, p).
Draft: 1jan2016 c©David Pollard
§7.4 From partitions to expected suprema 14
The lack of nesting has also created a subtle difference from the previousapproach, due the new choice of the linking functions `k.
Remark. Actually inequality <15> does give us a uniform bound,∆k(t) ≤ γλ(T, π, p)/λk, but that rate of decrease is not enough to offsetthe growth of #Sk.
In Section 4.7 the `k was defined as a map from Sk+1 to Sk, with Lkdefined by following a chain of `i-links between succesive Si+1 and Si; thedomain of `k is now the whole of T and there are no Lk maps. Previously,if Li+1t = Li+1t
′ then Lit = Lit′; now two distinct points t and t′ in Sm
might map to the same point of Si+1 but different points of Si. That is, tand t′ might belong to the same E ∈ πi+1, which ensures that `i+1t =`i+1t
′, but to different sets, F and F ′, of πi. We still have an analog ofthe bound <14> for |X(t′)−X(`pt
′)|, but the increments X(`i+1t)−X(`it)and X(`i+1t
′)−X(`it′) need not be same even though Ei+1(t) = Ei+1(t′).
E ∈ πi+1
F' ∈ πiF ∈ πi tFtE
tF'
t t'Not to despair. The difference X(`i+1t) −X(`it) is still taken between
one point in Si+1 and one point in Si. There are at most NiNi+1 suchpairs; with nested partitions we had at most Ni+1 links to worry about.Fortunately the growth assumption on Ψ makes it just as easy to handle aset of NiNi+1 increments as to handle a set of Ni+1 increments. Define
\E@ max.pairs\E@ max.pairs <16> Mi := max
|X(si+1)−X(si)|
d(si+1, si): si ∈ Si, si+1 ∈ Si+1, d(si+1, si) > 0
.
Then, for each t in Sm,
|X(`i+1t)−X(`it)| ≤Mid(`i+1t, `it) ≤Mi
[∆i(t) + ∆i+1(t)
]and on the set Ωy := Mi ≤ 2yλi for all i,
|X(t)−X(`pt)| ≤ 2y∑m−1
i=pλi[∆i(t) + ∆i+1(t)
]≤ 4yγλ(T, π, p).
Draft: 1jan2016 c©David Pollard
§7.4 From partitions to expected suprema 15
The inequality holds (on Ωy) uniformly for t in Sm, even though we do nothave strong enough uniform control over the individual ∆i(t) diameters. Wehave only to check that PΩy is close enough to 1 if y is large enough:
PΩcy = P∃i ≥ p : Mi > 2yλi
≤∑
i≥pP⋃
si,si+1
|X(si+1)−X(si)| > 2yλid(si, si+1)
≤∑
i≥pNiNi+1β(2yλi) by <2>
≤∑
i≥p
Ψ(λi)Ψ(λi+1)
Ψ(2yλi).
As usual, the β contribution has to be small enough to kill off the NiNi+1
contribution with enough in reserve to make the sum nicely convergent andsmall. Two appeals to the growth assumption <3> ensure that
Ψ(λi)Ψ(λi+1)Ψ(yλi/c0) ≤ Ψ(2yλi) if y ≥ Kλ := 2c20(1 + λ).
For such y,
PΩcy ≤
∑i≥p
1
Ψ(yλi/c0)≤∑
i≥p
1
λiΨ(y/c0)≤
K ′λΨ(y/c0)
,
where K ′λ :=∑
i≥0 λ−i <∞. That is,
\E@ MM.chain\E@ MM.chain <17> Pmaxt∈Sm
|X(t)−X(`pt)| > 4yγλ(T, π, p) ≤K ′λ
Ψ(y/c0)for y ≥ Kλ,
where Kλ and K ′λ are constants that depend on λ and Ψ.If we integrate the last inequality with respect to y we reecover a bound
on the expected value:
Pmaxt∈Sm |X(t)−X(`pt)| ≤ 4γλ(T, π, p)
(∫ Kλ
0dy +
∫ ∞Kλ
K ′λΨ(y/c0)
dy
).
That is,
\E@ MM.mean.approx\E@ MM.mean.approx <18> Pmaxt∈Sm |X(t)−X(`pt)| ≤ Cλγλ(T, π, p) for each m ≥ p ∈ N0,
where the constant Cλ depends only on λ. In particular, if the X process iscenetered (zero means) and we take p = 0 then let m tend to infinity we get
F (T ) ≤ P supt∈T |Xt| ≤ Cλγλ(T, π).
Draft: 1jan2016 c©David Pollard
§7.5 The role of Gaussianity 16
The stronger inequality <18> also delivers a result stronger than bound-edness of sample paths if γλ(T, π, p) goes to zero as p tends to infinity. Re-call from Section 5.3 that X has uniformly continuous sample paths on thecountable index set T if: for each η > 0 and ε > 0 there exists a δ > 0 suchthat Posc(δ,X, T ) > η < ε. It suffices to prove existence of such a δ (notdepending on m) for which
\E@ osc.Sm\E@ osc.Sm <19> Posc(δ,X, Sm) > η < ε for every m.
Majorizing::MM.unifcty <20> Example. An analog of the argument used in Section 4.5 to bound a normof the oscillation also establishes <19> if γλ(T, π, p)→ 0 as p→∞. In factthe argument is a little easier because π has already created the equivalenceclasses used in the proof.
Fix m. For each p define Λp := maxt∈Sm |X(t)−X(`pt)|. Invoking <18>,choose p so that PΛp ≤ Cλγλ(T, π, p) < ηε.
For distinct E,F ∈ πp choose points τE,F ∈ E and τF,E ∈ F such that
d(τE,F , τF,E) := d(E,F ) := mind(s, t) : s ∈ E ∩ Sm, t ∈ F ∩ Sm.
Define
Mp := max
|X(τE,F )−X(τF,E)|
d(E,F ): E,F ∈ πp and d(E,F ) > 0
,
a maximum of fewer than(Np2
)< N2
p random variables each with Ψ-normat most 1. From one of the maximal inequalities in Section 4.4, we have theupper bound PMp ≤ Ψ−1
(N2p
).
If s, t ∈ Sm ∩ E for some E ∈ πp then |Xs −Xt| ≤ 2Λp. If s ∈ Sm ∩ Eand t ∈ Sm ∩ F for distinct E,F in πp then `ps = `pτE,F and `pτF,E = `pt.If, in addition, d(s, t) < δ then d(E,F ) < δ and
|Xs −Xt| ≤ |Xs −X(`ps)|+ |X(`P τE,F )−X(τE,F )|+ |X(τE,F −X(τF,E)|+ |X(τF,E −X(`pτF,E |+ |X(`pt)−Xt|
≤ 4Λp + δMp.
It follows that
Posc(δ,X, Sm) ≤ 4PΛp + δPMp ≤ 4ηε+ δΨ−1(N2p ).
If δ is small enough then Posc(δ,X, Sm) > η ≤ 5ηε/η.
Draft: 1jan2016 c©David Pollard
§7.5 The role of Gaussianity 17
7.5 The role of GaussianityMajorizing::S:Gaussfnal
Suppose X = Xt : t ∈ T is a centered Gaussian process, with T atworst countably infinite. Equip T with the metric dX for which d2
X(s, t) =var(Xs −Xt).
For each λ > 1 and Ψ2(x) = ex2 − 1, Theorem <8> part (iv) asserts:
if F (T ) := P supt∈T Xt is finite then there exists a (Ψ2, λ)-admissible sequence of partitions π for which γλ(T, π) ≤ CλF (T ).
Talagrand’s argument—adapted to my setting—depends on just twoproperties of Gaussians, both of which are discussed in Chapter 6.
Majorizing::Borell.subg <21> Borell’s inequality. There exists a universal constant CBor for which: ifY = Yt : t ∈ T is Gaussian process with T finite or countably infinite andboth m := P supt∈T Yt < and σ2 := supt∈T var(Yt) are finite then
‖supt∈T Yt −m‖Ψ2≤ CBorσ.
Majorizing::Sudakov <22> Sudakov’s inequality. There exists a universal constant CSud > 0 forwhich: if Y := (Y1, Y2, . . . , Yn) has a centered multivariate normal distri-bution, with P|Yj − Yk|2 ≥ δ2 for all j 6= k then
Pmaxi≤n Yi ≥ CSudδΨ−12 (n) for n ≥ 2.
Remark. I have replaced the lower bound, which in Chapter 6 was aconstant multiple of δ
√log2 n , by a multiple of δΨ−1
2 (n), in order tostress the role played by Ψ2. The substitution is justified by the factthat log2 n/ log(1 + n) > (0.95)2 for n ≥ 2.
The next Lemma, which corresponds to Talagrand (2005, Proposition 2.1.4)and Talagrand (2014, Proposition 2.4.9), captures everything we need toknow about Gausianity before constructing the admissible sequence of par-titions.
Remark. It is truly amazing that Talagrand’s method also works forother sorts of functional, making it applicable to many nongaussianproblems. The relevant theory for general functionals, and a proof ofhow they generate nice nested partitions of the index set, was presentedin a brutal way by Talagrand (2005, Section 1.3) and more gently byTalagrand (2014, Section 2.3). I found it enlightening to see how theGaussian case works before plunging into the general setting.
Draft: 1jan2016 c©David Pollard
§7.6 From functional to nested partitions 18
Majorizing::maxfnal <23> Lemma. There exists a universal constant r0 > 2 for which the followingproperty holds for every δ > 0. If t1, . . . , tn, with n ≥ 2, is a r0δ-separatedsubset of T , and if Hi ⊆ B[ti, δ] for each i, then
F(⋃
i≤nHi
)≥ δΨ−1
2 (n) + mini≤n F (Hi).
Remark. The balls B[ti, δ] and B[tj , δ] are disjoint for i 6= j.
Proof Write H for⋃i≤nHi and define Yi = supt∈Hi(Xt −Xti). Note that
PYi = F (Hi) and supt∈Hi var(Xt − Xti) ≤ δ2 because dX(t, ti) ≤ δ for tin Hi. By Borell’s inequality, the variable Wi = Yi − F (Hi) is centeredsubgaussian with ‖Wi‖Ψ2
≤ CBorδ. As in Section 4.4,
Pmaxi≤n |Wi| ≤ CBorδΨ−1(n)
because Ψ (Pmaxi≤n |Wi|/(CBorδ)) ≤∑
i≤n PΨ2(|Wi|/(CBorδ)). By Sudakov’sinequality,
Pmaxi≤nXti ≥ CSudr0δΨ−1(n).
Calculate.
supt∈H Xt = maxi≤n (Xti + Yi)
= maxi≤n (Xti + F (Hi) +Wi)
≥ maxi≤nXti + mini≤n F (Hi)−maxi≤n |Wi|.
Take expectations to deduce
F (H) ≥ CSudr0δΨ−1(n) + mini≤n F (Hi)− CBorδΨ
−1(n)
Choose r0 := max(2.1, (1 + CBor)/CSud).
7.6 From functional to nested partitionsMajorizing::S:bdd
Once again let X = Xt : t ∈ T be a centered Gaussian process, withindex set T at worst countably infinite (to ward off measurability evils) andequipped with the metric dX . For each λ > 1 this Section shows how toconstruct a (Ψ2, λ)-admissible sequence of partitions π for which γλ(T, π) ≤CλF (T ), where Cλ is a constant that depends only on λ.
The partitions will actually be nested (despite all that talk in Section 7.4explaining why nesting is not essential). The argument is recursive. To give
Draft: 1jan2016 c©David Pollard
§7.6 From functional to nested partitions 19
myself a little more wiggle room, I’ll relax the constraint that #πk ≤ Ψ2(λk),by requiring only that #πk ≤ mk := bΨ2(λk+`)c, where the integer ` ischosen so that m1 − 1 > Ψ2(λ1+`/2) ≥ 2. You would need to engage insome relabelling in the nitpicking style of Example <12> to expunge thesuperfluous `.
The recursive strategy makes use of a localized form of F ,
F (A, δ) := supt∈T
F (A ∩B[t, δ]) = supt∈T
P supXs : s ∈ A and d(s, t) ≤ δ.
The strategy is slightly greedy. We start with π0 = T, constructing π1
as a partition of T into at most m1 subsets. Most of those subsets will havediameter smaller than 2D/r2, where r is chosen (strategically) larger thanthe r0 of Lemma <23>. Those will be called the ‘small’ sets. At most one ofthe sets (call it Ep) in π1 will be ‘big’, in the sense that its diameter might notbe smaller than D but, via Lemma <23>, we gain control over F (Ep, D/r
2).We then continue recursively, partitioning each member of π1 into at
most m2 subsets—mostly small, with no more than one of them ‘big’. Andso on. The trick is to anticipate at each step what will happen after twomore recursive steps. If the sets of πk+1 are thought of as the children of πkthen the strategy is: be greedy for the great-grandchildren. As Talagrand(2005, page 22, last line) noted, this is the main clever idea in the proof.
For each partitioning subset E the proof keeps track of not just its diam-eter but also of a closed ball, which contains E, of radius ρ(E) and centeredat a point τ(E) in T . The center τ(E) need not belong to E. The diameterof E is at most 2ρ(E).
The next Lemma captures the details of the recursive partitioning step.
Majorizing::gauss.partition <24> Lemma. Suppose A is a subset of T contained within some closed ball B[τ, ρ]and r−1 is greater than the r0 from Theorem <23>. Then for each positiveinteger m ≥ 2 and each ε > 0 there exists a finite partition π = E1, . . . , Epof A with p ≤ m and points ti ∈ T for which:
(i) Ei ⊆ B[ti, ρ/r], that is, τ(Ei) = ti and ρ(Ei) = ρ/r, for each i < m(the ‘small sets’)
(ii) if p = m (a ‘big set’) then Em ⊆ B[τ, ρ] (of course, so τ(Em) = τ andρ(Em) = ρ) and
F (Em, ρ/r2) + (ρ/r2)Ψ−1
2 (m− 1) ≤ F (A) + ε
Draft: 1jan2016 c©David Pollard
§7.6 From functional to nested partitions 20
Proof Define δ = ρ/r2. Construct the Ei sets recursively in a semi-greedyfashion: Instead of trying to make each F (Ei) as large as possible, lookahead to a recursive application of the Lemma within each Ei by trying tomake Fδ(Ei) as large as possible. Algorithmically:
procedure make small setsi← 1 and Ai ← Awhile i < m do
choose ti ∈ T with F (Ai ∩B[ti, δ]) > F (Ai, δ)− εHi ← Ai ∩B[ti, δ]Ei ← Ai ∩B[ti, rδ] % so τ(Ei) = ti and ρ(Ei) = rδAi+1 ← Ai\Eiif Ai+1 = ∅ then
partition complete; exit loopend ifi← i+ 1
end whileend procedure
The procedure ends either when Ai+1 = ∅, which means that A is alreadycompletely partitioned into fewer than m small sets, or when i = m. Inthe latter case, we are left with a nonempty subset Am and disjoint sub-sets E1, . . . , Em−1 with each Ei containing a subset Hi.
B[τ,ρ]B[t1, δ]
A2
TB[t1, rδ]
B[t1, r0δ]
The picture shows the situation after the first passage through the loop.The set A (which is represented by a rectangle, rather than a region definedby intersections of closed balls and their complements) sits inside the bigball B[τ, ρ]. For no good reason, the point t1 is shown sitting inside the sameball. The shaded region shows E1, the part of A carved out by B[t1, rδ].The dotted circle is the ball B[t1, r0δ], which has not yet been mentioned inthe construction. If A2 6= ∅, I claim that the point t2 cannot lie within the
Draft: 1jan2016 c©David Pollard
§7.6 From functional to nested partitions 21
ball B[t1, r0δ], for otherwise (because r > 1 + r0)
A2 ∩B[t2, δ] ⊆ A2 ∩B[t1, rδ] = ∅.
Maybe I should have defined F (∅) = −∞ to make sure that an empty setcannot get close to the supremum that defines F (A2, δ). A similar argumentworks for each of the succeeding runs through the loop, which ensures thatthe set t1, . . . , tm−1 is r0δ-separated. Lemma <23> then tells us that
F (A) ≥ F (∪i<mHi) ≥ δΨ−12 (m− 1) + mini<m (F (Ai, δ)− ε) .
If we choose Em = Am, a subset of every Ai, then the minimum is greaterthan F (Em, δ), as asserted by (ii).
Remark. In an earlier version of the construction from Lemma <24>,Talagrand (1992) allowed the loop to run until Ai+1 = ∅. It had tostop after a finite number of steps because finiteness of F (T ) impliesthat T is totally bounded, and all the ti’s are r0δ-separated: at somepoint Ai+1 must be empty. (Actually, Talgrand assumed finiteness of Tin that paper; total boundedness was not an issue.)
Finally I can explain in detail how to construct the nested sequence ofpartitions for which
\E@ the.task\E@ the.task <25> supt∈T∑
k≥0λkdiam(Ek(t)) ≤ CλF (T ).
We start with π0 = T and ρ(T ) = D. With mk = bΨ2(λk+`)c for a largeenough ` we ensure that mk ≥ 2 and Ψ−1
2 (mk−1) ≥ λk+`/2 for k ∈ N. Defineεk = F (T )/2k. Define r := 2λ + r0. Apply Lemma <24> first with A = Tand m = m1 and ε = ε1 to construct π1. Then apply the Lemma to each Ain π1, with m = m2 and ε = ε2 to construct π2. And so on.
Consider any t in T . As that t stays fixed for the rest of the proof I’ll dropit from the notation, writing Ek instead of Ek(t) and ρk instead of ρ(Ek(t)).The k = 0 term in <25> causes no problem because (Problem [3])
diam(E0) = D ≤√
2πF (T ).
For ease of notation let me call E0 a ‘big’ term. The sequence of setsE0 ⊇ E1 ⊇ E2 . . . consists of ‘big’ and ‘small’ sets. If Ek is small thenρk = ρk−1/r. If k ≥ 1 and Ek is big then ρk = ρk−1 and
\E@ bigEk\E@ bigEk <26> εk + F (Ek−1) ≥ F (Ek, ρk/r2) + c0ρkλ
k
Draft: 1jan2016 c©David Pollard
§7.6 From functional to nested partitions 22
where c0 := 12λ
`/(2λ+ r0)2 is a positive constant that depends only on λ.The inequality 2F (T ) ≥ εk+F (Ek−1, ρk−1) ensures ρkλ
k ≤ 2F (T )/c0 forbig Ek. There can be no infinite string of consecutive big sets in the Eksequence: If Ek, Ek+1, . . . , Ek+n are all big then ρk = ρk+1 = · · · = ρk+n
and λn ≤ 2F (T )/(c0ρkλk), constraining the size of n.
Each finite stretch B1, B2, . . . of consecutive big Ek’s is interspersed witha stretch of small sets. In fact one stretch of small Ek’s might be infinite, inwhich case there are no more big sets. The pattern looks like:
B1︷︸︸︷bbb
S1︷ ︸︸ ︷sssss
B2︷︸︸︷bb
S2︷︸︸︷ss
B3︷ ︸︸ ︷bbbbb
S3︷ ︸︸ ︷sssss
B4︷ ︸︸ ︷bbbbb
S4︷︸︸︷sss
B5︷︸︸︷b
S5︷ ︸︸ ︷sssss . . .
↓ ↓ ↓ ↓ ↓k1 k2 k3 k4 k5
The index kj = k(j) gives the position of the last big Ek in block Bj . Theset K = k1, k2, . . . might be finite or infinite.
I assert that∑k≥1
ρkλk ≤ c1
∑k∈K
ρkλk
for a constant c1 that depends only on λ. Indeed, if kj = α then ραλα
dominates the sum of all the terms in blocks Bj and Sj , in the sense that∑k∈Bj
ρkλk ≤ ρα(λα + λα−1 + . . . ) ≤ ραλα/(1− 1/λ)∑
k∈Sjρkλ
k ≤(λα+1ρα/r + λα+2ρα/r
2 + . . .)≤ ραλα/(1− λ/r).
Now you know why I made sure that r > 2λ.If #K ≤ 2 then
∑k∈K ρkλ
k ≤ 4F (T )/c0. The case where #K > 2 ismore interesting; it involves great-grandchildren.
Write fk for F (Ek, ρk/r2). For big Ek, inequality <26> becomes
εk + F (Ek−1) ≥ fk + c0ρkλk
Suppose α = kj and β = kj+2. Both Eα and Eβ are big and between themare at least two small Ek sets: at worst bsbsb. We have ρα/r
2 ≥ ρβ = ρβ−1
and
εβ + F (Eβ−1) ≥ fβ + c0ρβλβ.
Draft: 1jan2016 c©David Pollard
§7.7 Uniformly continuous Gaussian sample paths 23
The set Eβ−1 is either a great-grandchild of Eα or a descendant of such agreat-grandchild. In any case,
fα = F (Eα, ρα/r2)
≥ F (Eα ∩B[τ(Eβ−1), ρβ−1])
≥ F (Eβ−1) because Eβ−1 ⊆ Eα ∩B[τ(Eβ−1), ρβ−1]
≥ fβ + c0ρβλβ − εβ.
That is,
ρk(j+2)λk(j+2) ≤ εk(j+2)−1 + fk(j) − fk(j+2).
If we sum over all odd values of j the right-hand side telescopes, leavingsomething smaller than a constant multiple of F (T ). The argument foreven values is similar. The sum
∑k∈K ρkλ
k is bounded above by a constantmultiple of F (T ).
7.7 Uniformly continuous Gaussian sample pathsMajorizing::S:cts
Suppose X = Xt : t ∈ T is a centered Gaussian process, with T countable.Even without Gaussianity, Example <20> shows that existence of a (Ψ, λ)-admissible sequence π for which γλ(T, π, p) → 0 as p → ∞ implies uniformcontinuity of the sample paths. The converse is also true.
By virtue of Lemma <10> and Example <12> it suffices to prove ananalogous result for majorizimg probability measures.
Majorizing::MM.cts <27> Theorem. If X is a centered Gaussian process with uniformly continuoussample paths and F (T ) <∞ then there exists a Borel probability measure µon T for which
supt∈T
∫ δ
0Ψ−1
2 (1/µB[t, r]) dr → 0 as δ → 0.
Remark. The following argument is based on a proof by Talagrand(1987, page 122-124), who attributed the method of proof to Fernique.See Talagrand (2014, Appendix B) for the corresponding proof viaadmissible sequences of partitions.
The proof combines the virtues of packing numbers and majorizingmeasures.
The assumption that F (T ) is finite can be weakened to anassumption that T is totally bounded under its dX metric, becauseuniformly continuous functions on totally bounded sets are necessarilybounded.
Draft: 1jan2016 c©David Pollard
§7.8 Problems 24
Proof Uniform continuity of sample paths implies that
osc(δ,X, T ) := sup|Xs−Xt| : d(s, t) < δ → 0 almost surely as δ → 0.
And the function supt∈T |Xt| is integrable, because
P supt∈T|Xt| ≤ P sup
t∈TXt + P sup
t∈T(−Xt) = 2F (T ) <∞.
It follows by Dominated Convergence that βk := Posc(δk, X, T ) → 0 forany sequence δk → 0.
Finiteness of F (T ) and Sudakov’s inequality together imply that T istotally bounded. As a consequence, for each δk there is a finite partition πkof T into finitely many, say Nk, sets of diameter at most δk.
For each member E of the partition πk choose a point τ(E). Then
F (E) := P supt∈E Xt = P supt∈E(Xt −Xτ(E)) ≤ βk,
the final inequality coming from the fact that dX(t, τ(E)) ≤ diam(E) ≤δk. Through the equivalences noted in Theorem <8>, there exists a Borelprobability measure µE on E for which
supt∈E
∫ diam(E)
0Ψ−1
2 (1/µEB[t, r]) dr ≤ Cβk,
where C is a universal constant.Define a probability measure on T by
µ :=∑
k≥1
1
2kNk
∑E∈πk
µE .
To show that µ has property asserted by the Theorem, consider a fixed tin T . To simplify notation, abbreviate Ek(t) to Ek and µEk(t) to µk. Also
remember that Ψ−12 (uv) ≤ Ψ−1
2 (u)+Ψ−12 (v) for all u, v ≥ 0. Then for each k
and each δ ≤ δk,∫ δ
0Ψ−1
2 (1/µB[t, r]) dr ≤∫ δ
0Ψ−1
2
(2kNk
µkB[t, r]
)dr
≤∫ δ
0Ψ−1
2 (2kNk) dr +
∫ δk
0Ψ−1
2 (1/µkB[t, r]) dr
≤ δΨ−12 (2kNk) + Cβk + δkΨ
−12 (1).
The last inequality holds because µkB[t, r] = 1 for r ≥ diam(Ek). The upperbound is uniform in t. The rest is easy. Given ε > 0, just choose k then δto make the upper bound less than ε.
Draft: 1jan2016 c©David Pollard
§7.9 Notes 25
7.8 ProblemsMajorizing::S:Problems
[1] Suppose the Young function Ψ satisfies the growth conditionMajorizing::P:Psi.growth
Ψ(x)2 ≤ Ψ(cx) whenever x ≥ x0
for some constant c ≥ 2. For some α > 0, show that Ψ(y)/ exp(yα) → ∞as y → ∞. Hint: For some b > x0 we have Ψ(b) ≥ e. Show that Ψ(ckb) ≥exp(2k). Consider y with ck+1b > y ≥ ckb.
[2] Suppose µ is a probability measure that concentrates on a countable sub-Majorizing::P:MM.extend
set T1 of T and γ(T1, µ) < ∞. It can also be regarded as a probabilitymeasure on T . Show that γ(t, µ, δ) ≤ sups∈T1 γ(s, µ, δ) for each t ∈ Tand 0 < δ ≤ 1. Hint: If t ∈ T and s ∈ T1 with d(s, t) < εD thenB[t, (r + ε)D] ⊃ B[s, rD].
[3] Show that F (T ) ≥ D/√
2π. Hint: Pmax(Xs, Xt) = PXs + P(Xt −Xs)+.Majorizing::P:F.diam
[4] Under the conditions of Theorem<27>, prove that δΨ−12 (pack(δ, T, d))→ 0Majorizing::P:Sud.small
as δ → 0.
[5] (Repeated from Chapter 4.) The classical bounds (Feller, 1968, Section VII.1Majorizing::P:normal.max
and Problem 7.1) show the normal tail probability Φ(x) = PN(0, 1) > xbehaves asymptotically like φ(x)/x. More precisely, 1−x−2 ≤ xΦ(x)/φ(x) ≤1 for all x > 0. Less precisely, − log
(c0Φ(x)
)= 1
2x2 + log x + O(x−2) as
x→∞, where c0 =√
2π .
(i) (Compare with Leadbetter et al. 1983, Theorem 1.5.3.) Define an =√
2 log nand Ln = log an. For each constant τ define mτ,n = an−(1+τ)Ln/an. Showthat
c0Φ (mτ,n) = n−1 exp (τLn + o(1))
(ii) Define Mn = maxi≤n Zi for iid N(0, 1) random variables Z1, Z2, . . . . Showthat
PMn ≤ mτ,n =(1− Φ(mτ,n)
)n= exp
(−aτn(c−1
0 + o(1)))
(iii) If n increases rapidly enough, show that there exists an interval In oflength o(Ln/an) containing the point an−Ln/an for which PMn /∈ In → 0very fast.
Draft: 1jan2016 c©David Pollard
Notes 26
7.9 NotesMajorizing::S:Notes
What more can I say? Everything in this Chapter resulted from my attemptsto understand some of the work by Talagrand on majorizing measures. WhatI have covered amounts to parts of two (out of sixteen) chapters in thedefinitive book by Talagrand (2014).
References
Feller1 Feller, W. (1968). An Introduction to Probability Theory and Its Applications(third ed.), Volume 1. New York: Wiley.
Fernique75StFlour Fernique, X. (1975). Regularite des trajectoires des fonctions aleatoiresgaussiennes. Springer Lecture Notes in Mathematics 480, 1–97. Ecoled’Ete de Probabilites de Saint-Flour IV—1974.
LeadbetterLindgrenRootzen83 Leadbetter, M. R., G. Lindgren, and H. Rootzen (1983). Extremes andRelated Properties of Random Sequences and Processes. Springer-Verlag.
LedouxTalagrand91book Ledoux, M. and M. Talagrand (1991). Probability in Banach Spaces:Isoperimetry and Processes. New York: Springer.
Talagrand1987ActaMath Talagrand, M. (1987). Regularity of Gaussian processes. Acta Mathemat-ica 159, 99–149.
Talagrand1992GAFA Talagrand, M. (1992). A simple proof of the majorizing measure theorem.Geometric and Functional Analysis 2 (1), 118–125.
Talagrand96MM Talagrand, M. (1996). Majorizing measures: The generic chaining. Annalsof Probability 24, 1049–1103.
Talagrand2001AnnProb-MMwM Talagrand, M. (2001). Majorizing measures without measures. The Annalsof Probability 29 (1), 411–417.
Talagrand2005MMbook Talagrand, M. (2005). The Generic Chaining: Upper and lower bounds ofstochastic processes. Springer-Verlag.
Talagrand2014MMbook Talagrand, M. (2014). Upper and Lower Bounds for Stochastic Processes:Modern methods and classical problems. Springer-Verlag.
Draft: 1jan2016 c©David Pollard