Dynamics and Number Theory: an introduction through the ...2 CHAPTER 1. INTRODUCTION|THE WHYS AND...

Dynamics and Number Theory: an introduction through the

lens of fibred systems

Joseph Vandehey

Fall, 2015

Contents

1 Introduction—the whys and wherefores 1

2 A first look at fibred systems 3

2.1 Base-b expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 Decimal expansions as a dynamical system . . . . . . . . . . . . . . 6

2.3 Fibred systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Admissible sequences and full cylinders . . . . . . . . . . . . . . . . 10

2.3.2 Proving measure-preserving-ness and cylinders . . . . . . . . . . . . 11

2.3.3 Relating x to Tnx—or, When is an expansion an expansion? . . . . 14

2.4 More examples of fibred systems . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.1 Base-b expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.2 β-expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.3 The Luroth series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.4 Generalized Luroth series . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.5 Bernoulli shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4.6 The Baker’s transformation . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.7 Engel’s series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.8 Continued fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.9 Even more continued fractions . . . . . . . . . . . . . . . . . . . . . 28

3 Ergodicity 31

3.1 What does ergodicity mean? . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 The ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Normality and equidistribution . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Proving ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4.1 Ergodicity of continued fraction expansions . . . . . . . . . . . . . . 42

3.4.2 Ergodicity for β-expansions . . . . . . . . . . . . . . . . . . . . . . . 44

3.4.3 A wrap-up on ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . 46

iii

iv CONTENTS

4 Normal numbers 49

4.0.4 A quick refresher on asymptotic notation . . . . . . . . . . . . . . . 50

4.1 The combinatorial method: Copleand-Erdos . . . . . . . . . . . . . . . . . . 50

4.1.1 The combinatorial method for other systems . . . . . . . . . . . . . 56

4.2 The analytic method: Davenport-Erdos . . . . . . . . . . . . . . . . . . . . 56

4.3 The rational method: Bailey-Crandall . . . . . . . . . . . . . . . . . . . . . 61

5 The set of normal numbers 63

5.1 The Pyatetskii-Shapiro normality criterion . . . . . . . . . . . . . . . . . . . 63

5.1.1 Applications of the normality criterion . . . . . . . . . . . . . . . . . 66

5.2 Augmented systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.1 Applications of augmented systems . . . . . . . . . . . . . . . . . . . 74

6 Hyperbolic geometry 81

6.1 Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.1.1 Billiard flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.1.2 Special flow under a ceiling function . . . . . . . . . . . . . . . . . . 82

6.1.3 Cross-sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.2 Natural extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.2.1 The natural extension of the continued fraction map . . . . . . . . . 85

6.3 Hyperbolic geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.3.1 Lines, lengths and geodesics . . . . . . . . . . . . . . . . . . . . . . . 86

6.3.2 The unit tangent bundle and geodesic flow . . . . . . . . . . . . . . 88

6.4 Areas and integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.5 Fuchsian groups, lattices, and fundamental regions . . . . . . . . . . . . . . 93

6.6 Ergodicity of the geodesic flow . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.7 Continued fractions and geodesics . . . . . . . . . . . . . . . . . . . . . . . . 98

6.7.1 Building up the cross-section . . . . . . . . . . . . . . . . . . . . . . 98

6.7.2 Applications of this connection . . . . . . . . . . . . . . . . . . . . . 101

6.7.3 Other lattices, other expansions . . . . . . . . . . . . . . . . . . . . . 104

7 Explicit ergodic estimates, operators, and mixing 107

7.1 Estimates on rates of convergence . . . . . . . . . . . . . . . . . . . . . . . . 107

7.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.2 The Perron-Frobenius operator . . . . . . . . . . . . . . . . . . . . . . . . . 112

7.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.2.2 The Koopman operator . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.2.3 The spectral decomposition theorem . . . . . . . . . . . . . . . . . . 116

7.3 Mixing estimates for continued fractions . . . . . . . . . . . . . . . . . . . . 119

7.4 The quest for ACIMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

CONTENTS v

7.4.1 Using Perron-Frobenius to find ACIMs . . . . . . . . . . . . . . . . . 1277.4.2 The Lasota-Yorke method . . . . . . . . . . . . . . . . . . . . . . . . 128

vi CONTENTS

Chapter 1

Introduction—the whys andwherefores

Growing up as a mathematician in a decidedly non-mathematical family, I have often hadto face the dreaded question: “So, what exactly is it you do?” At one point, my motherdecided to ask me this question while I was in my first semester of graduate school, half-waythrough a course about modular forms. About five minutes into my fifteen-minute answershe was begging me to stop.

It was not a good answer.

Since then, I’ve been practicing to make my answer more accessible. Rather than tryto describe the high-level, deep results we try to achieve, I start with the simple questionsthat we cannot, for whatever reason, answer.

So consider π. Most people with some college-level math under their belt will say thatπ is about 3.14. Math lovers probably know more, and might say π is about 3.1415926.They might even have competitions memorizing as many digits as they can, or writingpiems—poems which constrain the number of letters in each word by the successive digitsof pi.

Here comes the simple question: “How many 7’s are there in the decimal expansion ofπ?” We might expect that roughly one-tenth of the digits of π should be 7, but in fact, wedon’t even know if there are infinitely many 7’s in the expansion of π. It’s possible thatonce we get far enough in the expansion, π is 7-less. Worse still, this question is likely quitehard: unlike in sieve methods, where we can sometimes clearly identify the impedimentto solving the problem, here we are not even close enough to a solution to even clearlyunderstand why we cannot solve it.

These notes are largely concerned with the techniques we have developed to try andanswer this question, unsuccessful though they may be in actually giving an answer toit. More broadly, these notes will give an overview of a particular intersection between

1

2 CHAPTER 1. INTRODUCTION—THE WHYS AND WHEREFORES

number theory, dynamical systems, and ergodic theory, as seen through the lens of fibredsystems—expansions, such as decimal expansions and continued fractions. If pressed, wemay call this the ergodic theory of numbers.

In an effort to make these notes more useful, although we will frequently talk aboutquestions regarding fibred systems, we will touch on a number of techniques and how theymay be useful in other problems as well.

Finally, we remark on books that have influenced these notes. In particular, thereis “Ergodic Theory of Numbers” by Dajani and Kraaikamp, “Ergodic Theory with a viewtowards Number Theory” by Einsiedler and Ward (available as a free e-book through UGA),“Uniform Distribution of Sequences” by Kuipers and Niederreiter, “Distribution ModuloOne and Diophantine Distribution” by Bugeaud, “Ergodic Theory of Fibred Systems andMetric Number Theory” by Schweiger, and “Metric Number Theory” by Harman.

Chapter 2

A first look at fibred systems

2.1 Base-b expansions

We start in a familiar place: good, old decimal expansions. We learn in elementary schoolthat when we write “394” we mean “three hundreds, nine tens, and four ones” or

3× 100 + 9× 10 + 4× 1.

Maybe we think of place-values in terms of powers and instead write this as

3× 102 + 9× 101 + 4× 100.

We also learn, at some point, that when we write out π = 3.14 . . ., we mean that π isapproximately

3× 100 + 1× 10−1 + 4× 10−2.

And by “approximately” here, we mean that π is between 3.14 and 3.15.But all of this rather dances around the main idea of the decimal expansion. So let us

call

x =

∞∑n=−k

an10n

, an ∈ 0, 1, . . . , 9 (2.1)

for some k ∈ N≥0, a decimal expansion for a non-negative real number x. The an’s we referto as the digits, and the set D = 0, 1, . . . , 9 as the digit set.

This leads to a number of questions:

(Q1) Does such an expansion exist for all non-negative numbers x? If not, where does itfail?

(Q2) Is this expansion unique for all non-negative numbers x? If not, where does it fail?

3

4 CHAPTER 2. A FIRST LOOK AT FIBRED SYSTEMS

(Q3) What properties of the number x (such as “being rational”) relate to properties ofthe decimal expansion for x (such as “being eventually periodic”)?

(Q4) Assuming existence and uniqueness, how does one find the digits of the expansion?

The decimal expansion is so common that I suspect several of the answers to thesequestions has already started leaping into the forefront of your mind. For example, theanswer to (Q1) is “Yes, everywhere.”

But to emphasize how non-trivial these questions can be, what happens if I replacethe 1 in the digit set with 11? Can we still find an expansion of the form (2.1) for allnon-negative numbers? No, because 13/100 now has no expansion. (We would have tohave that a1 = 0 and from there on, even if ak = 11 for all k ≥ 2, the expansion would besmaller than 13/100.)

What if we take the digit set to be D = −1, 0, 1, 2, . . . , 8? Here, perhaps a bitsurprisingly, all real x’s have an expansion—yes, even the negative ones. Don’t worry ifit’s not obvious why. We’ll come back to this one after we’ve built up a bit more notation.By the way, a close cousin of this expansion, the balanced ternary expansion, was brieflyconsidered to be used in computers by the Soviets, because, among other things, it requiresfewer carries when adding than regular ternary.

More questions of this type have been studied by Bruce Reznick and several of hisstudents.

But lets get back to the answers to our questions for good, old decimal.

(A1) Yes, everywhere.

(A2) No. Any number (besides zero itself) that has an expansion that ends on repeated0’s has an expansion that ends of repeated 9’s as well, in much the way that 1 = 0.9.These are the only exceptions, however.

(A3) There are many, many answers to this. One, as hinted in the question, is well known:all rational numbers have eventually periodic decimal expansions. A number x canbe expressed as a rational number a/10b, a, b ∈ N, if and only if it has an expansionthat ends on repeated 0’s. Another one that is not as commonly known, althoughstill quick: if there exist infinitely many N such that aN+i = 0 for 0 ≤ i ≤ N2, thenx is a Liouville number—that is, there exist infinitely many rationals p/q in lowestterms with 0 < |x− p/q| ≤ 1/qn for any n ≥ 1.

(A4) Let’s demonstrate how to do this with a specific x, namely√

2. Let’s assume we knowthat

√2 is between 0 and 10 exclusive, so that we know there is no digit in the tens

place or greater. How do we know that the ones digit of√

2, that is a0, is 1? Becausewe know that x2 = 2, so we test each possible value of a0 from 0 to 9 and see that12 < 2 but 22 > 2.

2.2. DYNAMICAL SYSTEMS 5

Okay, now what about the first digit, a1? Again we can test all the possible values1.0, 1.1, 1.2, . . . , 1.9 and we see that 1.42 < 2 but 1.52 > 2, so a1 = 4.

Next a2. We test 1.40, . . . , 1.49 and see that 1.412 < 2 but 1.422 > 2, so a2 = 1.

And so on.

Answer (A4) here is giving a little hint of things to come. We are following an algorithm,and at each step, the algorithm looks almost the same as what we did before. In fact, wecan alter it so we truly do the same thing at every stage.

Definition 2.1.1. Let bxc, the floor of x or the integer part of x, denote the largest integern such that n ≤ x < n+ 1. let x, the fractional part of x, denote x− bxc.

First, let’s consider x =√

2− 1 so that we already have a number that’s in the interval[0, 1), and let’s call this number x0. To find a1, multiply x0 by 10 and let a1 the integer partof this point—the largest integer less than or equal to the point. Then let x1 = 10x0 − a1.Note that x1 = 10x0. Now to find a2, multiply x1 by 10 and let a2 be the integer part of10x1. Let x2 = 10x1 − a2 = 10x1 = 102x0 and so on.

Now we truly do have an algorithm where we do the exact same thing at every stage.We have a dynamical system. So let’s talk about them.

Exercise 2.1.2. Show that every positive real number has a decimal expansion. (Be careful,you need to show that digits we gave will give an expansion which equals the original point.)

Exercise 2.1.3. Prove that if x ∈ [0, 1) is a rational number which is of the form a/b withb relatively prime to 10 then the decimal expansion for x is purely periodic—that is, it canbe written in the form 0.a1a2a3 . . . ak. (Hint: what values can xn take?)

2.2 Dynamical systems

A dynamical system as we think of it consists of several parts:

1. A (non-empty) space X. This can really be any set, but we will most often take thisto be a subset of Rn, commonly the interval [0, 1) or unit boxes like [0, 1)n.

2. A σ-algebra F on X. A σ-algebra is a collection of subsets of X, which requires thatX ∈ F ; if A ∈ F , then Ac ∈ F ; and if A1, A2, · · · ∈ F , then

⋃∞n=1An ∈ F .

On their own (X,F) is referred to as a measure space

3. A measure µ on (X,F), where here a measure means a function µ : F → [0,∞)satisfying µ(∅) = 0 and µ(

⋃∞n=1An) =

∑∞n=1 µ(An) for any disjoint collection of An’s

from F .


If µ(X) < ∞, then we will almost always normalize µ—that is, multiply by aconstant—so that µ(X) = 1. In such a case we refer to µ as a probability mea-sure and (X,F , µ) as a probability space (or as a normalized finite measure space).Sometimes, however, we will be interested in measures with µ(X) =∞.

4. A transformation T : X → X. We say that T is measurable on the space (X,F , µ) iffor any A ∈ F we have T−1A ∈ F as well. We say that T is measure-preserving on(X,F , µ) if T is measurable, and if for any A ∈ F , we have µ(T−1A) = µ(A).

Why all this talk about T−1 instead of T? The best answer I have is that T−1 essen-tially doesn’t lose information, whereas T can. Think back to our decimal exampleearlier. If we have x1, we have a lot of information about x0, but not all of it. We’velost one digit and have no way to recover it. On the other hand, if I hand you a possi-ble suite of x0’s corresponding to some x1, you still have all the possible informationabout x1. Nothing has been lost.

A dynamical system is then a quadruple (X,F , µ, T ), where X is a non-empty set, Fis a σ-algebra on X, µ is a measure on (X,F), and T : X → X is a surjective µ-measurepreserving transformation. (Sometimes we will be interested in dynamical systems thatdon’t preserve the measure, but are still measurable. We’ll point these out specificallywhen we encounter them. Non-surjective transformations will probably never appear inthis class.)

At this point, we will basically stop talking about σ-algebras. The reason for this is thatalmost all of our spaces X wll be subsets of Rn, and almost all of our F ’s will just be theLebesgue σ-algebra on X. To briefly describe what the Lebesgue σ-algebra is: the Borel σ-algebra on a subset X of Rn is the smallest σ-algebra that contains all open rectangles. TheLebesgue σ-algebra is the union of the Borel σ-algebra and all subsets of measure Lebesgue-measure-0 Borel sets—i.e., the completion of the Borel σ-algebra. For all purposes here,the properties of σ-algebras are really just designed so that measures exist with the desiredproperties.

Likewise, we will often, but not always use the Lebesgue measure (denoted by λ forsubsets of R1) as our measure. In the same way that the Lebesgue algebra is built up fromrectangles, the Lebesgue measure is built up from the assumption that the measure of arectangle is the product of the side lengths of the rectangle.

2.2.1 Decimal expansions as a dynamical system

So, as mentioned before, the decimal expansions form a dynamical system. Here is thesystem we use. X is the interval [0, 1). F is the usual Lebesgue σ-algebra. µ is theLebesgue measure λ. And T is the map Tx = 10x (mod 1) = 10x. (You’ll note wehaven’t shown that T preserves the Lebesgue measure, which is technically a part of ourdefinition. We will put this off for a minute until some more results make it a bit easier.)

2.3. FIBRED SYSTEMS 7

A very important object of study for a dynamical system is the sequence Tnx∞n=0—theorbit of the point x. Recall that in our algorithm to determine the digits of the decimalexpansion, in order to go from xn to xn+1, we multiplied by 10 and took the fractional part.In other words, we applied T . So the terms xn in our algorithm were just the elements Tnxof the orbit.

We also defined the digit an to be the integer part of 10xn−1. But we can cut out themiddle man, forget about multiplying by 10, and say that the digit an equals the value ofi ∈ Z for which xn−1 is in the interval [i/10, (i+ 1)/10).

So remember back in the introduction, we asked the question “How many 7’s are therein the decimal expansion of π?” Well, now we have a way of framing that question withour new terminology: we want to know how often the orbit of π − 3 lies in the interval[7/10, 8/10).

As a final note, this system is also useful because it already guarantees uniquenessof the expansion: we only ever get infinite 0’s at the end of an expansion, never infinite9’s. We could swap that around by using the ceiling instead of the floor and replacing[i/10, (i+ 1)/10) with (i/10, (i+ 1)/10].

2.3 Fibred systems

Now we have the language of what a dynamical system is: it is a transformation on anice space. However, not all dynamical systems are equally interesting. For example, thedynamical system given by X = R, µ = λ the Lebesgue measure, and Tx = x + 1 is notthat interesting of a system, broadly speaking. (Note we’ve already forgotten about theσ-algebra.)

Consider the dynamical system given for some α ∈ (0, 1) by X = R, µ = λ, andTx = x + α (mod 1). This is not so interesting if α is rational, because there is someiterate Tn which is just the identity. On the other hand, the case where α is irrationalturns out to be a fairly interesting topic and a subject of modern research. We may comeback to these “irrational rotation maps” later in the semester.

In this course, we will focus on a specific subset of dynamical systems which give riseto some way of expressing a number or point by an expansion, like a decimal expansion.These systems we call fibred systems (also called piecewise invertible dynamical systems).

For the purposes of this course, a fibred system consists of a dynamical system (X,µ, T )together with a digit set D that is finite or countable, and an indexed collection X =Xdd∈D of subsets of X, such that:

1. the sets Xdd∈D form a partition of X—that is,⋃d∈DXd = X and Xd∩Xd′ is empty

if d 6= d′; and,

2. the restriction of T to Xd is injective.


Our decimal expansion dynamical system is a fibred system with D = 0, 1, 2, 3, . . . , 9and Xd = [d/10, (d+ 1)/10).

Now this isn’t the only way to create a fibred system from the dynamical system. If wetake those intervals and shift them by 1/20 (modulo 1 of course), we get another partitionthat satisfies all the above restrictions. For that matter, we could take D = 0, 1, 2, . . . , 99and Xd = [d/100, (d + 1)/100), but these choices appear less natural. In general, wewill study transformations T that are piecewise continuous and we will assume, when itmakes sense to do so, that the Xd are maximal, connected sets on which T is injective andcontinuous. Here, we mean maximal in the sense that there should not be a proper supersetof Xd which has these same properties.

However, recall our earlier questions about decimal expansions, especially (Q1): doesevery point in our space have an expansion that equals the original point? How can weanswer this question for a generic T , like Tx = 4x(1− x) or Tx =

√5x (mod 1)?

Recall from our discussion of the decimal expansion that an = d if and only if xn−1 ∈[d/10, (d+ 1)/10), or, in other words, Tn−1x ∈ Xd. This leads to a very natural definition.

Definition 2.3.1. For an arbitrary fibred system (X,µ, T,D,X ), let an(x), the nth digit ofx, be defined as the value of d such that Tn−1x ∈ Xd. Moreover, let Ω = DN, the set of all(countably) infinite sequences of digits, and let σ : Ω → Ω be the left-shift transformation,σ([a1, a2, a3, . . . ]) = [a2, a3, a4, . . . ].

We define the symbolic representation map φ : X → Ω be given by φ(x) = [a1(x), a2(x),a3(x), . . . ]. We say that a fibred system is representative1 if φ is injective.

The importance of representativeness to studying expansions cannot be understated:without it, it is possible to have two different points with the same exact representation inthe system! This is the opposite problem we faced with decimal expansions, where we hadpotentially two ways of representing a given number, but could always pick one of themin a canonical fashion: without representativeness, we could give an expansion for a pointand not know which point it is a representative for.

To help us understand when a fibred system is representative, consider the branch ofthe map T given by Td = T |Xd which maps Xd → X. We have that a1(x) = d if and only ifx ∈ Xd, or, equivalently x ∈ T−1

d X. Let’s simplify notation a little and write a1 instead ofa1(x) if the choice of x is clear, and just say that we always have that x ∈ T−1

a1 X. Likewise,we have that Tx ∈ T−1

a2 X—or, equivalently, x ∈ T−1T−1a2 X. Going on in this matter, we

see that we always have that x ∈ T−kT−1ak+1

X for all k ≥ 0.

We can rewrite this in a slightly more compact form. Consider that x ∈ T−1a1 X ∩

T−1T−1a2 X. But we can just write this intersection as T−1

a1 T−1a2 X. So in general x ∈

T−1a1 T

−1a2 . . . T−1

akX for any k ≥ 0. We call the sets T−1

a1 T−1a2 . . . T−1

akX and write them as

1Schweiger says that “φ is a valid representation” instead.


Cs for s = [a1, a2, . . . , ak]. We let |s| denote the length of the string s (in this case, k) andsay Cs is a cylinder of rank k.

To reemphasize, if s = [a1, a2, . . . , ak], then x ∈ Cs if and only if ai(x) = ai for 1 ≤ i ≤k. In other words, the cylinder sets are precisely the sets whose first so-many digits arespecified. We could consider s to be the empty string, in which case we let Cs = X.

Let us give some illustrative examples using the decimal expansion once again. HereT−1 is very easy to calculate. As Tx = 10x (mod 1), we have that T−1x is the set ofpoints i/10 + x/10 : 0 ≤ i ≤ 9. Each choice of i here corresponds to a different integerthat can be subtracted when we take the mod-1. Thus T−1

5 X = [5/10, 6/10), T−1T−15 X =⋃9

i=0[i/10 + 5/100, i/10 + 6/100), and C[3,1,4] = [314/1000, 315/1000).

Theorem 2.3.2. Consider a fibred system (X,µ, T,D,X ) and suppose X is a boundedmetric space. Then the system is representative if for every x ∈ X we have that

limx∈Cs|s|→∞

diamCs = 0. (2.2)

Note that for our decimal expansion fibred system, we have that the diameter of Csequals 10−|s|, confirming that the system is representative.

Definition 2.3.3. A metric space is a space X together with a distance d : X ×X → R,which satisfies:

1. d(x, y) ≥ 0;

2. d(x, y) = 0 if and only if x = y;

3. d(x, y) = d(y, x); and,

4. d(x, z) ≤ d(x, y) + d(y, z).

Since we will often be interested in taking X to be a subset of Rn, we note that the standardEuclidean distance satisfies all the above conditions.

The diameter of a set, diamX, is given by supx,y∈X d(x, y).

Proof. Suppose the system is not representative. Then there exist two points x, y ∈ X withx 6= y and φ(x) = φ(y). Fix these two points.

Since φ(x) = φ(y), then if x ∈ Cs we must also have y ∈ Cs, since x and y sharetheir first |s| digits (and many more besides). But d(x, y) > 0, say d(x, y) = d > 0. ThusdiamCs ≥ d for any s such that x ∈ Cs and so (2.2) does not hold.

As an interesting outlier example, consider the case of rotations again, where X = [0, 1)and Tx = x+α for some α ∈ (0, 1). Then this is a fibred system with a single digit 0 and asingle X0 = X. Therefore every cylinder set Cs = X and so this is not representative (whichmakes sense because there is only one possible expansion but infinitely many points).


2.3.1 Admissible sequences and full cylinders

The problem with using the decimal expansion as our go-to example of a fibred system isthat it is, in many ways, too nice. There are many things which are true for the decimal ex-pansion which are absolutely untrue for any other system, and there are facts which are easyto prove for the decimal expansion (see, mixing of all orders) which give mathematiciansnightmares to consider for any other system.

Definition 2.3.4. A cylinder Cs for a fibred system (X,µ, T,D,X ) is said to be full ifT |s|Cs = X.

This seems like a silly definition for the decimal expansion since all cylinders are full.(Recall that Cs is an interval of length 10−|s| and T |s| acts by multiplying by 10|s| modulo1.) However, this is not a property that we should expect to hold in all cases. It will forsome interesting systems and it won’t for other interesting systems.

Proposition 2.3.5. Suppose in a given fibred system (X,µ, T,D,X ), all rank-1 cylindersare full. Then all cylinders are full.

Proof. The maps Td : Xd → X are, by assumption, injective. However, as all the rank-1cylinders are full, they must also be surjective and hence bijective. Since Td is bijective, somust T−1

d be bijective.Consider an arbitrary string s = [a1, a2, . . . , ak]. Then T−1

a1 T−1a2 . . . T−1

ak, by definition,

maps X onto Cs. Moreover, this is a composition of bijective functions so is bijective. Thus,it’s inverse, TakTak−1

. . . Ta1 maps Cs onto X and also this is equivalent to the restrictionof T k on Cs, showing that Cs is full.

Exercise 2.3.6. Give an example of a fibred system such that a non-full cylinder containsa full cylinder (of a different rank) as a subset.

A related problem that can come up is that a cylinder might be empty.

Definition 2.3.7. A string s is said to be admissible if Cs is non-empty.

Again, a silly definition for the decimal expansions; all cylinders are full and hence non-empty, thus all strings are admissible. But again, not a property we should always expectto hold. Roughly speaking, a string is admissible if it shows up in the expansion of somepoint x.

We noted before that Cs is precisely the set of x ∈ X such that the first |s| digits of xare the string s. What happens when we apply T or T−1?

T−1 acts in the expected way. If s = [a1, a2, . . . , ak], then T−1Cs is precisely the set ofpoints x ∈ X such that the digits of x starting from a2(x) form the string s. We can seethis because T−1Cs =

⋃d∈D T

−1d Cs =

⋃d∈D C[d,a1,a2,...,ak].


On the other hand T does not act as one might expect. TCs may not be equal toC[a2,a3,...,ak]; it need only be a subset. (This is obvious in the case where |s| = 1 and Csis not full.) Specifically speaking, it is the set of points whose expansions start with thestring [a2, a3, . . . , ak], and which could have a1 appended to it and still be admissible.

Likewise, we can consider the effect that T and T−1 has on the expansion of a pointx. Recall the map φ which took x to its sequence of digits [a1(x), a2(x), . . . ]. We also havethe left-shift map φ on Ω which takes [a1, a2, a3, . . . ] to [a2, a3, . . . ]. Ironically, the way Tinteracts with φ is the opposite of how it interacts with Cs: that is, T here is nicer thanT−1. In particular, since TCs ⊂ Cσ(s) (abusing notation here to let σ operate on finitelength strings), we see that φ(Tx) = σ(φ(x)), that is, T acts on x by shifting forwardall of its digits. On the other hand, it is not necessarily true that φ(T−1x) = σ−1(φ(x)),because we cannot necessarily append any digit we want to a sequence and still have it beadmissible. (If we restrict φ and σ to act on φ(X) instead of all of Ω, then it is always truethat φ(T−1x) = σ−1(φ(x)).)

2.3.2 Proving measure-preserving-ness and cylinders

Okay, so I’ve thrown a lot of definitions at you, but so far we haven’t really seen an easyway to prove one of the big properties we want: measure-preserving.

Consider the decimal expansion dynamical system and T−1(a, b):

λ(T−1(a, b)) = λ

(9⋃i=0

(i

10+

a

10,i

10+

b

10

))

=

9∑i=0

λ

(i

10+

a

10,i

10+

b

10

)

=

9∑i=0

b− a10

= b− a = λ((a, b)).

So T preserves the measure of intervals, but there are many more sets in the Lebesgueσ-algebra than intervals. And remember, in this class, we don’t like to deal with σ-algebrasif we can avoid them.

So instead we start with semi-algebras.

Definition 2.3.8. We call a collection of sets A a semi-algebra if it is closed under finiteintersections and the complement of any set in A is a finite disjoint union of elements inA. We again say that A generates F if F is the smallest σ-algebra containing A.

As an example, the collection of all open intervals intersected with [0, 1) is a semi-algebra, and that semi-algebra generates the Lebesgue σ-algebra on [0, 1). Here’s a usefulfact about semi-algebras.


Theorem 2.3.9 (Theorem 1.2.7 in Dajani and Kraaikamp). Let (X,F , µ) be a probabilityspace, and let A be a semi-algebra that generates F . For every B ∈ F and for every ε > 0there exists a C that is a finite disjoint of elements from A such that µ(B4C) < ε.

Lemma 2.3.10. Let (X,F , µ) be a probability space and assume T is a measurable trans-formation on X. If µ(T−1A) = µ(A) for any A in a semi-algebra A generating F , then Tpreserves µ.

(I looked up several proofs of these two facts, but could not find one that didn’t requireseveral definitions and subsidiary lemmas that we would never use again.)

This is great. It allows us to forget about all the complexity involved with σ-algebra andfocus on proving measure-preserving only for the elements of a generating semi-algebra.

Remark 2.3.11. One can also show that completing a measure by tossing in all subsets ofnull-measure sets does not alter the properties of being measurable or measure-preserving.

As an exercise, convince yourself that the cylinder sets for the decimal expansion trulygenerate the Lebesgue semi-algebra. (Hint: just prove they generate all intervals.)

But in general, can we just take the cylinder sets to be a semi-algebra and prove that itsuffices to check measure-preserving on cylinders? Maybe. The nice part about cylinders isthat they satisfy part of the semi-algebra condition trivially. Two cylinder sets are eitherdisjoint or one is a subset of the other. On the other hand, the complement-rule for semi-algebras will be satisfied for cylinder sets if and only if there are only finitely many digits.

Proposition 2.3.12. Let (X,µ, T,D,X ) be a fibred system. Assume that D ⊂ N. Let Adenote the following collection:⋃

d≥jC[a1,a2,...,ak,d] : j ∈ N

.

Then A is a semi-algebra.

Proof. Note that since j can equal 1 and⋃d≥1C[a1,a2,...,ak,d] = C[a1,a2,...,ak], A includes all

cylinder sets.First let us consider complements. The complement of a set A =

⋃d≥1C[a1,a2,...,ak,d] is

the (non-disjoint) union of all cylinder sets whose strings disagree with [a1, a2, . . . , ak, d] atsome digit. in particular Ac can be expressed as the union over the following finite collectionsets, all of which are disjoint:

C[a1,a2,...,a`,b] : 0 ≤ ` ≤ k − 1, b < a`+1

∪C[a1,a2,...,ak,d] : d < j

∪

⋃b≥a`+1

C[a1,a2,...,aell,b] : 0 ≤ ` ≤ k − 1

.


For intersections, consider two sets

A =⋃d≥j

C[a1,a2,...,ak,d] and B =⋃d≥j′

C[a′1,a′2,...,a

′k′ ,d].

Without loss of generality, we may assume that k ≤ k′. If ai 6= a′i for any 1 ≤ i ≤ k, then itis clear the intersection A ∩B is empty. If ai = a′i for all 1 ≤ i ≤ k and if k = k′, then it isclear that A∩B will either equal A or B depending on whether j or j′ is larger. If ai = a′ifor all 1 ≤ i ≤ k and if k < k′, then it is clear that A ∩ B = B if and only if a′k+1 ≥ j andotherwise A ∩B = ∅.

It’s an important distinction to make (even though most of the time we’ll just checkintervals/rectangles instead of cylinder sets). Also, note that this all only works if we havea probability space, and hence a finite measure.

What if we have an infinite measure? Do we have any recourse? Yes, and it turns out tostill be (thankfully!) easy to deal with in many cases. We simply go back to the definitionof a measure... but now in terms of integrals:

µ(A) =

∫XχA dµ =

∫χA dµ,

where here χA is the characteristic function for A. So it suffices to show that

µ(T−1A) =

∫χT−1A dµ =

∫χA T dµ =?

∫χA dµ = µ(A).

The only step in question is whether the integral of χA T equals the integral of χA.Let’s see this played out for the decimal expansion again. Assume A is any Lebesgue-

measurable set, not necessarily an interval.∫[0,1)

χA T (x) dx =

9∑i=0

∫[i/10,(i+1)/10)

χA(10x− i) dx

=9∑i=0

∫[0,1)

χA(x)dx

10

=

∫[0,1)

χA(x) dx.

The key step in proving this was applying a change of variables. This is normally no bigdeal when everything is finite and, in many cases, still quite manageable even when themeasure is infinite.

Proofs of measure-preserving-ness necessarily are much easier when all the cylinders arefull (did you catch where we used that fact in the second proof for decimal expansions?).


Provided at least you can state what all the rank-1 cylinders are, however, the calculationsmay not be too bad. How do we do changes of variable for non-Lebesgue measures? Often,by turning them into Lebesgue measures via densities.

Definition 2.3.13. We say that ν µ (read as ν is continuous with respect to µ) ifµ(A) = 0 implies that ν(A) = 0.

If these measures are σ-finite, then the Radon-Nikodym theorem says that we can write

ν(A) =

∫Af dµ.

This function f is sometimes referred to as the density of ν if µ is the Lebesgue measure.Functionally speaking, the density allows one to perform operations over integrals dν byinstead using fdµ.

If ν µ and µ ν, then we say the two measures are equivalent.

All this, of course, starts with the perspective of having a nice measure to begin with.What if we have a measure space and a transformation T : can we then prove that thereexists a nice measure which is preserved by T? In general, yes, but we will likely come backto that later in the semester.

2.3.3 Relating x to T nx—or, When is an expansion an expansion?

Now we’re faced with a small conundrum. What do we mean when we say we have adecimal expansion for a number x ∈ [0, 1)?

From the perspective we started with, we would say that x has a decimal expansiongiven by

x =∞∑n=1

an10n

, an ∈ 0, 1, 2, . . . , 9.

And, more precisely, we actually mean that there exists an infinite sequence an∞n=1 withan ∈ 0, 1, . . . , 9, such that

x = limN→∞

N∑n=1

an10n

.

In particular, the point to emphasize here is that the way we think of these expansions isthat they are the limit of the finite truncations of the expansion.

In contrast, from the dynamical point of view, we have a very different notion of havingan expansion. Here, if the system is representative, we might say that x has an expansion[a1, a2, . . . ] if x belongs to all the cylinder sets C[a1,a2,...,ak], k ≥ 1. In fact, it should be the


unique point with this property, and the cylinder sets are nested, so that C[a1] ⊃ C[a1,a2] ⊃C[a1,a2,a3] ⊃ . . . . So, thinking of this in a sense as convergence, we can rewrite this as

x =∞⋂n=1

C[a1,a2,...,an] = limN→∞

N⋂n=1

C[a1,a2,...,an].

This is not ideal. It doesn’t quite jive with the way we think of an expansion converging.So let’s try a different perspective, one that looks as x as a point rather than as a singletonset:

x = T−1a1 (Tx) = T−1

a1 T−1a2 (T 2x) = . . .

or, equivalently,

x =a1

10+Tx

10=a1

10+

a2

102+T 2x

102= · · · =

N∑n=1

an10n

+TNx

10N.

This looks very similar to our notion of convergence from before, except instead of Tnx, wehad 0 (and a limit). So let’s make that formal:

Definition 2.3.14. Suppose we have a fibred system (X,µ, T,D,X ) and with X being asubset of Rn or Cn containing the origin. We say the fibred system is analytically repre-sentative if for all x ∈ X, we have that

x = limN→∞

T−1a1 T

−1a2 . . . T−1

aN0

where φ(x) = [a1, a2, a3, . . . ]. If these points would not exist, then we allow the functionsT−1d to be extended analytically.

Note that being analytically representative implies being representative, because a se-quence cannot converge to two distinct points simultaneously.

I will add that this is my own definition. I have not seen it used elsewhere. I am includingit here really to emphasize the multiple ways we can look at the meaning of “expansion” andhow to relate them to one another. In particular, if a system is analytically representative,then we get, for free, that our common notion of (Q1) is “Yes, everywhere.” In fact, we evenget the answer to (Q2) as “Yes, everywhere” for free as well, since the dynamical system iscompletely deterministic and only allows one expansion.

Proposition 2.3.15. Suppose (X,µ, T,D,X ) is a fibred system and X is a metric space.Suppose that 0 ∈ T |s|Cs for all admissible strings s and that for all x ∈ X we have that

limx∈Cs|s|→∞

diamCs = 0.

Then the system is analytically representative.


Proof. This is extremely straightforward. If 0 ∈ T |s|Cs for all s, then T−1a1 T

−1a2 . . . T−1

an 0 ∈C[a1,a2,...,an] for all admissible strings s = [a1, a2, . . . , an]. In particular, if [a1, a2, . . . ] arethe digits of the expansion of x, we have that both x and T−1

a1 T−1a2 . . . T−1

an 0 are in the samerank n cylinder set, and by our assumption about the diameter, this must be shrinking to0 as n goes to infinity.

The conditions here are a little technical, so we offer the following simple corollary.

Corollary 2.3.16. Suppose (X,µ, T,D,X ) is a fibred system and X is a metric space.Suppose that all cylinders are full and that lim|s|→∞ diamCs = 0. Then the system isanalytically representative.

2.4 More examples of fibred systems

2.4.1 Base-b expansions

The simplest example of a fibred system after decimal expansion are the other base-bexpansions (also sometimes called n-ary expansions).

Here, for an integer base b ≥ 2, we have X = [0, 1), µ = λ, Tx = bx (mod 1), D =0, 1, 2, . . . , b−1, and Xd = [d/b, (d+1)/b), for d ∈ D. Basically everything that we’ve saidabout base-10 (decimal) expansions holds true in the base-b case. The system is analyticallyrepresentative, measure-preserving, all cylinders are full, cylinders shrink with the lengthof the string, and so on.

We just remark that T−1d x = d/b+ x/b, so

x =N∑n=1

an(x)

bn+TNx

bN=∞∑n=1

an(x)

bn.

We also remark, for interest, that the facts about rational numbers and periodic expan-sions hold here too. A point x has a finite (that is, ending on infinite 0’s) expansion if andonly if it can be written as x = a/bk for some integers a, k. A point x has a purely periodicexpansion if and only if it can be written as x = p/q with gcd(q, b) = 1. And a point x hasan eventually periodic (possibly finite) expansion if and only if it is rational.

2.4.2 β-expansions

And from beautiful and simple, we move straight into the dragon’s lair.

β-expansions. Oh, β-expansions.

The set-up is (mostly) simple. Let β be a real, non-integer number greater than 1.Let X = [0, 1) and Tx = βx (mod 1). From here, it gets a bit more complicated. We let

2.4. MORE EXAMPLES OF FIBRED SYSTEMS 17

D = 0, 1, . . . , bβc, and

Xd =

[dβ ,

d+1β

), d < bβc,[

bβcβ , 1

), d = bβc.

There is also a measure, called the Parry measure, which is invariant under this map T .We’ll talk about it more in a moment.

This all looks so familiar to base-b expansions that we might get lulled into a false senseof security. But as it turns out, these expansions are wildly, wildly different from base-bexpansions.

The first problem we notice is that not all cylinders are full. In particular Xbβc is notfull. Thus, there are non-admissible sequences of digits. But is it representative? Yes, andfor the same reason as base-b expansions. The maps T−1

d act by T−1d x = d/β + x/β and

thus have derivative 1/β, so they always shrink intervals by a factor of at least 1/β. ThusdiamCs ≤ β−|s|.

Exercise 2.4.1. Prove that 0 is contained in T |s|Cs for all admissible strings s and thusβ-expansions are analytically representative.

Exercise 2.4.2. Prove that T does not preserve Lebesgue measure for any β > 1 that isnot an integer.

Let us consider what these expansions look like:

x =a1(x)

β+Tx

β

=a1(x)

β+a2(x)

β2+T 2x

β2= . . .

=

N∑n=1

an(x)

βn+TNx

βN

=∞∑n=1

an(x)

βn.

So what sequences are admissible and what is the Parry measure? Remember that wehad alternate ways of writing 1 in the decimal expansion: either as 1.0 or as 0.9. While wecould get an alternate expansion by swapping all intervals from [∗, ∗) to (∗, ∗], this wouldnot always give us the expansion we want. Instead, we get our alternate expansion just byextending T to act as it naturally should on all of [0, 1]. This removes the surjectivity, buteverything else follows through. The resulting sequence of digits in Ω we denote by d(1, β),and the answer to both of these questions is related to this alternate representation. We


let

d∗(β) =

[a1, a2, . . . , an−1, an − 1], d(1, β) = [a1, a2, . . . , an, 0],

d(1, β), otherwise.

so that d∗(β) is the sequence of digits in a non-terminating β-expansion of 1

Definition 2.4.3. Given two strings s and s′, potentially infinite, with digits ai and a′i, wesay that s lexicographically precedes s′, written s <lex s

′, if there exists some n such thatai = a′i for i < n and an < a′n. By assumption the empty string lexicographically precedesall non-empty strings.

Proposition 2.4.4. A string s of digits from D is admissible for a β-expansion if and onlyif σns <lex d

∗(β) for all n ≥ 0.

Proof left as an exercise. Also, though we have not mentioned it, we may define aninfinite string s to be admissible if ∩s′Cs′ is non-empty, where s′ are the finite truncationsof s. This is equivalent to saying that there exists some x such that φ(x) = s.

For the Parry measure, this is given by

νβ(A) =

∫Ahβ(x) dx

where

hβ(x) =1

F (β)

∑φ(x)<lexσnd(1,β)

1

βn

and F (β) is the constant specifically chosen so that νβ([0, 1)) = 1. (Note: in the definitionof hβ the sum ranges over all n ≥ 0, so, since we always have that φ(x) <lex d(1, β), thesum is always at least 1.)

In fact, one can prove that 1− β−1 ≤ hβ(x) ≤ (1− β−1)−1, so the Parry measure isn’ttoo far removed from Lebesgue measure.

Exercise 2.4.5. If β = 12(1 +

√5), then β2 − β − 1 = 0. One can show that d(1, β) =

[1, 1, 0, 0, 0, . . . ], so it ends on repeating zeroes very quickly. Calculate hβ(x) for this β. (Itwill be a piecewise function with two pieces.)

What about the other properties we like, like a connection between rational numbersand periodic expansions? In fact, we don’t even have a full answer to that.

Definition 2.4.6. A real number β > 1 is a Pisot number if it is an algebraic integer allof whose conjugates z satisfy |z| < 1. A real number β > 1 is a Salem number if it is analgebraic integer all of whose conjugates z satisfy |z| ≤ 1, with equality holding for at leastone z.


Let Per(β) denote the set of x ∈ [0, 1) with eventually periodic β-expansions, and letQ(β) denote the smallest field contiaining both Q and β. It’s easy to see (by replicatingthe similar proof for base-b expansions), that Per(β) ⊂ Q(β) ∩ [0, 1).

Theorem 2.4.7 ((Bertrand, ’77, and Schmidt, ’80)). Let β > 1, not an integer.

If Q ∩ [0, 1) ⊂ Per(β), then β is either a Pisot number or a Salem number.

If β is a Pisot number, then β = Q(β) ∩ [0, 1).

Conjecture 2.4.8 (Schmidt). Let β > 1 be a Salem number. Then Per(β) = Q(β)∩ [0, 1).

This gives just a hint of how bad things can be for β-expansions. Rational numbers,as the theorem above tells us, need not have periodic expansions. But it’s even worse.In general, one does not expect that the sum of two numbers with periodic (or finite)expansions is still periodic (or finite)!

Because of their unique properties, β-expansions are a popular topic within symbolicdynamics.

2.4.3 The Luroth series

This is perhaps not nearly as well known, but it has some interesting properties.

Here, we define things in a non-standard order. We let X = [0, 1), µ = λ, D =2, 3, 4, . . . ∪ ∞, Xd = [1/d, 1/(d − 1)) for d ∈ 2, 3, . . . and X∞ = 0, and T isdefined by

Tx =

d(d− 1)x (mod 1) = d(d− 1)x− (d− 1), x ∈ Xd, d ∈ 2, 3, . . . ,0, x = 0.

.

Here we see our first introduction of the “infinite” rank-1 cylinder, X∞. While we’vetalked about finite base-b expansions, what we really meant was that we could representthem with only finitely many terms. On the other hand, this X∞ cylinder really doesindicate a stop to the expansion. If the orbit of x enters X∞, it never leaves; nothing elsecan happen in the expansion but be ∞ over and over again.

Because the properties of X∞ are so weird, we tend to ignore it when discussing theproperties of an expansion. Note that

⋃∞n=0 T

−nX∞, the set of all points that eventuallyenter X∞, has Lebesgue measure-0. In fact, in this case, it’s countable.

So if we ignore X∞ (which, if we are being technical, means removing⋃∞n=0 T

−nX∞from X), then all cylinders are full, the system is analytically representative (once we extendall the inverse branches to include 0 once again), it’s measure preseving, and so on. (Notethat T−1

d always has a derivative at most 1/2, so cylinders shrink as the length of the stringgoes to infinity again.)


What does this expansion look like? Supposing that the orbit of x avoids X∞, we have

x =1

a1+

Tx

a1(a1 − 1)

=1

a1+

1

a1(a1 − 1)a2+

T 2x

a1(a1 − 1)a2(a2 − 1)= . . .

=

N∑n=1

1

an∏n−1i=1 ai(ai − 1)

+TNx∏N

i=1 ai(ai − 1)

=

∞∑n=1

1

an∏n−1i=1 ai(ai − 1)

Exercise 2.4.9. Prove that all rational numbers have a Luroth series expansion that iseventually periodic.

2.4.4 Generalized Luroth series

There is a much more generalized form of Luroth series which we will briefly comment on.We won’t go through all the details.

For these so-called GLS expansions, we start by picking a choice of disjoint Xd =[`d, rd) ⊂ [0, 1) such that λ(

⋃Xd) = 1, and let X∞ = [0, 1)\

⋃Xd, which necessarily now has

Lebesgue measure 0. We assume there are at least 2 such sets, X1 and X2. We let D be thecollection of these d’s, together with∞. We also choose a function ε : D\∞ → −1,+1.

Then we let X = [0, 1), µ = λ, and T is given by

Tx =

1

rd−`dx−`d

rd−`d , x ∈ Xd, ε(d) = +1rd

rd−`d −1

rd−`dx, x ∈ Xd, ε(d) = −1

0, x ∈ X∞.

We won’t go into much more detail at the moment, other than to say that, yes, theseagain have all full cylinders, are analytically representative, and preserve Lebesgue measure.We can again show that the diameter of cylinders shrink to 0 as T−1

d has a maximumderivative which is bounded away from 1. All the base-b expansions, as well as first Lurothseries we saw, are all examples of Generalized Luroth series.

Exercise 2.4.10. Prove that this system preserves Lebesgue measure.

GLS expansions are a natural object of study from a dynamical perspective, moresothan base-b expansions, much in the way that continuous functions are a slightly morenatural object of study for real analysis than polynomials. As an extra note as to whythese are studied: there’s a deep connection between β-expansions and an associated GLSexpansion. Dajani and Kraaikamp go into great detail on this, using it to find a simplerexplanation for the Parry measure.


2.4.5 Bernoulli shifts

So far we’ve looked at transformations on [0, 1). Let’s change that a bit. Let’s revisitanother system which we saw briefly.

Let (Y,G, ν) be some probability space. Let (X,F , µ) =∏∞n=1(Y,G, ν), so that X =

(Y )N, the space of all one-sided sequences of elements from Y ; F is the σ-algebra generatedby cylinders of the form

x = xk∞k=1 : xi ∈ A1, . . . , Ai+n−1 ∈ An, A1, A2, . . . , An ∈ F ′, i, n ∈ N,

and µ is the product measure defined on cylinders by

µ(x = xk∞k=1 : xi ∈ A1, . . . , Ai+n−1 ∈ An) =

n∏i=1

ν(Ai).

Then we simply let T be the forward shift on X so that T (xn∞n=1) = xn+1∞n=1. Such asystem is a (one-sided) Bernoulli shift. We can make it two-sided by replacing N with Z.

A priori, this is not a fibred system, as it could map uncountably many elements to one,but if Y is countable, then it is with D = Y and Xd = xk∞k=1 : x1 = d.

In fact, here’s one that ought to look really familiar. Take Y = 0, 1, . . . , 9, G to bethe power set of Y , and ν just be defined as 1/10 on each singleton set. Then this Bernoullishift is just the space Ω for the decimal expansion, with T here acting as the left-shift mapσ. In fact, one can show that the decimal expansion dynamical system is isomorphic tothis dynamical system.

Definition 2.4.11. We say that two dynamical systems (X,F , µ, T ) and (Y,G, ν, S) areisomorphic, if there exists a map ψ : (X,F , µ, T )→ (Y,G, ν, S) and sets NX ⊂ X, NY ⊂ Y ,such that

1. µ(NX) = ν(NY ) = 0 and ψ : X \NX → Y \NY is a bijection (that is, ψ is one-to-oneand onto almost everywhere);

2. φ−1(A) ∈ F for any A ∈ G;

3. ν = µ ψ−1; and

4. ψ T = S ψ.

So ψ is an isomorphism between the spaces that nicely preserves the σ-algebra, measure,and transformation.

In general, we say that a fibred system is Bernoulli if it is isomorphic to a Bernoullishift. (The term for this in the abstract is Bernoullicity.) By construction, in a Bernoullifibred system, all cylinders are full. All the base-b expansions and all the generalized Lurothseries turn out to be Bernoulli. This will be important in the next chapter, as we will seethat Bernoullicity implies a host of useful properties.


2.4.6 The Baker’s transformation

Let X = [0, 1)2, µ = λ2, D = 0, 1, X1 = [0, 1/2)× [0, 1), X2 = [1/2, 1)× [0, 1),

T (x, y) =

(2x, 1

2y), 0 ≤ x < 12 ,

(2x− 1, 12(y + 1)), 1

2 ≤ x < 1.

This is called the baker’s transformation because if you think about how the transformationacts it looks a bit like how one kneads dough.

This system is very interesting. If we look at the first coordinate, we just see the base-2expansion. Let’s write out the base-2 expansion of x by 0.a1a2a3 . . . as we might usually do,and let’s write out the base-2 expansion of y by 0.a0a−1a−2 . . . . Then the transformationT acts by

T (0.a1a2a3 . . . , 0.a0a−1a−2 . . . ) = (0.a2a3a4 . . . , 0.a1a0a−1 . . . ).

As it turns out, the baker’s transformation is the two-sided Bernoulli shift correspondingto the base-2 expansion. (It’s also the natural extension of the base-2 expansion, but wehaven’t discussed that yet.)

Some peculiarities happen here: there are no full cylinders, and it is not representative.

2.4.7 Engel’s series

Here we take X = (0, 1], µ = λ, D = 2, 3, 4, . . . , Xd = (1/d, 1/(d− 1)], and T given by

Tx = dx− 1 x ∈ Xd.

Beware, this system is not measure-preserving. (There do exist infinitely many distinctσ-finite, but not finite measures, continuous with respect to Lebesgue, which are preservedby T . See Thaler, ’79)

This yields an expansion that looks like

x =1

a1+Tx

a1=

1

a1+

1

a1a2+T 2x

a1a2

=∞∑n=1

1

a1a2 . . . an.

Here, again, there are some peculiarities of the system. For starters, only X2 is a fullcylinder. For another, if you pay close attention to the transformation, we always haveTnx ≤ Tn−1x, and therefore we always have that a1 ≤ a2 ≤ a3 ≤ . . . . It is howeveranalytically representative.


Exercise 2.4.12. Prove that any sequence 2 ≤ a1 ≤ a2 ≤ a3 ≤ . . . of positive integers isan admissible sequence for Engel’s series.

Exercise 2.4.13. Prove that the Engel series of every rational number is eventually peri-odic with a period of 1. (That is, the sequence of digits satisfies an+1(x) = an(x) for allsufficiently large n.)

2.4.8 Continued fractions

Now we get to a really, really meaty part of the study of fibred systems: Continued fractions.The theory of continued fractions is deep, and it extends to many areas of mathematics.I’d be hard pressed to think of a subject which does not have some connection to continuedfractions somewhere. Continued fractions also make a great example of a system that ismore complex than base-b expansions, yet still workable from many angles, so it will be ourgo-to example for much of this course.

There are many, many different expansions, we’ll start with regular continued fractions(RCF) expansions.

Here X = [0, 1), µ is the Gauss measure given by

µ(A) =

∫A

1

(log 2)(1 + x)dx,

T is given by

Tx =

1x − b

1xc, x ∈ (0, 1)

0, x = 0.

Here we have D = 1, 2, 3, . . . ∪ ∞, and

Xd =

[1d+1 ,

1d

), d ∈ N,

0, d =∞.

Here, teasing out the effect of T is a touch more complex, but generates a fairly mem-orable expansion. We have T−1

d x = 1/(d + x), so provided the orbit of Tn−1x 6= 0, wehave

x =1

a1 + Tx=

1

a1 +1

a2 + T 2x

=1

a1 +1

a2 + · · ·+1

an + Tnx

.

Briefly, let us mention that we can extend the notion of continued fractions to the wholereal line. Each irrational real number has an expansion of the form

a0 +1

a1 +1

a2 + . . .

= 〈a0; a1, a2, . . . 〉, a0 ∈ Z, a1, a2, · · · ∈ N,


where if a0 = 0 we often just write 〈a1, a2, . . . 〉. Each rational number has a finite expansion:

a0 +1

a1 +1

a2 + · · ·+1

an

= 〈a0; a1, a2, . . . , an〉, a0 ∈ Z, a1, a2, . . . , an ∈ N.

In fact, each rational number has two such expansions, since if an > 1, one can alwaysreplace it with (an − 1) + 1

1 .Here we get two properties fairly quickly. It’s a basic exercise to prove that the system

is measure-preserving with the Gauss measure. One can also see that all cylinders are full.However, representativeness is much harder, because it’s not obvious that the diameter ofthe cylinder sets goes to 0: our trick of making use of the derivative of T−1 being boundedaway from 1 no longer works here. (Technically we could show that the derivative of T−2

is bounded away from 1, but we’ll show even more powerful things later on.)

Matrices, convergents, and convergence

Part of the power of continued fractions is their natural relationship to fractional lineartransformations of points, often represented as a matrix action. Here, suppose

M =

(a bc d

)is a 2× 2 matrix with entries in R (or C) and x is a real (or complex) number, or even thepoint at infinity. Then we let

Mx =ax+ b

cx+ d.

For now, we will restrict M to belong to the set

SL2(Z) =

(a bc d

): a, b, c, d ∈ Z, ad− bc = ±1

.

These matrices, it should be noted, are invertible, and the inverses are in SL2(Z). Moreover,this space is closed under matrix multiplication.

Exercise 2.4.14. Show that if M1,M2 ∈ SL2(Z) then one has that (M1M2)x = M1(M2x).(Remember it is possible for x or M2x to be infinity.)

This is most relevant to continued fractions because we have that

T−1d x =

1

d+ x=

(0 11 d

)x


We denote this latter matrix by Ad.

For a given x such that Tn−1x 6∈ X∞ (so that there are at least n digits in the continuedfraction expansion of x), we let Mn denote the matrix Aa1Aa2 . . . Aan . Thus,

x = T−1a1 T

−1a2 . . . T−1

an (Tnx) = Aa1Aa2 . . . Aan(Tnx) = Mn(Tnx).

Note that if we were extending the notion of continued fraction to the whole real line, wewould want to define Mn to be (

1 a0

0 1

)Aa1Aa2 . . . Aan .

Everything we say from here out will work just as well for extending to the whole real lineas to [0, 1).

We also define the convergents pn/qn as the finite truncations of the expansion of x, sothat

pnqn

= 〈a1, a2, . . . , an〉 = Mn0.

(We assume the fraction is in lowest terms, with qn positive.)

Now we can prove a great many facts about continued fractions in very quick succession.In every case where we refer to pn/qn

1.

Mn =

(∗ pn∗ qn

), n ≥ 0

Why? Because

(a bc d

)0 = b

d and Mn0 = pn/qn. (To get that we truly have pn = b

and qn = d, we use the fact that everything is positive, so there are no negativesigns to worry about, and the fact that the determinant condition on Mn implies thatgcd(b, d) = 1.)

This implicitly defines p0/q0 = 0 (or a0 in general).

2.

Mn =

(pn−1 pnqn−1 qn

), n ≥ 0

Why? Write out MnAan+1 = Mn+1 and analyze where the left-most column on theright-hand side comes from.

This implicitly defines p−1/q−1 =∞(= 1/0).


3. If we define p−1 = 1, p0 = a0, q−1 = 0 and q0 = 1, then we have the recurrencerelations

pn = anpn−1 + pn−2, qn = anqn−1 + qn−2, n ≥ 1.

Why? Write out MnAan+1 = Mn+1 again and compare the right-hand columns ofeach side.

4. We have pn−1qn − pnqn−1 = (−1)n.

Why? Write out Mn = Aa1Aa2 . . . Aan , compare determinants on each side, anduse the fact that the determinant of a product is the product of the determinants.Alternately, use the previous fact together with induction.

5. The sequence of qn is increasing, and if we define Fn by F0 = 1, F1 = 1, and Fn =Fn−1 + Fn−2, then qn ≥ Fn for all n.

Why? Because qn = anqn−1 + qn−2, and an ≥ 1 for n ≥ 1.

6. We have

x =pn−1T

nx+ pnqn−1Tnx+ qn

.

Why? Because x = Mnx and we know what all the coordinates of Mn are.

7. We have

x− pnqn

=(−1)nTnx

qn(qn−1Tnx+ qn)

Why? Because we have

x− pnqn

=pn−1T

nx+ pnqn−1Tnx+ qn

− pnqn

=(pn−1qn − qn−1pn)Tnx

qn(qn−1Tnx+ qn)

and then we use the earlier fact to simplify the numerator.

8. We have ∣∣∣∣x− pnqn

∣∣∣∣ ≤ 1

q2n

Why? Use the last fact together with Tnx ∈ [0, 1).

Note: this, together with the fact that qn tends to infinity, gives us that continuedfractions are analytically representative (for any point that doesn’t fall into X∞,although even those points can be taken care of without difficulty).


9. If s = [a1, a2, . . . , ak] and we let x = 〈a1, a2, . . . , ak〉, then Cs consists of the set of allpoints between

pkqk

andpk + pk−1

qk + qk−1

including the former but not the latter.

Why? Because if y ∈ Cs then MnTny ∈ Cs, but since all cylinders are full, Tny can

take any value in [0, 1). Thus Cs consists of all values of

pk + pk−1Tny

qk + qk−1Tny.

10. The Lebesgue measure of Cs is 1/qk(qk + qk−1).

Why? Take the difference of the two fractions from the last fact and then apply thefact that pn−1qn − pnqn−1 = (−1)n.

11. A cylinder set Cs is closed on the left and open on the right if and only if |s| is even.

Why? Do the difference from the last fact again and this time track the sign.

Special properties of continued fractions

There are a number of amazing properties satisfied by the continued fraction expansion. Wecould easily spend the entire semester studying them. We’ll only mention a few propertiesof immediate interest here.

Theorem 2.4.15. For all real numbers x there exist infinitely many solutions to∣∣∣∣x− p

q

∣∣∣∣ ≤ 1

q2.

Let ε > 0. For almost all x (that is, for all x up to a set of Lebesgue measure 0), thereare only finitely many solutions to ∣∣∣∣x− p

q

∣∣∣∣ ≤ 1

q2+ε.

Definition 2.4.16. For a real number x we say that a fraction p/q in lowest terms is abest rational approximation (of the second kind) if

|qx− p| < |q′x− p′|

for any pair (p′, q′) with 1 ≤ q′ ≤ q.


Theorem 2.4.17. Unless x is an integer plus one-half, every best rational approximationfor x is a convergent for x and vice-versa.

Definition 2.4.18. We call x a quadratic surd if it is an irrational number that is thesolution to an integer polynomial Ax2 +Bx+ C = 0.

Theorem 2.4.19. A real number x has an eventually periodic continued fraction expansionif and only if x is a quadratic surd.

2.4.9 Even more continued fractions

α-continued fractions

Let α ∈ [0, 1].We define an α-continued fraction dynamical system by X = [α − 1, α), µ = λ, and

Tx = |1/x| (mod 1). Note that this transformation is well defined since there is a uniquerepresentative in X for any point taken modulo 1. This measure will not be invariant,needless to say.

These expansions look likeε1

a1 +ε2

a2 +ε3

a3 + . . .

,

where εi ∈ 1,−1 and ai ∈ N. The digits D will consist of ∞ together with all possiblepairs (εi, ai) allowed by the system.

For α = 1, this is just the usual continued fractions we saw above. For α = 1/2, this isknown as the nearest-integer continued fraction map, where the digits consist of all (±1, n),n ≥ 2, together with infinity. A priori, the cylinders are a bit hard to describe, and theyneed not be full (although they will in the nearest-integer continued fraction case). Wedon’t even know what the invariant measure is for all α; specifically we’re missing it forα ∈ [0,

√2− 1].

However, most of the other work we did above on convergence holds still, with minortweaks.

Many, many, many other continued fraction expansions exist, it would take us weeksto catalogue all of them. Some of these continued fractions are studied precisely becausethe corresponding set of matrices form an interesting subgroup, such is the case with thecontinued fractions with even partial quotients and the theta group.

Complex continued fractions

Consider X = [−1/2, 1/2)2 seen as a subset of C. Let µ = λ2. (Again, this will not beinvariant.) Let Tx = 1/x (mod 1), again taken to mean the unique representative modulo


1 in the domain X. The digits now are Gaussian integers, and there has been some carefulwork done studying the admissible sequences

These satisfy a lot of nice properties. They satisfy a “best approximation” type theorem,although it’s off by a constant, and if you tweak the system a fair bit, you can actuallyget true best approximations. They still satisfy a similar theorem about how eventuallyperiodic expansions are quadratic surds. The dynamics also satisfy very powerful properties(such as mixing, which we’ll see later). But despite all of this, we don’t have a closed formfor the measure. (See Hensley’s book on continued fractions.)

Curiously, we think of complex continued fractions as one-dimensional continued frac-tions. (It’s one complex dimension.) Many attempts over the years have been made togeneralize these properties to higher dimensions, in part either to better study diophantineapproximation in higher dimensions or to extend the result on quadratic surds to higherdegree algebraic irrationals. These attempts have met mixed success: usually they are madeto satisfy one nice property and the others fall by the wayside. In my own research, I haveworked with Anton Lukyanenko to study continued fractions on the Heisenberg group, andwe were surprised to get weak, but nonetheless existing, variants of the desired Diophantineproperties and the periodicity of algebraic points.

Chapter 3

Ergodicity

Now that we have an interesting collection of dynamical systems, we’re going to mentionsome interesting properties of dynamical systems, which imply very powerful results.

3.1 What does ergodicity mean?

Definition 3.1.1. Let (X,µ, T ) be a dynamical system (not necessarily measure-preserving).Then T is said to be ergodic if for every measurable set A for which T−1A = A, we havethat either µ(A) or µ(Ac) is 0.

This definition is designed to hold even if the system is not finite. For finite systems,we often just say that µ(A) is 0 or 1. (Some places define ergodicity with T−1A = A up toa set of µ-measure 0.)

What does this mean?

I think of ergodicity as being an indecomposability property for dynamical systems,much like primality is for integers. To see this, suppose that T−1A = A but neither µ(A)nor µ(Ac) is 0. What T−1A = A means is that every point in A, if T is applied to it, landsin A, and these are the only points which do that. Thus, a point in Ac, if T is appliedto it, lands in Ac. In fact, by using surjectivity, we can show that T−1(Ac) = Ac. Thus,one could decompose the dynamical system (X,µ, T ) into two separate dynamical systems(A,µ, T ) and (Ac, µ, T ), which run independently of one another.

Proposition 3.1.2. Let (X,µ, T ) be a measure-preserving dynamical system on a proba-bility space. Then the following are equivalent:

1. T is ergodic.

31

32 CHAPTER 3. ERGODICITY

2. For every measurable A of positive measure, we have

µ

( ∞⋃n=1

T−nA

)= 1.

3. For every measurable A,B of positive measure, there exists a positive integer n suchthat µ(T−nA ∩B) > 0.

4. Suppose A is a semi-algebra generating F . Then for every A,B ∈ A, we have

limN→∞

1

N

N−1∑n=0

µ(T−nA ∩B) = µ(A)µ(B).

Note: the equivalence of the first three statements isn’t too difficult, but the last one isnot immediately obvious.

So from the second and third condition, we see that ergodicity is essentially a transitivity-type relation: almost everything goes almost everywhere. From the fourth condition, wesee that ergodicity is a very loose form of independence: given two sets, A,B, the proba-bilities that a point is in B now and that it will end up in A in the future are, on average,independent events

There are a variety of stronger conditions that are studied as well. We consider (X,µ)to be a probability space here.

1. We say T is weakly mixing if for any measurable A,B, we have

limN→∞

1

N

N−1∑n=0

∣∣µ(T−nA ∩B)− µ(A)µ(B)∣∣ = 0.

Weak mixing implies ergodicity. This easily follows by the third definition of ergod-icity.

2. We say T is strong mixing if for any measurable A,B, we have

limn→∞

µ(T−nA ∩B) = µ(A)µ(B).

Strong mixing clearly implies weak mixing.

3. We say T is Bernoulli if it is isomorphic to a Bernoulli shift.

Bernoullicity implies strong mixing. This is easy to see for cylinder sets (for largeenough n we would have µ(T−nA ∩ B) = µ(A)µ(B)) and then we can use that thecylinder sets form a generating semi-algebra to get it for the rest.

3.2. THE ERGODIC THEOREM 33

Because we know already that base-b expansions, and indeed any GLS expansion, isBernoulli, we get that they are ergodic for free. This doesn’t help us with β-expansions orcontinued fraction expansions or anything similar.

There are many other properties of dynamical systems which we haven’t mentioned here.There’s exactness, which means that ∩∞i=0T

−iF = X, ∅, and which implies strong mixing.There’s weak Bernoulli, which is between Bernoulli and strong mixing. There’s also mixingof higher orders: 3-mixing, for example, means µ(T−n−mA∩ T−mB ∩C) = µ(A)µ(B)µ(C)as n,m→∞.

It may seem like ergodicity is a very weak property: it only gives indecomposability ofthe system. But actually it implies an incredibly powerful theorem.

3.2 The ergodic theorem

The title is a bit of a misnomer. There isn’t one ergodic theorem. There are lots of them.

There is the ratio ergodic theorem, which is very useful for infinite measures. Thereis the maximal ergodic theorem, which is used to prove the theorem we’ll study below inEinseidler and Ward. There is the mean ergodic theorem of Von Neumann, which is astatement of convergence for operators.

What we will study is the pointwise ergodic theorem, and give the proof included inDajani and Kraaikamp, originally due to Katznelson and Weiss. Quickly, we recall thatL1(X,µ) denotes the set of integrable functions f such that

∫X |f |dµ <∞.

Theorem 3.2.1 (Birkhoff’s pointwise ergodic theorem). Let (X,µ, T ) be a dynamical sys-tem. (We need it to be measure-preserving in this case.) Then, for any f ∈ L1(X,µ), wehave that the function

f∗(x) = limn→∞

1

n

n−1∑i=0

f(T ix)

exists almost everywhere, satisfies∫X fdµ =

∫X f∗dµ, and is T -invariant—that is, f∗ T =

f∗.

Moreover, if T is ergodic, then f∗ equals the constant value∫X fdµ almost everywhere.

We’ll come back to this later, but one way to think of the pointwise ergodic theorem isthat it says that the orbits of almost all points don’t just travel almost everywhere in thespace X, they nicely distribute over the space too.

First, we’ll need an intermediate result.

Lemma 3.2.2. For a probability space (X,µ), a measure-preserving transformation T isergodic if and only if every measurable function f that is T -invariant almost everywhere—that is, f T = f a.e.—is also a constant almost everywhere.


Proof. The “if” direction is very quick by making use of indicator functions 1A, so we leaveit as an exercise for the reader.

In the “only if” direction, suppose T is ergodic and that f is T -invariant a.e. Define,for any r ∈ R,

Ar := x ∈ X : f(x) > r.

Since f is T -invariant, we also have that Ar is a T -invariant set a.e.—that is, T−1Ar = Arup to a set of measure 0. By ergodicity, we must have that µ(Ar) equals 0 or 1. If f is nota constant almost everywhere, then we can find an r ∈ R such that 0 < µ(Ar) < 1, whichis a contradiction.

In general we say that A = B up to a set of µ-measure 0 if µ(A4B) = 0. One canshow that for a dynamical system (X,µ, T ) on a probability space, a measure-preservingtransformation T is ergodic if and only if for every A such that µ(A4T−1A) = 0 we havethat µ(A) = 0 or µ(A) = 1.

Proof of the pointwise ergodic theorem. We will only provide the proof in the case where fis an indicator function 1A for some measurable set A. By standard real-analytic techniques,we can then move from indicator functions to simple functions, from simple functions tobounded functions, and from bounded functions to L1(X,µ).

Note that in this case f(T ix) measures whether T ix belongs to the set A. We define

f(x) = lim supn→∞

1

n

n−1∑i=0

f(T ix) and f(x) = lim infn→∞

1

n

n−1∑i=0

f(T ix).

These functions exist, because lim sups and lim infs always exist. These functions aremeasurable because lim sups and lim infs of measurable functions are measurable. Andclearly f(x) ≤ f(x) for all x ∈ X. We can also show that both functions are T -invariant,because

f(Tx) = lim supn→∞

1

n

n−1∑i=0

1A(T i+1x)

= lim supn→∞

1

n

n∑i=0

1A(T ix)− 1A(x)

n

= lim supn→∞

n+ 1

n

1

n+ 1

n∑i=0

1A(T ix)− 1A(x)

n

= f(x)

and likewise for f .

3.2. THE ERGODIC THEOREM 35

If we can show that ∫Xf dµ ≤ µ(A) ≤

∫Xf dµ,

then since f − f ≥ 0, we get that∫X f − f dµ is a non-positive-valued integral whose

integrand is non-negative. This is only possible if f = f almost everywhere, and forthose values where we have equality, we also have, by definition, that both functions equalf∗. We have that

∫X f∗ dµ = µ(A) =

∫X f dµ. Moreover, this function is T -invariant

almost everywhere, and thus, by the previous lemma, if T is ergodic, then f∗ must beconstant almost everywhere, and so, for this constant value, we have

∫X f dµ =

∫X f∗ dµ =

f∗∫X dµ = f∗.

We will show that∫X f dµ ≤ µ(A). The other inequality holds by a similar method.

Let ε > 0 be fixed; we will let it tend to 0 at the end of the proof. Let Sn(x,B) =∑n−1i=0 1B(T ix), for any measurable set B. Let

N(x) := minn ≥ 1 : Sn(x,A) ≥ (f(x)− ε)n,

which must exist due to the nature of the lim sup. By definition SN(x)(x,A) ≥ (f(x) −ε)N(x).

For a real number M > 0, let AM = x ∈ X : N(x) > M. Since N(x) is alwaysfinite, we can find a fixed M such that µ(AM ) < ε. For this choice of M , let A′ = A∪AM .Further define

N ′(x) :=

N(x), if N(x) ≤M,

1, if N(x) > M.

We have N ′(x) ≤M for all x ∈ X.We can moreover see that SN ′(x)(x,A

′) ≥ (f(x) − ε)N ′(x) for all x. If N(x) > M ,then we must have that x ∈ A′, that N ′(x) = 1, and therefore that SN ′(x)(x,A

′) = 1

which is larger than (f(x)− ε)N ′(x). If, on the other hand, we have that N(x) ≤M , thenN ′(x) = N(x), and since A ⊂ A′, we have that

SN ′(x)(x,A′) =

N(x)−1∑i=0

1A′(Tix) ≥

N(x)−1∑i=0

1A(T ix) ≥ (f(x)− ε)N ′(x).

Now we invoke some technical definitions, but the gist of the idea will be that we wantto replace a long sum Sn(x,A′) with sums of length N ′(x), which we then estimate usingthe above. In particular, let n0(x) = 0 and nk(x) = nk−1(x) + N ′(Tnk−1(x)x). Choosen larger than M and let ` = `(n, x) = maxk ≥ 1 : nk(x) ≤ n − 1. Then, using theT -invariance of f , we have

Sn(x,A′) ≥n`(x)−1∑i=0

1A′(Tix) =

`−1∑i=0

ni+1(x)−1∑j=ni(x)

1A′(Tjx)


=`−1∑i=0

SN ′(Tni(x)x)(Tni(x)x,A′)

≥`−1∑i=0

(f(Tni(x)x)− ε)N ′(Tni(x)x)

= (f(x)− ε)`−1∑i=0

(ni+1(x)− ni(x))

≥ (f(x)− ε)(n−M),

where this last line comes from the fact that n− n`(x) ≤M .

Now recall that T is measure preserving so that for any measurable set B and anyinteger i ≥ 0, we have

µ(B) = µ(T−iB) =

∫X

1T−iB dµ =

∫X

1B(T ix) dµ.

Thus µ(A′) = 1n

∫X Sn(x,A′) dµ. Combining this with the inequality we proved in the last

paragraph, so that we have

µ(A′) ≥ n−Mn

(∫Xf dµ− ε

).

Taking limits gives µ(A′) ≥∫X f dµ − ε. Recalling the difference between A and A′ gives

µ(A) ≥∫X f dµ− 2ε. Since ε > 0 was arbitrary, we get the desired relation.

3.3 Normality and equidistribution

Okay, so what does the pointwise ergodic theorem mean? The easiest way to understandit is again to consider an indicator function 1A for some measurable function A.

In this case, limn→∞1n

∑n−1i=0 1A(T ix) indicates the average number of times the orbit

of x visits A. And by the ergodic theorem, this should equal µ(A) for almost all x. In otherwords, the frequency with which the orbit of x visits A is determined by the size of A.

This can be a little misleading. Consider the binary expansion, the set A = [0, 1/2),and the point x = 1/3. The orbit of x alternates between 1/3 and 2/3 evenly, so it doessatisfy the ergodic theorem for the function 1A, but clearly the orbit does not nicely fallinto every set A. In fact there is no point x whose orbit will nicely meet every set A cannever happen, because you can always remove the orbit of a point x from a given set. Butmaybe we can show the orbits of most points meet a nice large collection of sets.

3.3. NORMALITY AND EQUIDISTRIBUTION 37

Definition 3.3.1. Given an ergodic, finite, measure-preserving fibred system (X,µ, T,D,X ),we say a point x ∈ X is T -normal if for every cylinder set Cs we have that

limn→∞

1

n

n−1∑i=0

1Cs(Tix) = µ(Cs). (3.1)

Equivalently, a point is T -normal if the limiting frequency of any string s in φ(x) is equalto µ(Cs).

We will often refer to a point as being normal to base b if it is T -normal with respectto the base-b expansion and likewise refer to a point as being CF-normal if it is T -normalwith respect to the continued fraction expansion.

Remark 3.3.2. There have been many different definitions of normality over the years,most of which have been equivalent. This definition is nicely related to the symbolic structureof a fibred system. In many places, normality is equivalent to genericity of points for adynamical system: a point x ∈ X in a topological dynamical system (X,µ, T ) is said to begeneric if

limn→∞

1

n

n−1∑i=0

f(T ix) =

∫Xf dµ,

for all continuous functions f on X. Compare this with the notion of equidistribution givenlater in this section.

Proposition 3.3.3. Given an ergodic, finite, measure-preserving fibred system (X,µ, T,D,X ),almost all points x ∈ X are T -normal. Here, again, almost all means that the set of pointswhich are not T -normal has µ-measure 0.

Proof. For any given s, the set of points for which (3.1) does not hold has µ-measure 0.Since there are countably many strings of finite length (even if D itself is countably infinite)and since the countable union of measure-0 sets has measure 0, we see that the set of pointsfor which (3.1) does not hold for any finite string s has µ-measure 0.

Definition 3.3.4. Let µ be a probability measure on [0, 1). We say a sequence xi∞i=1 isequidistributed with respect to µ if for every [a, b) ⊂ [0, 1) we have

limn→∞

#1 ≤ i ≤ n : xi ∈ [a, b)n

= µ([a, b)).

Proposition 3.3.5. A number x is normal to base-b if and only if the orbit of x in thebase-b expansion is equidistributed with respect to the Lebesgue measure.


It would be tempting to prove this by making use of the fact that the cylinder setsform a generating semi-algebra and our earlier result that says each measurable set can beapproximated to arbitrarily good degree by finitely many elements of our semi-algebra, butit turns out we need something a little stronger.

Proof. The “if” direction follows immediately by considering [a, b) that are cylinder sets.So we consider only the “only if” direction.

Let [a, b) ⊂ [0, 1). (Here b has no relation to the base: we abuse notation and swap tob being a real variable.)

For a large integer k let Sk denote the union of all rank-k cylinder sets which have anon-empty intersection with [a, b). Let S′k denote the union of all rank-k cylinder sets whichare strictly contained inside [a, b). So S′k ⊂ [a, b) ⊂ Sk.

Since any rank-k cylinder set has Lebesgue measure 10−k and is, in fact, an interval,we have that λ(S′k) ≥ b − a − 2/10k. By noting that (3.1) holds for each rank-k cylinderset Cs contained in S′k, we have that for all sufficiently large n

#0 ≤ i ≤ n− 1 : T ix ∈ S′kn

≥ b− a− 4

10k.

(Here we implicitly used the fact that there were only finitely many rank-k cylinders con-tained in S′k. For each such cylinder, we know by the definition of T -normality that once nis large enough, the points x, Tx, T x, . . . , Tn−1x visit the corresponding cylinder set oftenenough, and since there are only finitely many cylinders, there is an n that works generally.)Likewise, for all sufficiently large n we have that

#0 ≤ i ≤ n− 1 : T ix ∈ Skn

≥ b− a+4

10k.

Now S′k ⊂ [a, b) ⊂ Sk implies that

#0 ≤ i ≤ n− 1 : T ix ∈ S′kn

≤ #0 ≤ i ≤ n− 1 : T ix ∈ [a, b)n

≤ #0 ≤ i ≤ n− 1 : T ix ∈ Skn

.

Comparing these bounds with the ones we proved in the previous paragraph, we see thatthe desired relation holds.

Note, similar proofs work for all β-expansions as well as the continued fraction expansion(with the Parry measure and the Gauss measure in place of Lebesgue); however, the proofin the continued fraction case requires a touch more care due to there being infinitely manyrank-k cylinders.

Theorem 3.3.6 (Weyl’s criterion). Let xi∞i=1 be a sequence in [0, 1). The following areequivalent:

3.3. NORMALITY AND EQUIDISTRIBUTION 39

1. The sequence is equidistributed with respect to Lebesgue.

2. For every complex-valued, 1-periodic, continuous function f , we have

limn→∞

1

n

n∑i=1

f(xi) =

∫ 1

0f(x) dx.

3. For every non-zero integer k, we have

limn→∞

1

n

n∑i=1

e2πikxn = 0.

We recall that eiθ = cos θ + i sin θ.

This last statement is often rephrased using asymptotic notation as∑n

i=1 e2πikxn = o(n).

Proof sketch. (1) implies (2): We call a function a step function if it is a finite linearcombination of indicator functions of intervals. Any real-valued, 1-periodic function can beapproximated from above and below by step functions to an arbitrary degree of precision.From there, we can approximate complex-valued functions by breaking them into their realand imaginary parts.

(2) implies (1): Any indicator function of an interval can be approximated from aboveand below by real-valued, 1-periodic continuous functions whose integrals converge to thelength of the interval.

(2) implies (3): Obvious from taking f(x) = e2πikx.

(3) implies (2): This follows from the Stone-Weierstrass, in that finite linear combina-tions of the functions e2πikxn can approximate any complex-valued, 1-periodic, continuousfunction f to an arbitrary degree of precision in the supremum norm.

We remark that the equivalence of the first two statements in Theorem 3.3.6 still hold ifLebesgue measure is replaced by any measure which is continuous with respect to Lebesgueand whose density is bounded away from 0 and infinity.

Now we’ve talked a lot about T -normal numbers and (provided we’re in a nice ergodicsystem), we know they are everywhere, with full measure. But how do we know if a numberis T -normal or not?

That’s actually a ridiculously hard problem.

We can give examples of T -normal numbers. We’ll do that in the next chapter. But fornow, we still need to prove ergodicity for some of our systems.


3.4 Proving ergodicity

We would still like to prove that both the β-expansions dynamical system and the continuedfraction dynamical system are ergodic. To do this we will make use of a particular methodwhich seems to have its roots in Knopp and Renyi.

Lemma 3.4.1. Suppse (X,F , µ) is a probability space and A is a semi-algebra that gener-ates F . Let E be a measurable set.

If there exists a δ > 0 such that for all A ∈ A we have

µ(E ∩A) ≥ δµ(A),

then µ(Ec) = 0If there exists an η < 1 such that for all A ∈ A we have

µ(E ∩A) ≤ ηµ(A),

then µ(E) = 0

Results like this are sometimes referred to as Knopp’s lemma.

Proof. We will prove the first statement. The second follows by a similar method.Let ε > 0. Since A is a semi-algebra that generates F , by our earlier lemma, we can

find a finite number of disjoint sets A1, A2, . . . , Ak ∈ A such that

µ

(Ec4

k⋃i=1

Ai

)< ε.

But then we have

0 = µ(E ∩ Ec) ≥ µ

(E ∩

k⋃i=1

Ai

)− µ

(Ec4

k⋃i=1

Ai

)

> µ

(k⋃i=1

(E ∩Ai)

)− ε

=k∑i=1

µ(E ∩Ai)− ε

≥k∑i=1

δµ(Ai)− ε

≥ δµ

(k⋃i=1

Ai

)− ε

3.4. PROVING ERGODICITY 41

≥ δ

(µ(Ec)− µ

(Ec4

k⋃i=1

Ai

))− ε

≥ δµ(Ec)− (1 + δ)ε.

Since ε > 0 is arbitrary, we have 0 ≥ δµ(Ec). But δ > 0 so µ(Ec) = 0.

Now our general goal will be to assume E is a positive measure, invariant set, so thatT−1E = E, and then we want to use the above lemma to show that µ(Ec) = 0, provingergodicity. The semi-algebra A will generally be taken to be the collection of cylinder setsso that A = Cs for some admissible s.

Suppose that E is a positive measure, invariant set and suppose that A = Cs withs = [a1, a2, . . . , ak]. Then we have that

µ(E ∩ Cs)µ(Cs)

=µ(T−kE ∩ Cs)

µ(Cs)=

∫T−kE∩Cs dµ∫

Csdµ

=

∫E∩TkCs ωs(y) dµ∫TkCs

ωs(y) dµ,

where ωs(y) is the Jacobian of T−1a1 T

−1a2 . . . T−1

ak.

As an example, for base 10 expansions, we have that

x = T−1a1 T

−1a2 . . . T−1

ak(T kx) =

k∑i=1

ai10i

+T kx

10k,

so that ωs = 1/10k always.

Remark 3.4.2. We should note that in general, it is not true that T (A ∩B) = TA ∩ TB.However, we do still get the desired relation T k(T−kE ∩ Cs) = E ∩ T kCs.

Theorem 3.4.3. Suppose we have a fibred system (X,F , µ, T,D,X ) and that the followinghold:

1. µ(X) = 1.

2. There exists a semi-algebra A which generates F such that each A ∈ A can be ex-pressed as a disjoint union of countably many full cylinders.

3. (Renyi’s condition) There exists a uniform constant M ≥ 1 such that for all admissiblestrings s, we have

supy∈T |s|Cs ωs(y)

infx∈T |s|Cs ωs(y)≤M.

Then T is ergodic.


Proof. Suppose E is an invariant set of positive measure. Then we want to show thatµ(Ec) = 0.

Let Cs be any full cylinder. Then we have that

µ(E ∩ Cs)µ(Cs)

=

∫E∩TkCs ωs(x) dµ∫TkCs

ωs(x) dµ=

∫E ωs(y) dµ∫X ωs(y) dµ

≥(infx∈X ωs(y)) ·

∫E dµ

(supx∈X ωs(y)) ·∫X dµ

≥ µ(E)

Mµ(X)

≥ µ(E)

M.

Let A be an element of the semi-algebra A. We assumed that A =⋃Cs for some

disjoint collection of full cylinders Cs. Then

µ(E ∩A) = µ(E ∩

⋃Cs

)= µ

(⋃(E ∩ Cs)

)=∑

µ (E ∩ Cs)

≥ 1

M

∑µ(E)µ(Cs) =

µ(E)

Mµ(⋃

Cs

)=µ(E)

Mµ(A).

Now we apply Lemma 3.4.1 with δ = µ(E)/M to see that µ(Ec) = 0, completing theproof.

3.4.1 Ergodicity of continued fraction expansions

With Theorem 3.4.3, showing that the continued fraction expansion is ergodic is not toodifficult.

First, in this case we have that µ(X) = 1 by definition.For the second condition of Theorem 3.4.3, we have that each rank-1 cylinder is full

and thus all cylinders are full. (Remember, we ignore X∞ and anything that falls into it.)The collection of cylinder sets does not form a semi-algebra generating the Borel σ-algebra,because there are infinitely many digits. However, there are countably many cylinder sets,which will be important.

Let a, b be any two rational numbers in [0, 1). By the fact that cylinder sets Cs areintervals whose length shrinks to 0 as |s| goes to infinity, each point in the interval (a, b)lives in some cylinder set Cs which is strictly contained in (a, b). Thus (a, b) can be expressedas the (countable) union of all these cylinder sets, and this union can be made disjoint bydiscarding any cylinder set which is a subset of another cylinder set in this union.

Now you might say that these sets (a, b) do not form a semi-algebra, since a complementof such a set cannot be written as a finite disjoint union of such sets. (In particular, you’ll


never be able to fit a into such a union.) However, this doesn’t matter because a is rationaland thus is in X∞ eventually, and we don’t care about those points.

We also note that the Borel algebra is generated by intervals with rational endpointsjust as it is generated by intervals with real endpoints.

Thus the second condition is satisfied.For the third condition, we need to calculate ωs(y).For this, remember that we have

x =pn + pn−1T

nx

qn + qn−1Tnx.

To calculate ωs(y) for s corresponding to the fraction pn/qn, we set Tnx = y and take thederivative with respect to this variable.(

pn + pn−1y

qn + qn−1y

)′=

(qn + qn−1y)pn−1 − (pn + pn−1y)qn−1

(qn + qn−1y)2

=qnpn−1 − pnqn−1

(qn + qn−1y)2

=(−1)n

(qn + qn−1y)2

Thus ωs(y) = (qn + qn−1y)−2. (Remember that we always take absolute values whencalculating the Jacobian.) So the supremum of ωs(y) = q−2

n and the infimum is (qn +qn−1)−2. However, since we also showed that 0 ≤ qn−1 ≤ qn, we see that the third conditionholds with M = 4.

Therefore the theorem applies and the continued fraction expansion system is ergodic.

Consequences of ergodicity for continued fractions

Now one consequence of ergodicity is that almost all real numbers are CF-normal, so eachstring appears within their expansion with the expected frequency, such as in the followingresult.

Proposition 3.4.4. Let k ∈ N. We have, for almost all x,

limn→∞

#1 ≤ i ≤ n : ai(x) = kn

=1

log 2log

(1 +

1

k(k + 2)

).

Proof. This follows by applying the ergodic theorem to the characteristic function of C[k] =(1/(k + 1), 1/k]. Since 1C[k]

(T ix) = 1 if and only if ai+1(x) = k, we see that for almost allx we should have

limn→∞

#1 ≤ i ≤ n : ai(x) = kn

=

∫[0,1)

1C[k](x) dµ(x)


=

∫ 1/k

1/(k+1)

1

log 2(1 + x)dx

=1

log 2

(log

(1 +

1

k

)− log

(1 +

1

k + 1

))=

1

log 2

(log

(k + 1

k

)− log

(k + 2

k + 1

))=

1

log 2log

((k + 1)2

k(k + 2)

)=

1

log 2log

(1 +

1

k(k + 2)

)

However, by applying the ergodic theorem to other functions besides the characteristicfunctions of cylinder sets, we can obtain some other interesting results. All the followinghold for almost all x.

limn→∞

#1 ≤ i ≤ n : ai ≡ 1 (mod 4)n

=1

2

limn→∞

a1 + a2 + · · ·+ ann

=∞

limn→∞

(a1a2a3 . . . an)1/n =∞∏n=1

(1 +

1

n(n+ 2)

)logn/ log 2

limn→∞

log qnn

=π2

12 log 2

While the first three are fairly straight-forward, the last equality given here takes a bitmore work. The key idea is to show that − log qn(x) ≈ log x + log Tx + · · · + log(Tn−1x)and thus the ergodic theorem can be applied with f(x) = − log x. We should note thatthe function − log x is integrable on [0, 1), but it is not continuous and 1-periodic, so thatT -normality does not necessarily imply this relation.

3.4.2 Ergodicity for β-expansions

We will again apply Theorem 3.4.3. Again, the first condition is easy, and each ωs(y) equalsβ−|s| by the same argument as for the base 10 expansion earlier, so the third condition holdstrivially.

The problem is the second condition. For this we will need some additional results.


Lemma 3.4.5. Let β > 1 be non-integer. The set of points with terminating β-expansions(expansions ending on repeating 0’s) is dense.

For terminology, a set is dense in [0, 1) if it has a non-empty intersection with everysub-interval of [0, 1).

Proof. We note that a number x has a terminating β-expansion if and only if the orbit ofx visits 0. (Once it visits 0, it never travels anywhere else.)

Also note that if we identify 0 with 1, then the transformation T , given by Tx = βx(mod 1) is a continuous, positive derivative function at all points except at 0 itself.

Consider an interval [a, b) ⊂ [0, 1). If this contains a point x such that Tx = 0, thenwe’re done. Otherwise T maps [a, b) onto an interval [a1, b1) of length β(b − a) that doesnot contain 0. We repeat the argument again: either [a1, b1) contains a point x such thatTx = 0 and we’re done, or else T maps it onto an interval [a2, b2) of length β2(b − a).Clearly this cannot go on forever. Eventually βn(b− a) ≥ 1, and we will have to see somepoint whose orbit hits 0.

Lemma 3.4.6. Suppose Cs = [a, b), then T |s|a = 0. This is true regardless of whether Csis full or not.

We only mention that to prove this, you just have to verify that if T−1d [a, b) contains

any points, it must contain T−1d a.

Lemma 3.4.7. The rank-k cylinder sets are the sets of the form [a, b) where Tna = Tnb = 0and there are no other points x ∈ [a, b) for which Tnx = 0.

This follows immediately from the previous lemma.

Lemma 3.4.8. Let Cs = [a, b) be any cylinder set. Then any set [a, b′) ⊂ [a, b) can bewritten as the countable, disjoint union of full cylinders.

Proof. We may assume that [a, b′) itself is not a full cylinder, as otherwise we are done. Letn = |s|.

By the continuity of T we know that Tn[a, b′) = [0, Tnb′). Provided Tnb′ ≤ 1/β,applying T again does nothing but multiply the right-endpoint by β (since there is nointeger part to remove). Let m ≥ 0 be the largest integer such that βmTnb′ < 1 so thatTn+m[a, b′) = [0, βmTnb′). Because we still have not seen another point that gets mappedto 0, we must have that [a, b′) isn’t just in a given rank-n cylinder, it is contained within asingle rank-n+m cylinder.

Consider what happens when T is applied one more time. For each integer 0 ≤ i ≤βm+1Tnb′, the interval [i/β, (i + 1)/β) is contained in [0, βmTnb′), and this interval getsmapped to [0, 1). Thus, for each such i, we have that the interval [a + i/βn+m+1, a + (i +1)/βn+m+1) is a full rank-n+m+ 1 cylinder.


Thus we can cover [a, b′) with finitely many rank-n+m+ 1 cylinders leaving, possibly,only a set [a′, b′) uncovered, where a′ is the left endpoint of a rank-n + m+ 1 cylinder setthat contains [a′, b′). Moreover, we see that b′ − a′ ≤ (1/β)(b− a). We can thus repeat thesame argument with [a′, b′) in place of [a, b′). Since, at every stage, we shrink the lengthof the target set by a factor of at least 1/β, we see that eventually every point in this setmust be covered by a full cylinder set. This completes the proof.

Lemma 3.4.9. Let [a, b) be a subset of [0, 1) with T ka = 0 for some k ≥ 0. Then [a, b) canbe written as a countable disjoint union of full cylinders.

Proof. By our earlier work, we know that a must be the left-endpoint of some rank-kcylinder. Since there are finitely many rank-k cylinders, we can write [a, b) as a disjointunion of rank-k cylinders unioned with [a′, b) where [a′, b) is completely contained in a singlerank-k cylinder. By the previous lemma, [a′, b) can be written as a countable disjoint unionof full cylinders, which completes the proof.

Lemma 3.4.10. The set of intervals [a, b) with T ka = 0 and T jb = 0 for some j, k ≥ 0form a semi-algebra that generates the Borel σ-algebra.

We only remark that showing this is a semi-algebra is easy, and showing that it generatesthe Borel σ-algebra follows by noting that the endpoints of these intervals are dense and sofollows for the same reason that it is generated by intervals with rational endpoints.

Together all the above work shows that the second condition of Theorem 3.4.3 holds,and thus all β-expansions are ergodic.

3.4.3 A wrap-up on ergodicity

A few comments to wrap up this section.First, our definition of ergodicity does not in any way rely on knowing that the system

was measure-preserving. There are various ways of showing the existence of an invariantmeasure with good properties. One possible construction goes as follows,

ρn(A) =1

n

n−1∑i=0

µ(T−iA) =

∫X

(1

n

n−1∑i=0

1A(T ix)

)dµ(x).

If the function ρ = limn→∞ ρn exists, then it is easy to see that it is invariant. A result ofRyll-Nardzewski can be helpful here.

Second, as we can see from proving ergodicity for certain systems, unless we know goodthings about full cylinders, it can be extremely hard. There are many variants of Renyi’stheorem which remove the restriction on full cylinders in place of other conditions, such asthe condition that T |s|Cs can be one of only finitely many different sets. However, in manycases of interest, even this is not enough. To my knowledge, there is no Renyi-like theorem


that implies the ergodicity of the complex continued fraction expansion: there ergodicityis proved using an operator-analytic method that also proves, among other things, strongmixing. More powerful techniques include the method of Lasota–Yorke.

Third, and finally, the strong-mixing of the continued fraction expansions and β-expansionscan be proven directly if enough care is taken. This has been done by Philipp, among others.

Chapter 4

Normal numbers

As we’ve seen, ergodicity can be an extremely powerful tool, telling us that the orbits ofalmost all points distribute nicely through the space, and in the case of continued fractions,that gives us a ton of useful information.

But then we hit a roadbump.

What are the numbers that have good properties? In particular, what are the T -normalnumbers?

The question of whether π is base-10 normal (or CF-normal) has been open for morethan a century, and we do not appear to be close to an answer. Nor are we close to ananswer to the normality of

√2, e, log 2, or an other commonly used irrational mathematical

constant—with rare exceptions like e being known not to be CF-normal due to having avery regular form.

What about a far simpler question: can we even give examples of T -normal numbers?Yes! In fact, a lot of big names have worked on giving examples. In this chapter, we’ll seethree distinct methods used for giving examples of base-b normal numbers. These proofsare fairly general while being fairly direct and simple to give. One of them will bring usthe closest we have ever managed to get to the normality of constants like π.

The very first base-10 normal number ever discovered was Champerknowne’s constant

0.12345678910111213 . . .

formed by concatenating all of the integers in succession. This inspired a flurry of re-lated work. Besicovitch proved that concatenating the perfect squares into 0.149162536 . . .gives a normal number. Copeland and Erdos proved that concatenating the primes into0.2357111317 . . . gives a normal number.

More generally a common type of normal number construction involves taking a functionf : N→ N (or, if f is positive and real-valued, we take the floor of such a function), lettingf(n) denote the string of base-10 digits of f(n) in the natural way, and then concatenating

49

50 CHAPTER 4. NORMAL NUMBERS

them to formθf = 0.f(1)f(2)f(3)f(4) . . .

orτf = 0.f(2)f(3)f(5)f(7)f(11) . . . .

There are dozens more examples of functions f which make either θf or τf normal to base-10. Nakai and Shiokawa showed that both θf and τf are normal when f is a polynomialwith real coefficients. Madritsch, Thuswaldner, and Tichy showed that both constants arenormal when f is a real-valued entire function of low logarithmic order. De Koninck andKatai proved that both are normal when f(n) = P (n+ 1) where P (n) is the largest primedivisor of n. Vandehey proved that θf is normal if f is taken to be the last half of the digitsof ω(n), the distinct prime divisor counting function. (By the Erdos-Kac theorem, we knowthat the first half of the digits of ω(n) should resemble log log n too closely, which preventsnormality there.) Vandehey and Pollack proved that θf is normal if f is the Euler totientfunction, the sum-of-divisors function, the Carmichael function, or any finite compositionof those three functions.

4.0.4 A quick refresher on asymptotic notation

We will make frequent use of asymptotic notation in this chapter, so let us briefly describesome of it in more detail.

We say f(x) = O(g(x)) if there exists C > 0 such that |f(x)| ≤ C|g(x)|. Equivalently,we say that f(x) g(x). We may think of this as f(x) is bounded in size by g(x).

We say f(x) = o(g(x)) if limx→∞ f(x)/g(x) = 0. (Sometimes x may go to another valueother than ∞, and in those cases we will make such things clear.) We may think of this asf(x) is an order of magnitude smaller than g(x).

We say f(x) g(x) if g(x) f(x) g(x). We may think of this as f(x) and g(x) asbeing of the same order of magnitude. So x4 + x3 3x4.

We say f(x) ∼ g(x) if f(x) = g(x)(1 + o(x)) or, equivalently, if limx→∞ f(x)/g(x) =1 + o(1). We may think of this as f(x) and g(x) as being asymptotically equivalent.

4.1 The combinatorial method: Copleand-Erdos

We begin by showing the general method of Copeland-Erdos, which was used to proveseveral of the above mentioned results.

The key idea of this method relies on the notion of (ε, k)-normality.Let us fix a base b, and define νs(a) to be the number of time s appears in the base-b

expansion of the integer a. We will allow νs(a) to have the same definition if a is a stringof digits. If x is a real number in [0, 1), we will let νs(x,N) denote the number of times thestring s occurs within the string [a1(x), a2(x), . . . , aN (x)]. In both definitions, we may let

4.1. THE COMBINATORIAL METHOD: COPLEAND-ERDOS 51

s be the empty string, in which case it just counts the number of digits. In such a case, wesimply do not write the subscript.

We say that an integer a is (ε, k)-normal if for all strings s with |s| = k, we have that∣∣∣∣νs(a)

ν(a)− 1

bk

∣∣∣∣ ≤ ε.In an (ε, k)-normal number, all strings of length k appear to close to the desired frequency fornormality, and thus we may think of it as having good “small-scale” normality properties.

The idea of the combinatorial method is fairly simple. Suppose one has a functionf : N → N such that for any given ε > 0 and k ∈ N we have that most all of the valuesof f(n) are (ε, k)-normal. Then we would expect that when we concatenate these valuesto form θf , the result should be (ε, k)-normal as well, and since ε and k were arbitrary, itshould be outright normal. This does require that f(n) are generally increasing in length,as we want the behavior inside the strings f(n) to dominate the behavior of strings thatstart in one f(n) and end in another.

Let us be more specific. We say a set S ⊂ N is meager if #n ∈ S : n ≤ m ≤ m1−δ forsome fixed δ > 0 and all sufficiently large m. We say a set S ⊂ N has asymptotic density 0if #n ∈ S : n ≤ m = o(m). We shall say a function f : N → N is almost bijective if thepre-image of any meager set has asymptotic density 0.

Theorem 4.1.1. Suppose the function f : N→ N is almost bijective. If, in addition,

m = o

(m∑n=1

ν(f(n))

)and m · max

1≤n≤mν(f(n)) = O

(m∑n=1

ν(f(n))

)

then x = 0.f(1)f(2)f(3) . . . is normal to base b.

The two conditions on ν(f(n)) can be interpreted as, first, that the ν(f(n)) are growingon average and, second, that no one term ν(f(n)) dominates in size.

To prove this, we first need the following bound on the number of (ε, k)-normal integers,which will come in the next two results.

Lemma 4.1.2. Let c ∈ (0, 1) be a real number. Then, as n tends to infinity, we have(ncn

)=

n!

(cn)!((1− c)n)!=

1√2πc(1− c)n

[c−c(1− c)−(1−c)

]n(1 +O

(1

n

)),

where we take x! = Γ(x+ 1) if x is not integer-valued.

Proof. We make use of Stirling’s formula, which says

n! =√

2πn(ne

)n(1 +O

(1

n

)).


Applying this to the binomial coefficient, we obtain the rather horrendous computation

n!

(cn)!((1− c)n)!

=

√2πn

(ne

)n(1 +O(1/n))

√2πcn

(cne

)cn(1 +O(1/cn))

√2π(1− c)n

((1− c)n

e

)(1−c)n(1 +O(1/(1− c)n))

.

From here, the remainder of the proof is relatively simple. All of the (n/e) terms cancelfrom the numerator and denominator, and one copy of

√2πn cancels as well. Finally we

note the following three facts which complete the proof:

• For any positive constant C we have

O

(C

1

n

)= O

(1

n

).

• We have1

1 +O(1/n)= 1 +O

(1

n

).

• We have (1 +O

(1

n

))×(

1 +O

(1

n

))= 1 +O

(1

n

).

Proposition 4.1.3. Let k ∈ N be fixed and let ε > 0 be sufficiently small in terms of k andb.

There exists a δ = δ(ε, k) > 0 such that the number of integers in the interval [1,m] thatare not (ε, k)-normal is O(m1−δ).

There also exists a δ′ = δ′(ε, k) > 0 such that the number of base-b strings of length `(including those that start with 0) that are not (ε, k)-normal is at most O(b`(1−δ

′)).

Proof. The first half of the proposition follows quickly from the second half, so we provethat one.

Let us prove the second half of the proposition for ε > 0 and k = 1. Let d be a digitin 0, 1, . . . , b − 1 and consider how many strings there are of length ` with exactly m

appearances of d. There are exactly

(`m

)ways to pick where the d’s should appear and


there are exactly (b − 1)`−m ways to pick what the remaining digits should be. So thenumber of non-(ε, 1)-normal numbers is bounded by

b

∑m<`( 1

b−ε)

(`m

)(b− 1)`−m +

∑m>`( 1

b+ε)

(`m

)(b− 1)`−m

where the extra b in front comes from letting d range through all possible values.

We will bound the first sum and note that the same process holds for the second sum.To go from a term with m to a term with m+ 1, we multiply by a factor of

`−m(m+ 1)(b− 1)

.

Thus, we can quickly see that the terms on the first sum are all increasing with m, andsince there are at worst ` terms, the sum is bounded by

b`

(`m′

)(b− 1)`−m

′

where m′ is the largest integer such that m′ < `(1/b− ε). However, by the monotonicity ofthe Gamma function, we can replace m′ with m′ + x for any x ∈ [0, 1) and change by nomore than a factor of `2. (For example, for the m′! term in the denominator, we know thatΓ(m+ 1) ≤ Γ(m+ 1 + x) ≤ Γ(m+ 2), and the first and last term here are separated by afactor of m+ 1.) Therefore, the first sum is bounded by

b`3(

`(1b − ε

)`

)(b− 1)`(

b−1b

+ε).

We now apply our previous lemma to the binomial coefficient here and see that the firstsum is bounded by

O

`5/2 [( b

1− bε

) 1b−ε( b

b− 1 + bε

) b−1b

+ε

(b− 1)b−1b

+ε

]`= O

`5/2b` [(1− bε)−( 1b−ε)(

1 +b

b− 1ε

)−( b−1b

+ε)]`

(Note that 1+O(1/n) = O(1).) I claim that the quantity in brackets is less than 1, providedε is sufficiently small, and thus can be written as b−2δ′ for some δ′ > 0. From here, it sufficesto note that `5/2 = O(bδ

′`) and that all these estimates hold for the second sum, and theproof is complete in this case.


To complete the claim, we take logarithms and then take the Taylor series expansion interms of ε:

(1− bε)−( 1b−ε)(

1 +b

b− 1ε

)−( b−1b

+ε)

= exp

−(

1

b− ε)

log (1− bε)−(b− 1

b+ ε

)log

(1 +

b

b− 1ε

)= exp

− b2

2(b− 1)ε2 +O(ε3)

.

Thus, if ε is small enough, the above expression is less than 1.(For students interested in checking the above Taylor series expansion, note that it is

easiest to do if you use the Taylor expansions for log(1 + x) and then multiply and addthose.)

From here, we still need to consider (ε, k)-normality. We sketch how to extend theresults. A string of k base b digits can be considered as a single-digit string of base bk

digits. Thus, if one has, say, the base 10 number 1205287 and want to study whether thisis (ε, 3)-normal, then one could look at (120)(528)∗ and ∗(205)(287) and ∗∗ (052)∗∗. Thus,knowing how often a string fail to be (ε, 1)-normal for base bk and knowing how often suchstrings can appear in a base b string gives the desired result.

Proof of the combinatorial method. To show that x = 0.f(1)f(2)f(3) . . . is normal to baseb, we must show for any given string s of length k that

limN→∞

νs(x,N)

N=

1

bk.

Finally, let us consider a fixed ε > 0 that we will allow to tend to 0 at the end of the proof.For a given integer N , let m = m(N) be such that the Nth digit of x lies in the string

given by f(m). In particular, we have

m−1∑n=1

ν(f(n)) < N ≤m∑n=1

ν(f(n)).

Our assumptions on ν(f(n)) imply that ν(f(m)) = o(∑m

n=1 ν(f(n)), so we have that N ∼∑mn=1 ν(f(n)), m = o(N), and ν(f(m)) = o(N). Therefore,

ν(x,N, s) = ν(f(1)f(2) . . . f(m), s) +O(L(f(m))) = ν(f(1)f(2) . . . f(m), s) + o(N).

The number of times a string of length k can appear in f(1)f(2) . . . f(m) starting insome f(n) and ending in some f(n′) with n < n′ is at most km = o(N). Therefore,

νs(x,N) = νs(f(1)f(2) . . . f(m)) + o(N) =∑n≤m

νs(f(n)) + o(N).


Let T ⊂ N be the set of integers n such that f(n) is not (ε, k)-normal. Note that bythe assumptions of the theorem, we have #n ≤ m : n ∈ T = o(m). We always have thatνs(f(n)) = O(ν(f(n))), and therefore

∑n≤mn∈T

νs(f(n)) = O

∑n≤mn∈T

ν(f(n))

= O

maxn≤m

ν(f(n)) ·∑n≤mn∈T

1

= o

(m ·max

n≤mν(f(n))

)= o (N) .

Now we let S = N \ T be the set of integers n such that f(n) is (ε, k)-normal. If n ∈ Sthen νs(f(n)) = ν(f(n))b−k +O(εν(f(n))), and thus∑

n≤mn∈S

νs(f(n)) =∑n≤mn∈S

(ν(f(n))

(b−k +O(ε)

))

= b−k

∑n≤m

ν(f(n))−∑n≤mn∈T

ν(f(n))

+O

ε ·∑n≤mn∈S

ν(f(n))

= b−k (N(1 + o(1)) + o(N)) +O

ε ·∑n≤m

ν(f(n))

= b−kN + o(N) +O(εN)

Since the sum over n ≤ m is equal to the sum over n ≤ m with n ∈ S plus the sum overn ≤ m with n ∈ T , we therefore have that

νs(x,N)

N= b−k + o(1) +O(ε).

Since ε > 0 was arbitrary, we therefore have that the limit of νs(x,N)/N goes to b−k asN →∞.

Of course, what is a good theorem without some neat applications?One nice corollary of the theorem is that if f(n) is a strictly increasing function on N

and its image is sufficiently dense, so that #m ≤ x : f(n) = m for some n eventuallyexceeds x1−ε for large enough x and any ε > 0, then θf is normal. By taking f(n) = n weget the normality of Champerknowne’s constant 0.1234567 . . . . By taking f(n) = pn, thenth prime, we get the normality of the Copeland-Erdos constant, 0.23571113 . . . .

Exercise 4.1.4. Prove that taking f(n) = bn1/2c will make θf be normal to base b.


4.1.1 The combinatorial method for other systems

The combinatorial method is fairly robust and extends naturally to many other systems.The idea is to find strings that have better and better small-scale normality properties andconcatenate those in succession.

Adler, Keane, and Smorodinsky did this most famously by showing that if one concate-nates the finite continued fraction expansions of the sequence of rationals

1

2,1

3,2

3,1

4,2

4,3

4, . . . ,

then one obtains an infinite continued fraction expansion that is CF-normal. (One can takeeither expansion of the rationals.)

Madritsch and Mance gave a way of constructing T -normal numbers for very generalfibred systems, including β-expansions. However, their process requires far more repetition.Whereas in Champerknowne’s constant, each string was only used once in the process, inMadritsch and Mance’s construction, a string of length k might be repeated around k2k

times.Vandehey developed a much simpler method for GLS systems with finitely many digits.

Arrange all finite length strings in order or descending length of their cylinder set. Thenconcatenate these strings in that order. The result is normal with respect to this GLS sys-tem. Although the description of the normal number is rather simple and similar in natureto Champernowne’s construction, the proof is considerably harder. It requires understand-ing the asymptotics of the sum of all terms of the kth-dimensional Pascal’s triangle that lieabove a given tilted hyperplane.

4.2 The analytic method: Davenport-Erdos

In this section, we will prove the following result, due to Davenport and Erdos.

Theorem 4.2.1. Let f(x) ∈ Z[x] be positive-valued for positive integers and have degreeat least 1. Then θf is normal to base-b.

This will take us to studying exponential sums more closely. From here on, we will usee(x) as a shorthand for e2πix.

Lemma 4.2.2. We have for α not an integer, that∣∣∣∣∣∣∑

1≤n≤Ne(αn)

∣∣∣∣∣∣ ≤ min

N,

1

2‖α‖

,

where ‖z‖ denotes the distance from z to the nearest integer.

4.2. THE ANALYTIC METHOD: DAVENPORT-ERDOS 57

Proof. It is elementary, and left as an exercise, that

∑1≤n≤N

e(αn) =sin(παN) · e

(α2 (N + 1)

)sin(πα)

.

Thus, we have ∣∣∣∣∣∣∑

1≤n≤Ne(αn)

∣∣∣∣∣∣ ≤ min

N,

1

| sin(πα)|

where the first part comes by bounding the sum term-by-term, as |e(x)| = 1.

Finally it suffices to note the elementary inequality that | sin(πα)| ≥ 2‖α‖.

Lemma 4.2.3. Let d(m) denote the number of divisors of m. Then for any ε > 0, we havethat d(m) mε. (The implicit constant will depend upon ε.)

Proof. Suppose that m has prime factorization given by pe11 pe22 . . . . Then we have that

d(m)

mε=∏i

ei + 1

pεeii≤

∏pi≤21/ε

ei + 1

2εei

and this is bounded by a constant since there are finitely many terms in the product and(x+ 1)/2εx is uniformly bounded as well.

Proposition 4.2.4 (Weyl’s inequality - simple version). Let f(x) be a polynomial of degreek and leading coefficient be the rational number a/q in lowest terms. Then for any fixedε > 0 we have ∣∣∣∣∣∣

∑1≤n≤N

e(f(n))

∣∣∣∣∣∣ N1+ε(N−21−k + q−21−k

).

Note: Weyl’s inequality is often done with a leading term α such that there is a lowest-terms rational a/q with |α− a/q| ≤ q2.

Proof. We will denote the sum∑

1≤n≤N e(f(n)) by S(f ;N). We will also define a differenceoperator given by

∆nf(x) = f(x+ n)− f(x) and ∆n,m1,m2,...,mkf(x) = ∆n∆m1,m2,...,mkf(x).

Now we note the following elementary string of equalities:

|S(f ;N)|2 = S(f ;N) · S(f ;N) =

∑1≤n≤N

e(f(n))

∑1≤m≤N

e(−f(m))


=∑

1≤n,m≤Ne(f(n)− f(m)) = N + 2<

∑1≤m<n≤N

e(f(n)− f(m))

= N + 2∑

1≤`<N<S(∆`f,N − `) ≤ N + 2

∑1≤`≤N

|S(∆`f,N − `)|

where ` here represents the difference n−m.

Let us reconsider ` as `1 and note that the same logic leads us to the inequality

|S(∆`1f,N − `1)|2 ≤ N + 2∑

1≤`2≤N|S(∆`1,`2f,N − `1 − `2)| ,

where we technically should have used N − `1 in place of N in several places here, butextending it only makes the right-hand side larger.

Now we make use of Cauchy’s inequality (or Holder’s inequality if you prefer) to insertthe second inequality into the first:

|S(f ;N)|4 ≤

N + 2∑

1≤`1≤N|S(∆`1f,N − `1)|

2

N2 +N∑

1≤`1≤N|S(∆`1f,N − `1)|2

N3 +N∑

1≤`1≤N|S(∆`1,`2f,N − `1 − `2)| .

Repeating this procedure k − 1 times we obtain

|S(f ;N)|2k−1 N2k−1−1

+N2k−1−k∑

1≤`1,`2,...`k−1≤N|S(∆`1,`2,...,`kf,N − `1 − `2 − · · · − `k)|

It is easy to see that the leading coefficient of ∆`1f(x) is equal to k`1a/q and thispolynomial has degree k − 1. Thus, repeating this process again, we obtain ∆`1,...,`kf isa linear polynomial with leading coefficient k!`1`2 . . . `ka/q. Thus, applying our earlierlemma, we see that

|S(f ;N)|2k−1 N2k−1−1 +N2k−1−k∑

1≤`1,`2,...,`k−1≤Nmin

N, ‖k!`1`2 . . . `ka/q‖−1

.

The values of k!`1`2 . . . `k range in value from k! up to k!Nk−1. No single value m canbe taken more than d(m)k−1 times, since each `i must be a divisor of m. By our earlier

4.2. THE ANALYTIC METHOD: DAVENPORT-ERDOS 59

lemma, we have that d(m)k−1 N ε for any m ∈ [1, k!Nk−1]. (The implicit constant mightdepend significantly on k.) Thus

|S(f ;N)|2k−1 N2k−1−1 +N2k−1−k+ε∑

1≤m≤k!Nk−1

minN, ‖ma/q‖−1

.

Note that ‖ma/q‖’s value is dependent only on the value of m modulo q. We canextend the sum to run up to q · dk!Nk−1/qe so that m runs through a full system of residuesdk!Nk−1/qe Nk−1/q times. Also as m runs through a full system of residues modulo q,so does ma, and we have that ‖r/q‖ = ‖(q − r)/q‖. And thus we have

|S(f ;N)|2k−1 N2k−1−1 +N2k−1−1+εq−1∑

0≤r≤q/2

minN, ‖r/q‖−1

N2k−1−1 +N2k−1−1+εq−1

∑0≤r≤q/2

minN,

q

r

N2k−1−1 +N2k−1−1+εq−1

N · ( qN

+ 1)

+ q ·∑

q/N≤r≤q/2

1

r

N2k−1−1 +N2k−1−1+εq−1 (q +N + q · logN)

N2k−1−1+ε logN +N2k−1+εq−1.

The proof is concluded by taking 2k−1 roots of both sides, noting that logN = N ε and thatε was arbitrary.

Proof of the Davenport-Erdos result. Here we will make use of Weyl’s criterion, so that θfis normal if and only if, for each non-zero integer `, we have∑

0≤n≤N−1

e(`Tnθf ) = o(N).

Let the left-hand sum be denoted by SN = SN (`).

Suppose that f has degree k. Note that ν(f(n)) = blogb f(n)c ∼ k logb n. LetNi =

∑1≤n≤i ν(f(n)) with N0 = 0. The above calculations can be extended to show

that Ni ∼ ki logb i. If we again choose m = m(N) to be the smallest integer such thatN ≤

∑1≤n≤m ν(f(n)), then as in the combinatorial method, we have that Nm−N = o(N)

and m = o(N).

Thus,

SN =∑

0≤n≤Nm−1

e(`Tnθf ) + o(N).


This latter sum we break up into pieces going from Ni + 1 to Ni+1:

SN =

m−1∑i=0

ν(f(i))−1∑j=0

e(`T j(TNiθf )) + o(N).

We now take a closer look at this inner sum. TNiθf = 0.f(i+ 1)f(i+ 2) . . . . Note

that 0.f(i+ 1) = f(i + 1)/bν(f(i+1). For 0 ≤ j ≤ ν(f(m)), we have that T j(TNiθf ) =T j(f(i+ 1)/bν(f(i+1))) +O(b−j). Also, it is a simple consequence of Taylor’s theorem thate(x+ y) = e(x)e(y) = e(x)(1 +O(y)) = e(x) +O(y) provided y is bounded. Thus,

SN =

m−1∑i=0

ν(f(i))−1∑j=0

(e

(`T j

f(i+ 1)

bν(f(i+1))

)+O(b−j)

)+ o(N)

=m−1∑i=0

ν(f(i))−1∑j=0

e

(`T j

f(i+ 1)

bν(f(i+1))

)+m−1∑i=0

ν(f(i))−1∑j=0

O(b−j) + o(N)

=m−1∑i=0

ν(f(i))−1∑j=0

e

(`T j

f(i+ 1)

bν(f(i+1))

)+m−1∑i=0

O(1) + o(N)

=

m−1∑i=0

ν(f(i))−1∑j=0

e

(`T j

f(i+ 1)

bν(f(i+1))

)+O(m) + o(N),

and since m = o(N), we may remove the O(m) term.The action of T and multiplication by b differ by addition by an integer at most, but

e(x) = e(x + n) for any integer n, so we can replace T by multiplication by b everywherewe like. If we do this and replace j with ν(f(i+ 1))− j and i with i− 1 everywhere, we get

SN =

m∑i=1

ν(f(i))∑j=1

e

(`

bjf(i)

)+ o(N),

There are Nm ∼ km logbm terms in the last pair of sums. If we extend the inner sumsto all go up to ν(f(m)), then there will now be mν(f(m)) ∼ km logbm. Thus, adding inall these terms adds only another term of size o(N), and then we may swap the order ofsummation as so:

SN =

ν(f(m))∑j=1

m∑i=1

e

(`

bjf(i)

)+ o(N).

Now at this point it would be tempting to apply Weyl’s inequality directly, but this couldgive very bad estimates when j is small. Let δ > 0 be fixed for now and let us bound the

4.3. THE RATIONAL METHOD: BAILEY-CRANDALL 61

part of the outer sum where 1 ≤ j ≤ δ · ν(f(m)) trivially (that is, each term is bounded by1). This gives a total contribution of O(δmν(f(m))) = O(δN). For the remaining terms,where δ · ν(f(m)) < j ≤ ν(f(m)), we note that for these sums, since ` and f are fixed, theleading coefficient of (`/bj)f(x) has lowest-terms denominator at least of size bδν(f(m))/2.So applying Weyl’s inequality to these terms with an ε to be chosen momentarily we get

SN ≤ν(f(m))∑j=1

m1+ε(m−21−k + b−2−kδν(f(m))

)+O(δN) + o(N)

= m1+εν(f(m))(m−21−k + b−2−kδν(f(m))

)+O(δN) + o(N).

Recall that ν(f(m)) ∼ k logbm. Thus bν(f(m)) ≈ mk. Thus by choosing ε sufficiently smallthe first term here can be made to be o(N).

So ultimately we have SN = O(δN). Since δ > 0 was arbitrary, we have that SN/N iseventually arbitrarily small and thus goes to 0. This completes the proof.

We have here only used the fairly weak form of the Weyl inequality to prove theDavenport-Erdos result. With other estimates on exponential sums one can prove thenormality of other numbers.

However, this technique has been largely restricted to proving facts about base-b ex-pansions, because of several reasons. We importantly made use of Weyl’s criterion, andwhile Weyl’s criterion works for things like continued fractions or β-expansions, the resultsare not particularly nice or easy to use. Also, we importantly made use of the fact thate(x+ n) = e(x) for any integer n.

4.3 The rational method: Bailey-Crandall

However, if there is a significant problem with the prior results it is that numbers like θfor τf don’t really look like numbers we are familiar with, numbers like π or e or

√2 even.

A slightly different construction comes much closer.Let b, c be coprime integers both at least 2. Let d ≥ 2 be an integer. Then the Korobov

number∞∑n=1

1

cdnbcdn

is normal to base b. Similarly the Stoneham number

∞∑n=1

1

cnbcn

is normal to base b.


This work was done over many many papers, probably close to 20. Bailey and Crandall,in 2002, generalized this to prove that numbers

∞∑i=1

1

cnibmi

are normal to base b where ni is increasing and mi is increasing exponentially faster thanni. In fact, it is relatively easy to prove that all of these numbers are not normal to basebc.

The method of proof is not terribly different than the Davenport-Erdos method. Theinfinite sum is approximated by the rational number

j∑i=1

1

cnibmi=

ajcnjbmj

.

We use these approximations with Weyl’s criterion, so that we want to understand expo-nential sums that look like

N∑n=1

e(`bn

ajcnj

).

It’s known due to Korobov that such sums should be bounded by something like (cnj/2 +Nc−nj/2) log cnj provided nj is sufficiently large. (In particular the key ingredient is knowingthat the multiplicative order of b modulo cn is really close to cn once n is sufficiently large.)So, these sums should be small provided N is much larger than cnj/2. This is why the mi

must grow exponentially fast.

Chapter 5

The set of normal numbers

There is much we do not know about individual normal numbers, but at this point, we knowa significant amount of information about the set of normal numbers, especially about theset of normal numbers to base b.

There are two main questions we are interested in in this chapter. When does a functionf preserve T -normality (so that f(x) is T -normal whenever x is T -normal)? And when arethe set of T -normal numbers equal (or at least contained within) the set of S-normalnumbers?

We start with a fairly simple result:

Lemma 5.0.1. Multiplication by a non-zero integer preserves base-b normality.

Proof. Recall, by Weyl’s criterion that a number x is base-b normal if and only if

N∑n=0

e(2πi`bnx) = o(N)

for every ` ∈ Z \ 0.Suppose x is base-b normal. And let k be a non-zero integer. Then kx is normal if and

only ifN∑n=0

e(2πi`kbnx) = o(N)

for every ` ∈ Z \ 0. But `k is a non-zero integer, so the above statement is true by thenormality of x. Thus multiplication by a non-zero integer preserves base-b normality.

5.1 The Pyatetskii-Shapiro normality criterion

If we want to prove more complicated relations between sets of normal numbers, we need amore powerful tool. For that we turn to the Pyatetskii-Shapiro normality criterion (which

63

64 CHAPTER 5. THE SET OF NORMAL NUMBERS

has been redubbed the Hot Spot Lemma in some recent papers).First, a lemma:

Lemma 5.1.1. Let (X,µ, T ) be an ergodic, measure-preserving dynamical system on aprobability space. Then, for any function f ∈ L1(X,µ) the set

A`(f, δ) =

x ∈ X :

∣∣∣∣∣1``−1∑i=0

f(T ix)−∫Xf dµ

∣∣∣∣∣ > δ

has measure converging to 0 as ` goes to infinity.

Proof. Let B`,δ denote the set of x such that∣∣∣∣∣ 1nn−1∑i=0

f(T ix)−∫Xf dµ

∣∣∣∣∣ ≤ δ (5.1)

for all n ≥ `. By the Birkhoff pointwise ergodic theorem, we know that almost all x willsatisfy (5.1) for sufficiently large `, since the difference on the left-hand side goes to 0.Therefore,

µ

( ∞⋃`=1

B`,δ

)= 1.

However, we have that B`,δ ⊂ B`+1,δ for all ` ≥ 1 by definition. Therefore, we have thatlim`→∞ µ(B`,δ) = 1. But by construction, we have that A`(f, δ) ⊂ Bc

`,δ. This completes theproof.

Now for the normality criterion:

Proposition 5.1.2 (Pyatetskii-Shapiro normality criterion for fibred systems). Let (X,µ, T,D,X )be a ergodic, measure-preserving fibred system on a probability space with lim|s|→∞ µ(Cs) =0. Let x ∈ X. Suppose there exists a fixed constant L > 0 such that for every string s, wehave

lim supn→∞

#0 ≤ i ≤ n− 1 : T ix ∈ Csn

≤ Lµ(Cs).

Then, in fact, we have

limn→∞

#0 ≤ i ≤ n− 1 : T ix ∈ Csn

= µ(Cs),

so that x is T -normal.

This proof is a simplified version of the proof of Moshchevitin and Shkredov.

5.1. THE PYATETSKII-SHAPIRO NORMALITY CRITERION 65

Proof. Consider δ > 0 which will be allowed to go to 0 at the end of the proof.For any ` > 0 we have

Sn(x,A) =1

`

n−1∑j=0

S`(Tjx,A) +O(`).

To see this, if we write the right-hand sum out explicitly we get

1

`

n−1∑j=0

S`(Tjx,A) =

1

`

n−1∑j=0

`−1∑i=0

1A(T i+jx)

=1

`

n+`−1∑i=0

ci,` · 1A(T ix)

where

ci,` =

i+ 1, i < `

`, ` ≤ i ≤ n− 1

n+ `− i, i > n− 1.

Comparing this with Sn(x,A) gives the desired result.Let A be a cylinder set Cs for some string s with |s| = m. Thus, for any ` > 0 we have

that∣∣∣∣Sn(x,A)

n− µ(A)

∣∣∣∣ ≤ 1

n

∣∣∣∣∣∣n−1∑j=0

(S`(T

jx,A)

`− µ(A)

)∣∣∣∣∣∣+O

(`

n

)

≤ 1

n

∑j∈N1

∣∣∣∣S`(T jx,A)

`− µ(A)

∣∣∣∣+1

n

∑j∈N2

∣∣∣∣S`(T jx,A)

`− µ(A)

∣∣∣∣+O

(`

n

)

=1

nΣ1 +

1

nΣ2 +O

(`

n

),

where j ∈ N1 if 0 ≤ j ≤ n − 1 and T jx 6∈ A`(1A, δ), and j ∈ N2 if 0 ≤ j ≤ n − 1 andT jx ∈ A`(1A, δ).

By definition of A`(1A, δ), we have that the summands of Σ1 must be bounded by δ,and since there are at most n possible values for j ∈ N1, we must have Σ1/n ≤ δ as well.

For Σ2, note that S`(Tjx,A) is a sum of ` indicator functions and thus has size at most

`, so the summands of Σ2 are bounded. Thus we have that Σ2 #N2/n.Therefore we have∣∣∣∣Sn(x,A)

n− µ(A)

∣∣∣∣ ≤ δ +O

(#N2

n

)+O

(`

n

).


Now, note that N2 = 0 ≤ j ≤ n− 1 : T jx ∈ A`(1A, δ), and that S`(z,A) is dependentonly on the first m+ `− 1 digits of z and thus is fixed on rank-m+ `− 1 cylinders. Recallthat m is the rank of the original cylinder C. So A`(1A, δ) can be written as a disjointunion

⋃Ct of rank-m+ `− 1 cylinder sets. Thus, by the sub-additivity of the limsup and

the assumption of the proposition, we have

lim supn→∞

#N2

n= lim sup

n→∞

∑Ct

#0 ≤ j ≤ n− 1 : T jx ∈ Ctn

≤∑Ct

lim supn→∞

#0 ≤ j ≤ n− 1 : T jx ∈ Ctn

≤∑Ct

Lµ(Ct)

= Lµ(A`(1A, δ)).

Thus, if we choose ` to be fixed so that µ(A`(1A, δ)) ≤ δ, which also implies that O(`/n)is negligible, we have that

lim supn→∞

∣∣∣∣Sn(x,A)

n− µ(A)

∣∣∣∣ δ.

But δ > 0 was arbitrary, so

lim supn→∞

∣∣∣∣Sn(x,A)

n− µ(A)

∣∣∣∣ ≤ 0.

But the left-hand side is always positive so

limn→∞

∣∣∣∣Sn(x,A)

n− µ(A)

∣∣∣∣ = 0,

proving the proposition.

5.1.1 Applications of the normality criterion

Proposition 5.1.3. Let (X,µ, T,D,X ) be a measure-preserving fibred system and supposeboth T and T r are ergodic with respect to µ for some r ≥ 2. Then a point x is T -normal ifand only if it is T r-normal.

Note: it is not true in general that the ergodicity of T implies the ergodicity of T r,although there are ways proving this. Usually if the ergodicity of T can be proven usingRenyi’s theorem, then the ergodicity of T r can be proven the same way.


Corollary 5.1.4. A number x ∈ [0, 1) is base-b normal if and only if it is base-br normal.

In fact, this corollary is best possible. Suppose there are two integers b, c ≥ 2 such thatbr 6= cs for any positive integers, r, s, then there are, in fact, uncountably many numbersthat are base-b normal but not base-c normal. The proof of this would take us too farafield.

Proof of the proposition. To begin with we note that expansions for T and T r differ by thegrouping of digits. The digits for T r are r-tuples of T -digits. Moreover, if s is a string ofT r digits and s′ is the corresponding string of T -digits (now ungrouped), then Cs = Cs′ .

Suppose x is T -normal and let Cs be any cylinder set for T r, which will also be a cylinderset for T . Then

lim supn→∞

#0 ≤ i ≤ n− 1 : (T r)ix ∈ Csn

≤ lim supn→∞

#0 ≤ i ≤ r(n− 1) : T ix ∈ Csn

= r · lim supn→∞

#0 ≤ i ≤ rn− 1 : T ix ∈ Cs+O(r)

rn

= r · lim supn→∞

#0 ≤ i ≤ rn− 1 : T ix ∈ Csrn

≤ r · lim supn→∞

#0 ≤ i ≤ n− 1 : T ix ∈ Csn

= r · µ(Cs).

But by the Pyatetskii-Shapiro normality criterion, we must have that x is T r-normal.

Now suppose x is T r-normal. Note that in the limit definition of normality for x, wemay replace n by rn, since the difference between n and rbn/rc is negligible in the limit.Now, let s be a T -string of length k. We have

limn→∞

#0 ≤ i ≤ rn− 1 : T ix ∈ Csrn

= limn→∞

r−1∑j=0

#0 ≤ i ≤ n− 1 : T j(T r)ix ∈ Csrn

=1

r

r−1∑j=0

limn→∞

#0 ≤ i ≤ n− 1 : (T r)ix ∈ T−jCsn

.

The sets T−jCs are almost certainly not cylinder sets for T r, so we cannot simply applythe T r-normality of x to complete the proof.

Suppose, by way of an example, that s = [1, 0, 1], r = 2, and j = 1. The set T−1Cs canbe expressed as a disjoint union of all cylinder sets corresponding to T -strings of the form[∗, 1, 0, 1], and each of these can be expressed as a T r-string [(∗, 1), (0, 1)].


More generally, we see that each set T−jCs, 0 ≤ j ≤ r − 1 can be expressed as acountable, disjoint union of T r-cylinder sets Csj where sj consists of bk/rc + 1 T r-digits.Using the bounded convergence theorem to swap the order of limit and sum, we have

limn→∞

#0 ≤ i ≤ n− 1 : (T r)ix ∈ T−jCsn

= limn→∞

∑Csj

#0 ≤ i ≤ n− 1 : (T r)ix ∈ Csjn

=∑Csj

limn→∞

#0 ≤ i ≤ n− 1 : (T r)ix ∈ Csjn

=∑Csj

µ(Csj ) = µ(T−jCs) = µ(Cs).

Plugging this into our earlier calculation, we see that

limn→∞

#0 ≤ i ≤ nr − 1 : T ix ∈ Csrn

=1

r

r−1∑j=0

µ(Cs) = µ(Cs)

proving that x is T -normal.

The following result is similar but not quite the same. We also only prove it for base-bexpansions. We’ll see later it does not hold in general.

Proposition 5.1.5. Let x = 0.a1a2a3 . . . be normal to base b. Then for any `,m ≥ 1, thenumber y`,m = 0.aàm+à2m+à3m+` . . . is normal to base b.

In fact this result is truly can be made into an if and only if type of statement. If x isnot normal, then there exist a ` and m for which the other constant is also not normal.

Proof. We may assume without loss of generality that m > 1 (as otherwise the result istrivial) and that ` ∈ 1, 2, . . . ,m.

Consider a string s = [d1, d2, . . . , dk]. We see that T iy`,m ∈ Cs if and only if T im+`−1x ∈Ct for some string t of the form [d1, ∗m−1, d2, ∗m−1, . . . , ∗m−1, dk], where ∗m−1 denotes m−1unknown digits. All such cylinders Ct are disjoint, there are b(m−1)(k−1) of them, and eachhas measure b−k−(m−1)(k−1). Thus, making use of the sub-additivity of the limsup again,we have

lim supn→∞

#0 ≤ i ≤ n− 1 : T iy`,m ∈ Csn

= lim supn→∞

#0 ≤ i ≤ n− 1 : T im+`−1x ∈⋃Ct

n

≤ lim supn→∞

m · #0 ≤ i ≤ nm+ `− 1 : T ix ∈⋃Ct

mn+ `


≤ m · lim supn→∞

#0 ≤ i ≤ n : T ix ∈⋃Ct

n

≤ m∑t

lim supn→∞

#0 ≤ i ≤ n : T ix ∈ Ctn

= m∑t

λ(Ct) = mλ(⋃

Ct

)= mb−k

Applying Pyatetskii-Shapiro proves that y`,m is normal to base b.

Proposition 5.1.6. Let x be base-b normal and let r be a rational number. Then x+ r isbase-b normal.

Proof. The key idea here is that T distributes over addition. So T k(x + r) = T kx + T kr(mod 1). Since r is rational, it has an eventually periodic base-b expansion and hence T krtakes only finitely many values, let us call them r1, r2, . . . , rm.

Then we have, for any string s,

lim supn→∞

#0 ≤ i ≤ n− 1 : T i(x+ r) ∈ Csn

= lim supn→∞

#0 ≤ i ≤ n− 1 : T ix+ T ir (mod 1) ∈ Csn

≤m∑j=1

lim supn→∞

#0 ≤ i ≤ n− 1 : T ix+ rj (mod 1) ∈ Csn

=

m∑j=1

lim supn→∞

#0 ≤ i ≤ n− 1 : T ix ∈ Cs − rj (mod 1)n

.

Now the set Cs − rj (mod 1) can be written as one interval if Cs − rj ⊂ [0, 1) and canbe written as two intervals if Cs − rj ⊂ (−1, 1). Thus, since base-b normality impliesequidistribution, we have that each of the limsups in the last line should be equal toλ(Cs − rj) = λ(Cs). Thus,

lim supn→∞

#0 ≤ i ≤ n− 1 : T i(x+ r) ∈ Csn

≤ m · λ(Cs).

By the Pyatetskii-Shapiro normality criterion, x+ r is normal to base b.

Exercise 5.1.7. Prove Proposition 5.1.6 by making use of Weyl’s criterion instead ofPyatetskii-Shapiro.

Proposition 5.1.8. Let x be base-b normal and let q be a positive integer. Then x/q isbase-b normal.


Proof. It is not true in general that T i(x/q) = (T ix)/q, clearly since this latter term must bein [0, 1/q). Instead, it is true that T i(x/q) = (T ix)/q+ j/q for some j ∈ 0, 1, 2, . . . , q− 1.This follows by noting that

T i(x

q

)=

10ix

q(mod 1) =

b10ixc+ T ix

q(mod 1),

and that the only thing that matters when taking b10ixc/q modulo 1 is the value of b10ixcmodulo q.

By an argument similar to the previous proposition, we can show that

lim supn→∞

#0 ≤ i ≤ n− 1 : T i(x/q) ∈ Csn

≤q−1∑j=0

lim supn→∞

#0 ≤ i ≤ n− 1 : T ix ∈ qCs − jn

=

q−1∑j=0

λ ((qCs − j) ∩ [0, 1)) =

q−1∑j=0

λ ((qCs) ∩ [j, j + 1))

= λ ((qCs) ∩ [0, q)) = λ(q · (Cs ∩ [0, 1))) = qλ(Cs).

Applying the Pyatetskii-Shapiro normality criterion shows that x/q is base-b normal.

Combining several results thus far, we obtain:

Corollary 5.1.9. Let x be base-b normal and let q, r ∈ Q with q 6= 0. Then qx+r is base-bnormal.

5.2 Augmented systems

As we’ve seen, the Pyatetskii-Shapiro normality criterion gives us a great way of provingrelationships between normal numbers. However, to tackle some bigger problems, we needa more powerful result. The following work is primarily due to myself, although it could beseen as a generalization of work by Jager and Liardet, which we will be partially referencinglater.

To start with, we need a fibred system (X,µ, T,D,X ) that is measure-preserving, er-godic, a probability space and has all cylinders full. LetM be some finite set and considera function f : D ×M→M. We will say that f is bijective in the second coordinate if forany d ∈ D, we have f(d, ·) is bijective.

Consider a new fibred system given by the following

• X = X ×M.

5.2. AUGMENTED SYSTEMS 71

• µ = µ× c, where c is the normalized counting measure on M.

• T (x,M) = (Tx, f(a1(x),M)).

• D = D ×M.

• X = (Xi,M) : Xi ∈ X ,M ∈M.

This new fibred system we refer to as an augmented system. The idea of this augmentedsystem is that the new coordinate acts like a finite state automaton, moving from state tostate determined by the digits of x. This builds a small amount of memory into the system:the orbit of a point can now remember small bits of data about where it has been.

Here’s an example of this, letM = 0, 1, 2, . . . ,m− 1 and let f(a, i) = i+ 1 (mod m).It’s easy to see that for this system we have T i(x, 0) = (T ix, i (mod m)). Now this can tellus something interesting. With ergodicity we can, for most points x, describe how often theorbit of x visits a given cylinder set Cs, but we cannot a priori describe when it does. Buthere, we know that the T orbit of (x, 0) visits (Cs, j) only when the iterate i is congruentto j modulo m.

Now let us consider how strings and cylinders act on this system. From our definitionswe see that digits are of the form (d,M) for some d ∈ D and M ∈ M. The branch T(d,M)

maps (C[d],M) to X. However, clearly all elements of C[d] share the same first digit, so thismap just acts by

T(d,M)(C[d],M) = (TdC[d], f(d,M)).

In particular, unless M consists of a single element, none of the rank-1 cylinders (andindeed none of the cylinders at all) are full. From this as well we can see that

T−1(d,M)(x,M

′) =

∅, if M ′ 6= f(d,M)

(T−1d x,M), if M ′ = f(d,M).

From this we can see that given a T -string s = [(d1,M1), (d2,M2), . . . , (dk,Mk)], this isadmissible if and only if

• Mi = f(di−1,Mi−1) for i = 1, 2, . . . , k − 1;

• the T -string s = [d1, d2, . . . , dk] is admissible;

• and the T -string [(d1,M1)] is admissible.

In fact, we have Cs = (Cs,M1). Moreover, any set (Cs,M) with s = [d1, d2, . . . , dk]is a cylinder set Cs by taking M1 = M , Mi = f(di−1,Mi−1) for i = 2, 3, . . . , k, ands = [(d1,M1), (d2,M2), . . . , (dk,Mk)]. Thus we will think of cylinder sets for the augmentedsystem as just being (Cs,M), and may think of strings s as (s,M1).


Exercise 5.2.1. If f is bijective in the second coordinate, then the augmented system ismeasure-preserving.

We say that an augmented system is transitive if for any M,M ′ ∈ M there exists anadmissible T -string s of length k, called the traversing string from M to M ′, such thatM1 = M and f(dk,Mk) = M ′. Thus, we have that

T kCs = T k(Cs,M1) ⊂ (X,M2).

Recall that a transformation T satisfies Renyi’s condition if there exists a uniformconstant L > 0 such that for every string s with |s| = k we have

supy∈TkCs ωs(y)

infy∈TkCs ωs(y)≤ L,

where ωs(y) is the Jacobian of T−k. As a consequence of this we showed that if Cs is a full,rank-k cylinder, then for any measurable set E we have µ(T−kE ∩ Cs) ≥ L−1µ(E)µ(Cs).(In fact, we also have that µ(T−kE ∩Cs) ≤ Lµ(E)µ(Cs), although we will not make use ofthis fact.)

The main result on augmented systems is the following.

Theorem 5.2.2. Suppose that (X,µ, T,D,X ) is a fibred system such that all cylinder setsfor T are full and that T satisfies Renyi’s condition. Suppose this can be extended to anaugmented system with transformation T that is transitive and measure preserving. ThenT is ergodic.

Moreover if x is T -normal then for any M ∈ M, (x,M) is T -normal; and if (x,M) isT -normal then x is T -normal.

T is not measure-preserving if f is not bijective in the second coordinate. If we lose themeasure-preserving assumption but instead add in assumptions that T is conservative (ifµ(E) = 0 then µ(T−1E) = 0) and that T satisfies Renyi’s condition, then we can prove theexistence of a probability measure ρ absolutely continuous with respect to µ and such thatT is measure-preserving and ergodic with respect to ρ. This more general proof relies onsuch techniques as time-inhomogeneous Markov chains, and thus, we will not give it here.

Proof. Suppose E is an invariant set for T with non-zero µ-measure. We want to show thatE has full µ-measure.

To begin with, we can write E as⋃M∈M(EM ,M). Since E has positive µ-measure,

there must be a set (EM ′ ,M′) that has positive µ-measure as well. (EM ′ will then have

positive µ-measure.)By the definition of transitivity and the assumption of full cylinders, for any M ∈ M,

there exists a string s = [d1, d2, . . . , dk] such that T k(Cs,M) = (X,M ′). More importantly,


T k acts as a bijection here. On the second coordinate, this just maps M to M ′, on the firstcoordinate it maps Cs bijectively to X. Thus

µ(T−k(s,M)(EM ′ ,M′)) = µ((Cs,M) ∩ T−k(EM ′ ,M ′))

= µ((Cs ∩ T−kEM ′ ,M)) =1

|M|µ(Cs ∩ T−kEM ′).

But by Renyi’s condition, this last set is within a positive constant multiple of µ(Cs)µ(EM ′),which is positive. Since E is an invariant set, this implies that the intersection of (Cs,M)with E has non-zero measure, and thus that each (EM ,M) has non-zero measure.

Now we want to show that E has a substantial intersection with each cylinder set. Notethat because each set EM has non-zero µ measure and there are only finitely many of them,there must exist some ε > 0 such that µ(EM ) > ε for all M ∈M.

Now consider an arbitrary cylinder set (Cs,M). Let k = |s| and T k(Cs,M) = (X,M ′).Then, using the invariance of E and this knowledge of the forward shift of (Cs,M), we have

µ(E∩(Cs,M)) = µ(T−kE∩(Cs,M)) = µ(T−k(EM ′ ,M′)∩(Cs,M)) =

1

|M|µ(T−kEM ′∩Cs).

Now applying Renyi’s condition again, we can see that there exists a constant L > 0 (notdependent on (Cs,M) such that

µ(E ∩ (Cs,M)) ≥ 1

L|M|µ(EM ′)µ(Cs) ≥

ε

L|M|µ(Cs) =

ε

L· µ(Cs,M).

Since ε/L > 0, we apply Knopp’s Lemma to see that E must have full measure, andthus T is ergodic. (Note: as usual, if there are infinitely many digits, we cannot just use thecylinder sets, since they do nor form a semi-algebra, but we can use the usual semi-algebraassociated to infinitely many digits.)

Now suppose (x,M) is T -normal. Then for any cylinder set Cs of T , we have that

limn→∞

#0 ≤ i ≤ n− 1 : T ix ∈ Csn

= limn→∞

#0 ≤ i ≤ n− 1 : T i(x,M) ∈ Cs ×Mn

= µ(Cs ×M) = µ(Cs),

so that x is T -normal.Now assume x is T -normal and let M ∈ M. Let (Cs,M

′) be some cylinder set for T .Then we have that

lim supn→∞

#0 ≤ i ≤ n− 1 : T i(x,M) ∈ (Cs,M′)

n

≤ lim supn→∞

#0 ≤ i ≤ n− 1 : T i(x,M) ∈ Cs ×Mn


= lim supn→∞

#0 ≤ i ≤ n− 1 : T ix ∈ Csn

= µ(Cs) =1

|M|µ(Cs,M

′).

By the Pyatetskii-Shapiro normality criterion, we see that (x,M) must be T -normal.

5.2.1 Applications of augmented systems

Theorem 5.2.3. If you delete all the digits of b− 1 from a base-b normal number, you areleft with a base b− 1 normal number.

Proof. Let x = 0.a1a2a3 . . . be base b normal. And let y = 0.a′1a′2a′3 . . . be the base-b − 1

number formed by deleting all copies of b− 1 from x.Fix a k ≥ 1. We will show that all strings of length k show up in y to the desired

frequency.To do this, consider the augmented system of the base-b expansion given by M being

the collection of all base-b− 1 strings of length k and f(d,M) given by

f(d, [c1, c2, . . . , ck]) =

[c1, c2, . . . , ck] if d = b− 1

[c2, c3, . . . , ck, d] otherwise.

This augmented system keeps track of the most recent k digits that were not equal to b−1.We want to be able to apply the theorem on augmented systems to this. It is quite

immediate to show that this system is transitive. Proving that it is measure-preserving isjust a touch difficult because f is not bijective in the second coordinate; however it canbe shown by examining how each of the inverse branches T−1

(d,M) acts. I leave this as anexercise to the reader. So we may take that the augmented system is ergodic and the point(x, [0, 0, . . . , 0]) is T -normal.

Given an integer m ≥ 1, let nm denote the smallest integer such that the string[a1, a2, . . . , anm ] contains exactly m digits in the set 0, 1, 2, . . . , b − 2. In other words,the mth digit of y should come from the nmth digit of x. We also have that m = #0 ≤i ≤ nm − 1 : T i(x, [0, 0, . . . , 0]) ∈ (A, ∗) where A = [0, (b− 1)/b) which corresponds to theset of points which have first digit in the set 0, 1, 2, . . . , b− 2.

From the way we have constructed this augmented system, we see that Tnm(x, [0, 0, . . . , 0]) =(Tnmx, [a′m−k+1, a

′m−k+2, . . . , a

′m]). Thus, for any string s of length k, we have

#0 ≤ i ≤ m− 1 : T iy ∈ Cs = #0 ≤ i ≤ nm − 1 : T i(x, [0, 0, . . . , 0]) ∈ (A, s)+O(1),

where the big-Oh term accounts for not quite accurately counting the first k or last koccurences properly. The use of A here is to make sure that only those i’s where i = nm


are being counted, preventing us from “reading” the same string in y multiple times at thesame location.

Thus, by the T -normality of (x, [0, 0, . . . , 0]), we have

limm→∞

#0 ≤ i ≤ m− 1 : T iy ∈ Csm

= limm→∞

#0 ≤ i ≤ nm − 1 : T i(x, [0, 0, . . . , 0]) ∈ (A, s)+O(1)

#0 ≤ i ≤ nm − 1 : T i(x, [0, 0, . . . , 0]) ∈ (A, ∗)

= limm→∞

#0 ≤ i ≤ nm − 1 : T i(x, [0, 0, . . . , 0]) ∈ (A, s)/nm#0 ≤ i ≤ nm − 1 : T i(x, [0, 0, . . . , 0]) ∈ (A, ∗)/nm

=µ(A, s)

µ(A, ∗)=µ(A) · (b− 1)−k

µ(A)= (b− 1)−k

as desired.

Theorem 5.2.4. Let x = 〈a1, a2, a3, . . . 〉 be a CF-normal number. Let ` ≥ 1 and m ≥ 2.Then 〈a`, a`+m, a`+2m, a`+3m, . . . 〉 is never CF-normal.

Although the initial idea of this proof is due to Heersink and Vandehey, the proof hereis due to Jakub Konieczy.

Proof. Without loss of generality, we may assume that ` ∈ 1, 2, 3, . . . ,m, since this onlyaffects finitely many digits.

Consider the augmented system formed by M = 1, 2, 3, . . . ,m and f(d, i) = i + 1(mod m). This is bijective in the second coordinate and so the augmented system is measurepreserving. It is clearly also transitive, and the CF expansion satisfies Renyi’s conditionand has all full cylinders. Therefore T is ergodic here.

Let x = 〈a1, a2, a3, . . . 〉 be CF-normal, and let x`,m = 〈a`, a`+m, a`+2m, a`+3m, . . . 〉.Let s = [d1, d2, . . . , dk] be some string. Since x is CF-normal, we must have that

limn→∞

#0 ≤ i ≤ n− 1 : T ix ∈ Csn

= µ(Cs).

Let⋃Ct be the union of all strings t of the form t = [d1, ∗m−1, d2, ∗m−1, d3, . . . , ∗m−1, dk].

We have that T ix`,m ∈ Cs if and only if T im+`−1x ∈⋃Ct.

By the theorem on augmented systems, we know that (x, 0) is T -normal. Therefore,

limn→∞

#0 ≤ i ≤ n− 1 : T ix`,m ∈ Csn

= limn→∞

#0 ≤ i ≤ mn− 1 : T i(x, 0) ∈ (⋃Ct, `− 1)

n

= m · µ(⋃Ct, `− 1) = µ(

⋃Ct).


(Here, as⋃Ct is a union of infinitely many cylinder sets, we would technically need to use

the bounded convergnece theorem or something similar to get the value of the limit.)Now the set

⋃Ct is not that mysterious. It’s the set of numbers whose first CF digit is

d1, whose m+ 1th is d2, whose 2m+ 1st is d3 and so on. Thus⋃Ct = C[d1] ∩ T−mC[d2] ∩ T−2mC[d3] ∩ · · · ∩ T−(k−1)mC[dk].

Let us call this set Cms .Now, if for any string s we have that µ(Cs) 6= µ(Cms ), then the theorem is proved.Suppose, by way of contradiction, that for all strings s, we have that µ(Cs) = µ(Cms ).

However, Cms is a union⋃Ct and for each Ct we have µ(Ct) = µ(Cmt ). But then, µ(Cs) =

µ(Cms ) = µ(⋃Cmt ). However, if we examine

⋃Cmt closely, we see that it is just Cm

2

s .

This process iterates, and so we see that µ(Cs) = µ(Cmk

s ) for any integer k ≥ 1.Now consider the string [1, 1]. We can prove directly that

µ(Cs) =log(10/9)

log 2= 0.152000 . . . .

On the other hand, since T is strong mixing, we have

µ(Cmk

s ) = µ(C[1] ∩ T−mkC[1])→ µ(C[1])

2 =

(log(4/3)

log 2

)2

= 0.172256 . . . .

This is a contradiction.

Theorem 5.2.5 (Moeckel’s Theorem). Let m be an integer at least 2. For any a, b ∈1, 2, 3, . . . ,m that are coprime, we have that for almost all x ∈ [0, 1) that

limn→∞

#0 ≤ i ≤ n− 1 : (pn, qn) ≡ (a, b) (mod m)

n= ca,b,

where ca,b is a non-zero constant not dependent upon x and pn/qn is the nth continuedfraction convergent.

In fact, we will show a stronger result and show how to calculate ca,b explicitly. In fact,if we let m = 2, then it is equally likely that pn/qn is 0/1, 1/1, or 1/0 modulo 2.

In order to prove this result, we will need a new fact about continued fractions first.

Lemma 5.2.6. Consider a matrix

M =

(P P ′

Q Q′

),

with 0 ≤ P < Q, 1 ≤ P ′ ≤ Q′, Q ≤ Q′, and PQ′ − P ′Q = ±1. Then P/Q and P ′/Q′ aresuccessive convergents (i.e., equal pk−1/qk−1 and pk/qk for some positive numbers x).


We shall, over the course of the proofs, refer to such matrices as “convergent matrices.”

Proof of lemma. We will note that if Q = Q′ = 1 then this can only correspond to thematrix (

0 11 1

),

which corresponds to p0/q0 = 0/1 and p1/q1. If Q′ > 1 then it must be relatively prime toQ by the determinant condition, and so we may assume for the rest of the proof, withoutloss of generality that Q < Q′.

The fraction P/Q is in lowest terms so its numerator and denominator equal the numer-ator and denominator of some convergent pk−1/qk−1. In fact, since P/Q is finite there aretwo different choices for its continued fraction expansion (depending on whether we wantit to end on 1 or not), which give different parities for k. We choose the expansion so thatthe matrix

M ′ =

(pk−2 Pqk−2 Q

)has determinant which is the negative of the determinant of M . (If P/Q = 0/1 thenpk−2/qk−2 = 1/0.)

Then, we can see by simple calculations that

M ′−1M =

(0 11 det(M ′) · (pk−2Q

′ − P ′qk−2)

)=

(0 11 a

).

Rearranging this, we see that

M = M ′(

0 11 a

),

and so Q′ = qk−2 + aQ. Since qk−2 ≤ Q and Q < Q′, we must have that a > 0. Butfollowing our work earlier in class, this immediately implies that P ′/Q′ is precisely pk/qk,where the continued fraction expansion here is the continued fraction of pk−1/qk−1 with thedigit a appended.

Proof. We will consider an augmented system corresponding to the Gauss map.Let M = SL±2 (Z/mZ), where here we will use the ± to denote that we allow determi-

nants of ±1. Since we are dealing with modular arithmetic, in fact this set is the set ofmatrices whose determinant, modulo m is ±1. (We leave it as an exercise to the reader thatthis set is equivalent to the set of matrices in SL±2 (Z) taken modulo m in each coordinate.)This set is finite because there are only up to k possible reduced residue classes each of thefour coordinates can fit in. We will let f be given by

f(d,M) = M

(0 11 d

)(mod m).


This is bijective in the second-coordinate because the matrix

(0 11 d

)is invertible so

multiplication on the right by such a matrix is a bijection. Therefore the system is measurepreserving.

By our work on continued fractions way back when we have that

T k(x, I) =

(T kx,

(0 11 a1

)(0 11 a2

). . .

(0 11 ak

)(mod m)

)=

(T kx,

(pk−1 pkqk−1 qk

)(mod m)

).

Therefore we have that the number of times that (pk, qk) = (a, b) (mod m), 1 ≤ k ≤ N isequal to the number of times

T k(x, I) ∈(

[0, 1),

(∗ a∗ b

)).

Thus the desired result would follow immediately from the main theorem on augmentedsystems, since the CF-normality of x implies the T -normality of (x, I). In fact, the expectedfrequency with which (pk, qk) ≡ (a, b) (mod m), is the frequency of matrices in M of theform (

∗ a∗ b

)(mod m).

To show that this system is transitive takes a touch more work. It suffices to show thatwe can get from the identity matrix I ∈M to any other M ∈M and vice-versa.

By our earlier remark on T k(x, I), we see that we can get from I to M if and only if thereexists some convergent matrix which is congruent to M modulo m. Let us abuse notationslightly and assume that M is actually a matrix in SL±2 (Z) and label its coordinates as

M =

(a bc d

).

Note that adding an integer multiple of one column to another (or one row to another) doesnot change the determinant of a matrix, namely keeps it in SL±2 (Z). If we add a multipleof m times a column or row to the other, then we not only don’t change the determinant,we also don’t change the matrix when it is taken modulo m. We leave it as an exercise tothe reader to show that one can use these operations together with the lemma above tofind a convergent matrix that is congruent to M .

This shows we can get from I to M . Note that if it is possible to get from I to M , thenthe same string will take M to M2, and from M2 to M3 and so on allowing us to get fromM to any power of M . However, SL±2 (Z/mZ) is a finite group and thus all elements havefinite order. Thus, there is some power of M that is the identity.


We will remark on one last theorem that can be proved using augmented systems,although the proof is far too complicated to be given in a short time.

Theorem 5.2.7. Let x be CF-normal, and let M be a matrix in GL2(Z) (so it has non-zerodeterminant). Then Mx is CF-normal as well. In particular CF-normality is preserved byrational addition and multiplication.

Chapter 6

Hyperbolic geometry, flows, andcontinued fractions

The goal of this chapter will be to draw a connection between continued fractions andhyperbolic geometry. The connections between these go very deep, and we will only remarkon them enough to establish these connections. A more thorough study of hyperbolicgeometry can be found in Einsiedler and Ward.

We will need to begin our study by formalizing several new concepts.

6.1 Flows

All of the dynamical systems we have studied thus far may be considered “discrete time”dynamical systems. Namely, we considered the orbit of a point x to be T kx as k rangedover the discrete set of the non-negative integers. We can extend a number of our definitionsto non-discrete time dynamical systems, which are known as flows.

More precisely, given a measure space (X,µ), a flow is a family T t : t ∈ R of trans-formations of X that satisfy T tT s = T t+s and T 0 = Id. We say that the flow T t ismeasure-preserving if T t preserves µ for each t, and we say that the flow is measurable if(x, t) 7→ T tx is measurable in the usual sense. A semi-flow is a flow that acts for t ≥ 0instead of t ∈ R.

From the definition, we see that any flow is a bijection, although semi-flows need notbe.

Many of our definitions and results of ergodicity apply equally well to flows as they didto discrete time systems.

We say a flow T t on a measure space (X,µ) is ergodic if for every A ⊂ X such thatµ(T tA4A) = 0 for all t ∈ R, we have that µ(A) = 0 or µ(Ac) = 0. Equivalently, the flow isergodic if f T t equals f almost everywhere for all t ∈ R implies that f is constant almost

81

82 CHAPTER 6. HYPERBOLIC GEOMETRY

everywhere. (In some places the definition is given not for all t ∈ R but for almost allt ∈ R.)

In fact, the Birkhoff pointwise ergodic theorem manages to carry along quite well. Wehave the following, which is a simplified version.

Theorem 6.1.1. Let T be a measurable, measure-preserving, and ergodic (semi-)flow ona probability space (X,µ). Then for any f ∈ L1(X,µ), we have that for almost all x ∈ X,

lims→∞

1

s

∫ s

0f(T tx) dt =

∫Xf dµ.

In fact, the proof of this fact follows fairly directly from the Birkhoff pointwise ergodictheorem by treating

∫ n+1n f(T tx) dt = F (Tnx) where F (x) =

∫ 10 f(T tx) dt.

There are two kinds of flow we will discuss briefly.

6.1.1 Billiard flow

Billiard flow is closely related to the geodesic flow we will study later on.In this flow we start with a space that is compact, typically a nice polygon, together

with a set of directions. The billiard flow at time t simply puts the point and its directionat where it would be if it were travelling along its initial direction at unit speed for t time.

The simplest billiard takes the unit square with corners at (0, 0), (0, 1), (1, 0), and(1, 1), and the flow bounces off the walls the way you might expect in an ideal physicssituation. A related billiard doesn’t have any collision with the walls, instead entering onewall transports you to the corresponding point on the opposite side of the square.

There is a surprising connection between such square billiards and number theory. Onebig question here is the study of saddle connections, essentially trajectories that take youfrom one corner of the billiard to another without passing through any other corners. If weunfold the billiard to tile all of R2, then this is equivalent to starting at the origin, travellingalong a straight line, and ending up at an integer lattice point (n,m) ∈ Z2 without passingthrough any other lattice points along the way. Such lattice points are known as visiblelattice points, and satisfy gcd(n,m) = 1.

6.1.2 Special flow under a ceiling function

Let us, in this case, start with a nice dynamical system (X,µ, T ). We define a functionr : X → [0,∞), and then construct a semi-flow T t on the space

X = (x, s) : x ∈ X, 0 ≤ s < r(x)

given by

T t(x, s) =

(x, s+ t), s+ t < r(x)

(T x, 0), s+ t = r(x).

6.1. FLOWS 83

This only defines the flow for certain values of t (dependent on x and s), but we can thenextend this to all values of t by using the fact that T tT t′ = T t+t′ . We naturally endowthe space X with the measure µ × λ, where λ is the Lebesgue measure. Normally we liker to be integrable, so that X has finite measure and thus µ × λ can be normalized to aprobability measure.

(Note: this special flow can be extended to a true flow, not merely a semi-flow, if andonly if T is bijective.)

Lemma 6.1.2. If T is ergodic, then T t is ergodic as well.

We leave the proof of this as an exercise, as it is quite short.Under fairly general conditions one can also show that T being measure-preserving

implies that T t is also measure-preserving.These flows are also sometimes called “suspension flows.”

6.1.3 Cross-sections

In the previous section, we saw how starting from a discrete-time dynamical system, onecould build a semi-flow. We will, in this section show how to reverse this procedure and,starting from a flow, build a discrete time dynamical system.

The key connection here is a cross section. Given a semi-flow T t on a space (X,µ), wesay that a subset Y ⊂ X is a cross-section if for almost all x ∈ X we have that

t ≥ 0 : T tx ∈ Y

is a discrete and infinite set. If in fact we have a flow, not just a semi-flow, then we willwant the same to hold true for t ≤ 0 as well. It’s fairly common to consider cross-sectionsthat are sub-manifolds in X that are one dimension smaller.

As a simple example of such a cross section, consider the billiard flow with a pointjumping from side to opposite side. The set Y given by

((x, y), η) : x = 1/2, y ∈ [0, 1), η ∈ T

is a cross section since the only points which fail to either hit it infinitely often or discretelyare those traveling directly up or down (which form a measure-zero set). Note that T hererepresents the circle and thus the set of directions a point could be traveling in.

Given a point y in a cross section Y , we let r(y) denote the “first return time” to Y ,that is, the smallest, positive t such that T ty ∈ Y . We can consider the transformationT on Y given by Ty = T r(y)y. We can then also consider the special flow associated tothis function with ceiling function r(y). There is a natural bijection between this specialflow under the ceiling function and the original flow (up to potentially a set of measurezero). Note that we haven’t said much of anything about the measure. This might in factbe difficult to say things about.


6.2 Natural extensions

We mentioned, way back near the start of these notes, that the Baker’s transformation on[0, 1) × [0, 1) was the natural extension of the base-2 expansion. Here we will define moreclearly what we mean by that.

First, we must define what it means to be a factor.

Definition 6.2.1. Let (X,F , µ, T ) and (Y, C, ν, S) be two dynamical systems. Then (Y, C, ν, S)is said to be a factor of (X,F , µ, T ) (and (X,F , µ, T ) is said to be an extension of (Y, C, ν, S))if there exists a measurable and surjective map ψ : X → Y such that

1. ψ−1C ⊂ F , so that ψ preserves the measure structure.

2. ψT = Sψ so that ψ preserves the dynamics

3. µ(ψ−1E) = ν(E), for all E ∈ C, so that ψ preserves the measure.

The map ψ is then called a factor map.

This is nearly identical to our definition of isomorphism between dynamical systemsgiven earlier, except that we do not require that ψ be a bijection (almost everywhere). Infact, we’ve seen plenty of factors and extensions before. Augmented systems are a greatexample of an extension, and here ψ would just be projection into the first coordinate.

Definition 6.2.2. Let (Y, C, ν, S) be a non-invertible dynamical system. An invertible,measure-preserving dynamical system (X,F , µ, T ) is called the natural extension of (Y, C, ν, S)if Y is a factor of X and the factor map ψ satisfies

∞∨k=0

T kψ−1C = F ,

where this join is the smallest σ-algebra to contain all the sets in each T kψ−1C.

Dajani and Kraaikamp (from whom these definitions are lifted) state a number of re-lated, important points.

1. Any two natural extensions of the same system are isomorphic, so we may talk about“the” natural extension.

2. The natural extension is, in essense, the smallest invertible extension. All otherinvertible extensions are also extensions of the natural extension.

3. The natural extension is ergodic if and only if the factor is. (This is easy to see inthe only if direction, but the if direction is not trivial.)

6.3. HYPERBOLIC GEOMETRY 85

In addition to the Baker’s transformation we saw earlier, there’s another natural exten-sion that turns out to be something we’ve already seen. If we have a one-sided Bernoullishift, the corresponding two-sided Bernoulli shift is its natural extension. As a result, sinceall generalized Luroth series are isomorphic to a (one-sided) Bernoulli shift, we get that thenatural extension of a generalized Luroth series would look like

T ([a1, a2, a3, . . . ], [a0, a−1, a−2, . . . ]) = ([a2, a3, a4, . . . ], [a1, a0, a−1, . . . ])

with the usual GLS expansion used in both places.

There is a natural extension for the β-expansion (As there must be as we mentionedbefore), but it is rather long and technical to describe, so I will encourage you to readDajani and Kraaikamp if you are interested in seeing it.

6.2.1 The natural extension of the continued fraction map

It turns out that the natural extension of the Gauss map T is likewise quite simple. It isgiven by

T (x, y) =

(Tx,

1

a1(x) + y

).

and this again has a form of

T (〈a1, a2, a3, . . . 〉, 〈a0, a−1, a−2, . . . 〉) = (〈a2, a3, a4, . . . 〉, 〈a1, a0, a−1, . . . 〉)

The underlying space is all of [0, 1) × [0, 1), just as with base-b expansions. (This doesnot in general hold. There are other continued fraction expansions on [0, 1) whose naturalextension lives of very strange spaces.)

The associated measure is now given by

µ(E) =1

log 2

∫ ∫E

dx dy

(xy + 1)2.

You can check that if you project this into the first coordinate, then you get the usualGauss measure.

Our goal in the rest of this

6.3 Hyperbolic geometry

(Note: Much of the material for this section comes from the notes of my advisor, FlorinBoca.)


6.3.1 Lines, lengths and geodesics

We will be working in the standard upper half-plane H = x+ iy ∈ C : y > 0. This spaceis nicely acted upon by the matrices in G = SL2(R). In fact, if we let

g =

(a bc d

),

then one can show that

=(gz) ==(z)

|cz + d|2.

We leave this calculation as an exercise to the reader.(We will generally assume that g has the coordinates a, b, c, d as listed here, throughout

this section.)

Proposition 6.3.1. The action of G is transitive on H. (That is, for any z, z′ ∈ H, thereexists a g ∈ G such that gz = z′.)

Proof. The matrix g =

(1 a0 1

)for any a ∈ R is in G and acts by gz = z + a. Thus,

without loss of generality, we may assume that z, z′ both have real part equal to 0, i.e.z = iα, z′ = iβ.

The matrix g =

( √β/α 0

0√α/β

)is also in G and acts by gz = βz/α and thus takes

z to z′.

Proposition 6.3.2. Given any two points z, w ∈ H, there exists g ∈ G such that gz = iand gw ∈ iR+.

Proof. Clearly by the previous proposition there is a matrix g′ such that g′z = i. So itsuffices to find a g such that g leaves i fixed and maps w onto iR+.

We claim that there exists a matrix g of the form

rt =

(cos t − sin tsin t cos t

)that satisfies these properties.

Consider the action of rt on a generic point z ∈ H:

rtz =z cos t− sin t

z sin t+ cos t

=sin t cos t(|z|2 − 1) + z cos2 t− z sin2 t

| cos t+ z sin t|2.


If z = i then it is clear that this simplifies down to

rti =i(cos2 t+ sin2 t)

|eit|2= i.

More generally we just want to show that we can chose t to make <(rtz) = 0, but we seethat

<(rtz) =sin t cos t(|z|2 − 1) + (cos2 t− sin2 t)<(z)

| cos t+ z sin t|2.

When t = 0 we have <(rtz) = <(z) and when t = π/2 we have <(rtz) = −<(z)/|z|2.One of these values is non-negative and the other is non-positive, and the value of <(rtz)alters continuously with t, so by the intermediate value theorem there is some t for which<(rtz) = 0.

A path in H is a piecewise differentiable map γ : [0, 1]→ H, γ(t) = x(t) + iy(t) = z(t).We define the hyperbolic length of a path γ by

L(γ) =

∫ 1

0

1

y(t)

∣∣∣∣dzdt∣∣∣∣ dt =

∫ 1

0

√x′(t)2 + y′(t)2

y(t)dt.

We choose this normalization for the following reason.

Proposition 6.3.3. Let γ be a path and g in G. Then L(γ) = L(gγ).

It is this proposition that gives us a good reason why we picked this strange notion oflength. In the usual real space, we don’t expect lengths to change if we translate or rotate,and this proposition gives us the same result for hyperbolic lengths.

Proof. We write z = γ(t) = x(t) + iy(t) and w = (gγ)(t) = u(t) + iv(t) = az+bcz+d . We can

see that dwdz = 1

(cz+d)2and v = =(gz) = y

|cz+d|2 by our early calculation. Thus we have that∣∣dwdz

∣∣ = vy .

Combining these facts together we get the following

L(gγ) =

∫ 1

0

1

v(t)

∣∣∣∣dwdt∣∣∣∣ dt =

∫ 1

0

1

v(t)

∣∣∣∣dwdz∣∣∣∣ ∣∣∣∣dzdt

∣∣∣∣ dt =

∫ 1

0

1

y(t)

∣∣∣∣dzdt∣∣∣∣ dt = L(γ),

as desired.

We define the hyperbolic distance d(z, w) from z to w as the infimum of L(γ) over allpaths γ such that γ(0) = z and γ(1) = w. A geodesic from z to w is a path that realizesthe distance d(z, w) if one exists.

Theorem 6.3.4. Geodesics exist between any two points in H. Moreover, the geodesics areeither vertical lines or are arcs of semi-circles whose endpoints are on the real axis.


Proof. Let us first consider the case where we two points iα and iβ, β.α > 0. Let γ be apath from iα to iβ. Then we have

L(γ) =

∫ 1

0

√x′(t)2 + y′(t)2

y(t)dt ≥

∫ 1

0

y′(t)

y(t)dt =

∫ β

α

du

u= ln(b/a).

However, if we choose γ to be the path that travels from iα to iβ directly vertically withconstant speed, we see that x′(t) = 0 and so we have equality, not just an inequality, in theabove chain. Thus this path is the geodesic.

Now consider two arbitrary points z, w ∈ H. By our earlier work we know that thereexists a g ∈ G such that gz = i and gw = iα, α > 0. By our previous work, the action ofapplying γ−1 preserves lengths of paths and since there is a geodesic γ from gz to gw, thepath g−1γ is a geodesic from z to w.

It remains to show that all geodesics are as we have described them, either vertical linesor semi-circular arcs. First note that all elements of g act by Mobius transformations andthus map straight lines (such as iR) to other straight lines or two circles.

Consider the axis iR+. This limits to two points on the boundary of H, 0 and ∞. Notethat g∞ = a/c and g0 = b/d. The only way for g∞ =∞ is for c = 0. This implies (by the

determinant condition) that d = 1/a, so that g in this case would just be

(a b0 1/a

). It

is easy to see this just acts by z 7→ a2z+ab, and thus maps the vertical line iR+ to another

vertical line. Similarly, we see that g0 = ∞ if and only if the matrix equals

(a b

1/b 0

)and this again maps iR+ to a vertical line.

If neither of these cases happen, so that g∞ and g0 both land on the real axis (indistinct points since the action of g is a bijection), then we see that g(iR+) must map to acircular arc that intersects the real line in two distinct places. (It cannot map to the wholereal axis since g preserves H.) Note that g(iR+) maps to one arc of this circle and thusg(iR−) to the remaining arc of this circle, but g(z) = gz and thus the two arcs must bereflections of one another about the real axis. Thus, the arc is a semi-circular arc.

In fact, one can show the following, although we will not do it here:

d(z1, z2) = ln|z1 − z2|+ |z1 − z2||z1 − z2| − |z1 − z2|

.

6.3.2 The unit tangent bundle and geodesic flow

Now we will make a few definitions that seem, at first, a little peculiar.We will denote the unit tangent bundle T 1H as the set of points (z, ζ) with z ∈ H and

ζ ∈ C with |ζ| = =(z). (This is called the unit tangent bundle because there is a Riemannianmetric associated to this space where the norm on ζ is given by ‖ζ‖z = |ζ|/=(z).)


Each element of the unit tangent bundle defines what we will just call a geodesic (asopposed to a geodesic between two points). In particular there is only one semi-circle orvertical line that passes through a given z tangent to direction ζ, and this is the geodesic.We will refer to the part of the geodesic that goes in the direction of ζ from z as the forwardgeodesic ray and the other part as the backwards geodesic ray.

We let any matrix g ∈ SL2(R) act on this unit tangent bundle by

(Dg)(z, ζ) =

(gz,

ζ

(cz + d)2

).

By our earlier calculation that =(gz) = =(z)/|cz + d|2 we see that this truly does mapT 1H to itself. We leave it as a fairly simple exercise to show that this action satisfies(Dg)(Dg′) = D(gg′).

Note that this action for g = ±I does nothing, therefore for the remainder of thischapter, we will think of G not as SL2(R) but as PSL2(R) = SL2(R)/(±I). In fact, withthis G we have the following result.

Proposition 6.3.5. The action of G on T 1H is transitive and free. So that for any(z, ζ), (z′, ζ ′) ∈ T 1H there exists a g ∈ G such that (Dg)(z, ζ) = (z′, ζ ′), and the stabi-lizer of any (z, ζ) is just the identity element.

Proof. First we prove transitivity. By our earlier work, we know that we can always find ag such that Dg(z, ζ) = (i, ζ ′), so it suffices to show that for any (i, ζ) ∈ T 1H, there existsg ∈ G such that g(i, ζ) = (i, i). We claim that a matrix of the form(

cos t − sin tsin t cos t

)has this property. We already showed before that such matrices map i to itself, but we canalso see that

Dg(i, ζ) =

(i,

ζ

(cos t+ i sin t)2

)=(i, ζ · e−2it

).

Clearly there exists a t such that ζ · e−2it = i, so we’re done with transitivity.Since the system is transitive, to show that it is free, we need only show that the

stabilizer of (i, i) is the identity element. Suppose gi = i. Then we have

ai+ b

ci+ d= i⇒ ai+ b = di− c⇒ a = d and b = −c.

Since ad− bc = a2 + b2 = 1, we see that we can let a = cos t and b = sin t for some t ∈ R,so that g has the form of the rotation matrix seen earlier. Therefore, Dg(i, i) = (i, i · e−2it).But e−2it = 1 if and only if t is a multiple of π and if t is a multiple of π then g is ±I, theidentity element of G.


Because of this proposition, we can say that there is a natural bijection between G andT 1H given simply by g ↔ (Dg)(i, i). This bijection respects the group action as well.

Corollary 6.3.6. Let

SO2 =

(cos t − sin tsin t cos t

): t ∈ R

and PSO2 = SO2/± I. Then the action of PSL2(R)/PSO2 is transitive and free on H.

This follows quite quickly by our bijection between G and T 1H, noting that PSO2 actson (i, i) just be altering the second coordinate. We will often denote PSO2 by K, so thatthe group G/K is transitive and free on H.

Now we will define the geodesic flow. This is a flow denoted by gt : T 1H → T 1H thattakes a point (z, ζ) to the point (zt, ζt) along the forward geodesic ray from (z, ζ) so thatd(z, zt) = t and so that ζt is tangent to the geodesic and points in the direction of theforward geodesic ray.

In one case, the action of the geodesic flow is easy to understand, namely at (i, i). Wehave already calculated that the distance from i to iα is | logα| and the forward geodesic

ray is the the ray ia : a ≥ 1. Thus gt(i, i) = (eti, eti). If we let at =

(et/2 0

0 e−t/2

), we

see that gt(i, i) = (Dat)(i, i).In fact, just knowing this gives us a better understanding of how geodesic flow works in

general. We know that for any (z, ζ) there equals some g such that (Dg)(i, i) = (z, ζ). Butbecause of how the action of g preserves distances we know that gt(z, ζ) = (Dg)(gt(i, i)).But then we have

gt(z, ζ) = (Dg)(gt(i, i)) = (Dg)(Dat)(i, i) = D(gat)(i, i).

This raises a point we must be careful about. Although we like to think of geodesic flow gtas acting on the left, on G it acts by multiplication by at on the right. And thus we willdenote this action by Rat .

Horocycle flow

We won’t remark on it too much here, but we will also care about the matrices

u−t =

(1 t0 1

)u+t =

(1 0t 1

).

Multiplication on the right by these matrices (modelling the space by G) results in the stable(for +) and unstable (for −) horocycle flows on T 1H. The actual action of these flows isa little harder to describe. Given a point (z, ζ) ∈ T 1H, there exists a circle that passesthrough z, the endpoint of the forward (resp., backward) geodesic, and is perpendicular to

6.4. AREAS AND INTEGRALS 91

ζ at z. The unstable (resp. stable) horocycle flow then pushes the point along this circle,with the tangent vector always remaining perpendicular to the circle.

We shall be interested in the following subgroups of G:

A = at : t ∈ R N+ = u+t : t ∈ R N− = u−t : t ∈ R.

We leave it as an exercise to prove that the following facts:

Exercise 6.3.7. We haveasu±t a−1s = u±

te∓s

Exercise 6.3.8. The matrices in N+ and N− generate all of G.

Note that it is note the case that N+N− = G

6.4 Areas and integrals

We will define hyperbolic area of a Lebesgue measurable set A ⊂ H by

µ(A) :=

∫A

dx dy

y2

Lemma 6.4.1. For every g ∈ G we have that µ(gA) = µ(A).

Proof. We let w = gz = az+bcz+d = u + iv. Since the the action of applying g is complex-

differentiable, by the Cauchy-Riemann equations, we have that ∂v/∂y = ∂u/∂x and ∂v/∂x =−∂u/∂y. Thus,

∂(u, v)

∂(x, y)=

∣∣∣∣∣ ∂u∂x

∂u∂y

∂v∂x

∂v∂y

∣∣∣∣∣ =∂u

∂x· ∂v∂y− ∂v

∂x· ∂u∂y

=

(∂u

∂x

)2

+

(∂v

∂x

)2

=

∣∣∣∣dwdz∣∣∣∣2 =

1

|cz + d|4.

Thus we also have

µ(gA) =

∫gA

du dv

v2=

∫A

∂(u, v)

∂(x, y)· dx dyv2

=

∫A

1

|cz + d|4· dx dy

(y/|cz + d|2)2 =

∫A

dx dy

y2= µ(A).

Remark 6.4.2. A hyperbolic triangle is a closed region that is bounded by three geodesics,potentially with vertices on R∪∞. The Gauss-Bonet formula states that the area of sucha triangle is always π minus the sum of the interior angles.

The above lemma implies that the measure dx dy/y2 on H is invariant under the actionof G (on the left). Note that each point in T 1H can be represented as (z, ζ) = (x+ iy, e2ity)with t ∈ [0, π).


Lemma 6.4.3. The measure dx dy dt/y2 on T 1H is invariant under the action of G (onthe left), seen here as acting by Dg.

Proof. We already know that dx dy/y2 is preserved, so it suffices to show that dt is alsopreserved. However the action of g on a point (x+iy, e2ity) is given in the second coordinateby

e2ity

(c(x+ iy) + d)2= e2i(t−θ) y

|c(x+ iy) + d|2= e2i(t−θ) · =(gz)

where θ is the argument of c(x + iy) + d. However, this action t 7→ t − θ very clearlypreserves dt, so we’re done.

This allows us to define a way of integrating on G itself. Let f be a function thattakes G (or, equialently, T 1H with coordinates (x, y, t)) to R. We then define the integral∫G f(g)dmG(g) to be equal to ∫

T 1Hf(x, y, t)

dx dy dt

y2.

Then for a set A ⊂ G we let mG(A) =∫A dmG(g). Moreover, we have shown that this

measure is invariant under the action of G, so that mG(gA) = mG(A) for all g ∈ G. Butnote, quite importantly, that this is an action on the left, whereas geodesic flow is an actionon the right. The measure mG is an example of a left Haar measure, which is a measureon a locally compact group that is positive, continuous, and left-invariant. Such measuresalways exist—as do right Haar measures—and are unique up to a constant multiple. Theproof of the uniqueness is at least a page of dense mathematics, so we won’t give it here.

(Now, since G is a group, we can define A−1 as the set of all inverses of elements of A.Then we define m−1

G (A) = mG(A−1). But this leads to the equalities:

m−1G (Ag) = mG((Ag)−1) = mG(g−1A−1) = mG(A−1) = m−1

G (A),

so that m−1G is right invariant. This gives the right Haar measure.)

Now for g ∈ G consider the new measure mG,g defined by

mG,g(A) = mG(Ag).

Since mG is left-invariant it is clear that mG,g is left-invariant as well, but that means thatit must be a multiple of mG, since Haar measures are unique up to constant multiple. Inparticular we define ∆(g) : G→ (0,∞) so that mG,g = ∆(g) ·mG. There are two beautifulfacts about ∆:

1. The value of ∆(g) is independent of any set A (provided A has positive measure), socalculating it is comparatively easy.

6.5. FUCHSIAN GROUPS, LATTICES, AND FUNDAMENTAL REGIONS 93

2. If ∆(g) = 1 for all g, then it is clear, by definition, that mG is right-invariant as wellas left-invariant.

Our next goal is to prove that mG truly is right-invariant as well.

6.5 Fuchsian groups, lattices, and fundamental regions

We will call a subgroup Γ of G = PSL2(R) a Fuchsian subgroup if it is a discrete subgroup,that is, the only sequences gn that limit to the identity matrix I must eventually just consistof the matrix I. . (We treat limits, topology, and all such things as coming inherited fromseeing G as a submanifold of R4.) The set PSL2(Z) is clearly a Fuchsian subgroup.

We will say, generally, that a group G of homeomorphisms of a metric space X actsproperly discontinuously if for every x ∈ X there is a neighborhood V of x such that gV ∩Vis only nonempty if g = e. There are many, many other equivalent definitions. One we willmention is that a group acts properly discontinuously on X if and only if for every compactset K ⊂ X the number of g ∈ G such that gK∩K 6= ∅ is finite. We leave the equivalence ofthese two statements as an exercise for the reader. (After this point we return to G beingPSL2(R).)

Lemma 6.5.1. For every z ∈ H and every compact set K ⊂ H, the set E := g ∈ G : gz ∈K is compact in PSL2(R).

Proof. Note that PSL2(R) can be considered a submanifold of R4 and thus compact setsare those that are closed and bounded by the Heine-Borel theorem.

First note that the map from g to gz is continuous, thus since K is closed, E must alsobe closed. It suffices to show that E is bounded in its coordinates.

Since K is bounded in H it is bounded away from infinity and all imaginary parts arebounded away from 0. Thus =(gz) = =(z)/|cz+d|2, so |cz+d| is bounded away from 0 and∞. By considering the imaginary part of cz + d, we see that c must therefore be bounded,and then by considering the real part we see that d must also be bounded.

We also have that |gz| = |(az + b)/(cz + d)| is bounded away from 0 and ∞ and byapplying what we know about cz = d, we ahve that |az + b| must be bounded away from 0and ∞. A similar calculation shows that a and b must be bounded.

Lemma 6.5.2. A subgroup Γ of PSL2(R) is Fuchsian if and only if it acts properly dis-continuously on H.

Proof. Suppose that Γ is not Fuchsian, in particular, that it is not discrete. Then thereis a sequence γn in Γ such that γn → I but γn 6= I. Then, for any z ∈ H, we have thatγnz → z. Thus, Γ does not act properly discontinuously on H.

Now suppose that Γ is Fuchsian. Consider z ∈ H that does not equal i and let V be anopen neighborhood of z. The set γz for γ ∈ Γ can only intersect V in a compact set, by


the previous lemma, but that means this set is a compact, discrete set, and therefore mustbe finite. By choosing V small enough, we can in fact show that the only such matrices γtake γz to z.

Given a group Γ that acts properly discontinuously on a metric space X, we call a closedset F ⊂ X a fundamental region for Γ when X =

⋃g∈Γ gF and Int(F ) ∩ gInt(F ) = ∅ for

every g ∈ Γ, with g 6= e. The simplest fundemental region one can typically fine is aDirichlet region, which, for a given p ∈ X is defined as the set of all points x ∈ X such thatd(x, p) ≤ d(x, γp) for all γ ∈ Γ.

Exercise 6.5.3. Show that a Dirichlet region is in fact a fundamental region.

Given a discrete subgroup Γ of a locally compact group G with left Haar measure mG,a fundamental domain for Γ\G is a set F in the Borel subsets of G for which G can bewritten as a disjoint union of γF for γ ∈ Γ. A lattice in G is a discrete subgroup Γ whichadmits a fundamental domain F such that mG(F ) =

∫G 1F (g)dmG(g) < ∞. One can also

say that Γ has finite covolume in G.In fact, it is relatively easy to see that all fundamental region have the same Haar

measure, since if F, F ′ are two fundamental domains, then

mG(F ) = mG

⋃γ∈Γ

(F ∩ γF ′)

=∑γ∈Γ

mG(F ∩ γF ′) =∑γ∈Γ

mG(γ−1F ∩ F ′) = mG(F ′).

From here on the Γ we will be interested in is PSL2(Z). If we use the bijection betweenPSL2(R) and T 1(H), then the Dirichlet region associated to 2i for this Fuchsian group canbe seen to be the set

(z, ζ) ∈ T 1H : |z| ≥ 1, |<(z)| ≤ 1

2

.

(We won’t prove this in class, but leave it as an exercise for an interested reader.) It isrelatively easy to see that this set has bounded measure when using the measure dx dy dt/y2.

Moreover, in this case, since G is in bijection with the metric space it acts on (T 1H),we see that a subset F ⊂ G is a fundamental region for γ if and only if Fg is a fundamentalregion for all g ∈ G. This follows because

⋃γ∈Γ γF = G = Gg =

⋃γ∈Γ γFg and Int(F ) ∩

γInt(F ) = ∅ if and only if Int(Fg) ∩ γInt(Fg) = ∅.Now we incorporate several facts together all at once: first, the measure mG on a

fundamental region F (associated to Γ = PSL2(Z)) is finite and positive. Second, actingon the right by an element of g turns F into another fundamental region Fg. Third, anytwo fundamental regions have the same Haar measure, so mG(F ) = mG(Fg). And Finally,this implies that ∆(g) = 1 for all g, so that mG is invariant by actions on the right as wellas by actions on the left.

Therefore we have the following:

6.5. FUCHSIAN GROUPS, LATTICES, AND FUNDAMENTAL REGIONS 95

Corollary 6.5.4. Geodesic flow preserves mG.

For Γ = PSL2(Z) or any other lattice in G, we may consider the quotient Γ\G andthe canonical quotient map π : G → Γ\G. For our usual choice of lattice, we may thinkof this just as taking an element of G and mapping it to its corresponding representativein the fundamental region. We can further define a measure on this space by mΓ\G(B) :=mG(F ∩ π−1B).This measure will be finite, since mG(F ) <∞ for all lattices, and thus wecan normalize it to be a probability measure.

Lemma 6.5.5. For every g ∈ G we have mΓ\G(B) = mΓ\G(Bg), so that mΓ\G is right-invariant and in particular is also preserved by geodesic flow.

Proof. Since we are modding G by Γ on the left, it is fairly immediate that π−1(Bg) =π−1(B)g. Also if F is a fundamental region for Γ so is Fg. Thus, by using the fact thatmG is right-invariant we have

mΓ\GBg) = mG(π−1(Bg) ∩ F ) = mG

π−1(B)g ∩⋃γ∈Γ

(F ∩ γFg)

=∑γ∈Γ

mG

(π−1(B) ∩ Fg−1 ∩ γF

).

Since γ ∈ Γ, we have that γ−1π−1(B) = π−1(B). And since mG is left-invariant as well, wehave that

mΓ\G(Bg) =∑γ∈Γ

mG(π−1(B) ∩ γ−1Fg−1 ∩ F )

= mG

π−1(B) ∩⋃γ∈Γ

(F ∩ γFg−1)

= mG(π−1(B) ∩ F ) = mΓ\G(B)

Because of this, we see that geodesic flow preserves not only mG but also mΓ\G. If youwish to go back and forth between the two measures, you can prove the following result:

∫Gf(g)dmG(g) =

∫Γ\G

∑γ∈Γ

f(γg)

dmΓ\G(Γg).


6.6 Ergodicity of the geodesic flow

Now we’re all set to prove that geodesic flow is ergodic.What we will do is consider a usual L2(Γ\G,mΓ\G) Hilbert space on Γ\G given by

〈f, g〉 =

∫Γ\G

f(s)g(s)dmΓ\G(Γs) ‖f‖ =

∫Γ\G|f(s)|2dmΓ\G(Γs)

Let us moreover consider the operator π(g) for g ∈ G given by π(g)f(s) = f(sg). (This isnot the same π we used before.) Since mΓ\G is right-invariant, it is clear that ‖π(g)f‖ = ‖f‖for all f ∈ L2(Γ\G,mΓ\G). This operator acts fairly nicely with respect to the group since

π(g) (π(h)f(s)) = π(g)f(sh) = f(shg) = π(hg)f(s).

Note that while we consider functions over the moduli space, the operator works for allelements in G.

It can moreover be shown that this strongly continuous: That as g approaches theidentity element (in the topology given earlier), we have that for any fixed f that ‖π(g)f−f‖goes to 0. This is relatively easy to see if f is the indicator function of a relatively simpleset, say on T 1H it might be a box in H crossed with an interval in the tangent direction: asg approaches the identity, the amount that π(g)f differs from f should shrink to nothing.If it’s true for sets like these, it should also be true for any Lebesgue-measurable set. Andif its true for any interval function, then one can extend it to measurable functions.

(In fact what we are describing is a unitary representation since π(g) is a unitaryoperator.)

Lemma 6.6.1 (Mautner). Suppose that g, h ∈ G satisfy limn→∞ gnhg−n = e, where e is

the identity element of G. Then for every f ∈ L2(Γ\G,mΓ\G) such that π(g)f = f almosteverywhere, it follows that π(h)f = f almost everywhere.

Proof. Note that since π(g)f = f and since π(g)π(gk) = π(gk+1) we have that π(gk)f = fa.e. for all integers k ≥ 0.

Now since π(gn)f = f a.e. we have that

‖π(h)f − f‖ = ‖π(h)π(gn)f − π(gn)f‖.

Since π(g−n) preserves the norm (and distributes over subtraction) we have

‖π(h)f − f‖ = ‖π(g−n)π(h)π(gn)f − π(g−n)π(gn)f‖ = ‖π(gnhg−n)f − f‖.

By assumption, we know that limn→∞ gnhg−n = e, so the lemma holds by strong continuity

and the fact that the integral of a non-negative function is 0 if and only if the function is0 almost everywhere.

6.6. ERGODICITY OF THE GEODESIC FLOW 97

Corollary 6.6.2. Let f ∈ L2(Γ\G,mΓ\G). Suppose that there exists a matrix as ∈ A otherthan the identity element, for which we have that π(as)f = f . Then for any g ∈ G we havethat π(g)f = f almost everywhere.

Proof. Recall thatasu±t a−1s = u±

te∓s , t ∈ R.

We also have that

π(a−1s )f = π(a−1

s )(π(as)f) = π(e)f = f,

and that a−1s = a−s. We may assume without loss of generality that s > 0.

Note that ansu+t a−ns = u+

te−ns → I as n goes to ∞ for all t ∈ R. Thus, by the previouslemma, we have that ‖π(u+

t )f −f‖ (and thus π(u+t )f = f almost everywhere) for all t ∈ R,

and thus f is almost everywhere fixed by the action of N+ on the right. We also havean−su

−t a−n−s = u−

te−ns → I as n goes to ∞ for all t ∈ R, and since π(a−1s )f = f , by the

previous lemma we have that π(u−t )f = f a.e. for all t ∈ R and thus f is a.e. fixed by theaction of N− on the right. But G is generated by N+ and N− by a previous exercise, sothe result holds.

Theorem 6.6.3. Geodesic flow is ergodic on Γ\G. (Although we have specified Γ to bePSL2(Z), in fact the method of proof works for any lattice.)

Proof. To prove this it suffices to prove that any function f ∈ L2(Γ\G,mΓ\G) that isinvariant under geodesic flow (that is, under the action of A on the right) is a constantalmost everywhere. By the previous lemma, we see that if f is such a function, thenπ(g)f = f a.e. for all g ∈ G.

Now suppose that a real-valued function f is not constant almost everywhere. (We canextend this to complex valued functions quite easily.) Then there exists disjoint intervalsI1, I2 ⊂ R such that the corresponding subsets of G,

Ci = h ∈ G : f(Γh) ∈ Ii, i = 1, 2

are neither null nor full sets with respect to mG. Let Bi denote the intersection of Ci withsome fundamental domain, and consider the measure

mG(B1 ∩B2g) =

∫G

1B1(h)1B2g(h) dmG(h).

Note that h ∈ B2g if and only if g ∈ B−12 h, where B−1 = b : b−1 ∈ B. Now we apply

Fubini’s theorem (since all correpsonding integrals are finite as they can be restricted to afinite volume fundamental domain) and the fact that mG is right-invariant to see that∫

GmG(B1 ∩B2g) dmG(g) =

∫∫G×G

1B1(h)1B2g(h) dmG(h)dmG(g)


=

∫G

1B1(h)

(∫G

1B−12 h(g)dmG(g)

)dmG(h)

=

∫B1

mG(B−12 h)dmG(h)

=

∫B1

mG(B−12 )dmG(h)

= mG(B1)mG(B−12 ).

Thus, since this is positive, there must exist some g such that mG(B1 ∩B2g) > 0. But weknow that the set of h for which π(g−1)f(Γh) = f(Γh) has full measure, so there must besuch an h in the set B1 ∩B2g for this choice of g. But then for this h and g we have that

f(Γh) = π(g−1)f(Γh) = f(Γhg−1)

but the former value must be in I1 while the latter value must be in I2, a contradiction.Thus f is constant almost everywhere and so geodesic flow is ergodic.

Exercise 6.6.4. Prove that geodesic flow is NOT ergodic on G.

One immediate consequence of ergodicity is that almost all geodesics equidistribute overΓ\G with respect to the measure mΓ\G. Here we can define equidistirbution simply by thefollowing: we can pick any rectangle in our fundamental domain (seen as a subset of H)crossed with an interval of angles (for the allowable directions of tangents) and call this arectangle on T 1H. We then say a geodesic is equidistributed if the frequency with whichthe forward geodesic ray enters any rectangle R is equal to the measure of the rectanglemΓ\G(R).

Among other things, we will also see that almost all geodesics must travel arbitrarily farinto the cusp of Γ\G. (We may think of travelling far into the cusp as travelling arbitrarilyhigh if we are looking at T 1H.)

6.7 Continued fractions and geodesics

6.7.1 Building up the cross-section

Now that we have a good understanding of the properties of geodesics and geodesic flowon the modular surface Γ\G, we want to build the connection between continued fractionsand geodesics via the first-return map to a cross section. (The coordinates we will use todo this are not the ones commonly in vogue, although we will remark on that more later)

We need some definitions. G = PSL2(R), Γ = PSL2(Z), S =

(0 −11 0

), T =(

1 10 1

), F = (z, ζ) ∈ T 1(H) : |z| ≥ 1, |<z| ≤ 1/2 that is the fundamental region for

6.7. CONTINUED FRACTIONS AND GEODESICS 99

Γ\T 1H. We will often switch back and forth between thinking of F as being a subset ofT 1H and as being equal to Γ\T 1H. We shall rely on context to make it clear which wemean.

Now, for any (z, ζ) ∈ F , the geodesic defined by this point has two endpoints onR ∪ ∞. We will say a geodesic has positive orientation if <(ζ) > 0 so that the endpointof the forward geodesic ray (labelled by y) is to the right of the endpoint of the backwardgeodesic ray (labelled by −y). We say it has negative orientation if <(ζ) < 0 so that theendpoing of the forward geodesic ray (labelled now as −y) is to the left of the endpoing ofthe backward geodesic ray (now labelled y). We will be interested in triplets of the form(y, y, ε) associated to a point (z, ζ) ∈ F . Here ε = ±1 according to whether the orientationis positive or negative. We won’t remark too much on vertical geodesics or their orientationat the moment, except to say that it is definable for most of them. (The vertical geodesicat 0 is the problem.)

Let C be the subset of T 1H defined by points (iα, ζ) ∈ T 1H, α > 0, such that in thecorresponding coordinates (y, y, ε), we have y ∈ (0, 1], y ≥ 1. (In fact, there is a bijectionbetwen elements of C and allowable tuples (y, y, ε).) Note that by our definitions, anypoint (z, ζ) on the positive imaginary axis that isn’t vertical has y and y both positive. Wefurther note that C is not contained in F , it is contained in T 1H, however, it is containedin F ∪ SF , where S is the inversion matrix defined before; however, we can consider thisas a subset of Γ\G by mapping everything to the fundamental domain. If we do considerC as a subset purely of F , note that all the corresponding geodesics have an endpoint in[−1, 1]. We will come back to this fact later.

We want to show that C is a cross-section for geodesic flow. To start this, we willshow that each (z, ζ) ∈ F that does not point vertically will, under geodesic flow on Γ\G,intersect C at least once.

Consider the geodesics corresponding to elements of C. These are all semi-circles withone endpoint in (0, 1] (or [−1, 0)) and the other endpoint in (−∞,−1] (or [1,∞)). Clearly,any such geodesic intersects C once.

Now consider a point (z, ζ) ∈ F that does not point vertically. We may assume thegeodesic is oriented positively, as the proof is identical in the negative case. The forwardendpoint of this lies at y and this y lies in some interval (n, n+ 1] for an integer n. Let usconsider DT−n(z, ζ), where T is the “shift horizontally by 1” matrix defined earlier. Thisshift alters how the geodesic acts on T 1H but not how it acts in the modulo surface. Inparticular, in T 1H, this moves the geodesic so the forward endpoint y is in the interval(0, 1]. Now we just want to show that the corresponding y must be at least 1. Supposethat it isn’t. Then y and −y are in (−1, 1], so the corresponding geodesic is completelycontained in the circle of radius 1 around the origin. But this region does not intersect withF (or any T translate of it), so cannot contain (z, ζ), which is a contradiction.

This is still true for vertical geodesics besides the one mentioned before, since it is easyto see that these geodesics (when mapped into F ) cannot always be vertical.


So given that all but one geodesic intersects C, we want to understand when the nextintersection is (presuming there is one). Consider a point (iα, ζ) ∈ C, here again consideredas a subset of F ∪ SF .

First consider (iα, ζ) such that y = 1. Since y ≥ 1, we see that α must be at least 1.But then if we draw the geodesic going from (iα, ζ) to 1, we see it can only pass throughthree copies of the fundamental domain, first F , then TF , then TSF . It will never intersectwith any copy of C again.

Let us suppose on the other hand that y ∈ (0, 1). Applying S to (iα, ζ) does not changehow it acts on the modular surface, but takes the coordinates (y, y, ε) to (1/y, 1/y,−ε).Now let n be the positive integer such that ε/y ∈ ε(n, n + 1]. By translating our geodesicby Tn (or T−n as appropriate) we see that there is another intersection with C which hascoordinates (1/y−n, 1/y+n,−ε). This is another intersection, but is it the next intersectionwith C? It’s possible we skipped some intersection in between.

To see that this truly is the next intersection, we just chart the path of the geodesic.It starts in F (or in SF and then F ), then works its way through a variety of T kF (for keither positive or negative depending), before entering TnF and possibly TnSF . Now weknow that C, seen as a subset of F , is a subset of (iα, ζ) : α > 1, and T kC is likewiseseen to be a subset of the vertical line (with real part k) passing through the middle ofT kF . However, it is clear that the other intersections of the geodesic with these verticallines do not intersect T kC, because it does not have the right ζ: in particular, neither ofthe endpoints of the geodesic are in [−k − 1,−k + 1].

Thus we see that a geodesic intersects C with coordinates (y, y, ε), then, provided y 6= 1,the next intersection will come with coordinates (1/y, [1/y] + 1/y,−ε). If y = 1 there areno more intersections. Call this transformation from one point to another T .

Consider the map ψ that takes (y, y, ε) to (y, 1y ), which is in (0, 1] × [0, 1]. If we then

consider ψTψ−1, this acts by

ψTψ−1(y, z) =

(1

y

,

1

[1/y] + z

)This is just the natural extension of the Gauss map (up to some funny business at theendpoints). Thus, the first return map to C is just an extension of the natural extension ofthe continued fraction expansion. (The extra extension bit comes from this ε coordinate.)Since the only way for a geodesic to have a finite number of intersections with C alongthe forward geodesic ray is if there is eventually an intersection with C where y = 1, thiscan only happen if the forward endpoint of the geodesic is rational. Therefore almost allgeodesics in Γ\G intersect C an infinite number of times along the forward geodesic ray,and thus C is a true cross-section.

Because of this we can also draw up a natural bijection between a ceiling function overthis extension of the continued fraction expansion and geodesic flow, where the ceilingfunction is the first return time.


6.7.2 Applications of this connection

I should note that this connection does not rely on the ergodicity of geodesic flow what-soever. Artin originally used this connection to prove the existence of dense geodesics(geodesics which enter any rectangle in F ). We won’t prove that itself since it follows fromergodicity, but will prove some additional things.

We will need a bit of notation first. Consider a point (z, ζ) ∈ F . Let (y, y, ε) be thecoordinates of the first time the forward geodesic intersects C, presuming such a pointexists. We will refer to these as the standard coordinates for this point on the geodesic.Note that since y ∈ (0, 1) (unless y = 1 and the geodesic zooms off into the cusp), we willtypically represent it as its continued fraction expansion by 〈a1, a2, a3, . . . 〉, and likewise wewill denote the continued fraction expansion of y (when it is in [1,∞)) by 〈a0; a−1, a−2, . . . 〉.Here, going from one intersection with C to the next involves traveling from

(〈a1, a2, a3, . . . 〉, 〈a0; a−1, a−2, . . . 〉, ε)→ (〈a2, a3, a4, . . . 〉, 〈a1; a0, a−1, . . . 〉,−ε) .

Note that if y is rational—and hence has two expansions, one with a 1 at the end, onewithout—then all our definitions will mean that the natural choice of expansion to take isthe one with 1 at the end.

Let’s start with a fact that is fairly immediate from our earlier definitions.

Lemma 6.7.1. Let (z, ζ) ∈ F have standard coordinates (y, y, ε). If y is rational, then theforward geodesic ray eventually falls into the cusp (i.e., travels vertically upwards in F ). Ify is rational, then the backward geodesic ray eventually falls into the cusp.

This has a neat corollary. Note that the geodesic (2i,−2i) is somewhat unique in thatit just travels straight down, bounces up and goes straight up again. Any other verticalgeodesic will intersect the bottom side of F , transport from z to −z, and then move away ata strange angle. Can it ever go vertical again? Sure. In fact, such geodesics are dense in theset of vertical geodesics, they simply correspond to those vertical geodesics with rationalreal part.

Lemma 6.7.2. Consider (z, ζ), (z′, ζ ′) ∈ F . As the distance between these points goes tozero, the distances between the corresponding y’s and y’s in the standard coordinates alsogoes to zero. Furthermore, consider (z, ζ) ∈ F with standard coordinates (y, y, ε). Giveny′, y′, as y′ and y′ converge to y, y, there exists a point (z′, ζ ′) ∈ F with standard coordinates(y′, y′, ε) converging to (z, ζ).

This fact is fairly obvious geometrically.

Proposition 6.7.3. There exist periodic geodesics in Γ\G. In fact, such geodesics aredense.


Proof. The existence follows quite quickly. Let ai∞i=−∞ be an infinite, periodic sequenceof positive integers with period length k and let ε ∈ +1,−1. Then the gedoesic startingfrom the point (z, ζ) ∈ C with standard coordinates (〈a1, a2, a3, . . . 〉, 〈a0; a−1, a−2, . . . 〉, ε)must also be periodic. This is because the 2kth next time the geodesic from this pointvisits C will have the same coordinates. (We use 2k instead of k because of ε.)

Now consider a point (z, ζ) ∈ F with non-vertical tangent direction with standardcoordinates (y, y, ε) = (〈a1, a2, a3, . . . 〉, 〈a0; a−1, a−2, . . . 〉, ε). We will assume both of theseexpansions are infinite although the finite case is also doable. Let K be large and considerthe bi-infinite periodic sequence bi∞i=−∞ obtained by repeating the sequence a−K , a−K+1,a−K+2, . . . , aK−1, aK. Then the points 〈b1, b2, b3, . . . 〉 and 〈b0, b−1, b−2, . . . 〉 converge to yand y, so by the previous lemma, there is some geodesic that passes through a sequence ofpoints converging to (z, ζ) whose natural coordinates are defined by this bi-infinite periodicsequence.

Proposition 6.7.4. Let (z, ζ) ∈ F . If in standard coordinates the corresponding y is aquadratic irrational, then the forward geodesic ray limits onto a periodic geodesic. (Bylimiting onto a periodic geodesic, we mean that there exists a periodic geodesic such thatfor all t sufficiently large, gt(z, ζ) is eventually arbitrarily close to this geodesic.)

Proof. Recall that y is a quadratic irrational if and only if y has an eventually periodiccontinued fraction expansion. By moving forward enough along the geodesic, we can assumethat y has a purely periodic continued fraction expansion y = 〈a1, a2, . . . , ak〉. We may alsoassume, without loss of generality that k is even, as if it’s not, we can simply double it.

Now consider what happens when we move far enough forward along the geodesic to

visit C corresponding to Tnk

(y, y, ε) for n ∈ N. Since nk is even, this will leave ε fixed andsince nk is a multiple of k it also leaves y invariant. However, the first nk digits in they coordinate will be ak, ak−1, . . . , a1 repeated n times, and thus as n increases, this willconverge to 〈ak; ak−1, . . . , a1〉.

If we denote the sequence of intersections of the forward geodesic of (z, ζ) in Γ\G with Cby (zi, ζi), then it is clear that from the above that (znk, ζnk) converges to some (z′, ζ ′) ∈ Fsuch that the geodesic from (z′, ζ ′) is periodic. The question is will all points on the forwardgeodesic start to approach this periodic geodesic. Consider (znk, ζnk) on F and plot thegeodesic to (z(n+1)k, ζ(n+1)k) on all of H. Since (znk, ζnk) converges to (z′, ζ ′) as n goesto infinity, we see that (z(n+1)k, ζ(n+1)k) must lie on the same translated copy of C for allsufficiently large n, and thus must converge to the translate of (z′, ζ ′) that lies on this copyas well. Thus these arcs from (znk, ζnk) to (z(n+1)k, ζ(n+1)k) converge to a semi-circular arcin the Euclidean sense, but since they are bounded away from the real line, they must alsoconverge in the hyperbolic sense.

Proposition 6.7.5. Let (z, ζ) ∈ F . If in standard coordinates, the corresponding y hasbounded continued fraction digits, then the forward geodesic ray stays in a compact subset


of Γ\G.

Here we may interpret “compact subset” to mean that the geodesic always stays in abounded region in F . That such geodesics are infinite should come as no surprise becausethere are infinitely many periodic geodesics and any periodic geodesic stays in a compactsubset. However, this set is a lot bigger. The set of periodic geodesics corresponds to the setof quadratic irrationals, and that is countable. The set of continued fractions with boundeddigits is uncountable. In fact, it has full Hausdorff dimension.

We will leave the proof of the above statement as an exercise to the reader.

Proposition 6.7.6. Let (z, ζ) ∈ F . If in standard coordinates, the corresponding y iscontinued fraction normal, then the forward geodesic ray equidistributes in Γ\G.

Proof. We will only provide a sketch of the overall proof.The first step is to show that if y is continued fraction normal, then T

n(y, y, ε) must

equidistribute as n goes to positive infinity. Here the equidistibution is with respect tothe invariant measure on the natural extension lifted through the factor map ψ. Thisequidistribution result follows largely by noting that the digits of y have negligible impacton T

n(y, y, ε) as n gets large: all of the digit frequencies are eventually controlled by the

digits of y, which, by assumption, is continued fraction normal. Showing that the ε doesnot cause problems does not take much additional work. In fact, if we replace ±1 with 0and 1, we note that this is similar to augmented systems that we have seen before.

We may exclude those y that are rational from further consideration.Now consider the special flow under the ceiling function constructed from T , where the

ceiling function corresponds to the length of the geodesic between the points on C withstandard coordinates (y, y, ε) and T (y, y, ε). (Note that this space will consist of points(y, y, ε, s) and the measure will be the ψ-lifted measure in the first three coordinates andLebesgue measure in the last one.)

Each point (z, ζ) (such that the backward geodesic ray intersects C at least once)corresponds naturally to a point on the special flow. Namely, flow backwards from (z, ζ)until you reach C. The distance will correspond to s and the standard coordinates of thisintersection will correspond to (y, y, ε). It is clear that the special flow is isomorphic to thegeodesic flow.

Now, we must show that if y is continued fraction normal (and hence the forward orbitof (y, y, ε) is equidistributed), then the flow from (y, y, ε, s) (for any allowable choice of sbounded by the ceiling function) equidistributes. This is not too difficult and will likely beleft as an exercise.

Now consider (z, ζ) ∈ F so that the corresponding y is continued fraction normal.Consider a rectangle in F , and we want to consider the frequency with which the forwardgeodesic enters F . We can move through our isomorphism to consider the geodesic as anequidistributed special flow. However, F may no longer resemble a rectangle when seen


through the special flow. However, it can be shown that the resulting set is very regular andcan be estimated to an arbitrary degree above and below by a finite number of rectangles.And from the equidistribution of this point under the special flow, we see that there isa specific frequency with which the forward geodesic from (z, ζ) ∈ F . The key point:this frequency is not dependent on the point we started with; it is only a consequence ofequidistribution.

Thus we know that all geodesics whose forward endpoint is continued fraction normalmust enter every rectangle R in F with a given frequency λR. We know also that almostevery geodesic equidistributes and thus enters R with the desired frequency mΓ\G(R).However, almost all geodesics have a forward endpoint which is continued-fraction normal,so λR must be mΓ\G(R) and this completes the proof.

6.7.3 Other lattices, other expansions

We’ve been looking at what happens when we take PSL2(R) and mod out by the particularlattice PSL2(Z). This resulted in a given fundamental domain F and a connection to theregular continued fraction expansion.

What if we use a different lattice? Do we get a connection to a different type of continuedfraction?

Yes.

One of the most well studied of these takes Γ to be the theta group:

Γ = M ∈ PSL2(Z) : M ≡ I or S (mod 2) .

The corresponding fundamental domain is given by (z, ζ) ∈ H : |z| ≥ 1, |<(z)| ≤ 1.This can be thought of as a copy of F , TF and TSF for the usual fundamental domaincorresponding to PSL2(Z) and the fact that we need three such fundamental domainscorresponds to the fact that the theta group is a subgroup of PSL2(Z) of order 3. Thecorresponding continued fraction expansion is the even continued fraction expansion, whichfor numbers in (0, 1) looks like

1

a1 +ε1

a2 +ε2

a3 + . . .

,

where each ai is a positive even integer and ε1 is ±1.

Another well known connection comes with the Hecke groups, which are generated bythe matrices (

1 λ0 1

) (0 −11 0

), λ > 0.


If λ = 2 cos(π/q) for q = 3, 4, 5, . . . , then this will act properly discontinuously and givea lattice. These are connected to the Rosen continued fraction, which for a number in[−λ/2, λ/2) has the form

ε1

λa1 +ε2

λa2 + . . .

,

where ε = ±1 and ai are positive integers at least 2.What about the standard congruence subgroups? These are the groups that look like

Γq = g ∈ PSL2(Z) : g ≡ I (mod q) .

We still get a lattice in these cases, although the fundamental domains are extremelydifficult to draw generally. Here, the connection isn’t to a type of continued fraction, butrather to an augmented system, the same one we used to prove Moeckel’s theorem in thelast chapter. (This was how Fisher gave a general proof of Moeckel’s theorem.)

Chapter 7

Explicit ergodic estimates,operators, and mixing

The big problem with a lot of the work we have done so far is that it is highly inexplicit. Ifa system is ergodic, then almost all points equidistribute across the space. Which points?Who knows. How fast? Well... there we might be able to say something, although wehaven’t yet. Our goal in this last chapter is to make many of these rougher estimationsmore precise. Among other things we will finally prove the mixing of the continued fractionexpansion, together with an explicit estimate on the rate of mixing.

7.1 Estimates on rates of convergence

The main result we will prove here is the following, which can be found as proposition 4.0.4in Iosifescu and Kraaikamp (and they in turn took it from Gal and Koksma). A similarresult and further digression can be found in Harman’s book. We will largely follow thedetails of Harman.

Theorem 7.1.1. Let T be a µ-preserving transformation on a measure space (X,µ). As-sume that f ∈ L1(X,µ) and∫

X

(n−1∑k=0

f(T kx)− n∫Xf dµ

)2

dµ = O(Φ(n))

as n → ∞, where Φ : N → R is a function such that Φ(n)/n is non-decreasing andΦ(n) = O(Φ(n+ 1)). Then for any ε > 0 we have

n−1∑k=0

f(T kx) = n

∫Xfdµ+O

(Φ1/2(n) log

32

+ε Φ(n)), n→∞

107

108 CHAPTER 7. EXPLICIT ERGODIC ESTIMATES, OPERATORS, AND MIXING

for almost all x.

It’s interesting to note that this theorem does not make any requirement that T isergodic, even though the conclusion is one we would expect of an ergodic-type result. Thereason comes from Φ(n). If T is not ergodic, it’s possible to conjure up examples of functionsf where Φ(n) is at least as big as n2, so the statement is vacuous.

Exercise 7.1.2. Under the same assumptions as the previous theorem, show that the mea-sure of the set

Eε,n :=

x ∈ X :

∣∣∣∣∣ 1nn−1∑k=0

f(T kx)−∫Xf dµ

∣∣∣∣∣ ≥ ε,

is bounded by O(Φ(n)/ε2n2).

Before we prove this theorem, first a few lemma.

Lemma 7.1.3 (Borel-Cantelli). Let X be a measure space with measure µ. Let Aj∞j=1

be a countable collection of measurable subsets of X. Then if∑∞

j=1 µ(Aj) converges thenalmost all x ∈ X belong to only finitely many of the Aj.

The proof is left as an exercise.

Lemma 7.1.4. Let f ∈ L1(X,µ) and let T be a mueasre-preserving transformation of(X,µ). Then ∫

Xf(x) dµ =

∫Xf(Tx) dµ.

Proof. Suppose f is the indicator function of a set A. Then∫X

1A(Tx)dµ =

∫X

1T−1A(x)dµ = µ(T−1A) = µ(A) =

∫X

1A(x)dµ.

So the lemma is true for f = 1A. However, if it is true for indicator functions, then bystandard techniques we can show it is true for all L1 functions.

Proof of Theorem. It suffices to show this holds for all non-negative functions in L1(X,µ).Let us consider a sequence nj defined by nj = maxn : Φ(n) < j. These do not need to

be distinct values. However, since Φ(n)→∞ with n, we see that nj →∞ with j. In fact,since Φ(n)/n is non-decreasing, there exists a constant c > 0 such that for all sufficientlylarge n we have that Φ(n) > cn. Thus nj+1 − nj is bounded.

Suppose that the desired relation holds for whenever n = nj for any j. Suppose nowthat nr < n < nr+1., then

nr−1∑k=0

f(T kx) ≤n−1∑k=0

f(T kx) ≤nr+1−1∑k=0

f(T kx).

7.1. ESTIMATES ON RATES OF CONVERGENCE 109

By assumption however, we get that

nr

∫Xf dµ+O(r1/2 log3/2+ε r) ≤

n−1∑k=0

f(T kx) ≤ nr+1

∫xf dµ+O((r+ 1)1/2 log3/2+ε(r+ 1)).

Since (nr+1−nr)∫X f dµ = O(1) and since the two big-Oh terms in the above line are both

O(Φ(n)1/2 log3/2+ε Φ(n)) we see that

n−1∑k=0

f(T kx) = n

∫xf dµ+O

(Φ(n)1/2 log3/2+ε Φ(n)

),

as desired.Therefore it remains to show that the desired relation holds for n = nj for all sufficiently

large j.Let α(n,m, x) =

∑m−1k=n f(T kx), A(n,m) =

∫X α(n,m, x)dµ andD(n,m, x) := |A(n,m)−

α(n,m, x)|. Ultimately we want a bound on D(0, nj , x); however, this will be obtained bybreaking up the interval from 0 to nj into smaller pieces, which have a very nice structure.

Given j, let r ∈ Z be given by 2r−1 < j ≤ 2r. (Note r = log2 j+O(1).) We will considerintervals of the form (u, v] where u = nt2s and v = n(t+1)2s for non-negative integers t, swith (t+1)2s ≤ 2r, call the set of all these intervals Br. The interval (0, nj ] can be expressedas a disjoint union of at most r such intervals in the following way: if we write j in binaryform as j = 2a1 + 2a2 + · · ·+ 2ak where 0 ≤ a1 < a2 < · · · < ak, then the intervals take theform

(u, v] = (n2ai+2ai+1+···+2ak , n2ai−1+2ai+···+2ak ], i ≥ 2

where s = ai−1 and t = (2ai + 2ai+1 + · · · + 2ak)/2ai−1 . (We also include the interval(0, 2ak ].) We will call the set of these intervals Br(j). The importance of this particularway of breaking up the set (0, nj ] into a disjoint union is that it only took us r+1 intervals.

We have thatD(0, nj , x) ≤

∑(u,v]∈Br(j)

D(u, v, x).

Now we apply Cauchy-Schwarz and see

D(0, nj , x) ≤

∑(u,v]∈Br(j)

1

1/2 ∑(u,v]∈Br(j)

D(u, v, x)2

1/2

≤ (r+1)1/2

∑(u,v]∈Br

D(u, v, x)2

1/2

.

By the previous lemma, we know that the integral is invariant under applying T to x,but

∫X α(0,m, Tnx)dµ =

∫X α(n,m + n, x)dµ. Thus A(n,m) = A(0,m − n). Likewise we

have that∫X

(α(n,m, x)−A(n,m))2 dµ =

∫X

(α(0,m− n, Tnx)−A(0,m− n))2 dµ = O(Φ(m− n)).


Since Φ(n)/n is non-decreasing we have that

Φ(n) + Φ(m) =Φ(n)

nn+

Φ(m)

mm ≤ Φ(n+m)

n+mn+

Φ(n+m)

n+mm = Φ(n+m).

Thus, we have that∫X

∑(u,v]∈Br

D(u, v, x)2dµ = O

∑(u,v]∈Br

Φ(v − u)

= O

∑0≤s≤r

∑0≤t<2r−s

Φ(n(t+1)2s − nt2s)

= O

∑0≤s≤r

Φ

∑0≤t<2r−s

(n(t+1)2s − nt2s)

= O

∑0≤s≤r

Φ(n2r)

= O ((r + 1)Φ(n2r))

= O(r2r).

Let Er denote the set of x ∈ X for which∑(u,v]∈Br

D(u, v, x)2 > r2+2ε2r.

By our bound on the integral of this sum, we see that µ(Er) = O(r−1−2ε). By the Borel-Cantelli lemma, almost all x are in finitely many of the Er.

Consider x and j with x 6∈ Er. Then∣∣∣∣∣∣nj−1∑k=0

f(T kx)− nj∫Xf dµ

∣∣∣∣∣∣ = D(0, nj , x) ≤ (r + 1)1/2

∑(u,v]∈Br

D(u, v, x)2

1/2

= O(r3/2+ε2r/2) = O(j1/2(log j)3/2+ε)

= O(

Φ(nj)1/2(log Φ(nj))

3/2+ε).

This last equality follows from the fact that j ≤ Φ(nj + 1) = O(Φ(nj)).However, for almost all x, we have that x 6∈ Er for all sufficiently large j. This completes

the proof.

Note that things like the law of the iterated logarithm suggest that this is very close tobest possible. In particular it should not be true if one tries to use Φ(n)1/2(log log Φ(n))1/2

in place of Φ(n)1/2(log Φ(n))3/2+ε.

7.1. ESTIMATES ON RATES OF CONVERGENCE 111

7.1.1 Applications

Let’s give a quick example of this theorem in action.Consider T associated to the base 10 expansion and let f = 1A be the indicator function

for a rank-1 cylinder. Then,∫ 1

0

(n−1∑k=0

1A(T kx)− n∫ 1

01A(x)dx

)2

dx

=

∫ 1

0

(n−1∑k=0

1T−kA(x)− n10−1

)2

dx

=

∫ 1

0

n−1∑j,k=0

1T−jA(x)1T−kA(x)− 2n10−1n−1∑k=0

1T−kA(x) + n210−2

dx

=n−1∑j,k=0

λ(T−jA ∩ T−kA)− 2n10−1n−1∑k=0

λ(T−kA) + n210−2

=n−1∑j,k=0

λ(T−jA ∩ T−kA)− n210−2

If j = k, then λ(T−jA ∩ T−kA) = 10−1 and there are n such terms. If j 6= k thenλ(T−jA ∩ T−kA) = 10−2 and there are n2 − n such terms. Thus,∫ 1

0

(n−1∑k=0

1A(T kx)− n∫ 1

01A(x)dx

)2

dx = n(10−1 − 10−2) = 0.09n.

Thus, the theorem applies and we have for any such A and almost all x that

#0 ≤ i ≤ n− 1 : T ix ∈ An

= 10−1 +O(n−1/2(log n)3/2+ε).

Similar results are also possible for other rank-k cylinders.However, in showing this result, we relied heavily on knowing that the base-10 expansion

is Bernoulli, so λ(T−jA ∩ T−kA) was easy to measure. However, most systems we areinterested in are not Bernoulli. Note that for a more general system, with µ in place of λand an arbitrary set A, we have, if j ≤ k that

µ(T−jA ∩ T−kA) = µ(T−j(A ∩ T−(k−j)A)) = µ(A ∩ T−(k−j)A).

This looks like a mixing-type result. Recall that a system is mixing if µ(A ∩ T−nB) →µ(A)µ(B) as n→∞.


So let’s make an assumption. Let’s suppose we know that µ(A∩T−nA) = µ(A)2+O(τn)where 0 < τ < 1. This we might call exponential rate of mixing. If we apply this into ourearlier estimates we get∫ 1

0

(n−1∑k=0

1A(T kx)− n∫ 1

01A(x)dx

)2

dµ

=n−1∑j,k=0

µ(T−jA ∩ T−kA)− 2nµ(A)n−1∑k=0

µ(T−kA) + n2µ(A)2

=n−1∑j,k=0

(µ(A)2 +O(τ |j−k|)

)− n2µ(A)2

=

n−1∑j,k=0

O(τ |j−k|) = O

(n−1∑`=0

(n− `)τ `)

= O(n).

This is just as strong as our earlier result using Bernoullicity. Clearly we could make dowith weaker estimates than exponential rates of mixing, but as it turns out, exponentialmixing is the common behavior.

In the next section, we will introduce the Perron-Frobenius operator, which will give usa powerful technique for proving these exponential rates of mixing.

7.2 The Perron-Frobenius operator

(Some material for this chapter borrowed from the book Probabilistic Properties of Deter-ministic Systems by Lasota and Mackey.)

We wil want to consider an inner product and L1 norm on functions over a measurespace (X,µ) given by

〈f, g〉 =

∫Xf(x)g(x)dµ ‖f‖ =

∫X|f |dµ.

We shall refer to a function f ∈ L1 with ‖f‖ = 1 and f ≥ 0 as a density.We will say a transformation T on a measure space (X,µ) is nonsingular if µ(A) = 0

implies µ(T−1A) = 0. Given a non-singular transformation, we can define an operatorP : L1(X,µ)→ L1(X,µ) so that for any f ≥ 0, Pf is the “unique” solution to∫

APf(x)dµ =

∫T−1A

f(x)dµ, ∀A.

We say “unique” since one can always change its behavior on a measure-zero set and nothave it affect the integral. We can then extend P to act on any L1 function by breaking

7.2. THE PERRON-FROBENIUS OPERATOR 113

that function into a positive and negative part. The existence of this operator comes aboutdue to the Radon-Nikodym theorem.

This operator is known as the Perron-Frobenius operator.Here’s some straight-forward facts about the Perron-Frobenius operator, whose proofs

we leave as exercises.

1. P is a linear operator, so for any a, b ∈ R and f, g ∈ L1(X,µ), we have P (af + bg) =aPf + bPg.

2. P is a positive operator, so if f ≥ 0 then Pf ≥ 0.

3. If P is the Perron-Frobenius operator corresponding to T and Pn corresponds to Tn,then Pn = Pn.

4. P1X = 1X if and only if T preserves the measure µ. More generally, Pf = f if andonly if T preserves the measure µf given by µf (A) =

∫A fdµ.

5. If T is measure preserving, then∫X Pfdµ =

∫X fdµ, and ‖Pf‖ ≤ ‖f‖ with equality

if f is non-negative.

So the Perron-Frobenius operator is well-behaved, but what does it mean? What doesit do?

Let’s consider the case where T corresponds to a fibred system acting on [0, 1) and thatthe branches Td are increasing and have continuous derivatives. Then∫ x

aPf(t)dt =

∫T−1[a,x]

f(t) dt.

By taking derivatives we get

Pf(x) =d

dx

∫T−1[a,x]

f(t) dt =d

dx

(∑d∈D

∫ T−1d x

T−1d a

f(t) dt

)

=∑d∈D

f(T−1d x)

|T ′(T−1d x)|

.

Note that T ′ is will likely not be defined at the boundary of the cylinder sets.In fact, one can show by similar means that one generally has

Pf = (f T−1)

∣∣∣∣dT−1(x)

dx

∣∣∣∣where the latter term is a Jacobian determinant.

So we can think of the Perron-Frobenius operator as being a way to take f to f T−1

without destroying good behavior.


7.2.1 Examples

Let us consider some simple examples of the Perron-Frobenius operator. For base-b expan-sions we have

Pf(x) =∑

0≤d≤b−1

f(T−1d x)

|T ′(T−1d x)|

=∑

0≤d≤b−1

f((x+ d)/b)

b.

For continued fraction expansions we have that T−1d = 1/(x+ d) and thus

Pf(x) =∞∑d=1

f(T−1d x)

|T ′(T−1d x)|

=∞∑d=1

1

(x+ d)2f

(1

x+ d

).

While technically these are only defind on (0, 1) since the derivative is not nicely definedat 0, we can see these functions nicely extend to act on [0, 1).

However, and we should emphasize this strongly, these derivations of the Perron-Frobeniusoperator were with respect to Lebesgue measure, and we will often denote this by Pλ. Ifwe want to use a different measure, say the Gauss measure, or more generally a measureµ such that dµ = h(x)dx for some density function h, then we need a slightly differentderivation:∫ x

aPf(t)h(t)dt =

∫ x

aPf(t)dµ(t) =

∫T−1[a,x]

f(t) dµ(t) =

∫T−1[a,x]

f(t)h(t)dt,

and thus, by taking derivatives and applying our earlier techniques, we have

Pf(x) =1

h(x)

∑d∈D

f(T−1d x)h(T−1

d x)

|T ′(T−1d x)|

.

Thus since the Gauss measure is given by dµ = ((log 2)(1 + x))−1dx we have that thePerron Frobenius operator associated to the continued fraction expansion is given by

Pµf(x) = (1 + x) log 2∞∑d=1

1

(x+ d)2f

(1

x+ d

)1

(log 2)(

1 + 1x+d

)=

∞∑d=1

x+ 1

(x+ d)(x+ d+ 1)f

(1

x+ d

).

Exercise 7.2.1. Let f ∈ L1([0, 1), λ), h(x) > 0 for all x ∈ [0, 1), and let g(x) = (x +1)h(x)f(x). Prove that for the continued fraction expansion we have

Pnµ f(x) =Pnλ g(x)

(x+ 1)h(x),

where µ is given by dµ = h(x)dx.


Exercise 7.2.2. Give the Perron-Frobenius operator with respect to the Lebesgue measurefor the β-expansion.

These examples should hopefully make it clear that even though the Perron-Frobeniusoperator is defined in a somewhat abstract way, it acts in a very concrete way on functionswe are interested in. This will allow us to estimate the long-term behavior of P morereadily.

7.2.2 The Koopman operator

So there is a closely related operator, referred to as the Koopman operator, that acts onL∞(X,µ) (the set of essentially bounded functions. It acts very simply by Uf(x) = f(Tx).It’s easily seen to be a linear operator as well.

The Koopman operator is closely related to the Perron-Frobenius operator. Recall theinner product on L1 × L∞ given by

〈f, g〉 =

∫Xf(x)g(x)dµ,

then the Koopman operator is the adjoint of the Perron-Frobenius operator—that is 〈Pf, g〉 =〈f, Ug〉.

This can be seen easily for indicator functions g = 1A:

〈Pf, 1A〉 =

∫XPf(x)1A(x)dµ =

∫APf(x)dµ

=

∫T−1A

f(x)dµ =

∫Xf(x)1T−1A(x)dµ

=

∫Xf(x)1A(Tx)dµ = 〈f, U1A〉.

Since it holds for indicator functions, it can be extended to simple function and hence toall L∞ functions.

As a first connection between operators and ergodicity, recall that a system is ergodicif and only if the only functions f that equal f T almost everywhere are constant almosteverywhere. In other words, a system is ergodic if and only if the fixed points of theKoopman operator are almost everywhere constant functions.

However, a more powerful result is the following

Theorem 7.2.3. A transformation T is mixing if and only if

limn→∞

〈Pnf, g〉 = 〈f, 1X〉〈1X , g〉, for allf ∈ L1, g ∈ L∞.


Proof. As usual, we consider the proof for f, g equal to indicator functions 1A, 1B. It maybe extended from here by standard methods.

We give a chain of equivalent statements:

1.limn→∞

〈Pn1A, 1B〉 = 〈1A, 1X〉〈1X , 1B〉.

2. Since U is the adjoint of P , we have

limn→∞

〈1A, Un1B〉 = 〈1A, 1X〉〈1X , 1B〉.

3. Applying the definition of these inner products, we obtain

limn→∞

∫X

1A(x)1B(Tnx)dµ =

∫X

1A(x)dµ

∫X

1B(x)dµ.

4. Applying the fact that 1B(Tnx) = 1T−nB(x) and the connection between integralsand measures, we have

limn→∞

µ(A ∩ T−nB) = µ(A)µ(B),

i.e, T is mixing.

7.2.3 The spectral decomposition theorem

We give a version of the spectral decomposition theorem as found in Lasota and Mackey.Similar results are sometimes known as the Ionescu-Tulcea and Marinescu theorem.

Here we will make use of a few definitions. We say a set of functions is strongly pre-compact if every sequence of functions from the set has a strongly convergent subsequence.We define the distance from a function f and a set of functions G to be the infimum of‖f − g‖ for g ∈ G. We say an operator P is constrictive if Pnf converges to a stronglyprecompact set for all densities f . Finally we call a linear operator P : L1 → L1 a Markovoperator if Pf ≥ 0 for all f ≥ 0 in L1 and if ‖Pf‖ = ‖f‖ for all f ≥ 0 in L1. (Note thatPerron-Frobenius operator is a Markov operator by definition.)

Theorem 7.2.4. Let P be a constrictive Markov operator. Then there is an integer r,densities gi and ki ∈ L∞(X,µ), 1 ≤ i ≤ r and an operator Q : L1 → L1 such that for allf ∈ L1, Pf may be written as

Pf(x) =

r∑i=1

〈f, ki〉gi(x) +Qf(x),

where


1. gi(x)gj(x) = 0 if i 6= j, so that the support of the gi’s do not overlap;

2. there exists a permutation α on 1, 2, . . . , r such that Pgi = gα(i); and,

3. ‖PnQf‖ → 0 as n→∞ for every f ∈ L1.

The idea behind the spectral decomposition theorem is that operators essentially act likematrices in many ways. In particular, operators come with a point spectrum, which actslike eigenvalues for a matrix, and other spectrums, such as the continuous spectrum, whichdoes not act like eigenvalues. From the fact that ‖Pf‖ ≤ ‖f‖, we know that the operatorshouldn’t have any eigenvalues greater than 1. The spectral thereom essentially breaks theoperator into two pieces: one piece consists of the eigenspaces with eigenvalues of norm 1(these are the gi’s) and another part with strictly less than 1 in maximum eigenvalue (thisis Q).

The importance of this comes in part from Q. Since the largest “eigenvalue” of Q hasnorm less than 1, we expect ‖PnQf‖ to converge to zero exponentially fast.

Lemma 7.2.5. Suppose under the conditions of the spectral decomposition theorem that(X,µ) is a probability space and that T preserves µ. Then there exists a partition of X intoA1, A2, . . . , Ar such that gi = µ(Ai)

−11Ai.

Proof. Note that since T preserves µ we have Pn1X = 1X . Suppose that the period of thecorresponding α is k—that is, αk(i) = i for all i. Note that P kgi = gαk(i) = gi. Then wehave that

1X = Pnk+11X =r∑i=1

〈1X , ki〉Pnkgi + PnkQ1X =r∑i=1

〈1X , ki〉gi + PnkQ1X .

Taking limits as n goes to infinity, we see that

1X(x) =r∑i=1

〈1X , ki〉gi(x),

technically for almost all x ∈ X but we may ignore that for now.Let the support of gi be called Ai. By the spectral decomposition theorem, these are

distinct, and since 1X(x) = 1, we see that these Ai’s must form a partition of X. Moreover,if x ∈ Ai, then 1X(x) = 〈1X , ki〉gi(x), and thus gi(x) = 1Ai(x)/〈1X , ki〉. Because gi is adensity and thus has norm 1, we have that

1 =

∫Xgi(x)dµ =

µ(Ai)

〈1X , ki〉,

and thus gi(x) = 1Ai(x)/µ(Ai) as desired.


Proposition 7.2.6. Let P be the Perron-Frobenius operator associated to a measure-preserving transformation T on a probability space, and suppose P is constrictive. Then Tis mixing if and only if there exists a way for r = 1 in the spectral decomposition theorem.

Proof. Suppose r = 1. Then in the previous lemma we have that A1 = X and thus

Pnf(x) = 〈f, k1〉1X(x) + Pn−1Qf.

Thus Pnf → 〈f, k1〉1X . Suppose that f is non-negative, then ‖Pnf‖ = ‖f‖ and so we seethat 〈f, k1〉 = ‖f‖ and also ‖f‖ = 〈f, 1X〉.

Thus for any non-negative f ∈ L1 and any g ∈ L∞ we have

〈Pnf, g〉 = 〈f, 1X〉〈1X , g〉+ 〈Pn−1Qf, g〉.

As n goes to infinity, though, the norm of Pn−1Qf goes to 0 and hence this latter termdisappears. We can extend this to all f ∈ L1 by breaking a function into non-negative andnon-positive parts. Thus T is mixing.

Now suppose r > 1 and we will show T is not mixing. We may assume that r waschosen minimally. Note that we cannot have α(i) = i and α(j) = j for two distinct termsi = j because then we could reduce the value of r by 1 by combining gi and gj . Thus theremust exist an i, we may assume it is 1, such that α(i) 6= i.

Now consider f = g1 = 1A1/µ(A1). Then Pnf = gαn(1), and thus

〈Pnf, g1〉 =

µ(A1)−2, αn(1) = 1,

0, αn(1) 6= 1.

But there are infinitely many n’s for which αn(1) equals 1 and infinitely many for which itdoes not. Therefore 〈Pnf, g1〉 cannot converge to a limit and thus T is not mixing.

Remark 7.2.7. In fact when r = 1, we can show a much stronger property, exactness,holds.

But this tells us something really great. Suppose T is mixing, then

µ(A ∩ T−nB) = 〈1A, Un1B〉 = 〈Pn1A, 1B〉= 〈µ(1A)1X + Pn−1Q1A, 1B〉= µ(1A)〈1X , 1B〉+ 〈Pn−1Q1A, 1b〉= µ(1A)µ(1B) +O(‖Pn−1Q1A‖).

In other words, if the decay of the big-Oh term is exponential (which we expect it to besince the largest eigenvalue of Q has norm less than one), then we get exponential rate ofmixing automatically.

7.3. MIXING ESTIMATES FOR CONTINUED FRACTIONS 119

The problem, however, is that it’s generally really really hard to prove the spectraldecomposition theorem for the full space of L1 functions. We typically focus on a smallersubset, such as continuous functions, bounded functions, bounded variation functions, orLipschitz functions (in fact a result of Schweiger “Kuzmin’s Theorem revisited” does this forLipschitz functions under conditions slightly broader than what we used to prove ergodicityusing our Renyi-type theorem).

7.3 Mixing estimates for continued fractions

(Material from this section is borrowed from Chapter 2.1 of Iosifescu and Kraaikamp.)We will here consider T to be the usual continued fraction map and P be the corre-

sponding Perron-Frobenius operator corresponding to the invariant Gauss measure. Recallthat

Pf(x) =

∞∑d=1

x+ 1

(x+ d)(x+ d+ 1)f

(1

x+ d

)We will make a change of notation for the remainder of this section to allow µ to refer

to a general measure, not the Gauss measure. When needed, we will let µG denote theGauss measure. We will generally assume that µ λ, by which we mean that λ(A) = 0implies µ(A) = 0. By the Radon-Nikodym theorem we have that there exists a function hsuch that dµ = h dλ. Recall that for the Gauss measure h(x) = ((1 + x) log 2)−1.

We will also extend the Perron-Frobenius operator to act in the obvious way on functionsdefined over [0, 1] instead of [0, 1).

Lemma 7.3.1. Let µ be a probability measure on [0, 1), µ λ, with dµ = h dλ. Then forany n ∈ N and Lebesgue-measurable set A we have

µ(T−nA) =

∫A

Pnf(x)

x+ 1dx,

where f(x) = (x+ 1)h(x).

Note this is a very general statement and in particular does not require that h > 0almost everywhere.

Proof. We proceed by induction. For n = 0, this becomes µ(A) =∫A h dx which is obviously

true. Assume the statement holds for a given n. Thenwe will show it holds for n+ 1:

µ(T−(n+1)A) = µ(T−n(T−1A)) =

∫T−1(A)

Pnf(x)

x+ 1dx

= (log 2)

∫T−1A

Pnf dµ = (log 2)

∫APn+1f dµ


=

∫A

Pn+1f(x)

x+ 1dx.

The particular functions f we will start with are bounded, monotonic, real-valued func-tions. We will then move to functions of bounded variation. Note that if E is a finiteunion of disjoint intervals then the measure µ(A) = µG(A∩E)/µG(A) has a density h withrespect to Lebesgue measure such that f(x) = (1 + x)h(x) has bounded variation. Thiswill be our key step towards proving mixing.

Lemma 7.3.2. Suppose f is a bounded, non-decreasing (non-increasing) function. ThenPf is bounded and non-increasing (non-decreasing).

Proof. First we will show that P preserves boundedness. Suppose f is bounded, then

|Pf(x)| =

∣∣∣∣∣∞∑d=1

x+ 1

(x+ d)(x+ d+ 1)f

(1

x+ d

)∣∣∣∣∣ ≤ maxx∈[0,1]

|f(x)| ·

∣∣∣∣∣∞∑d=1

x+ 1

(x+ d)(x+ d+ 1)

∣∣∣∣∣= max

x∈[0,1]|f(x)| ·

∣∣∣∣∣(x+ 1)

∞∑d=1

(1

x+ d− 1

x+ d+ 1

)∣∣∣∣∣= max

x∈[0,1]|f(x)|

Thus Pf(x) is bounded.

Now suppose f is bounded and non-decreasing. (The proof for non-increasing is iden-tical.) Let 0 ≤ x < y ≤ 1, so f(y) ≥ f(x). Let us write Pf(y) − Pf(x) = S1 + S2

where

S1 =

∞∑d=1

y + 1

(y + d)(y + d+ 1)

(f

(1

y + d

)− f

(1

x+ d

))

S2 =∞∑d=1

(y + 1

(y + d)(y + d+ 1)− x+ 1

(x+ d)(x+ d+ 1)

)f

(1

x+ d

).

Since f(1/(y + d)) ≤ f(1/(x+ d)) we have that S1 is non-positive. It suffices to show thatS2 is as well.

Recall as we’ve seen before that∑∞

d=1(x+ 1)/(x+ d)(x+ d+ 1) = 1. Thus,

∞∑d=1

f

(1

x+ 1

)(y + 1

(y + d)(y + d+ 1)− x+ 1

(x+ d)(x+ d+ 1)

)= 0

7.3. MIXING ESTIMATES FOR CONTINUED FRACTIONS 121

Note also that

f

(1

x+ 1

)− f

(1

x+ d

)≥ f

(1

x+ 1

)− f

(1

x+ 2

)≥ 0, d ≥ 2.

Combining this we have that

S2 = −∞∑d=2

(f

(1

x+ 1

)− f

(1

x+ d

))(y + 1

(y + d)(y + d+ 1)− x+ 1

(x+ d)(x+ d+ 1)

)

≤ −(f

(1

x+ 1

)− f

(1

x+ 2

)) ∞∑d=2

(y + 1

(y + d)(y + d+ 1)− x+ 1

(x+ d)(x+ d+ 1)

)=

(f

(1

x+ 1

)− f

(1

x+ 2

))(y + 1

(y + 1)(y + 2)− x+ 1

(x+ 1)(x+ 2)

)≤ 0.

Thus Pf(y)− Pf(x) ≤ 0 as desired.

We will make use of the total variation in the remainder of this section. We define var fto be the supremum of

∑|f(xi) − f(xi−1)| over all sequences 0 ≤ x0 < x1 < x2 < · · · <

xk ≤ 1.

Proposition 7.3.3. Suppose f is a real-valued function of bounded variation, then varPf ≤12 var f .

Remark 7.3.4. It can be shown that the constant of 1/2 here is sharp.

Proof. First suppose the statement is true if we add in the addition constriction that thefunction is monotonic. Any function of bounded variation f can be written as f = f1 − f2

where f1, f2 are bounded, monotonic functions and var f = var f1 + var f2. But then

varPf = var(Pf1 − Pf2) ≤ varPf1 + varPf2

≤ 1

2var f1 +

1

2var f2 =

1

2var f,

so the proposition is proved in this case.(We note that the proof of the existence of the functions f1 and f2 comes from the Hahn

decomposition theorem of signed measures. In particular, we let µf ([a, b)) = f(b) − f(a)and then decompose this into positive and negative parts, which give us f1 and f2.)

It suffices to prove it for bounded monotone function. Assume that f is non-decreasing.(A similar proof works for non-increasing functions.) By the previous lemma Uf is non-increasing and bounded so

varUf = Uf(0)− Uf(1) =∞∑d=1

(1

d(d+ 1)f

(1

d

)− 2

(d+ 1)(d+ 2)f

(1

d+ 1

))


=1

1(1 + 1)f

(1

1

)−∞∑d=1

1

(d+ 1)(d+ 2)f

(1

d+ 1

)

≤ 1

2f(1)− f(0) ·

∞∑d=1

1

(d+ 1)(d+ 2)

=1

2(f(1)− f(0)) =

1

2var f.

This completes the proof.

Proposition 7.3.5. Let f be a non-negative function of bounded variation. Then

Pnf(x) = ‖f‖+O(.5n),

where the big-Oh term is uniform over all x ∈ [0, 1).

This follows almost immediately from the previous proposition, we simply note thatthe difference between the supremum and infimum of Pnf must be bounded by the totalvariation of Pnf and thus by O(.5n). Thus Pnf must be converging to a constant functionbut since P preserves norms of non-negative functions, the only thing it can converge to is‖f‖.

Theorem 7.3.6. T is mixing.

Proof. Consider a set E that is a finite, disjoint union of intervals of [0, 1). Consider themeasure µ(A) = µG(E∩A). This function is given quite simply by µ(A) =

∫A h(x)dx where

h(x) = 1E(x)/(1+x)(log 2). Thus the corresponding f(x) = (1+x)h(x) is easily seen to bea non-negative function of bounded variation since E is a finite union of intervals. Moreover‖f‖ = 1

log 2µG(E). (Note that ‖ · ‖ is defined with respect to the Gauss measure.)Therefore, for any measurable set A, we have

µG(E ∩ T−nA) =

∫A

Pnf(x)

x+ 1dx =

∫A

‖f‖+O(.5n)

x+ 1dx

=1

log 2µG(E)

∫A

dx

1 + x+O(.5n) = µG(E)µG(A) +O(.5n).

Now consider an arbitrary measurable set B ⊂ [0, 1). We know that for any ε > 0, thereexists a set E, which is a finite disjoint union of intervals of [0, 1) such that µG(B4E) < ε.This is because the set of subintervals of [0, 1) forms a semi-algebra that generates theLebesgue algebra. Moreover, by our work above, we have that for all sufficiently large nthat µG(E ∩ T−nA) < ε for any measurable set A. Therefore, for such n we have

µG(B ∩ T−nA) ≤ µG((B4E) ∩ T−nA

)+ µG(E ∩ T−nA) < 2ε.

Since ε > 0 was arbitrary, we see that µG(B∩T−nA)→ 0 as n→∞. Thus T is mixing.

7.4. THE QUEST FOR ACIMS 123

We note that the middle of this proof gives an exponential mixing result for many setsof interest.

7.4 The quest for ACIMs

(Some material from this chapter taken from Einseidler and Ward)

In one large respect, this book has pulled the wool over your eyes. We’ve constantlybeen working in dynamical systems with a measure that was invariant. It is not generallyobvious if one should exist, or even in the cases where it exists, what it should look like.So famous are some of the proofs of the forms of these measures that they often take thename of their discoverer: the Gauss measure for continued fractions, the Parry measure forβ-expansions, and so on.

Let us consider an arbitrary space X (generally a metric space equipped with a σ-algebra) and a measurable transformation T acting on this space. Recall that measurabilityis a property of the σ-algebra, not of the measure. A measure µ is said to be invariant ifµ(T−1A) = µ(A) for all measurable sets A. A measure µ is said to be ergodic if T−1A =A implies µ(A) = 0 or µ(Ac) = 0. As an example, if T corresponds to the continuedfraction map, then Lebesgue measure is not an invariant measure, but it is ergodic. (Weare assuming here that all measures are non-negative, not signed.)

We will denote by M(X,T ) the set of all invariant, probability measures and by E(X,T )the set of all ergodic, invariant probability measures. Clearly E(X,T ) ⊂M(X,T ).

Theorem 7.4.1. The set M(X,T ) is a convex set and E(X,T ) is its set of extreme points.

Recall that a set S is convex if given any a, b ∈ S all the points ta + (1 − t)b ∈ S, fort ∈ (0, 1). A point c is an extreme point if it cannot be written as ta+ (1− t)b for a, b ∈ S,a 6= b and t ∈ (0, 1).

Proof. Suppose ν1, ν2 ∈M(X,T ). Let t ∈ [0, 1] and µ = tν1 + (1− t)ν2. Then

µ(T−1A) = tν1(T−1A) + (1− t)ν2(T−1A) = tν1(A) + (1− t)ν2(A) = µ(A),

so M(X,T ) is convex.

Suppose µ ∈M(X,T ) is not ergodic. Then, by definition, there exists a set B such thatT−1B = B but 0 < µ(B) < 1. Consider the two probability measures

ν1(A) =µ(A ∩B)

µ(B)ν2(A) =

µ(A ∩Bc)

µ(Bc).

It is easy to see that these measures are invariant, hence in M(X,T ) and that µ = µ(B) ·ν1 + µ(Bc) · ν2.


Suppose by way of contradiction that µ ∈ E(X,T ) and that there exist distinct ν1, ν2 ∈M(X,T ) and t ∈ (0, 1) such that µ = tν1 + (1 − t)ν2. Since 0 < t < 1, we have that ν1 iscontinuous with respect to µ and thus using Radon Nikodym, let us write dν1 = fdµ. LetA denote the set of x ∈ X such that f(x) < 1.

Note that by the invariance of µ,

µ(T−1A ∩A) + µ(T−1A \A) = µ(T−1A) = µ(A) = µ(T−1A ∩A) + µ(A \ T−1A).

Thus µ(T−1A \ A) = µ(A \ T−1A). The only property we used was invariance, which alsoholds for ν1, so ∫

T−1A\Afdµ =

∫A\T−1A

fdµ.

However, by the definition of A, the integrand on the left is greater than or equal to 1everywhere it is being integrated while the integrand on the right is strictly less than 1where it is being integrated. Thus the µ-measure of both these sets must be 0 and henceµ(T−1A4A) = 0 and by ergodicity, A must have µ-measure 0 or 1. If µ(A) = 1 then

ν1(X) =

∫Xfdµ <

∫X

1dµ = 1,

which is impossible since ν1 is a probability measure. So A must have measure 0.A similar argument works if we let A be the set of x where f(x) > 1. Thus we have that

f = 1 almost everywhere and so ν1 = µ. But then ν2 = 11−t(µ − tν1) must also equal µ.

This contradicts our original assumption that such a way of writing µ as a sum existed.

How big are the sets M(X,T ) and E(X,T )? Well, in general, they could be massive.We saw on a homework problem that the base-10 expansion preserves the measure given by16

∑6i=1 δi/7. This is a general phenomenon. Each periodic point gives rise to an invariant

measure.In rare, lucky cases we can have that M(X,T ) consists of a single measure. In such a

case, we say that the system is uniquely ergodic. Here the sole invariant measure is also anergodic measure. One can show that in such cases, we can remove the “almost all” fromthe Birkhoff ergodic theorem.

The above theorem is interesting in that it relates E(X,T ) to M(X,T ). However, formany interesting spaces, any convex set can be written as the convex hull of its extremepoints: namely, every point can be written as a convex sum (or for uncountably infinitespaces) as an integral of its extreme points. Is that true here? Yes. In a statement generallyknown as the ergodic decomposition theorem, one can show that every invariant measurecan be written as a sum (or integral) of ergodic measures. We may or may not have timeto prove this in class, but will at least postpone it for the moment.

Let’s give one result which will illustrate how these various ergodic measures interactwith one another.


We say two measures µ1, µ2 are mutually singular if there exists A ⊂ X such thatµ1(A) = µ2(Ac) = 0. We may think of this as breaking up X into two pieces and eachmeasure is only supported on one of the pieces.

Proposition 7.4.2 (Lemma 4.6 from Einsiedler and Ward). If µ1, µ2 ∈ E(X,T ) andµ1 6= µ2, then µ1 and µ2 are mutually singular.

Proof. Since µ1 6= µ2 there must exist some measurable set A such that µ1(A) 6= µ1(B).Then if we let f = 1A we have

∫X fdµ1 6=

∫X fdµ2.

Consider

limn→∞

1

n

n−1∑k=0

f(T kx).

By the Birkhoff pointwise ergodic theorem, this converges to∫X fdµ1 for µ1-almost all

x ∈ X and must also converge to∫X fdµ2 for µ2-almost all x ∈ X. But since these two

constants are not equal, no x ∈ X can do both of these simultaneously. If we let A denotethe set of points x ∈ X for which the limit converges to

∫X fdµ2, then we see the desired

condition for mutual singularity holds.

Here is one consequence of the proposition. We know that for the base-10 fibred system,Lebesgue is invariant and ergodic. Therefore any other ergodic measure (such as the sumof dirac measures we saw before) must be supported on a set of Lebesgue measure 0.

For many fibred systems and dynamical systems of interest, one is often interested in theLebesgue measure. However, Lebesgue measure may not be invariant or ergodic. Thus, weoften are interested in trying to find absolutely continuous invariant measures (or ACIMs):these are measures µ ∈M(X,T ) such that mu is continuous with respect to λ. For any suchmeasure, recall that we can write µ(A) =

∫A hdλ for some L1 function h. It’s possible that

there are many, many such measures, especially if there is more than one ergodic ACIM.However, we note that by the previous proposition, any two such ergodic ACIMs should bemutually singular, and since they are absolutely continuous with respect to Lebesgue, thismeans that their supports must be disjoint up to a set of Lebesgue-measure 0.

The following theorem will make it more clear how the ergodicity of T relates to thesepossible ACIMs. Here we define a density f ∈ L1 to be a stationary density if Pf = f .Here equality is meant in terms of the L1-norm, since Pf can be changed on a measure-0set without altering its properties.

Theorem 7.4.3 (Theorem 4.2.2 from Lasota and Mackey). Let P be the Perron-Frobeniusoperator associated to a nonsingular transformation T on a measure space (X,µ).

If T is ergodic on (X,µ) then there is at most one stationary density f∗ of P .

If there exists a unique stationary density f∗ of P which is positive almost everywherethen T is ergodic.


This result is a little peculiar. It says that if T is ergodic with respect to Lebesgue,then there is at most one ergodic ACIM, but there don’t have to be any.

We need a lemma first though.

Lemma 7.4.4. Under the conditions of the theorem, we have that if A is a measurable setand f ≥ 0, then Pf(x) = 0 for all x ∈ A if and only if f(x) = 0 for all x ∈ T−1A.

In particular, supp f ⊂ T−1 suppPf where supp denotes the support.

Proof. First, note that ∫APf(x)dµ =

∫T−1A

f(x)dµ

and thus, ∫X

1A(x)Pf(x)dµ =

∫X

1T−1A(x)f(x)dµ.

But if Pf = 0 identically on A, then the left integral is 0 and thus the right integral mustbe 0, and so f = 0 identically on T−1A. This argument works in the other direction aswell.

Now let A = X \ suppPf . Then Pf = 0 on A so f = 0 on T−1A. Thus

supp f ⊂ X \ T−1A = X \ T−1 (X \ suppPf) = X \(X \ T−1 suppPf

)= suppPf

Proof of Theorem 7.4.3. First, suppose T is ergodic but that there are two stationary den-sities f1, f2. These correspond to probability measures µ1, µ2 (with, say µi(A) =

∫A fidµ)

for which T is invariant. These measures are continuous with respect to µ by constructionand thus T is ergodic with respect to these measures as well. By the previous propositionµ1 and µ2 must be mutually singular, thus the support of f1 and f2 must be disjoint (upto µ-measure 0 sets).

Let us call the support of f1 and f2 by A and B respectively and suppose that these aredefined so that they are disjoint because we can always shift values on a µ-measure-0 set.By the previous lemma, we have that A ⊂ T−1A, and thus T−nA ⊂ T−(n+1)A for all n.The same holds for B. Moreover, since A and B are disjoint, T−nA is disjoint from T−nBfor all n as well.

Now let A =⋃∞n=0 T

−nA and define B likewise. These sets are disjoint, and moreoverinvariant, since

T−1A =

∞⋃n=0

T−n−1A =∞⋃n=1

T−nA =∞⋃n=0

T−nA = A,

due to A ⊂ T−1A.


But A and B have positive measure, thus A and B have positive measure, are invariant,and disjoint. This contradicts the ergodicity of T .

On the other hand, suppose that there is a unique stationary density f∗ that is positivealmost everywhere but T is not ergodic. If T is not ergodic, there exists a set A with0 < µ(A) < 1 such that T−1A = A. Note T−1Ac = Ac as well. Note that

1Af∗ + 1Acf

∗ = f∗ = Pf∗ = P (1Af∗) + P (1Acf

∗).

However, we have 1Af∗ = 0 on Ac = T−1Ac, and thus P (1Af

∗) = 0 on Ac by the previouslemma. Likewise P (1Acf

∗) = 0 on A. Thus, we must have that 1Af∗ = P (1Af

∗) andlikewise with Ac.

Since f∗ is positive everywhere, let fA = 1Af∗/‖1Af∗‖ and fAc = 1Acf

∗/‖1Acf∗‖. Wethen have PfA = fA and PfAc = fAc , but these are both densities and distinct, whichcontradicts our original assumption.

7.4.1 Using Perron-Frobenius to find ACIMs

Consider a transformation T on a measure space (X,µ), and let P be the Perron-Frobeniusoperator associated to T and µ. We will consider An to be the Cesaro average:

Anf =1

n

n−1∑k=0

P kf.

Proposition 7.4.5 (Prop 25.2.2 from Schweiger). Suppose f ∈ L1(X,µ). If the sequenceAnf has a subsequence that converges in norm to f∗, then the full sequence Anf convergesin norm to f∗ and Pf∗ = f∗ in norm.

Recall that Pf∗ = f∗ if and only if T preserves the measure µf∗(A) =∫A f∗dµ. This

gives us a practical way to prove the existence of ACIMs (by showing that the sequenceAnf for some f is strongly precompact) and even a practical way to estimate such ACIMs(by applying An to a simple function f).

Proof. First, let us suppose that Anf has a subsequence that converges in norm to f∗. Wewill show that Pf∗ = f∗. Let ε > 0 be arbitrary and consider n such that ‖Anf − f∗‖ < εand that 2

n‖f‖ < ε. Then, we have that

‖PAnf − Pf∗‖ = ‖P (Anf − f∗)‖ ≤ ‖Anf − f∗‖ < ε

and also

‖(An − PAn)f‖ =

∥∥∥∥f − Pn+1f

n

∥∥∥∥ ≤ ‖f‖+ ‖Pnf‖n

≤ 2

n‖f‖ < ε


Thus,

‖f∗ − Pf∗‖ ≤ ‖f∗ −Anf‖+ ‖Anf − PAnf‖+ ‖PAnf − Pf∗‖ < 3ε.

Since n is arbitrary we see that f∗ = Pf∗ in norm.Now note that for any g ∈ L1 we have that

‖An(1− P )g‖ =

∥∥∥∥ 1

n(1− Pn+1)g

∥∥∥∥ ≤ 2

n‖g‖ → 0

as n goes to ∞.Let ε > 0 and let m = m(ε) be chosen so that ‖Amf − f∗‖ < ε. Since Pf∗ = f∗ and

thus Anf∗ = f∗ in norm, we have that

‖Anf − f∗‖ = ‖An(f − f∗)‖ ≤ ‖An(f −Amf)‖+ ‖An(Amf − f∗)‖≤ ‖An(f −Amf)‖+ ‖Amf − f∗‖ < ‖An(f −Amf)‖+ ε.

However, we have that

f −Amf =1

m

m−1∑k=0

(1− P k)f = (1− P )

(1

m

m−1∑k=0

(1 + P + P 2 + · · ·+ P k−1)f

),

and thus we have that ‖An(f − Amf)‖ → 0 as n goes to infinity. Thus ‖Anf − f∗‖ ≤ 2εfor all sufficiently large n. But ε was arbitrary, so Anf converges to f∗ in norm.

It can be similarly shown that if f is a density, then f∗ must be a density as well.

7.4.2 The Lasota-Yorke method

Given the previous proposition, the idea for trying to find ACIMs seems fairly direct:attempt to show that as An is applied to a nice function f , it must have a convergentsubsequence. There’s no a priori reason why this must happen. In fact, it’s quite possiblefor the functions to diverge quite horribly, say to something like n · 1[0,1/n).

Arguably the most famous result about finding ACIMs for fibred systems is due toLasota and Yorke. Various improvements on this result have been found over the years.

Theorem 7.4.6. Let ([0, 1), λ, T,D,X ) be a fibred system where D is finite, the Xi’s are allintervals, and T is piecewise C1 on each interval. Assume further that ω(s, ·) is absolutelycontinuous for each string s, and that there exists a constant θ > 1 so that |T ′(x)| ≥ θ forall x in the interior of each Xi.

Then T admits an invariant measure µ that is continuous with respect to λ, h = dµ/dλis of bounded variation, and for any f ∈ L1([0, 1), λ), Anf converges to a single functionf∗ of bounded variation.


The proof is basically a three step process. First, one shows that if T satisfies theconditions of the theorem then so does T k for any k ∈ N. Moreover if we can find an ACIMµk for T k, then the measure

µ(A) =k−1∑i=0

µk(T−iA)

is an ACIM for T . In particular we use this to study T k where k is sufficiently large so thatθ > 3.

Second, we use a contracting lemma to show that if the variance of f is bounded then

var(Pf) ≤ 3θ−1 var(f) + C‖f‖

for some constant C > 0.Using this, one can then show that there are constants C1 and C2 such that

var(Anf) ≤ C1‖f‖ |(Anf)(x)| ≤ C2‖f‖, for all x ∈ [0, 1).

Then thirdly, one applys Helly’s selection theorem, which will tell us that any sequenceof functions that has uniformly bounded total variation and is uniformly bounded has aconvergent subsequence.

Dynamics and Number Theory: an introduction through the ...2 CHAPTER 1. INTRODUCTION|THE WHYS AND...

Documents

Transcript of Dynamics and Number Theory: an introduction through the ...2 CHAPTER 1. INTRODUCTION|THE WHYS AND...