Patterns of Structure and Argument: a guide to...

Patterns of Structure and Argument:a guide to mathematical thinking

Charles I. Delman

September 17, 2017

Contents

1 Theory & Proof:Why do I need to know this? 11.1 What is mathematics, and what is it good for? . . . . . . . . . . . . . . . 11.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Back to the bees! Analyzing the honeycomb. . . . . . . . . . . . . . . . . 41.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Examining our assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Extra credit exercises! . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.7 Mathematics is abstract by nature. . . . . . . . . . . . . . . . . . . . . . 61.8 The dance of abstraction and concreteness. . . . . . . . . . . . . . . . . . 91.9 A note on how this course will proceed. . . . . . . . . . . . . . . . . . . . 11

2 Motivation: Area Revisited 132.1 What is Area? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 What have we left out? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Motivation: Solution Sets 213.1 What does solving mean exactly? . . . . . . . . . . . . . . . . . . . . . . 213.2 The process of solving an equation . . . . . . . . . . . . . . . . . . . . . 253.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4 A More Complicated Equation . . . . . . . . . . . . . . . . . . . . . . . . 313.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Sets & Logic 354.1 Open Propositions Define Sets . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Logical Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.1 Contrapositives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

iii

iv CONTENTS

4.2.2 Rephrasing a conditional as a disjunction . . . . . . . . . . . . . . 37

4.2.3 Negation: Don’t just say no! . . . . . . . . . . . . . . . . . . . . . 38

4.3 Quantified Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 The Real Number System 49

5.1 Addition and Multiplication of Real Numbers . . . . . . . . . . . . . . . 51

5.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3 The Ordering of the Real Numbers . . . . . . . . . . . . . . . . . . . . . 56

5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6 Set Theory & Proof Structure 61

6.1 The Axiom of Extensionality . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.2 Operations on sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.3 The Axiom of Restricted Comprehension . . . . . . . . . . . . . . . . . . 63

6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.5 More Axioms of Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.6 The Structure of a Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.6.1 From Universal to Specific . . . . . . . . . . . . . . . . . . . . . . 68

6.6.2 Generality through Arbitrarity . . . . . . . . . . . . . . . . . . . . 69

6.6.3 Vacuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.6.4 Division into cases . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.6.5 Proof by Contradiction . . . . . . . . . . . . . . . . . . . . . . . . 73

6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7 The Natural Number System 79

7.1 Definition of the Set of Natural Numbers . . . . . . . . . . . . . . . . . . 79

7.2 Proof by Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.3 Recursive Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.4 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.5 Proving Recursive Inequalities by Induction . . . . . . . . . . . . . . . . 87

7.6 The Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.8 Complete Induction: Two Approaches . . . . . . . . . . . . . . . . . . . 91

7.8.1 Modifying the proposition to enhance the inductive step . . . . . 91

7.8.2 Well-ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

CONTENTS v

8 Relations 998.1 Ordered pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1008.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1008.3 The Cartesian Product Operation . . . . . . . . . . . . . . . . . . . . . . 1018.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018.5 The Definition of a Relation . . . . . . . . . . . . . . . . . . . . . . . . . 1028.6 Operations on Relations: Inverse and Composition . . . . . . . . . . . . . 1038.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1048.8 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.8.1 The Definition of a Function . . . . . . . . . . . . . . . . . . . . . 1058.8.2 Some special types of functions . . . . . . . . . . . . . . . . . . . 1078.8.3 Surjectivity, Injectivity, and Inverse Functions . . . . . . . . . . . 1088.8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1108.8.5 The Induced Function on Power Sets . . . . . . . . . . . . . . . . 1128.8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.9 Properties of Relations: a Compendium . . . . . . . . . . . . . . . . . . . 1138.9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.10 Order Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1158.10.1 Weak Orders Versus Strong Orders . . . . . . . . . . . . . . . . . 1158.10.2 Partial Orders Versus Total Orders . . . . . . . . . . . . . . . . . 1168.10.3 The Dictionary Order on a Cartesian Product . . . . . . . . . . . 1168.10.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1168.10.5 Maxima and Minima; Bounds; Suprema and Infima . . . . . . . . 1188.10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8.11 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1248.11.1 Equivalence Classes and Partitions . . . . . . . . . . . . . . . . . 1248.11.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

vi CONTENTS

Chapter 1

Theory & Proof:Why do I need to know this?

In which we attempt to answer the question posed by the chapter title and, in doing so,provide some useful background on the nature of mathematics.

You may wonder why it is necessary to carefully study mathematical theory. Whystudy sets, for example? Aren’t they just collections of things? Why learn to provetheorems? Other people have proven the theorems we’ll use, and why bother to provethem anyway? Can’t we tell by experience what works and what doesn’t? We’ve solvedequations, computed derivatives, and worked out applied problems without learning theproofs of our methods, and things have gone just fine!

If you have questions and doubts similar to these, they deserve an answer. Thisintroduction is an attempt to give one. I hope it will also help you see that you haveused logical proof and theory more than you realize; you just weren’t aware of it.

1.1 What is mathematics, and what is it good for?

Presumably, since you are likely a mathematics major or minor, you think mathematicsis worthwhile. But let’s examine more carefully what mathematics is, and why it isvaluable. These are big questions, of course, and this short discussion will barely beginto answer them. It is doubtful they can be fully answered at all. However, I think wecan gain some useful and motivational insights from a short examination.

Mathematics is often described as the study of patterns. What is a pattern? Apattern is something that is regular and predictable, and therefore recognizable. The

1

2 CHAPTER 1. THEORY & PROOF: WHY DO I NEED TO KNOW THIS?

relationships among its parts can be described with precision; it is these relationshipsthat give the pattern its structure.

Consider, for example, a honeycomb made by bees. (If you haven’t seen one, justgoogle “honeycomb” images.) It has a regular pattern in that it fills a flat panel witha grid of hexagonal cylinders, each of which is very close to regular. By hexagonal wemean six-sided; by regular we mean that the sides are the same length (in cross-section)and the corner angles have equal measure. (In other words, the sides and angles are allcongruent.)

In order to fully explain our description of the honeycomb, we’d have to preciselyspecify or define many concepts of plane geometry, such as lines, segments, polygons,angles, and measurement. (I assume for purposes of this discussion that you are atleast generally familiar with these concepts.) Furthermore, to compare the hexagonalgrid to other possibilities the bees might use, in order to discern the reason they usehexagons (from an evolutionary point of view - I’m not suggesting the bees think aboutit), we’d need to be clear about some basic assumptions of flat plane geometry and theconsequences of those assumptions.

One such assumption is that given any line l and any point P not on line l, thereis a line through point P that is parallel to line l. By definition, two lines are parallelif they have no point in common; that is, they do not intersect. (We do not assumeanything else about parallel lines, such as that they are equidistant from each other.)This assumption reasonably captures part of what we mean by “flat.” Observe that itis not true on a sphere. The shortest path between two points on a sphere runs along agreat circle - a circle centered at the center of the sphere, making it as close to straightas a path on a sphere can be. Each great circle is the intersection of the sphere witha plane through its center. Notice as well that the sphere may be reflected onto itselfin a great circle, just as a plane may be reflected onto itself in a line. Therefore, to atwo-dimensional inhabitant of a sphere, the sides of the line are symmetrical: the linedoes not appear to be curving, because all of the curving occurs in the curvature of thesphere itself (and can only be seen from a three-dimensional perspective). One mightsay that a great circle curves with the sphere but not in the sphere. (By “in” we meanin the surface, not the space enclosed by the sphere in three dimensions.) Therefore, itis natural to interpret the “lines” on a sphere to be its great circles. Any two planesthrough the center of the sphere intersect in a line through the center of the sphere, andany such line intersects the sphere in a pair of antipodal points. (The North and SouthPoles give an example of a pair of antipodal points, if we visualize the earth as a sphere.)These antipodal points are on both great circles; thus, any two great circles intersect attwo points. There are no parallel “lines” at all on a sphere! In fact, since lines in a planecan intersect in at most one point, whereas great circles intersect in two, great circles do

1.1. WHAT IS MATHEMATICS, AND WHAT IS IT GOOD FOR? 3

not satisfy even the most fundamental assumptions that lines in a plane do!The existence of a parallel line through any point not on a given line does not entirely

capture what we mean by “flat.” We further assume that if a pair of parallel lines is cutby a transversal, the alternate interior angles are congruent. (It is harder to imagine asurface that is not flat in this sense, and we won’t try to describe one here, but suchsurfaces do exist.) An important consequence of this assumption is that the sum of thedegree measures of the angles of any triangle is 180◦. From this result (theorem) we canfurther deduce that the corner angles of a regular triangle (usually called an equilateraltriangle) must be 60◦, the corner angles of a square (regular quadrilateral) must be 90◦,those of a regular pentagon, 108◦, and those of a regular hexagon, 120◦. In general, wecan prove the formula

(n−2n

)180◦ for the corner angles of a regular n-gon. (Bear with me

for a moment; we’ll see shortly how this relates to our question about the honeycomb.)

Remark. The word “equilateral” means that the sides are equal in length (that is, congru-ent). For polygons other than triangles, having congruent sides is not sufficient to makethe polygon regular. The angles of an equilateral polygon with more than three sidesneed not be congruent. (An equilateral quadrilateral, for example, is called a rhombus.Draw some examples of rhombi that are not squares.) However, an equilateral trianglemust be equiangular as well, a restriction that justifies our use of the word “equilateral”as equivalent to “regular” in this special case.

Example. Show that an equilateral triangle must be equiangular. On what assumptiondoes this result rest? (By show we mean to prove informally. We cannot yet give acompletely rigorous proof because we have not explicitly stated all of our assumptionsand definitions.)

Solution: One of the basic assumptions of plane geometry is the side-angle-side (SAS)criterion for triangle congruence. (By definition, two triangles are congruent if there isa correspondence between their vertices for which the corresponding angles and sidesare congruent.) Two sides of a triangle and the included angle determine the other sideand angles; hence, given a correspondence between two triangles for which two pairs ofcorresponding sides and the included angles are congruent, the other corresponding partsmust also be congruent.

From the SAS criterion, it immediately follows that the base angles of an isoscelestriangle are congruent. (I expect you are familiar with this theorem.) For in 4BAC(here we have labeled the vertices of an arbitrary triangle with capital letters to makediscussion easier), if AB ∼= AC, then 4BAC ∼= 4CAB, by the SAS criterion. (That is,the triangle is congruent to itself with vertices A and B interchanged. We have used thesymbol “∼=” as a shorthand to mean “congruent.”) Thus, ∠A ∼= ∠B. Applying the baseangle theorem twice to an equilateral triangle, we see that all of the angles are congruent.


Example. Consider a plane satisfying our second assumption: if two parallel lines arecut by a transversal, then the alternate interior angles are congruent. Show that the sumof the degree measures of any triangle in this plane must be 180◦.

Solution. Given any triangle, call it 4ABC, consider a line through A that is parallel

to line←→BC, as in the figure below. Let the degree measures of angles ∠A, ∠B, and ∠C

be α, β, and γ, respectively. Since the alternate interior angles must be congruent, andthe angle measure of a straight angle is 180◦, we see that α + β + γ = 180.

A

B C

ab go

go

o

bo

o

1.2 Exercises

1. Show that the sum of the interior (corner) angles of a polygon with n sides is180(n− 2)◦. (Hint: divide the polygon into triangles. An alternative method is toextend one end of each side and consider the sum of the angles supplementary tothe interior angles; the total rotation as you travel around the polygon is 360◦. Itmight not be so obvious that this fact also depends on the plane being flat. Canyou see why?)

2. Use the result of the first exercise to prove that the angle measure of each cornerangle of a regular n-gon is

(n−2n

)180◦.

1.3 Back to the bees! Analyzing the honeycomb.

In order to analyze the honeycomb, we consider the abstract concept of a flat planecovered by a grid of congruent regular polygons. What shape can the polygons have?Since at least three corners must meet at any vertex of the grid, and the angles around avertex must add up to 360◦, we see that the only possible shapes are triangles, squares,or hexagons. The angles of a pentagon are too small to fit three around a vertex, but

1.4. EXERCISES 5

too large to fit four. The angles of any polygon with more than six sides are too largeto fit three.

Now, compare the area to the perimeter for each of these figures. You will see that,for a given area, a hexagon has the smallest perimeter (followed by a square and then atriangle). Therefore, the bees can store the same amount of honey with much less beeswaxin the walls of the honeycomb if they use hexagons! Making beeswax takes energy andmaterial, and energy and material require food. By being more efficient, the bees arebetter able to survive and reproduce using the available food in their environment. It’snot that bees can do geometric theory, but that natural selection has favored bees thatmake hexagonal honeycombs. Geometric theory has enabled us to see why!

1.4 Exercises

1. Calculate the area of an equilateral triangle as a function of its perimeter, p. (Theunits are not important; if some unit is chosen for length, the area will be in squaresof that unit.)

2. Do the same for a square and regular hexagon.

3. Explain why, for a given area, the hexagon has the smallest perimeter, followed bythe square and then the triangle.

4. Solitary insects and worms that build free-standing shelters for themselves typicallybuild round cylinders. Why?

Remark. A complete analysis would have to account for the forces on the walls, whichwould determine how thick they needed to be. These forces might be different for thedifferently shaped grids. But even our crude analysis gives insight into the pattern wesee in a honeycomb.

1.5 Examining our assumptions.

We have seen that our assumptions are not true on a sphere. What if bees built sphericalhoneycombs instead of flat ones? Not surprisingly, the result about the angles of atriangle is not true on a sphere. The measures of the angles of a triangle must add upto at least 180◦, and the sum can be much greater. Using either a model of a sphere oryour imagination, you should readily be able to see that this is true. You should be ableto construct an equilateral triangle whose angles all measure 90◦, for example. (Hint:


Imagining the sphere as planet Earth, start anywhere on the equator, travel directly tothe North Pole, turn 90◦, return directly to the equator, and return along the equatorto where you started.)

Because there is some flexibility in the angle size of a regular polygon on a sphere,we can cover a sphere with a grid of triangles, squares, or pentagons. However, hexagonscannot be used, nor can polygons with more than six sides, because the angles are toolarge. It can be shown that the angles of a polygon determine its size. (The larger thepolygon, the larger the angles, as you can convince yourself by looking at some examples.)Therefore, there is only one way to make each possible type of grid on a given sphere.There must be either four triangles, with three meeting at each vertex, eight smallertriangles, with four meeting at each vertex, 20 even smaller triangles, with five meetingat each vertex, six squares, with three meeting at each vertex, or 12 pentagons, withthree meeting at each vertex. These grids give rise to the five Platonic solids: thetetrahedron, octahedron, icosahedron, cube, and dodecahedron, respectively. These arethe only possible regular polyhedra. (Why can’t four squares or pentagons meet at avertex?)

1.6 Extra credit exercises!

1. Is the theorem about the base angles of an isosceles triangle true on a sphere?Explain your answer. (Hint: Is the SAS criterion true on a sphere? What makesthe SAS criterion work? Think about the motions that move the points of a planeor sphere around without altering the distances between them.)

2. Is the SAS criterion true on a surface that is rounded in some places and flat inothers?

1.7 Mathematics is abstract by nature.

To analyze the honeycomb, we made an abstract model of it that captured its mostimportant properties, a sort of “ideal” honeycomb. Good as the bees are at building –and they are very good – the cells of a real honeycomb are not perfectly regular, nor dothey have exactly the same volume. Furthermore, the walls of the honeycomb are notof exactly uniform thickness. (This variation is not all due to error. The interior wallsof the cells are intentionally rounded; therefore, the cell walls are thicker at the corners.The bees also build cells for raising young, which have different sizes depending on thetype of bee being raised.) Our assumption that the cells are hexagonal is more closely

1.7. MATHEMATICS IS ABSTRACT BY NATURE. 7

realistic; my entomologist colleagues did not know of any instances in which the beesused a different number of sides for any cell. It is not surprising that this assumptionaccords most closely with reality, since it is fairly stable with respect to variation in thesides and angles: if the bee turns somewhat more or less than 120◦ at the corners as itproceeds around or if makes some sides a little shorter or longer, the shape will still closeup into a hexagon.

When we create an abstraction from a collection of physical examples, we assume thatsome properties hold exactly, even if they are only approximately true in the examples. Inaddition, we ignore many details, which may vary. When we abstract from mathematicalexamples, we also ignore many variables while focusing on essential commonalities. Forexample, a collection of triangles may have various shapes and sizes, but they all havethree vertices and three sides. These two simple properties exactly define the concept oftriangle, of which there is a great variety of examples.

Examining our assumptions leads to new possibilities. What if we made differentones? What if conditions were different?

Even the simplest mathematical concepts are inherently abstract. Consider, for ex-ample, a natural number such as three. We know what three oranges are, or three cats,or three dogs, but what is the quality of “having three elements” that these collectionsof objects have in common? A set has three elements if it can be put in one-to-one cor-respondence with the set {1, 2, 3}. (This process is, after all, how we count.) However,this observation does not help much, since the elements of this set are numbers, whichwe are trying to define! What are these numbers? The elements 1, 2, and 3 are not justsymbols, because if we wrote these numbers in Japanese they would still be the samenumbers and serve the same purpose. It took mathematicians a long time to come upwith a satisfactory, universal answer to this question, which we introduce toward the endof this course.

We cannot completely understand the properties of an abstract concept by experi-ment. For one thing, we cannot test all of the infinitely many possibilities. Even if wecould, our results would give us no insight into the relationship between the properties wediscovered and the defining properties we assumed, and it is exactly these relationshipsthat constitute explanation and understanding. Therefore, we must logically deduce theconsequences of our assumptions. It is often surprising, and enlightening, how muchcan be deduced from very simple assumptions. For example, the two simple propertiesthat define a triangle, along with the SAS criterion and some simple assumptions aboutlines and points, have as consequences that the bisectors of the angles of any triangleare concurrent (that is, they all intersect at a single point). Since we do not need anyassumptions that depend on our surface being flat, this result is true on a sphere as well.(However, the sequence of logical steps required is quite long, requiring the development


of much of the theory of neutral geometry. The clarity obtained by reducing our reason-ing to simple assumptions has a price: it takes a lot of work to establish sophisticatedresults.)

In order to make deductions correctly, we must do two things:

• Precisely specify our assumptions and definitions; and

• Establish effective steps that determine whether or not a conclusion has to be trueunder the assumed conditions.

Learning to do these two things is a central purpose of this course! Another centralpurpose is to learn to communicate our reasoning clearly to others and to read math-ematical texts effectively, extracting their essential ideas and the relationship of thoseideas to what we already know. Along with deductive reasoning, metaphor and analogyplay a role in communicating mathematical ideas and nurturing an understanding ofthe mathematical landscape that illuminates formal derivations (which anyone, even themost devoted “geek,” would find insufferably boring by themselves!).

Please note that truth in mathematics is always conditional. When we say a theoremis “true,” we mean that it is true on condition that the assumptions of the theory underdiscussion are true. We make these assumptions explicit initially, but of course it wouldnot be practical to include them as part of the statement of every theorem; instead, theymay be thought of as a sort of definition of the whole system under study. We will notdeal with notions of “universal” truth (except in the context of a particular specified“universe”). Such notions belong to religion or philosophy (if they belong anywhere atall), not mathematics.

The concept of validity is also used in mathematics, and is related to the idea oftruth, but with slightly different connotations. An argument is valid if it is logicallycorrect; a conclusion is valid if it has been derived from the stated assumptions by avalid argument.

Some mathematicians prefer to avoid using the word “true,” in order to emphasizethe conditional nature of mathematics. Indeed, what would it mean for a mathematicalstatement, which refers to purely abstract products of our collective imagination, to betrue? In the final analysis, mathematics is only given life and meaning by the way itsatisfies our aesthetic sensibilities or models aspects of our experience. However, I findit awkward and unnatural to completely avoid the word “true.” It is often convenient torefer to statements as “true” for purposes of discussion, as long as we keep in mind thecontextual nature of this “truth.”

1.8. THE DANCE OF ABSTRACTION AND CONCRETENESS. 9

1.8 The dance of abstraction and concreteness.

Making mathematical models is a crucial way of understanding the world. Without them,we would have only collections of facts and observations without any understanding ofthe relationships among them. Without them, we cannot understand how or why thingswork the way they do, or make any but the most simplistic predictions.

This is not to say that the only purpose of mathematics is its application to scienceand other disciplines. Many people find the ideas of mathematics to be beautiful andinspiring in themselves. It gives mathematicians joy to pursue mathematics for its ownsake, regardless of any application their work might have.

It is also worth noting that often work that was initially thought of as entirely “pure”does turn out to have valuable applications, such as the use of number theory to developcodes for the secure transmission of information. These codes are used everyday, when-ever we make a financial transaction over the internet.

By deepening our fundamental understanding of patterns, structure, and logic, wemake possible applications that we could not have even conceived, much less achieved,before. Practical application is sort of like happiness: you achieve it best if you do notpursue it too directly!

As mathematics develops, there is a drive for increasing theoretical abstraction, whichallows for more general results. This drive operates in tension with the competing driveto find and explore concrete examples that inform our intuition (sometimes by reflectingour real world experience, sometimes by referring to abstractions with which we havebecome familiar and comfortable), giving relevance and meaning to the concepts we havedeveloped. This tension is never-ending and inevitable.

A closely related dynamic tension exists in mathematics between the constructiveand synthetic approaches to its development. The constructive approach strives torepresent mathematical objects as particular sets, which gives them a concrete char-acter, although not exactly in a physical way. The idea is to “construct things fromscratch,” so to speak, deriving their essential properties from the constructed represen-tations. A classic example is the representation of the numbers 0, 1, 2, 3, . . . by the sets∅, {∅}, {∅, {∅}} , {∅, {∅}, {∅, {∅}}} , . . ., where the symbol “∅” denotes the unique set con-taining no elements, known as the empty set. This representation has two important andnoteworthy properties: first, the set representing each number is constructed in a sys-tematic way from the preceding one, by simply including it along with all of its elements(the previous representatives), so there is a natural means of continuing the sequence;second, each representative has the number of elements that it represents. In contrast,the synthetic approach proceeds by simply assuming the existence of concepts satisfyingcertain properties and exploring what must be true about them as a consequence of


these properties. For example, we begin our study of algebra by assuming the existenceof a system of real numbers with specified operations and properties. A classic exampleof the synthetic approach occurs in geometry, in which we begin by assuming there arethings called points and lines, which we cannot define, and some relationships amongthem, which we also cannot define, such as that a line may pass through a point, whichwould then lie on that line, or that one point may lie between two other points on a line.(We do visualize points and lines, perhaps thinking of them as dots and lines on paper,but these representations are physical, not mathematical, and only approximate.) Weassume additionally that these relationships satisfy certain properties, such as that giventwo distinct points, there is exactly one line passing through both of them. (Otherwise,the theory would be a meaningless free-for-all!) The properties we choose are abstractedfrom our visualizations. From our undefined concepts, we can define new ones, such asthat of a segment: given two points, A and B, segment AB is the set of points containingA, B, and all the points between A and B. We can in turn assume the existence of rela-tionships between these defined concepts, such as congruence, explore their properties,and so forth. The rest, as they say, is history!

Mathematics moves fluidly between construction and synthesis, starting at variouspoints and moving both towards greater complexity and towards greater fundamentalunderstanding and unification. New concepts are constructed from synthetic ones, asin the example of segments, constructed from the undefined concepts of point and be-tweenness, and they in turn may motivate new synthetic concepts, such as congruence.After exploring a synthetic system of axioms, we may seek a foundation of more basicconcepts from which a model satisfying these axioms can be constructed. Alternatively,after exploring a construction, we may seek a set of axioms that underly and explainits properties, which might in turn apply to other constructions. To use geometry asan example, we might initially take the synthetic approach pioneered by Euclid andbrought to culmination by Hilbert,1 but eventually construct mathematical represen-tations of points and lines using a coordinate system. Among other things, we mightthen study the transformations of the coordinate plane that take segments and anglesto congruent ones. The properties exhibited by these geometric transformations mightinspire us to consider some of the structures of abstract algebra, such as groups, definedmerely as sets with certain operations, satisfying certain properties. (Now, rather thanabstracting from the physical world, we are abstracting from mathematics, focusing onsome properties while ignoring other, more specific ones.) The foundational concepts of

1David Hilbert, probably the most esteemed mathematician of the late nineteenth and early twentiethcenturies, developed a complete set of axioms for plane geometry in the spirit of Euclid, filling in theimplicit assumptions that Euclid had made but left unstated.

1.9. A NOTE ON HOW THIS COURSE WILL PROCEED. 11

mathematics, codified in the axioms of set theory and logic, are synthetic: you have tostart somewhere. (The set theoretic construction of the natural numbers, for example,is rooted in the assumption that a unique set with no elements exists.)

For some sets of properties, there are many examples satisfying them, but if we assumeenough properties, essentially one representation will remain, in that any other will haveexactly the same structure, just described in different terms. Once you assume enough,there is no room for variation. For example, there is essentially one representationsatisfying all of the properties of Euclidean geometry, which we call the Euclidean plane.You are familiar with its standard construction from high school algebra: the pointsare pairs of real numbers, and the lines are the solution sets of linear equations in twovariables.

1.9 A note on how this course will proceed.

Courses on mathematical methodology and structure tend to falter in one of two ways.At one extreme, one can avoid all ambiguity by adopting a high level of formality fromthe start, insisting for example that we must define the Cartesian product of two sets(don’t worry if you don’t know what that is yet) before we can talk about familiar binaryoperations such as addition. The result is tedium, along with the loss of any appreciationfor how or why anyone came up with these formal notions in the first place. Proceedingin this fashion is akin to trying to learn grammar before you learn to talk (althoughonce you can talk, you most certainly should refine your understanding of language bystudying grammar). I am sure you have no trouble understanding, without referenceto Cartesian products, that addition is called an operation because it takes a pair ofnumbers and associates to them a third number, their sum, or that it is called binarybecause it operates on two numbers at a time. Yet, at the opposite extreme, one canbegin quite imprecisely, assuming that a vague familiarity with the properties of numbersis sufficient to start “proving” things. The result is confusion, as it is not clear to studentswhat may be assumed in a proof and what may not.

We will follow neither of these approaches, nor even a compromise between them.Instead we will begin with familiar concepts, described at a comfortable level of abstrac-tion and formality, but with precision. As we proceed, we will examine these conceptsin increasing depth and also see them within an increasingly general context. We willtemporarily have to spell out some of our assumptions in non-technical language. (Even-tually we will see that they all boil down to the existence of certain sets; however, whereasthis technical reduction provides clarity, insight, and a certain kind of satisfaction, it isnot necessary for unambiguous communication.) We may make some assumptions pro-


visionally, as mathematicians often do when interested in exploring their consequences,informally justifying or motivating these assumptions and proving them later based onmore fundamental ones. The important point is that we will attempt to be both explicitand exact about our current assumptions at all times. These and only these explicitassumptions, along with previously proven results, may be used in proofs.

We begin with some motivation, in which we ask you to examine familiar proceduresto determine what unstated, implicit assumptions and processes of reasoning, so generallyaccepted we may not even notice them, are required to justify what we do. We hopethat this exercise will help you understand and appreciate the mathematical theory it isused to motivate.

Chapter 2

Motivation: Area Revisited

In which we examine the fundamental principles of measure and derive some formulasfrom them.

2.1 What is Area?

Continuing in a geometric vein, let us consider area on a flat surface. Rather than beginwith formulas by which it may be calculated, let us instead consider what we mean byarea: what area is and what it should do. From these assumptions, we will then deduceformulas for calculating it in some instances. In this manner, we will better understandthe meaning behind the formulas. This understanding is a crucial aspect of mathematicalthinking, one with which you may not have as much experience as you should!

Area is a positive numerical quantity associated to a region of the surface. Thus,area is a type of measure; the number associated to a region is its measurement. Whatproperties should such a measure have?

The numbers associated to two regions should allow us to compare the sizes of theregions, but we must be precise about what we mean by “size,” since we can measureand compare sizes in different ways. If we wanted to know how long it would take towalk around the boundary of a region, we would measure its size by the distance wewould have to travel, but that is not what we mean by area; the distance around theboundary of a region is its perimeter. If we want to know how much wheat we couldgrow in the region, the area measurement (along with the yield for wheat) should tell us.A long, skinny region could have the same area as a shorter, fatter region, but a muchlarger perimeter.

As with length, the area of a region should depend only on its size and shape: con-

13

14 CHAPTER 2. MOTIVATION: AREA REVISITED

gruent figures should have the same area. Mathematicians say that area is a geometricinvariant, meaning that it does not vary among figures that are geometrically the same,in the sense of being congruent. Perimeter is also a geometric invariant.

Obviously, the property of being a geometric invariant is not enough in itself todetermine area, since it is true of perimeter as well. For a second property, we can alsotake a clue from the properties of length. Note that if a segment (of either a straight lineor a curve) is divided into parts that do not overlap except at their endpoints, its lengthis the sum of the lengths of its parts. Similarly, if a region is divided into parts which donot overlap except along their boundaries, its area should be the sum of the areas of theparts. If I can grow 10 bushels of wheat in a field and 7 bushels in a neighboring field,then I can grow 17 bushels in the region covered by both fields. (Notice that area differsfrom perimeter in this respect; perimeter does not have this property, because a sharededge would count only once when the regions are combined.)

We are really talking about analogous measures in different dimensions. Length isone-dimensional, so zero-dimensional objects, namely individual points, have no length.If the parts of a segment overlap only at their endpoints, the length of the whole segmentis the sum of the lengths of its parts because no length is shared. Area is two-dimensional,so zero- and one-dimensional objects, namely points and segments, have no area. If theparts of a region overlap only in one-dimensional objects, then its area is the sum of theareas of its parts because no area is shared.

Remark. In higher mathematics, it turns is sometimes better to consider oriented mea-sures. Oriented length is positive in one direction, negative in the other, and this notioncan be extended to higher dimensions. But that is another story.

The two properties discussed above are fundamental, but alone they are not sufficientto determine a unique area measure. This is because any measure with these propertiesis scalable: if we double, triple, halve, or scale all of the measurements by any otherfactor, the two properties clearly still hold, and the comparison of regions remains valid.To determine a particular area measure, we must choose a unit by convention. (This isalso true for perimeter, of course; you get different measurements in meters and feet, forexample.)

Our area unit will depend on the unit chosen for the more basic measure of distance.Once this unit is chosen, say centimeters (cm), we define the area of a rectangle to bethe product of its length and width: if R is a rectangle with length l cm and width wcm, then its area is AR = lw cm2. “Wait,” you say! “You just threw a formula at us;I thought we were going to deduce the formulas”! Well, yes, we are going to deduce allof the other formulas, but in order to establish an area measure we need to assume oneformula.

2.1. WHAT IS AREA? 15

Our assumed formula must be consistent with our first two assumptions. Clearly, itis. Congruent rectangles have the same length and width, hence our formula gives themthe same area. And if a rectangle is divided into two smaller rectangles by a segmentparallel to a pair of its sides, the area given by our formula for the whole rectangle is thesum of the areas given by our formula for the two smaller rectangles. (See Figure 2.1.)This fact depends on the distributive property of addition and multiplication, which onemight view as being assumed in order to make number operations consistent with ourintuitive understanding of area! By seeing that our formula is a natural choice havingthe two desired properties, we understand the meaning behind it!

l = l + l

w

l1

1

l2

2

{Figure 2.1: lw = (l1 + l2)w = l1w + l2w

Remark. This method of establishing a unit does not work on a curved surface, such asa sphere, because rectangles do not exist on a curved surface. We can certainly measurearea on a curved surface, but we need a different method of establishing a unit.

From the area formula for a rectangle, we can easily deduce the area formula fora parallelogram. Observe that by dropping a perpendicular from an obtuse corner tothe base, we divide a parallelogram of base b cm and height h cm into a right triangleand a trapezoid; hence, its area is the sum of the areas of these two figures. Observefurther that, since the height of the parallelogram is measured perpendicular to the base,a rectangle with sides of length b cm and h cm may also be divided into a triangle andtrapezoid that are congruent, respectively, to these; hence, the rectangle’s area is given


by the same sum. Since the area of the rectangle is known to be bh cm2, the area of theparallelogram must also be bh cm2. (See Figure 2.2.)

b

h

Figure 2.2: The area of a parallelogram of base b and height h is bh.

2.2 Exercises

1. Deduce that the area of a triangle of base b units and height h units is 12bh squared

units. (Hint: Construct a parallelogram from two copies of the triangle, one ofwhich is rotated by 180◦ from the other; since the triangles are congruent, each hashalf the area of the parallelogram.)

2. Deduce that the area of a trapazoid with parallel bases of length b1 units andb2 units and height h units is the height times the average of the bases: 1

2(b1 +

b2)h squared units. (Hint: Divide the trapezoid into two triangles; alternatively,construct a parallelogram from two copies of the trapezoid; alternatively, divide thetrapezoid into a parallelogram and a triangle; alternatively, divide the trapezoidinto a rectangle and two right triangles. There is more than one way to skin a cat!)

3. If the length of a rectangle is 3 meters and its height is 51 centimeters, could wesensibly describe its area as 153 meter-centimeters? If so, what region naturallyrepresents one meter-centimeter?

2.2. EXERCISES 17

abc

a

b

ca

bcc

a + b 2 2 = c2

ab

Figure 2.3: The Pythagorean Theorem

4. The Pythagorean Theorem states that if a right triangle has legs of length a unitsand b units and hypotenuse of length c units, then a2 + b2 = c2. Referring toFigure 2.3, deduce the Pythagorean Theorem from the properties of area. Besure to justify, using a fundamental result of Euclidean geometry we have alreadydiscussed, that the angles add up correctly where corners and edges meet.

5. Like most theorems, the Pythagorean Theorem is conditional : it has a hypothesis,which imposes conditions under which its conclusion must hold. As the theoremis stated above, the conditions are all mixed together, conveyed via the words“right,” “legs,” and “hypotenuse,” along with the manner in which the sides arelabeled. The geometrically significant condition is clearly that the triangle hasa right angle; while it is certainly important to label the sides correctly in theformula, this consideration is merely technical.

It is often the case that we can clarify the meaning of a theorem by laying outthe technical conditions in a preliminary exposition in order to isolate the moremeaningful hypotheses in the final statement. Let us restate the PythagoreanTheorem in this manner: Given any triangle, let its vertices be labeled A, B,and C, and let the sides opposite A, B, and C have lengths a, b, and c units,


respectively. If ∠C is a right angle, then a2 + b2 = c2.

The converse of a conditional statement is obtained by exchanging its hypothesisand conclusion. Keeping the same preliminary conditions, a meaningful converseto the Pythagorean Theorem is immediate from the version above: If a2 + b2 = c2,then ∠C is a right angle.

Is the converse of the Pythagorean Theorem true? If so, outline how you wouldprove it, stating any additional basic geometric results (such as congruence criteria)you would need. If it is false, give a counterexample. (A counterexample is a specificexample for which the hypothesis is true but the conclusion is not, thereby provingthat the statement in question is false.)

6. There is also an unstated condition preliminary to the Pythagorean Theorem aswe deduced it: the triangle must lie in a flat surface. Without this condition, theargument using our area formulas would not apply.

Just because a particular argument applies only under certain conditions does notmean a statement is not also true under other conditions; it might simply be thata different argument is required. (Perhaps a more general argument that appliesunder both sets of conditions can even be found.)

In this light, is the Pythagorean Theorem true on a sphere? If so, outline how youwould prove it, stating any properties of spherical geometry you would need. If itis false, give a counterexample. (Hint: consider the discussion of spherical trianglesin Section 1.5.)

Remark. Many treatments of area take triangles rather than rectangles as the fundamen-tal regions, since any polygon can be divided into triangles. This approach is necessaryon the sphere, where rectangles don’t exist.

Remark. Calculating the area of circles and other regions with curved boundaries requiresthe concept of limit, as defined and studied in calculus. This approach goes back at leastto Archimedes in ancient Greece, but its precise logical development occurred muchlater. Calculus is also needed to calculate the areas of regions on surfaces of non-uniformcurvature.

2.3 What have we left out?

I hope the arguments above are clear and convincing; nonetheless, much has been leftout. You probably didn’t notice because the material is so familiar. Can you see, for

2.3. WHAT HAVE WE LEFT OUT? 19

example, how we used the fact that the surface under consideration is flat? (Hint: Ithas to do with parallel lines and angles. As mentioned previously, rectangles, defined asquadrilaterals with four right angles, don’t even exist on a sphere.)

Not only are there unstated details in the arguments, there are unstated assumptionswhose proof requires theoretical development we have not covered. In common parlance,there are logical gaps (sort of leaps of faith, except that perhaps you didn’t notice youwere leaping). For example, the argument for a parallelogram requires knowing that theopposite sides are congruent. This fact is not part of the definition, which only requiresthe opposite sides to be parallel. That they must also be congruent can be deduced bydividing the parallelogram into two triangles; however, these triangles share only oneside, and we know nothing about the other pairs of sides. (Remember, we are trying todeduce logically that they are congruent; we cannot assume they are.) Thus, to provethe two triangles are congruent, we need another criterion, the angle-side-angle (ASA)criterion. (We know the corresponding angles are congruent because they are alternateinterior angles for parallel lines cut by a transversal, namely the diagonal that dividesthe parallelogram into triangles.) Do we need to make ASA a separate assumption,or can it be deduced from more fundamental assumptions, such as SAS? How do weknow this new criterion is even consistent with our other assumptions? In fact, ASAcan be deduced from SAS and more fundamental assumptions. Although doing so is notparticularly difficult, it is somewhat involved and beyond the scope of this course.

As another example, consider the perpendicular dropped from the obtuse corner ofthe parallelogram. How do we know we can drop a perpendicular from any point to anyline, and how do we know that in this case it lands on the base, rather than outsidethe parallelogram? Proving this last point requires a major result of plane geometrycalled the Exterior Angle Theorem. This is not a trivial point, since the Exterior AngleTheorem is not true on a sphere. (Fortunately, there are no parallelograms on a sphere,either!) Again, the theoretical development required is not particularly difficult, butit takes considerable time and is beyond the scope of this course. You will have theopportunity to fill these gaps by taking a course in classical geometry.

Logical gaps are intrinsic to the process of learning and discovery. We must oftenmake conjectural leaps in order to see the framework of a theory, either coming backto prove these conjectures later or trusting the work of others. If they turn out to befalse, we start again with a new approach, informed by the insights gained from seeingour errors. Indeed, if we tried to proceed step by logical step without ever trying to seefurther ahead, we would never get very far. It would be like walking while looking downat one’s feet!

It is neither practical nor necessary to fill in every logical step as we proceed; however,it is very important to recognize when we are leaping over gaps and consider the logical


steps that might bridge them.

Chapter 3

Motivation: Solution Sets

In which we use the example of solving an equation to introduce some fundamental con-cepts, language, and notation used in set theory and logic.

You are familiar with solving equations. Let us look at some simple equations inorder to understand more precisely what it means to solve one and by what process wedo it.

3.1 What does solving mean exactly?

Consider the following simple equation:

x+ 2 = 5.

An equation such as this one is a statement in mathematical language. Translated intoEnglish, this statement says, “When two is added to some number, let’s call it x, theresult is five.”

Because the equation involves an unknown quantity, the number represented by thevariable x, it implicitly asks a question: What number could x be? When x is replacedby a specific number, such as 3 or 4, the resulting equation must be either true or false(but not both). As you well know, the equation 3 + 2 = 5 is true, whereas 4 + 2 = 5 isfalse. (As emphasized in Chapter 1, this notion of truth or falsity is conditional, basedon the assumptions we make about numbers and the operation of addition. Indeed, italso requires the definitions of 2, 3, 4, and 5 in terms of 1: 2 = 1+1, 3 = 2+1, 4 = 3+1,5 = 4 + 1. By definition of 4, 4 + 1 = (3 + 1) + 1 = 3 + (1 + 1). By definition of 2,3+(1+1) = 3+2, so 3+2 = 4+1 = 5, where the final equation follows from the definition

21

22 CHAPTER 3. MOTIVATION: SOLUTION SETS

of 5. Do you remember the name for the property we used to regroup the numbers alongthe way? We have not yet made our assumptions about numbers explicit, but theyare based on our innate intuition about what whole numbers and addition mean. Thisintuition is so fundamental that it is not unique to humans; other animals, such as crowsand dolphins, can also count and add small whole numbers.)

In the study of logic, a statement that must be either true or false is called a propo-sition; formally, we say a proposition has a unique truth value. Not all statements arepropositions. For example, “Mozart wrote beautiful music” is a statement, but we can-not assign it a unique truth value; whether or not it is true is a matter of individualopinion.

Because an equation has a unique truth value for any number that is substitutedfor the variable, it defines a set, meaning a collection of things that can be identified,called its solution set : the numbers for which the equation is true are in the solution set,the others are not. Solving the equation means finding the most transparent and directdescription of this set: instead of merely knowing that it is the set of numbers, x, forwhich x+ 2 = 5, we prefer to also know that this set contains exactly the number 3 andno other numbers.

The things contained in a given set are called its elements. Symbolically, if S is aset, we write x ∈ S to denote that x is an element of (or “belongs to”) S. By the natureof the concept, a set is completely characterized by its elements; thus, if two sets haveexactly the same elements, they are equal (that is, they are the same set). An equationdefines a set in terms of a proposition about its elements. More symbolically, we write:

{x : x+ 2 = 5}.

The brackets tell us we are expressing a set, and the colon translates into Englishas “such that.” Thus, the expression above translates into English (or, more accurately,“Mathlish”) as “the set of all x such that x+ 2 = 5.” Many authors use the symbol “|”instead of the colon; either symbol is equally acceptable.

Our solution defines the same set as an explicit list of elements (in this case, a list oflength one):

{x : x+ 2 = 5} = {x : x = 3} = {3}.

A proposition about a variable is called an open proposition. Two open propositionsare considered to be equivalent if they are true for exactly the same set of things, thatis, if they have the same solution set. Thus, the open propositions x+ 2 = 5 and x = 3are equivalent. The sets {x : x + 2 = 5} and {x : x = 3} that are described by theseequivalent open propositions are equal because they contain the same elements, just

3.1. WHAT DOES SOLVING MEAN EXACTLY? 23

identified in different ways. In summary, equivalent open propositions describe equalsets.

If we can find an open proposition equivalent to the original equation that explicitlylists the elements of the solution set, that is, a proposition of the form “x = this numberor x = this number . . . or x = this number,” then we have solved it. Note that the orderof the list of elements for a set makes no difference; it doesn’t even matter if an elementis repeated. We are only concerned with what is in the set. Note in addition that thelist might be empty, as for the equation x+ 2 = x. In this case the solution is the emptyset - the unique set with no elements at all, denoted by the symbol ∅. Although thesolution to the simple equation x + 2 = 5, which is only one step away from a directdescription, is obvious, many equations are more difficult to solve; consequently, knowingtheir solutions is commensurately more helpful.

A theme of this book is that mathematics, although special, is not that different fromthe rest of life, and this situation is no exception. For example, the proposition, “Someperson visited my friend John yesterday evening,” invites an attempt to figure out, bylogical investigation, who it was. If we find that this proposition is equivalent to, “Someperson is my friend Janet or some person is my friend Todd,” we have solved the mystery(and can move on to wondering what Janet, Todd, and John might have talked about).

Before examining what is involved in finding an explicit list of solutions to an equa-tion, we should pause and be a little more careful about just which numbers we areconsidering as possible values of x. In fact, since we haven’t placed any restriction onwhat x could be in our description of the set {x : x+2 = 5}, x could be any mathematicalobject at all (whatever all the possible mathematical objects might be - another goodand serious question); although the defining equation only makes sense for objects towhich 2 could be added, this sloppiness is clearly unsatisfactory and we will henceforthstudiously avoid it!

I assume you are sufficiently familiar with the systems of natural numbers, integers,rational numbers, real numbers, and complex numbers to do at least simple computa-tions; we will discuss all of these number systems with great care in future chapters.(By system we mean a set of numbers together with the operations, such as additionor multiplication, that apply to them.) Let us review, since we need to use them, thestandard symbols that represent the sets of numbers in these systems: the set of naturalnumbers is denoted by N; the set of integers is denoted by Z, the set of rational numbersis denoted by Q, the set of real numbers is denoted by R, and the set of complex numbersis denoted by C. Each of these systems expands on the previous one. Authors vary asto whether or not zero is included in the set of natural numbers. We will include it:N = {0, 1, 2, 3, . . .}.

For the simple equation x + 2 = 5, it does not make any difference in which system


we are working, since we get exactly the same solution in all of them; however, for manyequations the system does make a difference. For example, the equation 2x = 5 hasno solution in the systems of natural numbers or integers, although its solution set ineach of the other systems is {5

2}. The equation x2 = 2 has no rational solutions, but its

solution set in either the real or complex number system is {−√

2,√

2}. (We will provelater that

√2 is not rational; that is, it cannot be written as a ratio of two integers.

This fact was discovered in ancient Greece by the school of Pythagoras.) In the natural,integer, rational, and real number systems, the equation x3 = 1 has a unique solution,1, but in the complex number system it has three: 1, −1−i

√3

2, and −1+i

√3

2.

We can specify the set of possible values for x in one of two ways. One way is toinclude it as an additional requirement in the proposition that defines the set, as in:

• {x : x ∈ R and x+ 2 = 5} = {3}

• {x : x ∈ R and x3 = 1} = {1}

• {x : x ∈ C and x3 = 1} = {1, −1−i√

32

, −1+i√

32}

This approach, while logically correct, may seem a bit awkward because the tworequirements in our description, x ∈ R and x + 2 = 5, play rather different roles; theformer is far more general than, and provides a context for, the latter. As we learned inour discussion of the Pythagorean Theorem, it often improves clarity to get contextualassumptions out of the way up front, which we can do with the following alternative,and slightly briefer, construction:

• {x ∈ R : x+ 2 = 5} = {3}

• {x ∈ R : x3 = 1} = {1}

• {x ∈ C : x3 = 1} = {1, −1−i√

32

, −1+i√

32}

Note that the symbol “∈” may be used as either a verb, as it is when it appears ina defining proposition (“such that x belongs to R”) or a preposition, as it is when it isused to restrict the values of x to a particular set (“x in R such that ”). Mathematicalusage liberally incorporates grammatical flexibility of this kind.

Now that we have thoroughly discussed both the context and meaning of solving anequation and introduced the language needed to formally describe solutions, let us turnto the process of finding the solution.

3.2. THE PROCESS OF SOLVING AN EQUATION 25

Remark. Ordinary language is often used in preference to formal and symbolic mathe-matical descriptions. We more often write, “The real solution to the equation x+ 2 = 5is 3,” than “{x ∈ R : x + 2 = 5} = {3}.” In many situations, formal mathematicallanguage is just not as natural, and it is limited in its expressive power to convey thefull subtlety of ideas: the analogies, comparisons, and connections that occur behind thescenes in our thinking. (But notice that we do prefer the symbolic language of algebra,which is so clear and concise, for expressing numerical operations and relationships.) Inmathematical formalism, there are no subtexts implied by a particular choice of words!Its virtues include uniformity and precision, which can help in logically tricky situations.

3.2 The process of solving an equation

For the simple equation above, we can summarize our process simply as subtracting 2from both sides; however, this pat description hides several properties of numbers andnumber operations that are being used. Aside from knowing how to add and subtractexplicit numbers correctly, not to mention the definitions of number names alluded toearlier, we also needed the following facts:

• The operative quality of addition: the sum of two numbers is determined by thenumbers that are added. If we start with two equal numbers (albeit expressed indifferent ways) and add the same number to each, we will get the same sum inboth cases.

• The definition of subtraction in terms of additive inverses: subtracting 2 is thesame as adding −2. (Since −2 is not a natural number, we technically need todefine subtraction differently in the natural number system, and of course it is notdefined for all ordered pairs of inputs; it is easiest just to work in a larger systemthat has additive inverses when solving equations.)

• The associative property of addition: (x+ 2) + (−2) = x+ [2 + (−2)] = x+ 0. Weneeded first to describe subtraction in terms of addition because the operation ofsubtraction is not associative; for example, (5−2)−3 6= 5−(2−3). The associativeproperty is relevant because addition is a binary operation: one can only add twonumbers at a time.

• The defining property of 0 as the additive identity: x+ 0 = x.

Let us examine in detail the logical structure, meaning, and significance of each ofthese assertions.


The operative quality of addition is a universal property: it applies to all numbers inthe system. Taking the real numbers as our system, for example, the precise statementof this property is that, for all real numbers x, y, x′, and y′, if x = x′ and y = y′, thenx+ y = x′ + y′.

Words such as “and,” “or,” “not,” and “implies,” and the pair “if ... then ...” acton propositions (open or not) to create new propositions whose truth value in any in-stance follows from that of their components. For example, a proposition composed oftwo propositions joined by “and” is true exactly for the instances in which both of itscomponents are true. The new proposition that results is called the conjunction of itstwo components. As another example, if the phrase “it is not the case that” is insertedbefore a proposition, the resulting proposition is true exactly for the instances in whichthe original proposition is false. The resulting proposition is called the negation of theoriginal. Since they act on propositions, conjunction and negation may be thought ofas logical operations; conjunction is binary, acting on a pair of propositions, whereasnegation is unitary, acting on a single proposition. (Note the similarity in form betweensuch phrases as “the sum of x and y” and “the conjunction of x = x′ and y = y′.”Similarly, note the similarity in form between “the cubed root of x” and “the negation ofx = x′.” Addition is a binary operation on numbers; taking the cubed root is a unitaryoperation.) A proposition that simply asserts a relationship between two things withoutinvolving any logical operations is called atomic. The negation of an atomic propositioncan generally be abbreviated with a slash: for example, the negation of the propositionthat x+ 2 = 5 is the proposition that x+ 2 6= 5.

There are two other logical operations, both of them binary; these are, of course,the operation represented by “or” and the one represented by “implies” (which is alsorepresented by “if ... then ...”). A statement of the form “x > 2 or x2 > 4” is called thedisjunction of its two components. The mathematical convention is that disjunctionsare never exclusive; that is, a disjunction is true if either or both of its componentpropositions are true, false only if both component propositions are false. For example,the proposition that x > 2 or x2 > 4 is true for x = 3 as well as for x = −3. (If aflight attendant asks if you want a blanket or a pillow, he is using disjunction in thenon-exclusive sense, since of course you can have both - and probably want both if youplan to go to sleep. If he asks if you would like coffee or tea, the sense of the disjunctionis exclusive, since the rules allow only one drink at a time.) Note that it is impossible tofind an instance in which x > 2 is true but x2 > 4 is false, a consideration that will beof interest shortly.

The proposition that if x = x′ and y = y′, then x + y = x′ + y′, is called conditionalbecause it sets up a condition, in this case that x = x′ and y′ = y′, under which aconsequence, in this case that x+y = x′+y′, is held to be true. Conditional propositions,


also called implications, are the most important in logic, because they are the way “onething leads to another.” For example, as we recently noted, the condition that x > 2ensures in any instance that x2 > 4, so the proposition that x > 2 implies x2 > 4 is truefor all real numbers x. There are many ways to phrase a conditional proposition; all ofthe following represent the same logical combination of the atomic propositions x > 2and x2 > 4:

• If x > 2, then x2 > 4.

• x > 2 only if x2 > 4.

• x > 2 implies x2 > 4.

• x2 > 4 if x > 2.

• x > 2⇒ x2 > 4.

Which form to use is a matter of style and, most importantly, clarity. Notice how thearrow symbol in the last example so eloquently conveys the condition x > 2 leading tothe consequence x2 > 4. On the other hand, for the more complicated proposition thatif x = x′ and y = y′, then x+ y = x′+ y′, the form “x = x′ and y = y′ ⇒ x+ y = x′+ y′”might be ambiguous, appearing to possibly mean that x = x′ (unconditionally) and, inaddition, if y = y′, then x + y = x′ + y′, an entirely different meaning, and one that iscertainly not universally true for all real numbers x, x′, y, and y′!

The converse of an implication is the proposition obtained by exchanging the con-dition and the consequence. The converse of a true proposition need not be true. Forexample, the converse of the universally true proposition that if x > 2, then x2 > 4, isthe proposition that if x2 > 4, then x > 2, which fails to be universally true. (The latterproposition fails because a number that satisfies the condition x2 > 4 could be less than−2; the condition is not sufficient to ensure the stated consequence in all instances.) Weoften summarize the conjunction of an implication and its converse with the phrase “ifand only if” or an evocative double arrow, as in the following (universally true) logicallyequivalent statements:

• x+ 2 = 5 if and only if x = 3.

• x = 3 if and only if x+ 2 = 5.

• x+ 2 = 5⇔ x = 3.

• x = 3⇔ x+ 2 = 5.


Suppose p(x) and q(x) are open propositions about a real number x for which theassertion p(x)⇔ q(x) holds universally (that is, for all x ∈ R). For any instance x ∈ Rsuch that p(x) is true, q(x) must also be true (since p(x) ⇒ q(x)). Similarly, for anyinstance x ∈ R such that q(x) is true, p(x) must also be true (since q(x)⇒ p(x)). Thus,p(x) and q(x) are true for exactly the same instances and false for exactly the sameinstances: they are equivalent.

Turning now to the remaining assertions, the associative property of addition is alsoa universal property: for all real numbers x, y, and z, (x+y)+z = x+(y+z). (Studentsoccasionally confuse the associative property with the commutative property, which isalso assumed to be true for addition in any number system. The commutative propertyinvolves only two terms, stating that for all real numbers x and y, x + y = y + x.The associative property involves three terms, and the order of the terms is the sameon both sides of the equation; it is just a matter of which pair of adjacent terms issummed first, after which the third term is added in whatever order it appears. Theassociative property simply states that the result of successive additions does not dependon how the terms are successively grouped into pairs.) It is critical to think in terms ofuniversal properties we can rely on, in order to avoid errors. Although it happens that(x+2)−2 = x+(2−2), the validity of this property of subtraction (and, more generally,the validity of (x + y) − y = x + (y − y) in all instances) must be established from thedefinition of subtraction and the associative property of addition.

The defining property of 0 has both universal and existential aspects: First we assertthat there exists a real number y such that, for every real number x, x+ y = y. Noticethat we are asserting the existence of a number with a particular universal property. Forconvenience, we might refer to this property as the identity property under addition: theresult of adding this number to any other given number is the identical given numberyou started with. It is easy to deduce that there is only one number with this property,for suppose x and x′ both have it. Then x = x + x′ = x′ + x = x′. The first equationholds by the identity property assumed for x′, the second follows from the commutativeproperty of addition, and the third from the identity property assumed for x. Since thereis only one such number, we can give it a name: 0. In summary, we define 0 to be theunique number with the property that, for any real number x, x+ 0 = 0. The number 0is called the additive identity. (By the commutative property of addition, we also havethat for any real number x, 0 +x = x. For simplicity, we include this fact in the definingproperty of 0. That way, we don’t have to remember in which order we stated it andapply the commutative property every time 0 appears in the other position!)

Similarly, the existence of additive inverses includes both universal and existentialaspects, but applied in the opposite order: for any real number x, there exists a realnumber y such that x+ y = 0. Here we are asserting the universal existence of numbers


having properties specific to the instance to which they apply. We will prove later onthat each real number x has a unique additive inverse, which we denote by −x. Ourassumption is only that it has an additive inverse, not that it has only one. (You canprobably figure out the argument establishing uniqueness. Try it!)

Now let us return to what seemed to be a trivial equation, knowing much more aboutwhat its statement and solution embody! Although you may remember solving equationsas a sequence of manipulative “bookkeeping” steps, these steps represent a process oflogical deduction in which properties such as those we have just discussed are applied. Ifa number with a particular property, such as 0, −2, or

√2 exists, then we can introduce

this number as appropriate and take advantage of its defining property. (You shouldbe able to state the defining property of

√2. Try it!) If a conditional property applies

universally, we can use it in any instance for which the condition is satisfied to obtainthe conclusion. Unfortunately, in many elementary treatments of algebra the logicalrelationships are omitted from the written manipulations. Let us put them in. In thecontext of the real number system:

• x+ 2 = 5⇔ (x+ 2) + (−2) = 5 + (−2), since addition is a well defined operation(operative quality of addition). The converse implication (⇐) holds because wecould undo what we did by adding 2 to both sides of our new equation, retrievingthe original one. Our introduction of the number (-2) is justified by the assumptionthat a unique additive inverse exists for any real number, along with the definitionof −2 as denoting the additive inverse of 2.

• (x + 2) + (−2) = x + (2 + (−2)), by application of the associative property ofaddition. As with the operative quality of addition, we do not need to know whatx is to apply this property, since it is universal.

• 2 + (−2) = 0 by the defining property of −2, and x + 0 = x, by the definingproperty of 0; therefore, x+ (2 + (−2)) = x.

• On the other hand, 5 + (−2) = 3. This calculation follows from the definitions of5, 4, 2, −2, and 0, along with the associative property of addition: 5 + (−2) =(4 + 1) + (−2) = ((3 + 1) + 1) + (−2) = (3 + (1 + 1)) + (−2) = (3 + 2) + (−2) =3 + (2 + (−2)) = 3 + 0 = 3!

• Thus, x+ 2 = 5⇔ x = 3.

Since writing all of this down is rather cumbersome and buries the actual processamong the justifications, we generally leave commonplace justifications to the reader


when writing the solution to an equation. (Because the associative property is usedso routinely, we usually don’t even bother to group terms when the only operation isaddition.) But we should at least retain the logical relationships between the equations,as in the following streamlined exposition, rather than just writing them in sequence:

x+ 2 = 5⇔ x+ 2 + (−2) = 5 + (−2)⇔ x = 3.

I hope you will do this from now on. The value of this practice will be more apparentin the solution of a more complicated equation, which we address after some exercises.

3.3 Exercises

1. Write each of the following sets in formal notation.

(a) The set of real numbers whose squares are greater than 1 and less than 3.

(b) The set of integers whose squares are greater than 1 and less than 3.

2. (a) Write the converse of the proposition that, if x < 5, then x− 1 < 5.

(b) Is the proposition you just constructed true for all real numbers x? If not,give a counterexample.

3. (a) Write the converse of the proposition that if x = x′ and y = y′, then x+ y =x′ + y′.

(b) Is the proposition you just constructed true for all real numbers x, x′, y, andy′? If not, give a counterexample.

4. For which of the following sets is the proposition that x2 > 4⇒ x > 2 universallytrue?

• N• {x ∈ R : x > 0}• {x ∈ R : x > −2}• {x ∈ R : x > −1}

5. If we think of conjunction, disjunction, and implication as binary logical operations,we can ask whether each is associative or commutative.

(a) Which of these operations are associative?

3.4. A MORE COMPLICATED EQUATION 31

(b) If any are not associative, give an example of an open proposition whose truthvalues change when its components are regrouped.

(c) Which of these operations are commutative?

(d) If any are not commutative, give an example of an open proposition whosetruth values change when its components are exchanged.

6. (a) What property would a multiplicative identity have?

(b) Does a number having this property exist?

(c) Must a number with this property be unique? If so, prove it!

(d) If a unique such number exists, what is it?

7. For each of the following propositions, state if it is true or false, and briefly sketchan argument to justify your answer. Remember that, as we saw in the examples inthe text, the syntactical convention is that anything asserted to exist may dependon those things, and only on those things, that have been introduced earlier in theproposition.

(a) For any real number y, there exists a real number x such that x < y.

(b) There exists a real number x such that, for any real number y, x < y.

3.4 A More Complicated Equation

Let us now find a list of all real solutions to the equation x =√x+ 2, providing logical

justification that our solution is correct. (Recall that the symbol “√

” denotes thenon-negative square root, thereby designating a unique number.) In doing so, we willrefer to some familiar properties assumed for numerical operations that we have notlisted before. We will also need the following theorem, which you will prove rigorouslyin Chapter ??: For all real numbers x and y, xy = 0 if and only if x = 0 or y = 0. Fornow, here is a sketch of the proof. First, to prove the implication from right to left, weshow that, for any real number x, 0x = 0. (Note that the defining property of 0 does notsay this! Since 0 has already be defined to satisfy a different property, we cannot simplyassume it also has this one: we have to prove it does!) 0x = (0 + 0)x by the definingproperty of 0, and (0 + 0)x = 0x+ 0x by the distributive property; hence, 0x = 0x+ 0x.Subtracting 0x from both sides, we obtain the result. Conversely, suppose that xy = 0,but x 6= 0; in this case, we must show that y = 0. Since x 6= 0, we can divide both sidesby x to conclude that indeed y = 0 · 1

x= 0.

Now, on to our solution of the equation:


• For any real number x, x =√x+ 2 ⇒ x2 = x + 2 by the operative quality of

multiplication: on the left, we have multiplied a number by itself; on the right,we have multiplied the same number (albeit expressed differently) by itself; sameinputs, same output.

Observe that the converse implication is not universally true. Why not? Since wedo not know the value of x, we cannot apply any property that is not universal toall real numbers.

• For any real number x, x2 = x + 2 ⇔ x2 − x − 2 = 0. Justification is left to thereader.

• By the distributive property, x2 − x− 2 = 0⇔ (x− 2)(x+ 1) = 0.

• By the theorem cited above, (x−2)(x+1) = 0 if and only if x−2 = 0 or x+1 = 0.

• x − 2 = 0 ⇔ x = 2; similarly, x + 1 = 0 ⇔ x = −1. Justification is left to thereader.

• Thus, putting together our previous steps, we obtain that if x =√x+ 2, then

x = 2 or x = −1. (We do not obtain the converse implication because the first stepis not “reversible.”) This implication shows that 2 and −1 are the only possiblesolutions to the equation x =

√x+ 2, but it does not guarantee that each of these

numbers is a solution. (The converse implication, which fails, would guaranteethey were if it were true.) Therefore we check each solution, finding that indeed 2solves the equation, but that −1 does not (for reasons that should be obvious).

• We conclude that {x ∈ R : x =√x+ 2} = {2}.

3.5 Exercises

1. Explain why it is not universally true for all real numbers that if x2 = x+ 2, thenx =√x+ 2.

2. Describe a useful set of real numbers for which it is universally true that if x2 =x + 2, then x =

√x+ 2. (Don’t use what you know about the solution to this

particular equation, since then the set you come up with will not be useful forsolving it!)

3.5. EXERCISES 33

3. Based on your answer to the previous question, is it necessary from a logical pointof view (as opposed to the possibility of having made an error in calculation) tocheck that 2 is a solution to the equation x =

√x+ 2? (In other words, can we

tell immediately by logical deduction that, as long as we have not made an errorin calculation, 2 must be a solution.)

4. Is the proposition that x 6=√x+ 2 universally true for the set {x ∈ R : x < 0}?

5. Based on your answer to the previous question, can we tell immediately withoutchecking by calculation that −1 is not a solution to the equation x =

√x+ 2?

6. Justify the proposition that, for any real number x, x2 = x+ 2⇔ x2 − x− 2 = 0.

Chapter 4

Sets & Logic

In which we systematically develop the the relationship between sets and logic and con-solidate our understanding of logical relationships

I hope the previous chapters have convinced you that you have defined sets and madelogical deductions more often than you may have thought, even if you did not preciselyformulate your definitions and arguments and were not fully conscious of the extent towhich sets and logic underlie elementary geometric and algebraic procedures! Now webegin a rigorous treatment of these topics.

4.1 Open Propositions Define Sets

Just as an equation defines its solution set within a given number system, any openproposition whose truth value depends on a single variable defines the set of all elementsof a given set that make the propositions true, as in the following examples:

• {x ∈ R : x =√x+ 2} = {2}

• {x ∈ N : 2 < x < 5} = {3, 4}

• {x ∈ Q : 2 < x < 5} = {3, 4, 52, 7

2, 9

2, 7

3, 8

3, 10

3, . . .}

• The elements of {x ∈ R : 2 < x < 5} cannot be listed, even with an infinite list.

• {x ∈ N : For every y ∈ N, x ≤ y} = {0}

• {x ∈ R : For every y ∈ N, x ≤ y} = {x ∈ R : x ≤ 0}

35

36 CHAPTER 4. SETS & LOGIC

• {x ∈ N : For every y ∈ R, x ≤ y} = {x ∈ R : For every y ∈ R, x ≤ y} = ∅

The proposition that defines a set is said to separate the elements that satisfy it fromthe remaining elements of the set initially given. The set defined by the proposition issaid to comprehend it - in more common usage, it is the comprehensive collection ofelements that satisfy the proposition - subject to the restriction that its elements comefrom the given set. To comprehend means to understand, and in some sense, at the mostbasic level, to understand a property means to know all the things that have it, the setdetermined by what it means. (Certainly, there are higher levels of understanding aswell: seeing parallels, analogies, and metaphors, for example.) Of course, as illustratedby the examples above, the initial set, by restricting the “universe” from which elementsare selected, makes a considerable difference to the set that is defined.

4.2 Logical Equivalence

Different combinations of operations may produce the same result for all inputs. As asimple example from algebra, for any real number x, 2x = x+ x. Although multiplyinga number by 2 is operationally different from adding a number to itself, the properties ofthese operations ensure that they always give the same result: by definition, 2 = 1 + 1;the distributive property implies that (1 + 1)x = 1x+ 1x; and the multiplicative identityproperty of 1 implies that 1x + 1x = x + x. Thus, we might say that these differentprocedures are operatively equivalent and that the expressions “2x” and “x + x” (asopposed to the numbers they represent, which are equal) are expressively equivalent.

Just as an algebraic expression represents a number for each instance of its variables,a proposition represents a truth value in each instance. As we discussed in Chapter 3,propositions with the same truth value in each instance are considered equivalent. Sinceequivalent propositions define the same set when one is substituted for the other, theyhave the same categorical value. (They may well have different computational value.A large part of mathematics consists of finding equivalent criteria for that are easier tocompute.) If their equivalence derives solely from the properties of the logical operationsused, rather than the content of the component propositions involved, the propositionsare said to be logically equivalent. Logically equivalent forms yield equivalent propositionsfor all sets of propositions to which they are applied. Just as the expressive equivalenceof “2x” and “x + x” has nothing to do with a particular value of x, but rather derivessolely from the way these expressions are constructed, logical equivalence has nothingto do with the meaning of the particular propositions that are combined, but resultsentirely from the way they are combined, from the form rather than the content. The

4.2. LOGICAL EQUIVALENCE 37

propositions “x 6< y” and “x ≥ y” are equivalent, but not logically equivalent, sincetheir equivalence derives from the properties of the particular relation of order in thereal number system. (Don’t let the terminology confuse you. It is not that “x 6< y” and“x ≥ y” are not equivalent for all logical purposes - that is, for purposes of argument.They are, at least if x and y are real numbers and < denotes the usual order in the realnumber system. It is just that they are not equivalent for reasons that are purely intrinsicto the nature of logical expression. Additional assumptions are required.) Let us nowconsider some very important examples of logically equivalent forms that propositionsmay take.

4.2.1 Contrapositives

Consider the propositions “x > 2 ⇒ x2 > 4” and “x2 6> 4 ⇒ x 6> 2.” These twopropositions represent the general forms “p ⇒ q” and “¬q ⇒ ¬p,” respectively, withthe particular propositions “x > 2” and “x2 > 4” substituted for p and q, respectively.Each of these conditional propositions is called the contrapositive of the other. (Whythe term is not “contranegative” is anyone’s guess, but that’s the way it is!) No matterwhat propositions are substituted for p and q, contrapositives have the same meaning:stating that condition p ensures consequence q is the same as stating that the failure ofsaid consequence q ensures the failure of said condition p (since if p had not failed, qcould not have failed, whereas the satisfaction of q tells us nothing about p).

Different equivalent expressions are useful in different situations. For example, of theequivalent expressions “x2 − 4” and “(x + 2)(x − 2),” the former is more useful whencomputing the derivative of the function f defined by f(x) = x2 − 4, whereas the latteris more useful when finding the roots of f(x) = 0. Similarly, one of two contrapositivelyrelated propositions might be more useful in a particular proof, because these equivalentpropositions provide different hypotheses from which to work.

4.2.2 Rephrasing a conditional as a disjunction

At first glance, it might not seem that one could rephrase a conditional proposition as adisjunction, or vice-versa, because we tend to use these constructions in different senses.We generally use conditional propositions in a predictive sense. Suppose I say, “If youget a C or better on the third exam, you will pass the course.” Then, presuming thatI am not lying, when you get your third exam back having received, say, a B, you canpredict with certainty that you won’t fail the course. If this course was the last obstacleto graduation, you can safely tell your parents to make their hotel reservations! On theother hand, we typically use disjunctive propositions descriptively. When I say, “Either


you got a D or F on the third exam or you passed the course,” I am simply describingall the possibilities that might have occurred.

But notice that if this disjunction is stated in the future tense - “Either you will get aD or F on the third exam or you will pass the course” - it has exactly the same predictivevalue as the previous conditional proposition. If those are the only two possibilities, andat least one must be true, then clearly if you get a C or better (and hence do not get a Dor F), you will pass. It is also possible that both are true. Maybe I am being conservativein my prediction or just pushing you to get at least a C on the exam; perhaps you couldget a D and still pass. (In everyday usage, an unstated converse implication is sometimesimplicitly also intended, but in mathematics we do not recognize any propositions thatare not explicitly stated!) Similarly, if the previous conditional statement is stated in thepast tense - “If you got a C or better on the third exam, then you passed the course” -it takes on a descriptive sense, narrowing the possibilities to having either got less thana C on the third exam or passed the course, one of which must have occurred. It isalso worth noting at this point that both propositions exclude exactly the same singlepossibility: getting a C or better on the third exam and failing the course. (In everydayspeech, the use of a disjunction in the predictive sense seems to be used most often foremphasis when the consequence is negative, as in “You will be home by midnight or youwill be grounded for two weeks”!)

From a logical point of view, sense is irrelevant; only truth value is relevant. Thus theforms “p⇒ q” and “¬p or q” are logically equivalent, as I hope the preceding discussionhas convinced you. The assertion of each of these forms excludes the possibility that pholds and q does not, while allowing the other three (that both hold, that both fail, orthat p fails and q does not). I also hope that by informally considering the meaning ofexamples you are getting a feel for how logical language is used. The cartoons in Figure4.1 may further help make the point in a darkly humorous manner!

As with contrapositives, the disjunctive and conditional forms have different uses.As we will see shortly, a disjunction provides a natural division into cases (see Section6.6.4), whereas the conditional form provides a hypothesis on which to rest subsequentarguments.

4.2.3 Negation: Don’t just say no!

Often we must assert that something is not the case; usually it is more useful to do soin an affirmative manner, stating exactly what is then the case. A note on syntacticalconventions: the phrase “it is not the case” is assumed to apply to all that follows beforethe next punctuation mark. So, “It is not the case that p and q” has the same meaningas “¬(p and q)”, whereas “It is not the case that p, and q” has the same meaning as

4.2. LOGICAL EQUIVALENCE 39

Get me the moneyor you're dead!

If I don't get him the money,then I will be dead.

p or qhas the same meaning as

not p implies q

If you try anything funny, then I will kill you!

I'd better not try anything funnyor he will kill me.

p implies qhas the same meaning as

not p or q

Figure 4.1: Logic on the dark side!


“(¬p) and q.” In spite of these conventions, negative phrasing can be confusing, anotherreason to rephrase the proposition with all the negatives (if any) in the atomic componentpropositions. Our understanding of truth value shows us how to do this.

Considering a real number x, suppose it is not the case that x 6> 2. Then clearlyit is the case that x > 2. For any proposition, the negation of its negation is logicallyequivalent to the original proposition. Symbolically, “¬(¬p)” is logically equivalent to“p.”

Again considering a real number x, suppose it is not the case that x > 2 or x ∈ Z;then it is the case that x 6> 2 and x 6∈ Z. For any real number x, there are exactlyfour mutually exclusive possibilities: x > 2 and x ∈ Z, x > 2 and x 6∈ Z, x 6> 2 andx ∈ Z, or x 6> 2 and x 6∈ Z. The assertion that x > 2 or x ∈ Z includes the first threecases and excludes the fourth; the negation of this assertion, then, includes exactly thefourth: x 6> 2 and x 6∈ Z. In general, given any two propositions p and q, there are, apriori, exactly four mutually exclusive possibilities: p and q, p and (¬q), (¬p) and q, or(¬p) and (¬q). (I say “a priori” because a mathematical relationship between particularpropositions p and q might preclude some of them. For example, the case that x > 2and x 6> 1 does not occur.) The assertion “p or q” includes the first three and excludesthe fourth. So if it is not the case that p or q, then it is the case that (¬p) and (¬q).Symbolically, “¬(p or q)” is logically equivalent to “(¬p) and (¬q).”

Since “¬(¬p)” is logically equivalent to “p,” it is now easy to see, working backwards,that the negation of “p and q” is “(¬p) or (¬q).” Since “p ⇒ q” is logically equivalentto “(¬p) or q,” its negation is logically equivalent to “p and (¬q).” If we are thinking ofthe assertion“p⇒ q” in the predictive sense, then its negation “p and (¬q)” asserts thecase in which this predictive device does not work.

Example. If it is not the case that 2 < x < 5 (shorthand for 2 < x and x < 5), then itis the case that 2 ≮ x or x ≮ 5.

Example. If it is not the case that x2 = x + 2 ⇒ x =√x+ 2, then it is the case

x2 = x+ 2 and x 6=√x+ 2.

For complicated propositions, we just work our way in:

Example. Suppose it is not the case that if 2 < x < 5, then x2 < 17 or x2 > 8. Thenit is the case that 2 < x < 5 and x2 6< 17 and x2 6> 8.

Remark. If x is confined to be an integer, then it is the case that if 2 < x < 5, thenx2 < 17 or x2 > 8.

4.3. QUANTIFIED PROPOSITIONS 41

Let us pause to observe a fundamental difference in the nature of conjunction, on theone hand, and that of disjunction or implication, on the other. A conjunction of closedpropositions whose truth value we know provides a useful summary by asserting twofacts together in a neat package, as in 2 < 5 and 5 < 7. (Informally, 5 lies between 2 and7.) But a disjunction of two propositions whose truth value we know is not really useful,because it asserts less than we really know. Although 2 < 5 or 5 < 7 is certainly true, itwould be better to assert the stronger proposition stated above. And although 2 < 5 or5 < 2 is also certainly true, it would be better simply to assert the stronger propositionthat 2 < 5; throwing 5 < 2, which we know to be false, into the mix is a distraction thatresults in a loss of information. Rephrasing either of these disjunctions as an implicationalso seems silly: in the statement 2 6< 5 ⇒ 5 < 7, the condition is meaningless, sincethe conclusion that 5 < 7 is unconditionally true, and in the statement 2 6< 5 ⇒ 5 < 2,the false conclusion is meaningless, since the condition is patently false. Disjunctionsand implications only come into their own when they apply to propositions whose truthvalue we don’t know, because in this context they define relationships between theseunknowns. In particular, disjunctions and implications of open propositions are valuablewhen they apply universally, either to all mathematical objects or, more commonly, toall the elements of some set. (They are most useful when the collection of objects towhich they apply is not finite, since for a finite collection we can check each instance, atleast in theory.) Thus, we come naturally to the subject of quantification.

4.3 Quantified Propositions

The logical operations indicated by phrases such as “there exists” and “for all” arecalled quantifiers because they indicate the quantity of values for which a proposition isstipulated to hold, not in numerical terms, but in terms of sets. When all of the variableson which the truth value of an open proposition depends are quantified, it becomes aclosed proposition. For example, the truth value of the proposition that x ≥ 0 dependson the value of x, but the proposition that, for all natural numbers x, x ≥ 0, is true, asis the proposition that there is a real number x such that x ≥ 0, whereas the propositionthat for all real numbers x, x ≥ 0, is false. Thus, quantification of a variable is alogical operation that removes the dependence of truth value on the specific value ofthat variable. Instead, the truth value depends on the manner in which the variable isquantified. Quantification creates a proposition with some degree of generality. Since theability to make general claims lies at the heart of mathematics, understanding quantifiersis absolutely crucial for any mathematics student.

Although, as with other logical operations, phrasing may vary somewhat when propo-


sitions are expressed in ordinary language, there are only two types of quantifiers: ex-istential (represented by “there exists”) and universal (represented by “for all”). Theymay be denoted by the symbols ∃ and ∀, respectively. An existential quantifier is alwaysfollowed by a property satisfied by the object that is proposed to exist; this propertyis introduced by a phrase such as “such that,” or “for which.” An existential quantifiermay also be further specified by a set to which the proposed object must belong; thisadditional specification just amounts to conjoining an additional property that the ob-ject must satisfy. A universal quantifier may be unrestricted, but is often followed bya specified set to which the quantifier applies. The application of a universal quantifiermay also be further restricted to elements of that set satisfying a particular property,which just amounts to separating the elements satisfying that property as the set towhich the quantifier applies. Similarly, a condition is imposed in a universally quantifiedproposition amounts to separating elements satisfying the condition as the set to whichthe quantifier applies. Thus, the following (true) propositions are synonymous:

• For all natural numbers x, x ≥ 0.

• ∀x ∈ N, x ≥ 0.Parentheses are sometimes used instead of a comma: (∀x ∈ N)(x ≥ 0).

And they are logically equivalent to:

• ∀x, if x ∈ N, then x ≥ 0.

The following (true) propositions are synonymous:

• For all real numbers x such that x ≥ 0, x2 − x− 2 = 0⇔ x =√x+ 2.

• ∀x ∈ R such that x ≥ 0, x2 − x− 2 = 0⇔ x =√x+ 2.


• (∀x ∈ R)(x ≥ 0 ⇒ (x2 − x − 2 = 0 ⇔ x =√x+ 2)). Note that the former

versions might be preferable, however, since they don’t require delimiters (such asparentheses) to avoid ambiguity. In addition, the former versions are more natural,since the general background condition x ≥ 0 is set forth in what amounts to asort of preamble, where it really belongs.

The following (true) propositions are synonymous:

4.3. QUANTIFIED PROPOSITIONS 43

• There exists a real number x such that x ≥ 0.

• ∃x ∈ R, x ≥ 0. Parentheses or the phrase “such that” may replace the comma.


• (∃x)( x ∈ R and x ≥ 0).

The familiar subset relationship between sets is defined precisely by means of a uni-versally quantified proposition. Consider two arbitrary sets A and B. We say that A isa subset of B, denoted A ⊆ B, if every element of A is also an element of B. Formally:

Definition. A ⊆ B if and only if (∀x)(x ∈ A ⇒ x ∈ B). (Equivalently, (∀x ∈ A)(x ∈B).)

Quantifiers may be combined with other logical operations; in particular, we mustoften consider their negations. For example, the negation of the (false) proposition thatfor all real numbers x, x ≥ 0, is the (true) proposition that there exists a real number xsuch that x < 0. The negation of the (true) proposition that there exists a real numberx such that x ≥ 0 is the (false) proposition that for all real numbers x, x < 0. In general,the proposition that all elements of a certain set have a certain property is negated bythe assertion that some element of that set fails to have it; that is, some element ofthe specified set satisfies the negation of the specified property. The proposition thatsome element of a certain set has a certain property is negated by the assertion that allelements of that set fail to have it; that is, all elements of the specified set satisfy thenegation of the specified property.

Negating a proposition with multiple quantifiers is a bit like peeling an onion: youhave to work your way from the outside in. This procedure may take a bit of practice!

Example. What is the negation of the (true) proposition that for every real number y,there is a real number x such that x < y?

Solution. The original proposition asserts a property universal to all real numbers y;therefore, its negation must assert the existence of a real number y that is an exception.The universal property asserted by the original proposition is that a real number x existswith a particular property relative to the given number y; therefore, if the real numbery is an exception, no number x having this particular property relative to y must exist.To say that no real number x with a particular property exists is to say that all realnumbers x fail to have it. The particular property in question is that x < y; therefore, tosay that x fails to have this property is simply to say that x ≮ y. Thus the negation of


the proposition that for all real numbers y, there is a real number x such that x < y, isthe (false) proposition that there exists a real number y such that for every real numberx, x ≮ y. The pattern is clearly evident when the statements are written symbolically:

The negation of the statement

(∀y ∈ R)(∃x ∈ R)(x < y)

is the statement(∃y ∈ R)(∀x ∈ R)(x ≮ y).

Remark. In Chapter 5 we will discuss the trichotomy property of order in the real numbersystem, from which it follows that x ≮ y ⇔ x ≥ y. However, the assumption oftrichotomy for this collection of relationships among real numbers lies outside the natureof logic: it is by no means logically necessary for a collection of relationships betweenelements of a set to satisfy trichotomy. For example, the divisibility relationships amongnatural numbers do not: 2 does not divide 5, 2 is obviously not equal to 5, and 5 doesnot divide 2, either.

Remark. A relationship between elements of a set, such as the order relationship betweenreal numbers, is called a relation. We will examine relations formally in Chapter 8.Because it is a relationship between sets, rather than between the elements of a specificset, the subset relationship is not considered a relation.

In general, suppose p(x, y) denotes an open proposition whose truth value dependsonly on the values of x and y, and ¬p(x, y) denotes the negation of p(x, y). The negationof the statement

(∀x ∈ A)(∃y ∈ B) (p(x, y))

is the statement(∃x ∈ A)(∀y ∈ B) (¬p(x, y)) .

Remark. This process may be continued indefinitely with more quantifiers, but state-ments with a large number of quantifiers get cumbersome and should be formulated instages, giving names to properties that are defined by intermediate steps.

4.4 Exercises

1. Rephrase each implication as a logically equivalent disjunction. Do not use thephrase “it is not the case”; move any negations into the atomic components ofyour answer.

4.4. EXERCISES 45

(a) If x > 2 then x2 > 4.

(b) x < 2⇒ x2 < 4.

(c) x2 < 4⇒ x < 2

(d) If −2 < x < 2, then x2 < 4.

2. Write the contrapositive of each proposition in the previous problem.

3. For each of the propositions in the previous problem:

• Decide if the proposition is true for all real numbers, true for some, but notall, real numbers, or false for all real numbers.

• For those propositions that are true for some, but not all, real numbers, givean example of a number for which the proposition is true and an example ofa number for which it is false.

• Write the negation of the proposition. Do not use the phrase “it is not thecase”; move any negations into the atomic components of your answer.

4. Rephrase each disjunction as a logically equivalent implication. Do not use thephrase “it is not the case”; move any negations into the atomic components ofyour answer.

(a) x < 2 or x > 3

(b) x < 2 or x2 > 4

(c) x > 2 or x2 > 4

(d) x > 2 or x2 < 4

5. There are two fundamental choices for rephrasing a disjunction as a logically equiv-alent implication, as in the previous problem, depending on which component onenegates. What do you notice about them?





• Write the negation of the proposition. Do not use the phrase “it is not thecase”; move any negations into the atomic components of your answer.

7. Write the negation of each of the following propositions without using the phrase“it is not the case”; move any negations into the atomic components of your answer.

(a) 2 < x < 3

(b) x > 0 and x2 = 4


(a) x > 0, and x < 2 or x > 5.

(b) x > 0 and x < 2, or x > 5.

(c) 2 < x < 3⇒ 4 < x2 < 9.

(d) If x < 2 or x > 4, then x2 6= 9.





(a) x > 0⇒ (x2 = x+ 2⇒ x =√x+ 2).

(b) (x > 0⇒ x2 = x+ 2)⇒ x =√x+ 2.



4.4. EXERCISES 47


12. For each proposition below:

• Decide if it is true or false.

• Write its negation without using the phrase “it is not the case”; move anynegations into the atomic components of your answer.

(a) Every real number is greater than 3 or less than 4.

(b) Every real number is greater than 3 and less than 4.

(c) Some real number is greater than 3 or less than 4.

(d) Some real number is greater than 3 and less than 4.




(a) For every real number x, there is a real number y such that y < x.

(b) For some real number x, there is a real number y such that y < x.

(c) For every positive integer x, there is a positive integer y such that y < x.

(d) For some positive integer x, there is a positive integer y such that y < x.




(a) For every positive real number x, there exists a real number y such that, forany real number z, z < y ⇒ 2z < x.

(b) For every positive real number x, there exists an integer y such that, for anyreal number z, z < y ⇒ 2z < x.


(c) For every positive integer x, there exists a positive integer y such that, forany real number z, z < y ⇒ 2z < x.

(d) There exists a real number y such that, for every real number x and everyreal number z, z < y ⇒ z < x.

Chapter 5

The Real Number System

In which we logically develop the properties of the number systems used to measure con-tinuous and discrete quantities.

Set theory and logic establish the basis for mathematical thought. We will discussthem more formally and systematically in the following chapter. Mathematics beyondthese realms generally concerns the elements of some set or sets with particular proper-ties, for example, real numbers (as in algebra), natural numbers (as in number theory),or points or lines in some geometry (whether planar, spherical, on some other surface,or in a higher dimension). As we will see, the relations among the elements of thesesets, such as the association of a sum and product to each pair of real numbers or theincidence relation between points and lines, which establishes which lines pass throughwhich points, are also given by sets with particular properties. The properties the setsin question are assumed to have, called axioms, are essential to proving results aboutthem. In order to write proper, rigorous proofs, we must know and state precisely andexplicitly what these properties are. Many properties, such as associativity in variousalgebraic systems, recur in different contexts, so we name them in order to systematizeour knowledge and better see parallels and connections. It is also important that thebasis for our axioms be examined, so that it doesn’t seem as if we are just playing gameswith rules made up out of thin air.

We will begin with careful study of the real and natural number systems, which arenot only central to mathematics, but also to science and everyday life. Real numbers areused to measure continuous quantities such as length, area, volume, and mass. Naturalnumbers are used to count discrete things. (A note on English usage: The word “fewer”,not the word“less,” is used in comparing discrete quantities, as in, “I have fewer cookiesthan you have.” The word “less” is reserved for continuous quantities, as in, “My house

49

50 CHAPTER 5. THE REAL NUMBER SYSTEM

has less space than yours.” Using these words properly helps emphasize and clarify whatkind of numbers we are talking about.) You have undoubtedly had much experiencesolving algebra problems involving real numbers, yet you have probably not provenmany of the results you applied. You also regularly use many basic facts about thenatural numbers, such as factorization into primes, when doing arithmetic. (Think aboutreducing fractions, for example) Yet you probably have not thought hard about whatgeneral properties you are using, why they are true, or in what ways they extend or failto extend to other elementary systems with operations, such as polynomials. In yourfuture classes, you will have opportunities to study the theories of geometry, advancedalgebra, and perhaps advanced number theory, but the basic algebra and arithmetic withwhich you are familiar will likely be assumed throughout the remainder of your educationwithout discussion. One day not too long from now, many of you will teach these basicsubjects. This course may be the only opportunity you have to thoroughly study thefoundation on which they are built.

The natural numbers are, as we all know, a subset of the real numbers. On the onehand, this makes them more limited; certain operations, such as division and subtraction,cannot always be carried out. On the other hand, the fact that the natural numbersare a smaller (but still infinite) set with a very special structure makes it possible todefine some interesting concepts, such as primeness, that do not make sense for realnumbers in general (since every real number is divisible by every other real number exceptzero) and to prove some fascinating assertions about them. For historical reasons, thetheory of the natural number system is called, simply, number theory. Many theoremsin number theory are easily stated but surprising. Some simply-stated conjectures haveonly recently been proven, and others remain unproven to this day, centuries after theywere first formulated. The existence of such intricate and elusive patterns in a numbersystem that arises simply from counting is a mystery that has fascinated people sincethe earliest times.

In developing these number systems, we have two options. One is to start with thenatural numbers and build, first the integers, then the rational numbers, and then thereal numbers out of them. The other is to start with the real number system as a wholeand define the natural numbers as a specific subset. The integers and rational numberscan then be defined as subsets of R based on the natural numbers: the integers consist ofthe natural numbers and their additive inverses; the rational numbers consists of thosereal numbers equal to the ratio of two integers. Formally, Z = N ∪ {−n : n ∈ N}, andQ = {m

n: m ∈ Z and n ∈ Z}. We will follow the convention that 0 ∈ N; hence 0

need not be included separately in the set of integers. Humans seem to have a built innatural intuition, much like our language ability, about both continuous quantities, suchas length and area, and discrete quantities, such as numbers of objects, and it is not

5.1. ADDITION AND MULTIPLICATION OF REAL NUMBERS 51

clear that either approach is better. Both approaches have their virtues, and we will infact discuss both of them. A major virtue of starting with the real number system as awhole is that from a logical point of view this approach is faster, simpler and easier. Sowe’ll take it first.

5.1 Addition and Multiplication of Real Numbers

The set of real numbers, R, comes with two binary operations, addition and multiplica-tion. It is these operations that make R into a number system (as opposed to just a set).We will not define these operations; we will simply assume they exist and obey certainaxioms. (Later we will define them in terms of set operations.) These operations areabstractions of our experience working with and thinking about continuous quantities,and it is this experience that provides the basis for the axioms we will assume aboutthem.

Addition reflects our experience with combining like quantities. If we place a 7 footboard end-to-end with a 3 foot board, they extend a length of 10 feet. If we saw a 3foot board from a 10 foot board with a saw that creates a kerf .01 feet wide, we are leftwith a 6.99 foot board. It requires 2 +π meters of string to border a semicircle of radiusone meter. If we pour ten liters of water into a tub and then remove the amount neededto fill a hemispherical bowl of radius 10 centimeters, we have 10 − 4π

3liters left in the

tub. When we first learn addition, we generally think of combining discrete quantities,as when we combine seven chocolate chip cookies with three oatmeal cookies to obtainten cookies. This interpretation makes sense for integers and, more generally, for integermultiples of any common unit, but not for sums such as

√2 +√

3. (The numbers√

2and√

3 are called incommensurable: no matter how each number is divided into equalparts, the parts have to be a different size, so there is no common unit with which they

can be measured. If there were,√

23

would be rational, which it demonstrably is not.

See if you can prove that!) Integers represent whole units of a quantity, but continuousquantities such as distance or mass occur in all amounts in between. Exactly what ismeant by “all amounts in between” will be discussed in Chapter ??. To understand thephrase “in between,” we need to understand order, and to understand “all amounts,”we need to understand a special property of order in the real number system called theleast upper bound property. (The fact that the square roots of any rational number arein the real number system is due to this property.)

Multiplication reflects our experience with several independent dimensions. If we usea 15 kilowatt bulb for three hours, we have consumed 45 kilowatt-hours of power. If wetravel at 100 kilometers per hour for 3 hours, we have traveled 300 kilometers. Near the


surface of the earth, objects under the influence of gravity accelerate at approximately−9.8 meters per second per second, that is, −9.8 meters/second2. (The accelerationis negative on account of the convention that the positive direction is up.) If an objectstarts at rest and falls for 2 seconds, it will be traveling at −19.6 meter/second. The forcerequired to lift an object with a mass of 2 kilograms is 19.6 kilogram-meters/second2;the unit of a kilogram-meter/second2 is called a newton. If you lift the object a distanceof 3 meters, you have done 58.8 newton-meters of work; if you lower it by 3 meters, youhave done −58.8 newton-meters of work. By exerting a negatively directed force on anobject moving in a negative direction, gravity did positive work. If a rectangular field is10 meters long and 2

√3 meters wide, it has an area of 20

√3 meters2. (Note: Insistence

on right-angled corners - that is, square corners - when multiplying linear dimensions ismerely a convenient convention; any angle between the dimensions would do as long aswe did not vary it, but our units of area would no longer be square units.) When wefirst learn to multiply, we generally think of repeated addition of a discrete quantity, aswhen we combine three bags of cookies containing five cookies each to obtain 15 cookies.This interpretation only makes sense for integers, but we can think of the quantities ofbags and cookies as representing independent dimensions, obtaining a perspective that isconsistent with our continuous general interpretation: 3 bags at 5 cookies per bag yields15 cookies.

A binary operation associates to each pair of elements of a set, real numbers in thiscase, an element of the same set, often referred to as the result of the operation. In thecase of addition, this result is called the sum of the pair; in the case of multiplication, itis called their product. Every pair of real numbers has a sum and a product. Given tworeal numbers a and b, their sum is denoted by a+ b and their product by a · b or, whenno confusion will result, simply by ab. Because the sum and product are determineduniquely by the numbers themselves, the variables used to represent the numbers do notaffect the result: if a = b and c = d, then a+ b = c+ d and ab = cd. It is not necessaryto give written justification in a proof when making a substitution of this type.

The qualities above constitute the general definition, in non-technical language, ofa binary operation. Since that term captures these qualities, we won’t list them asseparate axioms. We now list the additional properties of addition and multiplication ofreal numbers.

Axioms of Addition and Multiplication in the Real Number System

There exist binary operations + and · on R, such that:

1. These operations are associative:

5.1. ADDITION AND MULTIPLICATION OF REAL NUMBERS 53

(a) ∀x, y, z ∈ R, (x+ y) + z = x+ (y + z).

(b) ∀x, y, z ∈ R, (x · y) · z = x · (y · z).

2. These operations are commutative:

(a) ∀x, y ∈ R, x+ y = y + x.

(b) ∀x, y ∈ R, x · y = y · x.

3. Each operation has a distinct identity element :

(a) There exists an element 0 ∈ R such that, ∀x ∈ R, x+ 0 = x.

(b) There exists an element 1 ∈ R such that 1 6= 0 and, ∀x ∈ R, x · 1 = x.

4. All possible inverses exist:

(a) For each x in R, there exists a y in R such that x+ y = 0.

(b) For each x in R different from 0, there exists a y in R such that x · y = 1.

5. The operation · distributes over +:∀x, y, z ∈ R, x · (y + z) = (x · y) + (x · z).

These axioms are motivated by our physical experience of numbers. The basis foreach can be readily illustrated. For example, the Figure 5.1 illustrates the basis for thedistributive property (Property 5).

Now that we have precise axioms and definitions, we can prove the fundamentalproperties of algebra in the real number system. That is exactly what you will do,with your instructor’s help, in the following exercises! In the following chapter, withthe benefit of this practical experience, we will systematically examine the approachesintroduced here.

In a genuine, complete, rigorous proof, every assertion must be justified: if it hasnot been assumed to be true, we must show that it is logically forced to be true bythose propositions that have either been assumed to be true (the axioms) or previouslyjustified. We may use the meaning of logical operations to replace propositions withsynonymous propositions, and our definitions allow us to make additional replacementsof propositions with synonymous propositions. That’s it! Period! Do not assert anythingelse in a proof, no matter how obvious it seems! It may be true, and it may be obvious,but you must explicitly demonstrate that it logically follows from assertions that havebeen already established. (Otherwise you have just written a sketch of the proof at best,and worse yet, you may have asserted something false, in which case you don’t even havea correct sketch. Lot’s of seemingly obvious things turn out to be false!)


y

x(y+z) = xy + xz

x

z

xy xz

Figure 5.1: Illustration of the basis for the distributive property.

5.2 Exercises

1. Prove there is the unique element y ∈ R such that ∀x ∈ R, x+y = x. (Property 3aestablishes that such an element exists. Suppose y′ ∈ R also satisfies the propertythat ∀x ∈ R, x+ y′ = x. Consider y + y′ to prove that y = y′.)

The uniqueness of the element with this property justifies our designating it by thespecial symbol 0; there is no ambiguity regarding to which element 0 refers.

2. Similarly, prove there is a unique element of y ∈ R such that ∀x ∈ R, x · y = x.

The uniqueness of the element with this property justifies our designating it by thespecial symbol 1.

3. Let x ∈ R be a fixed number. Prove there is a unique element y ∈ R with theproperty that x+ y = 0. (Suppose x+ y = 0 and x+ z = 0. Show that y = z.)

We define the additive inverse of x to be this unique number associated to x anddenote it by −x. We define subtraction by x− y = x+ (−y).

5.2. EXERCISES 55

4. Similarly, given x 6= 0 ∈ R, prove there is a unique element y ∈ R such that xy = 1.

We define the multiplicative inverse of x to be this unique number associated tox and denote it by 1

xor, extending the properties of exponential notation, by

x−1. (We will defer further discussion of exponential notation until Section 7.3, onrecursive definition.) We define division by

x÷ y = x/y =x

y= x · 1

y.

Notice the several notations commonly used for the result of division. We willgenerally use the second or third notation. The last expression in this chain ofequalities gives the definition, in terms of the multiplicative inverse and the alreadypostulated operation of multiplication. Note also that, for any y 6= 0 ∈ R,

1÷ y = 1 · 1

y,

by definition, where 1y

means the multiplicative inverse of y, and in turn

1 · 1

y=

1

y,

by the multiplicative identity property of 1, so our third notation for division of1 by another number is consistent with the notation we previously chose for themultiplicative inverse. (This consistency is important! If the same symbol couldrepresent two distinct numbers, our meaning would not be clear from the contextin many situations.)

5. Prove that for any x ∈ R, 0 · x = 0. (Hint: use the distributive property.)

6. Conclude from the preceding result that 0 does not have a multiplicative inverse.(The axioms do not assert this. They guarantee that there is a multiplicativeinverse for any number other than zero, but not the converse proposition thatthere is not one for zero. The proof requires the fact that 1 6= 0, asserted as partof Property 3b.)

7. Prove the converse of Exercise 5, concluding that for any real numbers x and y,xy = 0⇔ (x = 0 or y = 0). (To construct the converse of Exercise 5, rephrase it asfollows: if x = 0 or y = 0, then xy = 0. This statement is slightly stronger than theresult of Exercise 5, but is obtained from it simply by applying the commutative


property of multiplication. The proposition to be proved in this exercise, then, isthat if xy = 0, then x = 0 or y = 0. Hint: Rephrase the conclusion as x 6= 0 ⇒y = 0.)

8. Prove that {x ∈ R : (x− a)(x− b) = 0} = {a, b}.

9. Prove that for all real numbers x and y, x = 0⇔ x+ y = y.

10. Prove that for all real numbers x and y, (x = 1 or y = 0)⇔ xy = y.

11. Prove that (−1)x = −x. Here we have omitted the obvious implied quantifier(∀x ∈ R) for the sake of brevity, as is common practice. (The left side of theequation denotes the product of x and the additive inverse of 1; the right sidedenotes the additive inverse of x. These are not a priori the same number. Thatthey are requires proof. )

12. Prove that xy· wz

= xwyz

. (Hint: use the commutative property.)

13. Prove that xy

+ wz

= xz+ywyz

. (Hint: use the distributive property.)

5.3 The Ordering of the Real Numbers

The real numbers also come with an order relation. Any two distinct real numbersare comparable, and the result of that comparison is denoted by placing the symbol <between them in the correct order. As with the operations of addition and multiplication,we will not define how to compare numbers, only assume that the order has certainproperties motivated by the comparison of physical quantities. (We will later definethe order relation in the course of building up the number systems from the naturalnumbers.) By definition, y > x means x < y and x ≤ y means x < y or x = y. Also,x < y < z means x < y and y < z.

The defining properties of an order relation are:

• Anti-reflexivity: ∀x ∈ R, x 6< x (nothing is less than itself); and

• Transitivity: ∀x, y, z ∈ R, x < y and y < z ⇒ x < z.

From anti-reflexivity and transitivity we can deduce a third property satisfied by anyorder:

• Anti-symmetry : ∀x, y ∈ R, x < y ⇒ y 6< x.

5.3. THE ORDERING OF THE REAL NUMBERS 57

Proof. By way of contradiction, suppose there exist numbers x and y such thatx < y and y < x. By transitivity (one of the assumed properties of order) it followsthat x < x. But this contradicts anti-reflexivity (the other assumed property oforder). We therefore conclude that no such x and y exist; hence, ∀x, y ∈ R,x < y ⇒ y 6< x.

Anti-reflexivity and transitivity (and hence anti-symmetry) are properties of anyorder relation by definition (they go with the territory, so to speak), so we won’t listthem as separate axioms. However, one can conceive of order relations in which notevery two elements of the set being ordered can be compared. A standard example isthe ordering of a family of sets by proper inclusion: Let C be any family of sets (such asthe power set of a given set). If A, B ∈ C, define A ⊂ B to mean that A is a subset ofB but not equal to B. A is said to be a proper subset of B. You can readily check thatproper inclusion is anti-reflexive (by definition, since proper means that the sets are notequal) and transitive, so it is an order relation. However, it is certainly possible that,given two distinct sets in our family, neither one is a proper subset of the other. Orderrelations in which some pairs of elements are not comparable are called partial. Those inwhich every element is comparable to every other (comparability holds) are called total.

To avoid having to refer separately to the algebraic and order properties, we willbegin numbering the order properties where we left off.

Axioms of the Order of Real Numbers with Respect to the Operations

There is a total order relation < on the real number system with the following prop-erties:

6. The operations respect the order relation:

(a) If x > y, then x+ z > y + z.

(b) If x > y and z > 0, then x · z > y · z.

7. The order relation < has the least upper bound property.

The least upper bound property is the one that establishes the existence of “everyamount in between.” It is needed to show the existence of irrational numbers such as

√2

and π. All of the integers are obtained from 0 and 1 using only addition and the existenceof additive inverses. All of the rational numbers are obtained from the integers usingonly multiplication and the existence of multiplicative inverses. However, no irrational


numbers can be proven to exist using only the algebraic axioms. The least upper boundproperty is somewhat complicated to define; we will define and discuss it in Chapter??, after we have thoroughly discussed order relations. It is not needed for algebraiccalculations or comparisons, so for now we will not use it.

You are probably familiar with the following property from your high school algebraclasses. Now we see that it can be proven from the properties we have assumed.

Theorem (Trichotomy Theorem). ∀x, y ∈ R, exactly one of the following is true: x = y,x < y, or x > y.

Proof. Since < is a total order, we know one of the statements must be true. We needto show that, in each case, the other two statements cannot hold.

Case 1: x = y. Then, by anti-reflexivity, x 6< y and y 6< x.

Case 2: x < y. By anti-symmetry, x 6> y. By anti-reflexivity (rephrased contra-positively), x 6= y.

Case 3: x > y. Similar to Case 2.

Note that the Trichotomy Theorem holds for any total order that might be definedon any set; we have not used anything special about R. (We did not need axiom (6) or(7).)

We have discussed operations and order relations in non-technical language. We willlearn shortly that relations can be defined in terms of set theory, and that order relationsare just those special relations satisfying anti-reflexivity and anti-transitivity. We willalso learn that functions are special relations satisfying a certain property, and thatbinary operations are special types of functions.

Finally, we can now define what it means for a number to be positive or negative:

Definition. A real number x is positive if x > 0; a real number x is negative if x < 0.

We can summarize the second part of Axiom (6) by saying that multiplication bya positive number preserves order. We will prove (and therefore need not state as anaxiom) that multiplication by a negative number reverses order. (Multiplication by zeroloses the order information by making both sides equal to zero.)

5.4. EXERCISES 59

5.4 Exercises

Assume Axioms (1)-(6) given above and all previously proven results. Prove each of thegiven propositions.

1. ∀x, y ∈ R, x < y ⇔ y − x > 0. (That is, x is less than y if and only if y − x ispositive, where positive means greater than zero, as stated in the definition above.)

2. ∀x ∈ R, x > 0⇔ −x < 0. (That is, a number is positive if and only if its additiveinverse is negative. Since −(−x) = x, it follows that a number is negative if andonly if its additive inverse is positive.)

3. ∀x, y ∈ R, x < y ⇔ −x > −y.

Remark. This result incorporates and generalizes the preceding one, since −0 = 0.Its proof is just a generalization of the previous proof as well.

4. ∀x, y, z ∈ R, (x < y and z < 0)⇒ zx > zy.

5. (a) The product of two positive numbers is positive.

(b) The product of a negative number with a positive number is negative.

(c) The product of two negative numbers is positive.

We can summarize these results as follows: the product of two numbers is positiveif and only if both are positive or both are negative. (We know from a previousresult that if one is negative and the other positive, the resulting product cannotbe zero, so it must be negative.)

6. ∀x ∈ R, x 6= 0 ⇔ x2 > 0. (In English, the forward implication says that thesquare of a nonzero real number is positive. For the converse implication, use thecontrapositive.)

7. −1 < 0 < 1.

8. A number and its multiplicative inverse are either both negative or both positive.

9. (a) ∀x, y ∈ R, 0 < x < y ⇔ 0 < 1y< 1

x.

(b) ∀x, y ∈ R, 0 > y > x⇔ 0 > 1x> 1

y.

(To prove the converse implications, just apply the implication you already proved,using the obvious fact that 1

( 1x)

= x. Make sure this fact really is obvious to you!)


10. ∀w, x, y, z ∈ R, (w < y and x < z)⇒ w + x < y + z.

Is the converse true? If so, prove it. If not, give a counterexample.

11. The sum of two positive numbers is positive and the sum of two negative numbersis negative.

12. Between any two real numbers, there is another real number. Formally, ∀x, z ∈ R,∃y ∈ R such that x < y < z. (Hint: Prove that the obvious candidate, y = x+z

2,

does the job.)

Chapter 6

Set Theory & Proof Structure

In which, now that we have had some practical experience, we investigate some fundamen-tal philosophical issues, introduce the most basic axioms of set theory, and systematicallyconsider the structure of a proof.

6.1 The Axiom of Extensionality

The same set may, as we have seen, be described in different ways. What makes twosets the same is that the same elements belong to them. This basic premise is formallyexpressed as an axiom of set theory:

Axiom (Axiom of Extensionality). For any sets A and B, if (∀x)(x ∈ A ⇔ x ∈ B),then A = B.

The name of this axiom comes from the notion that sets are determined by the“extent” of what they contain. One might reasonably argue that this is a definition.That it is an axiom reflects the philosophical perspective that we cannot define what itmeans for things to be equal, but only propose what properties sufficiently characterizethem to determine if they are the same. (For example, mountain lions maintain exclusiveterritories, keeping all other lions out; therefore, if two lions are seen in the same placeand no lions have died, they must be the same.) The Axiom of Extensionality declaresthat sets are determined by their elements. Note that we need not include the converseimplication, since it is in the nature of logical expression that if A = B, then (∀x)(x ∈A ⇔ x ∈ B): if A = B, then x ∈ A and x ∈ B are simply synonymous. If two thingsare equal, any proposition about one is synonymous with the proposition obtained bysubstituting the other.

61

62 CHAPTER 6. SET THEORY & PROOF STRUCTURE

It is clear from the definition of the subset relationship that the proposition thatA = B is synonymous with the proposition that A ⊆ B and B ⊆ A. It is sometimes easierto prove two sets are equal by proving each of these conjoined component propositionsseparately.

6.2 Operations on sets

For any two sets A and B, it certainly makes sense to define the set of all elements thatbelong either to A or to B (possibly both, as we never use “or” in the exclusive sensewithout specifically saying so). The resulting set, called the union of A and B, dependsonly on the sets A and B:

Definition. A ∪B = {x : x ∈ A or x ∈ B}

Note that it is impossible to specify a “universal” set in which the elements of A∪Bare to be found, since it is impossible to know in advance what elements the sets A andB might contain.

Remark. The proposition that x /∈ A ∪ B means it is not the case that x ∈ A ∪ B,which by definition means it is not the case that x ∈ A or x ∈ B. Rephrasing thisproposition in a more useful, affirmative manner, with all the negations moved into theatomic components, we have that x /∈ A and x /∈ B.

There are two other important operations defined on sets:

Definition. Given sets A and B, A ∩B = {x : x ∈ A and x ∈ B}

The set A ∩B is called the intersection of A and B.

Remark. The proposition that x /∈ A ∩ B means it is not the case that x ∈ A ∩ B,which by definition means it is not the case that x ∈ A and x ∈ B. Rephrasing thisproposition in a more useful, affirmative manner, with all the negations moved into theatomic components, we have that x /∈ A or x /∈ B.

Remark. Equivalently, we may write A ∩ B = {x ∈ A : x ∈ B}, taking set A as ouruniverse from which to choose those elements x that are also in set B. Similarly, we maywrite A ∩B = {x ∈ B : x ∈ A}.

Definition. Given sets A and B, A \B = {x : x ∈ A and x 6∈ B}

The set A \B is called the complement of B in A. The set (A \B)∪ (B \A) is calledthe symmetric difference of A and B, commonly denoted by A4B or AB.

6.3. THE AXIOM OF RESTRICTED COMPREHENSION 63

Example. The proposition that x ∈ A \ (B ∪ C) is equivalent to x ∈ A and x /∈ B andx /∈ C. Note that the parentheses are essential in the first proposition but not needed inthe second, since conjunction is an associative operation.

Example. The proposition that x ∈ A \ (B ∩ C) is equivalent to x ∈ A and (x /∈ B orx /∈ C). Note that the parentheses are essential in the both propositions.

The operations of union and intersection can be defined more generally, which isnecessary in order to consider the union or intersection of infinitely many sets. If wecould only perform these operations on two sets at a time, we could only create theunion of finitely many sets, no matter how many times we iterated this process. (Theterm “finite” can be precisely defined, but we will be satisfied with its intuitive meaningfor purposes of this discussion.) The following definitions generalize these operations toany collection of sets.

Definition. Given a set A (whose elements are assumed also to be sets),⋃A∈A

A = {x : x ∈ A for some A ∈ A}.

Definition. Given a set A (whose elements are assumed also to be sets),⋂A∈A

A = {x : x ∈ A for every A ∈ A}.

6.3 The Axiom of Restricted Comprehension

Until about a century ago, mathematicians did not worry about making general defini-tions of sets, such as the ones we just made, whenever it seemed convenient to do so.But then, in the early twentieth century, the philosopher and mathematician BertrandRussell discovered a danger. Recall that we informally described a set, not merely asa collection of things, but as an identifiable collection of things. What this descriptionmeans to suggest is that, for any mathematical object, the question of whether or not thisobject is in a given set should have an answer. We might not know the answer. We mightnot even know a procedure for determining an answer. But surely, for any reasonableconceptualization of sets and mathematical objects, the question should have an answer,and this answer should be exactly one of two choices: “yes” or “no”! Formally, given anydefined set A and any mathematical object x, x ∈ A should be a closed proposition: itis either true or false, and not both. What Russell discovered was a purported definitionfor a set A, along with an indisputably mathematical object x for which the statementx ∈ A does not have a unique truth value! (We say a “purported” definition because, if


it cannot be determined for all objects x whether or not x belongs to the set, then surelywe cannot say the set has actually been defined at all.)

Before exhibiting Russell’s troubling example, let us make Russell’s conceptualiza-tion of sets and mathematical objects more concrete: mathematical objects and sets are,simply, the same thing. It should come as no surprise that sets are mathematical objects.Indeed, if mathematics is the study of patterns, and if patterns are recognized by dis-tinguishing and classifying their parts, and if classifying things means putting them intocategories, and if defining categories is the same as defining the sets of things in thesecategories, then surely sets are mathematical objects! On the other hand, it may seem abit peculiar at first that the only mathematical objects are sets. After all, you are usedto working with such things as numbers and geometric points. Are these things sets?According to this point of view, yes, they are. That is, we can define sets that representthem, we can define the operations that apply to them as set operations, and we candefine the relationships between them as relationships between sets. In particular, thereis no reason to shy away from letting sets be elements of other sets, and indeed those arethe only kind of elements there are. We develop this perspective further in Chapter 6.

Note that a set of sets in not the same as the union of these sets, as illustrated bythe following examples:

•{{1, 2}, {2, 3}

}has two elements, but {1, 2} ∪ {2, 3} has three elements.

•{{1, 2, 3}, {2, 3, 4, 5, 6}

}has two elements, but {1, 2, 3} ∪ {2, 3, 4, 5, 6} has six ele-

ments.

• {N,Q} has two elements, but N ∪Q = Q has infinitely many elements, which canbe listed systematically.

• {N,Q,R} has three elements, but (N ∪ Q) ∪ R = Q ∪ R = R has infinitely manyelements, which cannot be listed.

Here is Russell’s example. Suppose we attempt to define a set A in the followingmanner:

A = {x : x 6∈ x}At first this proposed definition might seem reasonable: if X is a specific set, then

give sufficient information about the elements of X, such as that given by its definition,we should be able to check whether or not X itself is an element of X. But considerthe set A specified by the proposed definition. If this is a good definition, then A is aset (and hence an object for mathematical consideration). If A is a set, then we must

6.3. THE AXIOM OF RESTRICTED COMPREHENSION 65

be able to tell if A itself is a member of A: the statement that A ∈ A must be a closedproposition that is uniquely either true or false. Suppose this statement is true. Then,by definition of A, it is false! On the other hand, suppose the statement that A ∈ A isfalse. In this case, A 6∈ A is true; hence by definition of A, the statement that A ∈ A istrue! Clearly, the statement that A ∈ A does not have a unique truth value. Thus, it isnot a proposition. So A cannot be a set!

If a proposed definition does not actually ensure that what it purports to define,whether a quality or an object, can be identified (given sufficient information), then itmust be rejected. We say that what it purports to define is not well defined. The set Aof Russell’s example is not well defined.

In response to this problem, mathematicians have devised specific assumptions, oraxioms, about what sets (and hence what “mathematical objects”) exist. Any assertionwe wish to make about sets, including the definition of any specific set (since to defineit is to assert that it exists), must ultimately be justified by the axioms of set theory (ofwhich there are several variations). We may assume, as an axiom of some other theory,that a set with some specific properties exists; we will in fact do so in the followingchapter for the set of real numbers and the operations of addition and multiplication.However, any such assumption is provisional. That such a set actually does exist mustbe justified from the axioms of set theory.

There certainly can be nothing wrong with the open statement “x /∈ x” itself, whichis constructed using only the membership relation and the logical operation of negation.The problem is that it has been applied in an unrestricted manner, beyond the instancesfor which its meaning can be comprehended. The axiom that prevents Russell’s paradox,called the Axiom of Restricted Comprehension, restricts the application of a property tothe elements of an existing set. (The Axiom of Restricted Comprehension is also calledthe Axiom of Separation, among other things, since it separates the elements that satisfyit from those that don’t.)

Axiom (Axiom of Restricted Comprehension). For all sets A, if p(x) is an open propo-sition whose truth value depends only on the value of the variable x, then {x ∈ A : p(x)}is a set.

In reality, the Axiom of Restricted Comprehension is a whole family of axioms, calledan axiom schema, one for each open proposition p(x): since propositions are linguisticconstructions about mathematical objects, not mathematical objects themselves, we can-not refer to “all open propositions” in a proposition! (In other words, we have provideda scheme for creating all of the axioms in this family: just pick any open propositionwhose truth value depends on a single variable and plug it in for p(x). There is an ex-plicit process for constructing any open proposition in the formal language of set theory,


using only variables, the relations = and ∈, and logical operations, but we have no needfor that level of formality at this point.) Note that {x ∈ A : p(x)} is synonymous with{x : x ∈ A and p(x)}. Thus, all of the axioms ultimately involve a variable whose valuesare unrestricted. (The Axiom of Restricted Comprehension says that if this variableis restricted to be a member of a known set in conjunction with satisfying some otherproposition, then the resulting set is well defined.)

6.4 Exercises

1. Let B be an existing set. Show that no contradiction arises from the restricteddefinition A = {x ∈ B : x /∈ x}, and determine whether or not A ∈ A. Thus, theAxiom of Restricted Comprehension avoids Russell’s paradox.

2. For sets A and B as in the previous exercise, determine whether or not B ∈ A.

6.5 More Axioms of Set Theory

The Axiom of Restricted Comprehension creates some new problems for us. For onething, in order to define any sets at all, we must first have a set! For another, even if wedid have some sets, our definition of the union operation is no longer legitimate, sincewe don’t have a set to which the proposition that x ∈ A or x ∈ B, or the more generalproposition that for some A ∈ A, x ∈ A, should be restricted. (The union A ∪ B, ormore generally

⋃A∈AA, would be such a set, but we cannot define it until we already

have defined it, a circular conundrum!) There is only one way out: we need some axiomsensuring the existence of some sets.

Our first axiom ensures the existence of the most basic set of all: the empty set. Thisaxiom is generally not included separately in the most generally used standard axiomsof set theory, as it follows in a rather subtle way from a much stronger, but harder tounderstand, axiom that we will leave for Chapter ??; however, for simplicity and clarity,we include it.

Axiom (Existence of the Empty Set). There is a set with no elements. That is, thereexists a set ∅ for which the following (unrestricted) proposition is true: for every x, x 6∈ ∅.

Technically, we have only assumed that there is a set with no elements. To call it theempty set and designate it by the symbol ∅, we must know it is unique. We will proveshortly that it is.

The next axiom allows us to create sets with up to two elements - still pretty basic.

6.5. MORE AXIOMS OF SET THEORY 67

Axiom (Axiom of Pairing). For all sets A and B, {A,B} is a set.

By {A,B} we mean, of course, the set for which the proposition that x ∈ {A,B} istrue for the instances x = A and x = B, false in all other instances. Note that the setsA and B need not be distinct. So, for example, we can now justify that {∅} is a set.

Axiom (Axiom of Union). For any set A (whose elements are assumed to be sets),⋃A∈A

A = {x : x ∈ A for some A ∈ A} is a set.

Note that together with the Axiom of Pairing, the Axiom of Union provides in par-ticular for the union of two sets, since A∪B =

⋃X∈{A,B}X. Starting with the empty set

and using the Axioms of Pairing and Union, we can create sets with any finite numberof elements, but no infinite set. And, of course, any set separated from a finite set byan open proposition will be finite. Thus, these four axioms are not sufficient to justifythe existence of even the natural numbers. Surely we cannot have mathematics withouteven being able to count indefinitely! Of course not! But we will leave the the axiomthat asserts the existence of the natural numbers - the much stronger but harder tounderstand axiom mentioned above - for Chapter 6.

It seems reasonable that the set of all subsets of a given set A is well defined; it iscalled the power set of A and denoted P(A). The proposition that x ∈ P(A) is, bydefinition, synonymous with the proposition that x ⊆ A. The existence of the power setof any given set is asserted by the following axiom:

Axiom (Existence of Power Sets). Given any set A, P(A) = {x : x ⊆ A} is a set.

Example. If A = {∅, {∅}}, then P(A) = {∅, {∅}, {{∅}}, {∅, {∅}}}. Notice that thepower set of a set always has the empty set and the entire set as elements. Can you seewhy it is called the power set? (If not, try some more examples!)

You might reasonably ask if the axioms of set theory succeed in solving the problemilluminated by Russell, not only avoiding Russell’s specific paradox (which you haveshown in the exercises that they do), but preventing us in general from ever defining aset that does not make logical sense. The answer is that we don’t know, and in fact thereis some controversy among mathematicians about whether or not it is safe to assume oneof them in particular (which we have not yet introduced - the one’s introduced so far areuncontroversial). If someone were to find some set defined in accordance with the axiomsthat leads to a logical contradiction, as Russell’s did, and hence is not well defined, thenwe would know the axioms were not adequate. We would have to discard some of theaxioms and perhaps formulate some new ones and hope that they were better, and we


would have to discard any mathematics that was based on the faulty definition. So far,no one has done this, but we cannot guarantee that no one ever will. Mathematics isnever finished!

Fortunately, people do seem to have good instincts about the nature of mathematics.Mathematicians certainly make mistakes, but so far in the history of mathematics noserious mathematics has ever been discarded because it exposes a fundamental flaw inthe logical foundation of the subject. Although Russell’s reasoning certainly counts asinsightful and serious mathematics, discarding his ill-defined example does not count asa loss to mathematics, because Russell’s only reason in formulating it was to expose thevulnerabilities in the logical foundation being used at that time, much as a computerhacker might expose the vulnerabilities in the security of a network. He showed that theset theory of the time was naive and in need of further development. (Indeed we still calltreatments of set operations without reference to foundational issues of set definition“naive” set theory.) Fortunately, Russell exposed these vulnerabilities before anyonemade a serious breach of logic (at least as far as we know).

6.6 The Structure of a Proof

You have already done quite a few simple proofs of familiar algebraic propositions. Formore difficult proofs, it is helpful to examine how the structure of the proposition to beproved helps guide the structure of the proof itself. In the previous chapter, we introducedmost of the logical methods by which the truth of a proposition may be established fromthat of other propositions, but proving more abstract and unfamiliar propositions helpsto reveal the structure of the arguments more clearly. Here is a systematic summaryof some basic proof structures, with examples from set theory. Of course, most proofscombine several of these structures and are longer than the examples given here.

6.6.1 From Universal to Specific

If a universal proposition has been established, then it is true in every instance. In thefollowing example, the Axiom of Pairing, which applies universally to any sets A and B,is applied to the instance that A = B = ∅.

Proposition. {∅} is a set.

Proof. By the Existence of the Empty Set (our first axiom), ∅ is a set. By the Axiom ofPairing, {∅, ∅} = {∅} is a set.

6.6. THE STRUCTURE OF A PROOF 69

6.6.2 Generality through Arbitrarity

In proving a universal proposition, it is usually clearest to write the proof in terms of anarbitrary instance of each variable. Here are some simple examples.

Proposition. For any sets A and B, A ⊆ A ∪B.

Proof. Since the proposition is universal to all sets A and B, let A and B be sets. Wemust show that every element of set A is also an element of set A ∪ B, so let x ∈ A.Then, more generally, it is certainly the case that x ∈ A or x ∈ B. Thus x ∈ A ∪ B bydefinition.

Proposition. For all sets A and B, A \B ⊆ A.

Proof. Let A and B be arbitrary sets. By definition, to prove that A \B ⊆ B, we mustprove that every element of A \B is an element of A. So let x ∈ A \B. By definition ofA \B, x ∈ A and x /∈ B. In particular, x ∈ A.

6.6.3 Vacuity

Anything is universally true for the elements of the empty set, since there aren’t any. Anequivalent way to look at this is that if the condition of an implication is universally false,then the implication is universally true. There is nothing to prove, since the consequenceis only required to be true for instances that satisfy the condition. We say the implicationis vacuously true.

Here are two examples:

Proposition. There is only one set having no elements.

Proof. Let A and B be sets with no elements. Then the proposition that (∀x)(x ∈ A⇒x ∈ B) is vacuously true. Conversely, the proposition that (∀x)(x ∈ B ⇒ x ∈ A) is alsovacuously true. Thus, by definition, A = B.

This proposition retroactively justifies our assigning a special symbol to the emptyset and using the article “the” when referring to it. Our axiom only asserts the existenceof “a” set with no elements, so technically we should have designated such a set with avariable until we proved that only one set had this property; uniqueness is what turns avariable into a constant. However, sticking rigidly to logical order in situations like thisis often cumbersome and rather pedantic, so retroactive justifications of this kind arecommon.


The proof of this proposition illustrates the general method for showing that an objectwith a given property is unique: consider two objects with this property and prove theyare equal.

Proposition. The empty set is a subset of any set; that is, for any set A, ∅ ⊆ A.

Proof. For any set A, the proposition that (∀x)(x ∈ ∅ ⇒ x ∈ A) is vacuously true. Thus,by definition, ∅ ⊆ A.

Corollary. For any set A, ∅ ∈ P(A).

A corollary is a proposition that follows immediately from an established result. Thisone is an immediate consequence of the definition of the power set; the proof is left toyou.

6.6.4 Division into cases

If we know that one of two assertions must be true, we may devise separate arguments ineach case. In the following example, the definition of union provides a natural divisioninto cases.

As is often the case, the proposition we have to prove is conditional as well as uni-versal: we must show that in every instance satisfying the condition (which may bethe conjunction of several component conditions), the consequence is true. Since thereis nothing to prove in instances that fail to satisfy the condition, we may assume forthe sake of argument that our arbitrary instance does satisfy it. This assumption is,of course, provisional, as indicated by the phrase “for the sake of argument”; there iscertainly no reason to presume that the condition is always true! Our assertion is hypo-thetical ; we call it a hypothesis. The consequence of our hypothesis, which we wish toestablish in the proof, is called the conclusion.

A similar method may be used to write an argument based on an existential propo-sition. If it has been established or assumed that something with a particular propertyexists, we may consider such an object, represented by a variable, and deduce otherpropositions from its existence. An example of this method appears in Section 6.6.5, onproof by contradiction, below.

Proposition. For any sets A, B, and C, if A ⊆ C and B ⊆ C, then A ∪B ⊆ C.

Proof. Let A, B, and C be arbitrary sets satisfying the hypothesis that A ⊆ C andB ⊆ C. Let x ∈ A ∪B. By definition of union, x ∈ A or x ∈ B.


Case 1: x ∈ A. Since A ⊆ C, if x ∈ A then x ∈ C by definition of subset; hence,x ∈ C.

Case 2: x ∈ B. By a similar argument, x ∈ C in this case as well.

In either case, x ∈ C; hence, by definition, A ∪B ⊆ C.

Here is an example in which working through the quantifiers, hypotheses, and casestakes several steps.

Proposition. For all sets A and B, A ∪B = B ⇔ A ⊆ B.

Proof. There are two sub-propositions to prove: A ∪ B = B ⇒ A ⊆ B and A ∪ B =B ⇐ A ⊆ B. Let us address them separately.

First, we claim that A ∪ B = B ⇒ A ⊆ B. Assume as hypothesis that A ∪ B = B.We must show that every element of A is also an element of B, so let x ∈ A. By anearlier proposition, we have A ⊆ A ∪ B, so by definition x ∈ A ∪ B. Since A ∪ B = B,x ∈ B (by the Axiom of Extensionality). Thus A ⊆ B, by definition.

Next we claim that A ⊆ B ⇒ A ∪ B = B. Assume as hypothesis that A ⊆ B.The conclusion we wish to establish may be separated into two parts: B ⊆ A ∪ B andA∪B ⊆ B. The first of these assertions always holds, as proven earlier, so we need onlyprove the second. So let x ∈ A ∪B. By definition, there are two cases:

Case 1: x ∈ A. Then since, by hypothesis, A ⊆ B, x ∈ B by definition.

Case 2: x ∈ B. This is our desired conclusion, so there is nothing more to prove in thiscase.

It is certainly possible to divide into more than two cases. However one divides intocases, the key point is to make sure that at least one of the cases must hold for all objectsunder consideration. A good way to do this is to use a proposition and its negation foreach division, as we explain now.

We may assert propositions that are logically tautological, that is, forced by theirlogical construction to be true; for example, the proposition that, for any x, x = ∅ orx 6= ∅ is tautologically true, as is the proposition that, for any x, it is not the case thatboth x = ∅ and x 6= ∅. Given an open proposition and its negation, one must be true, andthe other false, in any instance. Being universally tautologically true, the disjunction ofany proposition and its negation provides a foolproof division into cases.

The following example demonstrates this method, along with several other notablepractices. In particular, as in the example above, the proposition to be proven is bicon-ditional, but writing every step biconditionally is difficult in this case, so we will prove


each component separately. To organize our proof in a readable fashion, we state eachcomponent proposition as a claim. That is, we claim it holds, and then provide a proofthat it does.

In general, it is good practice to organize a proof strategically into the major asser-tions that, taken together, lead to the result. This provides a framework for the ideasand logic, so that the argument is not just a formless sequence of steps (however logicallycorrect it might be). Each major assertion is proposed as a claim, telling the reader (andthe author) where you are going with the argument that follows, which usually takesmultiple steps. If there are many claims, and especially if there are claims within claims,it helps to number them in outline form. Do not confuse claims with cases: a claimis a proposition you are asserting; a case is a possibility, one of which must be true,although you have no way of knowing which one. You are not asserting or proving thatany particular case holds.

It is also helpful, as illustrated by many of our previous examples, to prove in advancesome preliminary results that can be conveniently cited. One we will need is that everyset is a subset of itself; we leave the proof to you.

Finally, notice that in a long proof the quantifiers are often implied in the expositionrather than repeated over and over.

Proposition. The power set of {∅} is {∅, {∅}}.

Proof. By definition, to prove that P({∅}) = {∅, {∅}}, we must prove that, for all setsx, x ∈ P({∅})⇔ x ∈ {∅, {∅}}.

Claim. x ∈ P({∅})⇐ x ∈ {∅, {∅}}.

Let x be a set, and assume x ∈ {∅, {∅}}; then x = ∅ or x = {∅}.

Case 1: x = ∅. By the corollary to a previous proposition, ∅ ∈ P({∅}).

Case 2: x = {∅}. Since every set is a subset of itself, {∅} ∈ P({∅}).

Thus, in either case, if x ∈ {∅, {∅}}, then x ∈ P({∅}).

Claim. x ∈ P({∅})⇒ x ∈ {∅, {∅}}.

Let x ∈ P({∅}). Since x ∈ P({∅}), x ⊆ {∅}, and hence, by definition, (∀y ∈ x)(y ∈{∅}). Thus, the only possible element of x, if there is one, is ∅. Either x is empty or itis not.

Case 1: x = ∅. In this case, x ∈ {∅, {∅}}.


Case 2: x 6= ∅. Then there is an element y ∈ x; therefore, x = {∅} ∈ {∅, {∅}}.

Thus, in either case, if x ∈ P({∅}), then x ∈ {∅, {∅}}

The fact that a proposition and its negation are mutually exclusive (that is, only onecan be true in any instance) can be helpful in proving that exactly one of several optionsoccurs, as in the following example:

Proposition. Given any sets A and B, every element of A∪B belongs to one and onlyone of the following three sets: A \B, B \ A, or A ∩B.

Proof. Let A and B be sets, and let x ∈ A ∪B. By definition, x ∈ A or x ∈ B.

Case 1: x ∈ A.

Case 1(a): x ∈ B. Then by definition, x ∈ A ∩B.

Case 1(b): x /∈ B. Then by definition x ∈ A \B.

Case 2: x /∈ A. Then, since x ∈ A or x ∈ B, it must be the case that x ∈ B;hence, by definition, x ∈ B \ A.

Thus, we know that x belongs to one of the sets A∩B, A \B, or B \A. Conversely:if x ∈ A ∩ B, then by definition x ∈ A and x ∈ B (Case 1(a)); if x ∈ A \ B, then bydefinition x ∈ A and x /∈ B (Case 1(b)); and if x ∈ B \A, then by definition x ∈ B andx /∈ A (Case 3). Since these cases are mutually exclusive, x belongs to only one of thesesets..

6.6.5 Proof by Contradiction

Our final strategy is to provisionally assume the negation of the proposition we wantto prove and, from this assumption, deduce a contradiction, meaning the conjunction ofsome proposition and its negation, which must be false. (A proposItion and its negationcannot both be true.) It therefore follows that our original proposition must be true, sinceits negation led directly to a falsehood. There are a variety of names for this method ofproof, including reductio ad absurdum (Latin for “reduction to the absurd”), “indirectproof” (since we reach the desired result indirectly, by first supposing it was false), orjust plain “proof by contradiction.”

When beginning a proof by contradiction, I prefer to use the word “suppose,” some-times augmented by the phrase “by way of contradiction” or use of the subjunctive mood,


when assuming the negation of the proposition to be proven, to emphasize for claritythe fact that this assumption is not only provisional, but is going to be rejected as false(unlike our provisional assumption of any conditions in a direct proof, which need not bethe case, but could be). It is always nice to let the reader know (and remind yourself)what is coming down the pike!

A nifty way to look at proof by contradiction is as follows: Let p be the proposition tobe proven. Let F represent a contradiction, say q ∧¬q, and let T represent its negation,¬q ∨ q (which, of course, is true). In proof by contradiction, we show that ¬p ⇒ F ,which is synonymous with its contrapositive, T ⇒ p. Since the condition T is true, theconsequence p must be as well.

As one might expect, proof by contradiction is the method to be used for provingany proposition that is stated in the negative and cannot be rephrased positively. Thus,it is the method to be used for proving a set is empty, as in the following example:

Proposition. For any sets A and B, (A ∩B) ∩ (A \B) = ∅.

Proof. Suppose by way of contradiction that (A ∩ B) ∩ (A \ B) were not empty. Thenthere would be some element x ∈ (A ∩ B) ∩ (A \ B). For this element we would have,by definition of intersection, that x ∈ A ∩ B and x ∈ A \ B; therefore, again followingthe definitions, x ∈ A and x ∈ B and x ∈ A (a redundant assertion we would normallyomit) and x /∈ B. Since x ∈ B and x /∈ B is a contradiction, no such x can exist, so weconclude that (A ∩B) ∩ (A \B) = ∅.

Definition. Sets A and B are disjoint if (and only if) A ∩B = ∅.

Using this definition, we may rephrase the previous proposition as stating that, forall sets A and B, the sets A ∩B and A \B are disjoint.

Remark. All definitions are biconditional; sometimes the phrase “and only if” is omitted,but it is always implicit in a definition. This convention is the one exception to the rulein mathematics that all assertions must be explicitly stated.

Remark. Proof from vacuity may be regarded as a type of proof by contradiction. Forexample, in proving that, for any set A, ∅ ⊆ A, we could suppose by way of contradictionthat there is an element of the empty set that is not an element of A. The very existenceof this element (regardless of any relationship it has to set A) contradicts the definingproperty of the empty set. We conclude that every element of ∅ is an element of A.

Students tend to love proof by contradiction, so I hasten to point out that it should notbe overdone. Often a direct proof is simpler and clearer. Suppose, for example, that youset out to prove a statement of the form (∀x ∈ A)(p(x)⇒ q(x)). You suppose by way of

6.7. EXERCISES 75

contradiction that (∃x ∈ A)(p(x) and ¬q(x)). You consider such an element x and, aftera series of deductions in which, as it happens, you never need the component p(x) of yoursupposition, you deduce ¬p(x), a contradiction. In this situation, it would be easier andclearer to prove directly the contrapositive proposition that (∀x ∈ A)(¬q(x) ⇒ ¬p(x)).(Presumably, if you didn’t use p(x) in your proof by contradiction, you did use ¬q(x),since it is the only other proposition you supposed. If you didn’t, then you can prove -directly - the stronger, unconditional statement that (∀x)p(x).)

As a bad example to demonstrate this, consider a proof by contradiction of our earlierproposition that if A ⊆ C and B ⊆ C, then A ∪ B ⊆ C. Suppose not; that is, supposethat A ⊆ C and B ⊆ C and A ∪ B 6⊆ C. Since A ∪ B 6⊆ C, there is an elementx ∈ A ∪B such that x /∈ C. Since x ∈ A ∪B, x ∈ A or x ∈ B. In the first case, A 6⊆ C,contradicting a component of our hypothesis. Similarly, in the second case, B 6⊆ C, againcontradicting a component of our hypothesis. We conclude that if A ⊆ C and B ⊆ C,then A∪B ⊆ C. Even though we have abbreviated our exposition in comparison to thelaborious style of the earlier proof, in which we strove to make every logical step explicit,the awkwardness of this proof should nonetheless be apparent. Hypotheses that are notused directly are confusing, hence bad style, and too many negatives give anyone overthe age of two a headache!

6.7 Exercises

In the following exercises, you may use any of the propositions proven in the examples.

1. Prove that {∅, {∅}} is a set.

2. Prove that {∅, {∅}, {∅, {∅}}} is a set.

3. Prove that, for any sets A and B, A \ B is indeed a set. (Hint: Use the Axiom ofRestricted Comprehension.)

4. Prove that, for any sets A and B, A ∩B is a set.

5. Prove that for any set A, A ⊆ A. This result explains why we prefer to use thesymbol ⊆ for the subset relationship rather than ⊂.

6. Prove that for any sets A and B, A ∩B ⊆ A.

7. Prove that for any set A, ∅ ∈ P(A).

8. Prove that, for any sets A and B, A ∩B = A \ (A \B).


9. Prove that for any sets A and B, A ∩B = B ⇔ B ⊆ A.

10. Prove that for any sets A and B, A ∪B = B ⇔ A ⊆ B.

11. Prove that for any sets A and B, A \B and B \ A are disjoint.

12. Prove that for any sets A and B, A = (A \B) ∪ (A ∩B).

13. Prove that for any sets A and B, (A ∪B) \B = A \B.

14. Prove that for any sets A and B, A \ (A ∩B) = A \B.

15. Let A, B, and U be any sets such that A ⊆ U and B ⊆ U . (The letter “U” is usedbecause we can think of U as the “universe” to which the elements of sets A andB belong.) The following propositions are called De Morgan’s Laws. Prove them.

(a) U \ (A ∩B) = (U \ A) ∪ (U \B).

(b) U \ (A ∪B) = (U \ A) ∩ (U \B).

16. Let A, B, and U be any sets such that A ⊆ U and B ⊆ U . Prove that A = B ⇔U \ A = U \B.

17. Provide a counterexample showing the proposition of the previous exercise is falseif we drop the condition that A ⊆ U and B ⊆ U .

18. Prove that U \ (U \ A) = A⇔ A ⊆ U .

Remark. If a universal set U , of which all sets being considered are subsets, is understoodin a given context, the shorter notation Ac = U \ A is customary. In this situation, werefer simply to the complement of A, omitting the phrase relative to U .

19. Prove that A ∩ B = (A ∪ B) \ (A B). (Hint: Apply one of De Morgan’s laws,choosing an appropriate set U and using previous results to establish the necessaryhypothesis. This approach results in a much shorter proof than starting fromscratch. Use previous results whenever possible!) The current result shows thatthe intersection operation can be defined in terms of union and complementation.

20. Write A \B using only the operations of union (∪) and symmetric difference ().Prove your expression is correct.

21. Consider the operations ∪, ∩, , and \ as binary operations on sets.

6.7. EXERCISES 77

(a) Which are associative? Prove your answers. (That is, for those that are,prove the appropriate universal proposition, and for those that aren’t, pro-vide a specific counterexample. A counterexample is simply an instance thatdisproves a universal proposition by showing the existence of an exception.)

(b) Which are commutative? Prove your answers.

(c) Which pairs of operations satisfy a distributive relationship. Prove your an-swers.

22. Compute the power set of {∅, {∅}, {∅, {∅}}}. For extra credit, prove your answeris correct.

23. Now that we have added an axiom asserting the universal existence of power setsand the Axiom of Extensionality to our collection, can we prove the existence ofan infinite set? Explain.

Chapter 7

The Natural Number System

In which we define the natural numbers as a subset of the real numbers and use itsdefining property to establish the methods of recursive definition and proof by induction.

7.1 Definition of the Set of Natural Numbers

We all know what the natural numbers are: {0, 1, 2, 3, . . .}. (There are differing opinionson whether or not zero should be included - we include it. 1) But suppose someone askedyou to prove that some proposition was true for all natural numbers. What would youdo? As a subset of the real number system, the natural numbers inherit all the axiomaticproperties that are true for all real numbers, but the proposition to be proven might notapply to all real numbers; rather, its proof might require something special about thenatural numbers.

For example, the following proposition is true: For any natural number n, the sumof all the natural numbers less than n is n(n+1)

2. This proposition does not even make

sense for all real numbers, because the set of real numbers less than a given number isnot finite. (Even an infinite series would not make sense, because the set of real numbersless than a given number cannot be listed, even in an infinite sequence.) How could thisproposition be proven? “Let n be a natural number. Then . . .” what, exactly? We can’tcheck the sum for every number on the list!

1It is true that zero is somewhat different, because we do not use it to count objects, but ratherto indicate that there are no objects to count. Historically, zero arose later than the other naturalnumbers. Nonetheless, logicians often include zero in the natural number system because, in constructingrepresentatives of the natural numbers using set theory, we begin with the empty set.

79

80 CHAPTER 7. THE NATURAL NUMBER SYSTEM

An infinite list suggested by a pattern is not adequately precise for proofs. We needa definition for the set of natural numbers. The properties in this definition can thenbe used in proofs. To formulate a definition, we must identify the essential - that is,defining - properties that the natural numbers, as we conceive them, possess.

We recognize that the system of natural numbers, as the system used for countingdiscrete objects, has three essential properties:

• It has a specified starting point, the first and smallest natural number (0 in ourcase, marking the absence of any object, 1 if we were to take counting to beginwith the first object present).

• Adding 1 to any natural number gives a new natural number. No matter howlarge the crowd, you can always add one more. Once we learn a systematic way ofnaming the numbers, we could explicitly count forever; even though we might notknow an English word for every number, once we decide on a place value system wecan write the numeral for any number. (The really large ones would take a very,very long - but finite - piece of paper. If you use the binary system, for example,you only need two symbols for enumeration, “0” and “1”, but the numerals quicklyget long: 0, 1, 10, 11, 100, 101, 110, 111, 1000, . . ..)

• The only natural numbers are the ones that can reached from 0 (or 1) by counting,adding one at a time. There are no others.

To more succinctly refer to the second property above, we give it a name:

Definition. A set S of real numbers is inductive if, ∀s ∈ S, s+1 ∈ S. (You might preferto think of this property in its conditional form as a statement about all real numbers:∀s ∈ R, s ∈ S ⇒ s+ 1 ∈ S.)

We can now define the set of natural numbers:

Definition. The set of natural numbers, denoted by N, is the intersection of all inductivesubsets of R that contain 0.

This definition captures, with precision, exactly the three properties listed above.The intersection of a family of sets, each of which contains 0, contains 0. (The proofwhich follows directly from the definition of the intersection of a family of sets, is left tothe exercises.) The intersection of a family of inductive sets is inductive (a fact whoseproof is also left to the exercises). By taking the intersection of all sets of real numberswith these properties, we obtain the smallest one, including only those numbers that areabsolutely necessary to obtain an inductive set, starting with 0.

7.2. PROOF BY INDUCTION 81

7.2 Proof by Induction

A direct consequence of the definition of the set of natural numbers is that any inductiveset that contains 0 must contain the entire set of natural numbers. (See Exercise 4.)Therefore, to show that a proposition is true for all natural numbers, we simply need toshow that the set of numbers for which the proposition is true includes 0 and is inductive.This proof strategy is called proof by induction. We now demonstrate it with a simpleproposition.

Proposition. Every natural number is greater than or equal to 0.

Proof. We examine the set [0,∞) = {x ∈ R : x ≥ 0}, showing that it contains 0 and isinductive. (In naming this set, I have used the customary interval notation.)

Claim. 0 ∈ [0,∞).

This is obvious: 0 = 0.

Claim. The set [0,∞) is inductive.

This means to show that if x ∈ [0,∞), then x+ 1 ∈ [0,∞). So let x ∈ [0,∞). Then,by the defining property of [0,∞), x ≥ 0. We proved earlier that 1 > 0; hence, by Axiom6 of the real number system and the defining property of 0, x + 1 > x + 0 = x. So bytransitivity (or substitution in the case x = 0), x+ 1 > 0. Thus x+ 1 ∈ [0,∞).

Since [0,∞) is inductive and contains 0, N ⊆ [0,∞). In other words, every naturalnumber is greater than or equal to 0.

In writing the proof above, I have emphasized by my phrasing that we are provingthat the set specified by a certain property contains the set of natural numbers; again,we do this by proving that the specified set contains 0 and is inductive (so it is one ofthose sets whose intersection creates the natural numbers). It is helpful to rememberthat all of the variants of proof by induction are just modifications or refinements of thisidea.

It is customary to write proofs by induction in an abbreviated form by simply con-sidering the property in question. We may also restrict our attention to natural numberswith that property; thus, we are showing that the set of natural numbers with the spec-ified property contains, and is therefore equal to, the whole set of natural numbers. Forthe proposition above, it was not necessary to assume this restriction, since the entireset of non-negative real numbers is inductive; however, often the restriction to naturalnumbers is needed, as the set of all real numbers with the desired property may not beinductive, or even definable. (For example, as noted above, it makes no sense to define


the set of real numbers x such that the sum of all of those numbers less than or equal tox is x(x+1)

2.) Here is the same proof written in the customary form:

Proof.

Claim. 0 ≥ 0. Obvious.

Claim. n ≥ 0⇒ n+ 1 ≥ 0. (Actually, with no extra work we get the stronger statementthat n ≥ 0⇒ n+ 1 > 0.)

Assume n ≥ 0. We proved previously that 1 > 0. Thus, by Axiom 6 of the realnumber system and the defining property of 0, n + 1 > n ≥ 0. (Note: We could justhave well used the alternative argument that n+ 1 ≥ 1 > 0.)

Remark. The first claim is customarily called the initial claim (or the initial step ofthe proof), and the second claim is customarily called the inductive claim (or inductivestep). The hypothesis assumed to prove the conclusion of the inductive claim is calledthe inductive hypothesis.

Remark. It is important to recognize that we are not making a circular argument byassuming that n ≥ 0 in the inductive claim. The proposition we are proving, (∀n ∈N)(n ≥ 0), is universal and unconditional. The inductive claim, (∀n ∈ N)(n ≥ 0 ⇒n+ 1 ≥ 0) is also universal, but conditional. In assuming as our hypothesis its conditionthat n ≥ 0, we are merely considering an arbitrary particular (and hypothetical) naturalnumber n that would satisfy the condition, not making an assumption about all naturalnumbers.

7.3 Recursive Definition

We would like to now explore the proposition that, for any natural number n, the sumof all the natural numbers less than n is n(n+1)

2. However, we have a technical problem.

We all know what is meant by the sum of all the natural numbers less than n, but thenumber of terms in this sum depends on n and can be arbitrarily large. How do wedefine it precisely? (As you know, nothing can be rigorously proven about a conceptthat is not precisely defined!) What we need is called a recursive definition:

Definition.n∑i=0

i =

{0, if n = 0(∑n−1

i=0 i)

+ n, if n > 0.

7.3. RECURSIVE DEFINITION 83

Remark. In other words, we have defined each multiple sum in terms of a binary sumwith the previously defined result:

∑0i=0 i = 0,

∑1i=0 i = 0 + 1 = 1,

∑2i=0 i = 1 + 2 = 3,∑3

i=0 i = 3+ 3 = 6, . . .. This agrees, of course, with the way we would add up a string ofnumbers in practice. The word “recursive” comes from the Latin root cursare, meaningto run, and prefix re-, meaning back. We “run back” to get the preceding output in orderto use it in obtaining the next one.)

Remark. More informally, we can write∑n

0 i = 0+1+2+3+ · · ·+n. Anytime an ellipsis(“· · · ”) is used to indicate a pattern involving the natural numbers up to some arbitrarypoint, a recursive definition is really being made. Any time an expression with an ellipsisis used in a calculation, a proof by induction is really being informally conveyed.

Now that we have defined∑n

i=0 i, we can prove our proposition:

Proposition. For all natural numbers n,∑n

i=0 i = n(n+1)2

.

Proof.

Claim. The result is true for 0.

0∑i=0

i = 0 =0(1)

2

by definition and basic arithmetic (results proven previously).

Claim. If the result is true for a natural number n− 1, then it is true for n.

Assumen−1∑i=0

i =(n− 1)(n)

2.

By definition,n∑i=0

i =

(n−1∑i=0

i

)+ n,

and by hypothesis,n−1∑i=0

i =(n− 1)(n)

2.

Thus, substituting, we obtainn∑i=0

i =(n− 1)(n)

2+ n =

n2 − n+ 2n

2=n2 + n

2=n(n+ 1)

2.


The two claims in the proof above show that the set of numbers n for which∑n

i=0 i =n(n+1)

2contains 0 and is inductive, respectively. It follows that this set contains all the

natural numbers: the equation∑n

i=0 i = n(n+1)2

holds for all of them.

Remark. It is often convenient to use the expression n − 1 for the arbitrary naturalnumber in our condition in order to match the notation of a recursive definition, inwhich case the next number is n. Since n− 1 is a natural number, n ≥ 1. You can justas well use n and n + 1 if you prefer (in which case n may be any natural number) aslong as you are consistent.

Proof by induction is an incredibly powerful method, so powerful that its power canbe a drawback to understanding. The logic proceeds as if by magic, sometimes givinglittle intuition into why the result is true. It is always a good idea to look for a proof thatilluminates what is going on behind the formula. For this reason, many people preferthe following alternative proof of the proposition:

Proof. Let f(n) = 0 + 1 + 2 + · · ·+n. Then, since addition is commutative, we may alsowrite f(n) = n+(n−1)+(n−2)+ · · ·+1+0. Combining the terms two at a time, we get2f(n) = [0+n]+[1+(n−1)]+[2+(n−2)]+ · · ·+[n+0] = n+n+n+ · · ·+n = n(n+1).

Dividing by two gives the desired result that f(n) = n(n+1)2

.

Although it appears that induction isn’t involved in this second proof, it really is.There are some unproven assertions hidden in the proof, and the proof of these assertionsdoes require proof by induction. Can you find them? You really can’t get aroundinduction for proofs of results like this! But you can write the proof in an informal stylethat enhances understanding. Be prepared to justify your intuitive assertions if asked!

The sum∑n

i=0 i is determined by n alone; in other words,∑n

i=0 i is a function of n.For a moment, let us denote this function by f : f(n) =

∑ni=0 i. The domain of f is the

set of natural numbers, N. Its outputs can be proven by induction to also be naturalnumbers, although that is not necessary: the outputs of an inductively defined functioncan be in any set.

We have defined f(n) for each n ∈ N by applying a formula based on an already knownfunction, in this case simply the binary operation of addition, to n and the output off(n − 1): f(n) = f(n − 1) + n. We refer to this formula as the recursive rule. Therecursive rule may involve any operations that have already been defined. The inputs forthe recursive rule may include, along with the number n itself, any number of previousoutputs of the function being recursively defined. Analogous to a proof by induction,you have to start by defining the number of initial outputs needed. In this example, weonly needed to define the specific value of f(0) initially. Here is two similar examples:

7.3. RECURSIVE DEFINITION 85

Example. The definition of the factorial function uses multiplication in its recursiverule rather than addition.

n! =

{1, if n = 0

n · (n− 1)!, if n > 0

Informally, n! = n(n − 1)(n − 2) · · · (2)(1). If we denote the factorial function by g,then the recursive rule is g(n) = n · g(n− 1).

Example. The exponential function for a given base b is defined for natural numberinputs as follows:

bn =

{1, if n = 0

b · bn−1, if n > 0

Any function f whose domain is the natural numbers is equivalent to the infinitesequence (ai)i∈N = (ai)

∞i=0 = (a0, a1, a2, . . .) defined by ai = f(i). For example, the

function f defined by f(i) = i2, i ∈ N, is equivalently defined by the sequence (i2)i∈N =(i2)∞i=0. A nice aspect of sequential notation is that one does not need to designatea letter such as f to name a function defined by a formula. The set of subscriptsthat indicate the place of each entry in the sequence is called the index set (hence thecustomary use of the letter i, or subsequent letters of the alphabet as needed, to denotean arbitrary element of this set), and its elements are called indices (singular: index ).More generally, any function whose domain is an inductive subset of the natural numbersis equivalent to an infintie sequence. For example, if the domain of a function f is the setN \ {0, 1} = {2, 3, 4, . . .}, then f is equivalently described by the sequence (ai)i∈N\{0,1} =(ai)

∞i=2 = (a2, a3, a4, . . .) defined by ai = f(i). Still more generally, any function at all

may be described as a sequence using its domain as the index set. In this way we obtainfinite sequences, such as (i2)9

i=1, as well as sequences in which the sequential orderingcharacteristic of the natural numbers is lost, such as (x2)x∈R. Sequential notation is justanother way to describe a function.

Example. The famous Fibonacci sequence, which begins (1, 1, 2, 3, 5, 8, 13, . . .), is de-fined by a0 = F (0) = 1, a1 = F (1) = 1, and the recursive rule that an = F (n) =F (n − 2) + F (n − 1) for n ≥ 2. This is an example of a recursive rule that retrievesoutput values further back than the preceding value.

It can be proven that any recursive rule and suitable sequence of initial values definesa unique function. The formal statement and proof of this fact are somewhat involved,and we defer them until we have completed a thorough, rigorous discussion of functionsin Chapter 8.


7.4 Exercises.

1. Prove that if every set in a family contains 0, then the intersection of the sets inthis family contains 0.

2. Prove that the intersection of a family of inductive sets is an inductive set.

3. Prove for arbitrary unions and intersections that:

(a) For any family of sets A, (∀A ∈ A)(A ⊆⋃A∈A

A).

(By a family of sets we simply mean a set of sets; we choose a different word tomake it easier to distinguish verbally the sets that are elements of the familyfrom the family itself. Sometimes other synonyms for set, such as collection,are used as well.)

(b) For any family of sets A, (∀A ∈ A)(⋂A∈A

A ⊆ A).

4. Prove that if S is any inductive set such that 0 ∈ S, then N ⊆ S. (Hint: Usethe definition of N as the intersection of all inductive sets containing 0. From anotational point of view, it is necessary to give a name to the collection of all suchsets, as in “let I0 be the set of all inductive sets containing 0.”)

5. Prove that if m and n are natural numbers, then n+m is a natural number. (Hint:Fix any m ∈ N and use induction on n, starting with n = 0.)

6. Prove that the product of any two natural numbers is a natural number.

7. Prove that there is no natural number between 0 and 1. (Hint: Show that the set{0} ∪ [1,∞) is inductive. The proof naturally divides into two cases.)

8. For every natural number n, prove that there is no natural number between n andn+ 1.

9. Prove that every natural number other than 0 is greater than or equal to 1.

10. Compute the first ten terms of the Fibonacci sequence.

11. Prove the fundamental properties of exponents:

(a) bmbn = bm+n.

7.5. PROVING RECURSIVE INEQUALITIES BY INDUCTION 87

(b) (bm)n = bmn.

Hint: For both exercises, fix a natural number m and use induction on n.

12. Let f : N→ R be a fixed function. Define∑n

i=0 f(i) using a recursive definition.

13. Let f : N → R be a fixed function. Define the general product Πni=0f(i) =

f(0)f(1)f(2) · · · f(n) using a recursive definition.

14. Let r be any fixed real number. Define f : N→ R by f(i) = ri. For this particularfunction, the sum defined recursively in the manner of the previous problem iscalled a geometric sum and may be written informally as 1 + r + r2 + · · · + rn.Prove the geometric sum formula: 1 + r + r2 + · · ·+ rn = 1−rn+1

1−r .

15. For all natural numbers n, show that∑n

i=1 i2 = n(n+1)(2n+1)

6.

16. Let a be a fixed real number, and let f : N→ R be a fixed function. . Prove thata∑n

i=1 f(i) =∑n

i=1 af(i).

17. Let f, g : N → R be fixed functions. Prove that∑n

i=1 f(i) + g(i) =∑n

i=1 f(i) +∑ni=1 g(i).

18. Combining the results of the previous two problems, conclude that for fixed realnumbers a and b and fixed functions f, g : N→ R,

∑ni=1 af(i)+bg(i) = a

∑ni=1 f(i)+

b∑n

i=1 g(i). (For any real numbers a and b, the function that associates to eachordered pair of real numbers (x, y) the expression ax+ by is called a linear combi-nation of x and y; thus, we have proven that “the sum of a linear combination isthe linear combination of the sums.”)

19. Use the result of the previous problem to give an alternate (and more intuitive)proof that 1+r+r2+· · ·+rn = 1−rn+1

1−r . (Hint: Consider (1−r)(1+r+r2+· · ·+rn).)

20. Apply the trick you used in the previous problem or, alternatively, apply the ge-ometric sum formula, to find the fractional expression for the repeating decimal.12345.

7.5 Proving Recursive Inequalities by Induction

Proving inequalities involving recursively-defined functions is sometimes harder thanproving equations. On the one hand, there is more room to maneuver when proving an


inequality, since exactitude is not necessary, but on the other hand, the intermediateinequalities sufficient to prove a desired result may not be entirely obvious. Here is anexample of an inequality involving an exponential function:

Proposition. For all natural numbers n, 2n > n.

Proof.

Claim (Initial Claim). 20 = 1 > 0.

Claim (Inductive Claim). 2n > n⇒ 2n+1 > n+ 1.

Assume 2n > n. Since 2n+1 = 2 · 2n, by definition, and 2 > 0, we have 2n+1 > 2n, byAxiom 6(b) and the inductive hypothesis. (By definition, 2 = 1 + 1, and we previouslyproved that 1 > 0; hence 2 = 1 + 1 > 1 + 0 = 1 > 0. 2) Thus, it now suffices to provethat 2n ≥ n + 1. (Observe that ≥ is sufficient, since we already have strict inequalityin 2n > 2n. There is no sense trying to prove something stronger than we have to!) Bythe distributive law and the defining property of 1, 2n = n + n. If n ≥ 1, it is clear(by Axiom 6(a)) that n + n ≥ n + 1, but it is also possible that n = 0. Therefore, wemust divide into cases, using a different argument in the case n = 0. (We will showindependently in the exercises that there are no natural numbers between 0 and 1, so ifn 6= 0, then n ≥ 1.)

Case 1: n = 0. 2n+1 = 21 = 2 > 1 = n+1. (Observe that the inductive hypothesisis not necessary in this case, since its specificity allows us to deduce the conclusionunconditionally.)

Case 2: n ≥ 1. As shown above, 2n+1 > n+ n ≥ n+ 1.

7.6 The Integers

The set of integers, denoted by Z, consists of the natural numbers and their additiveinverses:

Z = N ∪ {−n : n ∈ N}.

Note that N = {k ∈ Z : k ≥ 0}.2We know from an earlier proposition that 2 ≥ 0. The argument just given rules out the possibility

that 2 = 0. Although this possibility may seem ridiculous, it actually occurs in algebraic systems thatdon’t possess an order satisfying Axiom 6.

7.7. EXERCISES 89

It is easily checked that the integers also form an algebraic system: the sum or productof two integers is an integer. You are asked to do this in the exercises. An algebraicsystem with two associative operations, addition and multiplication, in which additionis commutative, the distributive property holds, each operation has an identity element,and every element has an additive inverse is called a ring. Multiplication in a ring neednot be commutative, although in this case it is, so Z is, in particular, a commutative ring.The set of integer-valued 2 × 2 matrices is an example of a non-commutative ring. Acommutative ring in which every non-zero element has a multiplicative inverse is, as wehave seen, a field. (In some texts a ring is not required to have a multiplicative identity;one that does is then called a ring with unit. However, this terminology is less common.)

Since k ≤ l⇔ l − k ≥ 0, the following result is immediate:

Proposition. For all integers k and l, k ≤ l if and only if there is a natural number msuch that l = k +m.

Thus, proving a proposition p(l) for all integers l such that k ≤ l is the same asproving p(k+m) for all natural numbers m. Since p(k+m) is a statement about naturalnumbers m, it may be proven by induction on m beginning with the initial claim thatp(k + 0) holds. Equivalently, since p(k + m) = p(l) and p(k + 0) = p(k), we write theproof more simply as an induction on l, beginning with l = k for our initial claim. Thefollowing example illustrates this approach.

Proposition. For every n ≥ 5, 2n > n2.

Proof. We first check that 25 = 32 > 25 = 52.Now suppose 2n > n2. We must prove that 2n+1 > (n+1)2. Now, 2n+1 = 2 ·2n > 2n2

by inductive hypothesis. (Here we also use Axiom 6(b), together with the fact that2 > 0, which follows as a particular case from the previously proven proposition thatevery natural number is greater than 0. That 2 is, in fact, a natural number follows fromits definition and the inductive property of the natural number system: 2 = 1 + 1. Allof our familiar numerals for natural numbers are defined inductively using the rules ofthe base ten place value system: 3 = 2 + 1, 4 = 3 + 1, etc.) Since our goal is to prove2n+1 > (n + 1)2, it now suffices to prove 2n2 > n + 12. Since (n + 1)2 = n2 + 2n + 1, itsuffices to prove that n2 > 2n + 1. Given that n ≥ 5 > 1, we have n2 ≥ 5n = 2n + 3n,and 3n > 1.

7.7 Exercises

1. Prove that the sum of any two integers is an integer. (There are three cases toconsider: both summands are natural numbers, in which case the result was proven


previously; one of them is a natural number and the other is the additive inverse ofa natural number – without loss of generality, we may assume the first is a naturalnumber, since addition is commutative; and, finally, both summands are additiveinverses of natural numbers.)

2. Prove that the product of any two integers is an integer.

3. Define negative integer exponents. (A recursive definition is not needed; instead,define them in terms of positive integer exponents.)

4. Prove the fundamental properties of exponents apply to all integer exponents, bothnon-negative (as proven previously) and negative. (Once again there are three casesto consider.)

5. Prove that for all natural numbers n, 3n > n2.

6. Prove, showing all details, that for all natural numbers n, 3n ≥ 2n, and that thisinequality is strict for n ≥ 1. (It follows, of course, that 3n > n, since we provedthat 2n > n.)

7. More generally, prove that if a < b, then for all natural numbers n, an ≤ bn, andthat this inequality is strict for n ≥ 1.

8. Prove that for all natural numbers n ≥ 4, 2n < n!.

9. Prove that for all natural numbers n ≥ 10, 2n > n3.

Remark. It is true in general that, for any natural number m 6= 0 and for anyfixed exponent k ∈ N and for some sufficiently large N ∈ N, n ≥ N ⇒ mn >nk. However, the algebra involved in the method of proof we have used becomesunwieldly for larger values of k. The methods of calculus are more effective forproving these results.

10. Prove that for all integers k ≥ −2, 2k > 13−k .

11. Prove that, for any natural number n ≥ 12, there are natural numbers k and lsuch that n = 3k + 7l. (Hint: You will need complete induction. Check 13 and 14separately; for the inductive step, consider n− 3. Why must you check 13 and 14separately?)

7.8. COMPLETE INDUCTION: TWO APPROACHES 91

7.8 Complete Induction: Two Approaches

7.8.1 Modifying the proposition to enhance the inductive step

We have seen how to extend proof by induction by using an appropriately modified initialstep. It is also possible to greatly increase the power of proof by induction by refiningthe inductive step. These two enhancements may be combined, but for simplicity let usrestrict our attention to proving a proposition p(n) for all natural numbers n.

Informally, we think of induction (starting at 0 for simplicity) as follows, where “it”refers to some proposition: “It is true for 0. If it is true for 0, then it must be truefor 1. If it is true for 1, then it must be true for 2. If it is true for 2, then it must betrue for 3. . . . Therefore, it is true for all natural numbers.” Note that at each step, weactually know what was proven in the previous step, so we have more information thanwe are using. Sometimes we need that additional information. We could instead thinkas follows: “It is true for 0. If it is true for 0, then it must be true for 1. If it is true for0 and 1, then it must be true for 2. If it is true for 0, 1, and 2, then it must be true for3. . . . Therefore, it is true for all natural numbers.”

Let us now develop this idea more formally and rigorously. What we are actuallyproving when we think the second way is the statement q(n) = (∀i ∈ N : i ≤ n)p(i).The statement q(n) is stronger than p(n), meaning that q(n)⇒ p(n) for all n ∈ N. Wethus obtain the desired result that (∀n ∈ N)p(n) by proving the stronger result that(∀n ∈ N)q(n) instead. In general, it is harder to prove a stronger result, but not in thiscase! Remember that in the inductive step we are not proving q(n) directly, but ratherthe conditional statement q(n) ⇒ q(n + 1). Thus, a stronger property q(n) gives us astronger inductive hypothesis to work with. Of course, q(n + 1) is a stronger inductiveconclusion, but in this case everything additional in the conclusion follows immediatelyfrom the stronger hypothesis: q(n)⇒ q(n+ 1) translates as (∀i ∈ N : i ≤ n)p(i)⇒ (∀i ∈N : i ≤ n + 1)p(i). In other words, our new inductive hypothesis is that the propertygiven by p(i) is true for all natural numbers that are less than or equal to n, and thenew inductive conclusion is that it is true for all natural numbers less than or equal ton + 1. Now, since the inductive hypothesis already tells us directly that the propertyp(i) is true for all natural numbers less than or equal to n (and there are no naturalnumbers between n and n + 1), the only new assertion in the inductive conclusion isp(n+ 1). This is just the same conclusion we had before! As we realized in our informalreasoning above, the rest of the proposition just “carries through.” Furthermore, sincethere are no natural numbers less than 0, the initial step is also the same. Although thepart of the conclusion that remains to be proven is the same as before, the hypothesisis much stronger, since you can use the truth of the property for all the numbers up to


and including n, not just n itself!The technique of using the individual assertion p(n) as the inductive hypothesis is

sometimes called simple induction, in contrast to using the more general, and there-fore stronger, assertion that (∀i ∈ N : i ≤ n)p(i), which is sometimes called completeinduction. (You are using the complete power that induction gives you.) Often, sim-ple induction is sufficient, but sometimes complete induction is absolutely necessary. Aclassic example is the following fundamental result of number theory. To understand itsstatement, we must recall the following definitions:

Definition. The natural number m divides n if there is some natural number k suchthat km = n. We also say that m is a factor of n.

Notation. Denote the fact that m divides n by m|n.

Clearly, every natural number has 1 and itself as factors. The numbers that have noother factors are special.

Definition. A natural number p is prime if it is greater than 1 and its only factors are1 and p. A natural number that is not prime is called composite

Note that a factor of a number cannot be larger than the number. (Why?) Therefore,to check that a number is prime, it is only necessary to check that the smaller naturalnumbers other than 1 do not divide it. A quick check demonstrates, for example, that2 and 3 are prime numbers. (In fact, you can be more efficient than that; it is onlynecessary to check numbers less than or equal to the square root of the number. Why?)

Theorem. Any natural number greater than 1 can be factored as the product of primefactors.

Example. 24 = 2 · 2 · 2 · 3.

Proof. The initial step is to show that 2 factors into primes, which is trivial, since 2 itselfis prime.

For the inductive step, let n be any natural number greater than 1. Assume that anynumber less than n factors into primes.

Case 1: n is prime. Then we are done.

Case 2: n is composite. Then, by definition, n = lm for some natural num-bers l and m, neither of which is 1. By inductive hypothesis, l and m fac-tor as the product of prime factors, say l = p1p2 · · · pj and m = pj+1pj+2 · · · pk.(Note that the prime factors pi in these expressions need not be distinct.) Thenn = p1p2 · · · pjpj+1pj+2 · · · pk is the product of prime factors.

7.8. COMPLETE INDUCTION: TWO APPROACHES 93

Remark. Simple induction will not work for this theorem, since neither factor of n willbe n − 1. The simple inductive hypothesis that n − 1 factors into primes is not merelyinsufficient, but useless.

Remark. It is also true that, up to the order of the factors, the prime factorization ofa number is unique. The uniqueness of the factorization is also proved by completeinduction, but the proof is considerably more complicated.

7.8.2 Well-ordering

An approach that is both more elegant and more general in its applications is to introducethe concept of well-ordering. First we define the smallest element of a totally orderedset in the obvious way:

Definition. Let S be an ordered set. An element s0 ∈ S is the smallest element of S iffor every s ∈ S, s0 ≤ s.

It is easy to prove that a set can have at most one smallest element, justifying theanticipatory use of the word “the” in the definition. We leave this proof as an exercise.

A set need not have a smallest element. The following are examples of sets that donot have a smallest element: ∅, Z, the interval (0, 1), the interval (0, 1]. The empty setis a special case: it fails to have a smallest element simply because it has no elements atall.

Definition. An ordered set T is well-ordered if every non-empty subset S ⊆ T has asmallest element.

Remark. Since the empty set is a subset of every set, we must exclude this special casein the definition, or it would be vacuous: no well-ordered set would exist.

Remark. We need not assume in the definition that the ordering of T is total, but infact it must be. Consider any two distinct elements t and t′; since the set {t, t′} has asmallest element, one must be less than the other.

Not every totally ordered set is well-ordered, by any means. The sets Z, (0, 1), and(0, 1] are not well-ordered because they do not have a smallest element. The intervals[0, 1), [0, 1], and [0,∞) are not well-ordered, even though each of them does have asmallest element, because many of their non-empty subsets, such as (0, 1), do not. Theset {0} ∪ { 1

n: n ∈ N} is not well-ordered because every infinite subset that does not

contain 0 fails to have a smallest element.


Well-ordered sets are very special, and of course we have a reason for defining them.The following theorem leads to a powerful method of proving universal propositionsabout their elements:

Theorem (Principle of Complete Induction). Let T be a well-ordered set. Let p(t) bea proposition about the elements of T such that (∀t ∈ T : t < s)p(t) ⇒ p(s). Then(∀t ∈ T )p(t).

Proof. Consider S = {t ∈ T : ¬p(t)}. We will show S = ∅; it follows that p(t) holds forall t ∈ T . Suppose to the contrary that S is not empty. Then since T is well-ordered, Shas a smallest element, s0. Thus for all t ∈ T such that t < s0, t /∈ S, and hence p(t)holds. But then, by our assumption with respect to p(t), p(s0) holds, contradicting theassertion that s0 ∈ S.

Many well-ordered sets do exist, so the Principle of Complete Induction has wideapplication. As we know, the method of complete induction is valid for the set ofnatural numbers, so it should come as no surprise that N, with its standard ordering (asa subset of R), is well-ordered. In fact, a set T is well-ordered if and only if the methodof complete induction is valid for T :

Theorem. A set T is well-ordered if and only if for any proposition p(t) about theelements of t such that (∀t ∈ T : t < s)p(t)⇒ p(s), it follows that (∀t ∈ T )p(t).

Proof. The “only if” part is the Principle of Complete Induction, proven above. For the“if” part we will prove that any subset of T that fails to have a smallest element is empty.Let S ⊆ T fail to have a smallest element, and consider the proposition that t /∈ S. Ift /∈ S for all t < s, then surely s /∈ S, since if s were an element of S, it would be thesmallest! Thus, by hypothesis, t /∈ S for all t ∈ T ; hence S is empty.

Corollary. The set of natural numbers, with its standard ordering, is well-ordered.

Remark. No algebraic operations are used in these definitions, theorems, or proofs. Awell-ordered set need not be a subset of the set of real numbers, nor need it have anyoperations defined on it; the method of complete induction will apply to it nonetheless.Conversely, a set for which the method of complete induction is valid need not be asubset of the set of real numbers or have any operations defined on it; it will be well-ordered nonetheless. (Of course, propositions about the elements of the set can only useoperations and relations that are defined on it.)

As outlined in the exercises, one can prove directly by simple induction that N iswell-ordered, obtaining as a consequence the validity of complete induction.

7.9. EXERCISES 95

7.9 Exercises

1. Prove that a set can have at most one smallest element.

2. This exercise outlines a direct proof using simple induction that N is a well-orderedset. Complete the proofs of each of the following claims, supplying all details andjustifications.

(a) For any natural number n, the set of natural numbers less than or equal to n(informally, {0, 1, 2, . . . , n}) is well-ordered.

Hints: Use proof by induction. The initial claim is that {0} is well-ordered,since its only non-empty subset, namely {0} itself, has a smallest element,namely 0, its sole element. For the inductive claim, assume that {0, 1, 2, . . . , n−1} is well-ordered. To prove that {0, 1, 2, . . . , n} is well-ordered, let S be anon-empty subset of {0, 1, 2, . . . , n}. Consider S ∩ {0, 1, 2, . . . , n − 1}, anddivide into two cases:

Case 1: S ∩ {0, 1, 2, . . . , n − 1} = ∅, in which case S = {n} (recall thatwe proved there are no natural numbers between n− 1 and n), or

Case 2: S ∩{0, 1, 2, . . . , n− 1} 6= ∅, in which case S ∩{0, 1, 2, . . . , n− 1}is a non-empty subset of {0, 1, 2, . . . , n − 1} and hence has a smallestelement by the inductive hypothesis.

(b) N is well-ordered (the desired full result).

Hints: Let S be any non-empty subset of N. Since S 6= ∅, there is a nat-ural number n ∈ S; therefore, S ∩ {0, 1, 2, . . . , n} is a non-empty subset of{0, 1, 2, . . . , n}. Now use the result of part (a), noting that any element of Sthat is not a member of S ∩ {0, 1, 2, . . . , n} is greater than n.

3. (a) Prove that for any natural number n and any natural number m other than0, there is a number l such that lm > n. (Hint: Let l = n + 1. This is asledgehammer approach, n + 1 generally being a much larger value of l thanis necessary, but it works!)

(b) Prove the existence of an algorithm for division with remainder in the naturalnumber system: for any natural numbers m and n such that 0 < m ≤ n,there exist natural numbers q (the quotient) and r (the remainder) such thatn = qm+ r and r < m. (Note that q 6= 0, since r < m.)

Hints: Use the fact that N is well -ordered: Consider the set S = {l ∈ N :lm > n}. By the result of part (a), S 6= ∅; therefore, it has a smallest


element. Let k be the smallest element of S. Clearly 0 /∈ S; hence, k > 0.(Furthermore, since m ≤ n, 1 /∈ S; hence, k > 1, although this fact is notneeded for the proof). Thus k − 1 ∈ N. Let q = k − 1.

Remark. This theorem does not actually provide a practical algorithm: con-sidering all multiples of the divisor until one found the very largest that isless than the dividend could be quite time consuming! The familiar long di-vision algorithm simplifies the procedure by taking advantage of place-valuenumeration.

4. This exercise outlines a proof that any two natural numbers, m and n, have aunique greatest common divisor, GCD(m,n), which is not only the largest butalso a multiple of any other common divisor. Furthermore, as a corollary of theproof 3, we obtain the invaluable result that GCD(m,n) is a linear combinationof m and n with integer coefficients (that is, there are integers k and l such thatGCD(m,n) = km+ ln).

(a) Let S = {km+ ln : k, l ∈ Z and km+ ln > 0}. Prove that S 6= 0.

(b) Therefore, since N is well-ordered, S has a smallest element. Let d be thesmallest element of S. Prove that any common divisor of m and n divides d.

(c) Prove that d ≤ m and d ≤ n.

(d) Prove that d itself divides both m and n. (Hints: By Exercise 3b above,m = qd + r, where 0 ≤ r < d. The remainder r is a linear combination ofm and n, since r = m − qd; however, since r < d, and d is by definition thesmallest element of S, r /∈ S. Thus the only possibility is that r = 0. Theproof that d|n is similar.)

(e) Prove that if d′ is a natural number such that d′|m, d′|n, and any commondivisor of m and n divides d′, then d′ = d. Thus we are justified in definingGCD(m,n) to be the unique natural number with these properties. It alsofollows from the definition of d that there are integers k and l such thatd = km+ ln.

The following exercises lead you through the proof that prime factorizations areunique. Prove each of the propositions.

3A corollary of the proof, meaning an extension of the argument used in the proof rather than anapplication of the statement of the theorem, is sometimes called a scholium; however, this term is notin common use.

7.9. EXERCISES 97

5. If p is prime and p|mn, then p|m or p|n. (Hint: Suppose p 6 |m. Then since p isprime, GCD(p,m) = 1. By the previous result, there exist integers k and l suchthat 1 = kp+ lm. Use this and the fact that n = 1n to show p|n.)

6. More generally, if p is prime and p|m1m2m3 · · ·mn, then p|mi for some i. (Hint:The case n = 2 is the result of the previous exercise. Use induction on n.)

7. If m = p1p2p3 · · · pn is a factorization of m into primes and p|m, then p = pi forsome i.

8. The factorization of any natural number into primes is unique. (Hint: Use completeinduction.)

The result that every natural number greater than 1 has a unique factorization intoprime factors is called the Fundamental Theorem of Arithmetic on account of the depthand breadth of its ramifications. In particular, since rational numbers are ratios ofintegers, the Fundamental Theorem of Arithmetic has profound consequences for therational number system. For one thing, we may assume that the numerator and denomi-nator of the fraction representing a rational number have no common factors: just factorinto primes and cancel. This simple fact provides the basis for proving that many realnumbers are irrational. Prove each of the following propositions.

9.√

2 is irrational. In other words, there exist no natural numbers m and n such

that(mn

)2= 2. (As of yet, since we have not examined the continuity of the real

number system, we do not know there is any real number with this property! Butif there is one (and indeed there is), it cannot be rational.)

10.√

3 is irrational.

11. More generally, if n is not a perfect square, then√n is irrational. (We say a natural

number n is a perfect square if there is a natural number m such that n = m2.Hint: Prove the contrapositive. Using the Fundamental Theorem of Arithmetic,first prove the lemma that q2|p2 ⇔ q|p.)

12.√

2 and√

3 are incommensurable.

Chapter 8

Relations

Mathematics is about relationships. Proofs describe the logical relationship among math-ematical assumptions and propositions. Sets describe the association of mathematicalobjects that play a particular role, such as numbers or points. There are usually addi-tional relationships among the elements of a set or between elements of different sets.Some numbers are larger than other numbers; some points are between pairs of otherpoints. For any pair of points in a geometric object, a certain number, and only thatnumber, is the distance between them. For every pair of numbers, a certain number istheir product and a certain number (generally different, but not always) is their sum.For any number, a certain number is its square, and two specific numbers are its squareroots. (The latter holds even for negative numbers if we are working in the complexnumber system). Some pairs of natural numbers have the same parity, either both evenor both odd, and other pairs of numbers do not.

Relationships of this type are described formally by set-theoretic constructions calledrelations. Relations lie at the very heart - and brain - of mathematics. Without theconcept of a relation, none of the mathematics with which you are familiar would formallyexist, not even counting, since counting requires proceeding from each number to itssuccessor; the association between each number and its successor is a relation (morespecifically, a function)!

We have already considered the relation of order on the real numbers, as well asthose that associate pairs of numbers to their sums and products, and we were able towrite these relations down, describe their properties precisely, and prove propositionsthat involve them. But what exactly are they as mathematical objects?

99

100 CHAPTER 8. RELATIONS

8.1 Ordered pairs

To describe a relation, we must describe which elements of which sets are related, butmerely grouping the them into sets of two is not enough. We must also be able to describehow they are related. For example, 2 is the successor of 1, but 1 is not the successor of 2.Relations are directed. Describing a directed pair requires that we can distinguish oneelement as coming first and the other as coming second. Sets do not do this. Two setsare equal, by definition, if and only if they contain the same elements. {1, 2} = {2, 1}.Thus we need a new type of object called an ordered pair. You are undoubtedly familiarwith ordered pairs, but you are likely non familiar with how to define them using sets.The following exercises guide you to the correct definition. (One could certainly thinkof ordered pairs as primary, undefined objects like sets and simply postulate that theordered pair (a, b) is equal to the ordered pair (c, d) if and only if a = c and b = d;however, defining ordered pairs in terms of sets is a useful exercise that provides insightinto the nature of sets themselves.)

8.2 Exercises

1. Which of the following pairs of sets has the property that the two sets are equal ifand only if a = c and b = d? (There is only one! This is trickier than it looks!)

(a) {a, b}, {c, d}(b) {a, {b}}, {c, {d}}(c) {{a}, b}, {c, {d}}(d) {{a}, {a, b}}, {{c}, {c, d}}(e) {{a}, {b}}, {{c}, {d}}(f) {a, b, {b}}, {c, d, {d}}(g) {a, b, {a, b}}, {c, d, {c, d}}

2. It is obvious for each pair of sets in exercise (1) that if a = c and b = d, then thesets are equal. As noted, the converse is true for only one pair. Prove the conversefor this pair, and give a counterexample to the converse for each of the others.

3. Fill in the blank: Definition. Given elements a and b of sets A and B, respectively,the ordered pair (a, b) is the set .

8.3. THE CARTESIAN PRODUCT OPERATION 101

8.3 The Cartesian Product Operation

We define the Cartesian product of two sets as follows:

Definition. Given two sets A and B, the Cartesian product A × B is the set {(a, b) :a ∈ A ∧ b ∈ B}.

Note that the order in which the two sets appear matters. It is ambiguous to say “theCartesian product of sets A and B,” since the logical connective “and” is symmetric.(Nonetheless, people do say that, with the understanding that A comes first.)

Remark. The Cartesian product operation is named after Rene Descartes, who intro-duced the idea of labeling the points of a Euclidean plane with pairs of real coordinates.(The pairs must be ordered, since going 5 to the right and 2 up, say, is different fromgoing 5 up and 2 to the right.)

Remark. Although we are not yet equipped to formally prove this, you should be able tosee that if A is a finite set with m elements and B is a finite set with n elements, thenA×B is finite and has mn elements.

8.4 Exercises

1. Prove that A × B = {(a, b) : a ∈ A and b ∈ B} is a set in accordance with theaxioms of set theory. (Hints: By the Axiom of Restricted Comprehension, it sufficesto show that A×B is a subset of a known set. Noting that both elements of eachordered pair are subsets of A ∪B, use the Axiom of the Power Set twice.)

2. Prove (A = ∅ or B = ∅) ⇔ A × B = ∅. (Hint: Prove the contrapositive bicon-ditional. Suppose there is an element in A × B, and show how it gives rise tothe existence of an element in A and an element of B. Conversely, show how theexistence of an element of A and an element of B gives rise to an element of A×B.)

3. Prove that if A 6= ∅ and B 6= ∅ and A 6= B, then A×B 6= B × A.

4. Prove that if A ⊆ C and B ⊆ D, then A×B ⊆ C ×D.

5. Is the converse to the previous proposition true? If so, prove it. If not, give acounterexample. In addition, if the converse is false, find an additional conditionon A and B under which it is true, and prove it.


6. Prove that if A ⊆ C, B ⊆ D, and either A 6= C ∧ D 6= ∅, or B 6= D ∧ C 6= ∅,then A × B ( C × D. (The symbol ( means “is a proper subset of,” as shouldbe evident. Sometimes the symbol ⊂ is used to mean this, but usage varies amongauthors; ( leaves not doubt.)

7. Is the converse to the previous proposition true? If so, prove it. If not, give acounterexample. In addition, if the converse is false, find an additional conditionon A and B under which it is true, and prove it.

For each of the following propositions, determine if the equation is true. If it is,prove it. If it is not, determine whether or not inclusion holds in one direction. Proveany inclusion that holds, and give a counterexample to any that doesn’t.

Finally, for extra credit, for any equation that does not hold, if you can find additionalconditions under which it does hold, do so and prove your result!

(Note that “\” means set difference; it is synonymous with “−.”)

8. (A×B)∪ (C ×D) = (A∪C)× (B ∪D). (Hint: Draw a picture in the coordinateplane!)

9. (A×B) ∩ (C ×D) = (A ∩ C)× (B ∩D).

10. A× (B \ C) = (A×B) \ (A× C).

11. (A \B)× (C \D) = (A× C) \ (B ×D).

8.5 The Definition of a Relation

Formally, a relation is simply a subset of the Cartesian product of two sets.

Definition. A relation of set A to set B is a subset of A×B.

(Note the asymmetry of the language “of . . . to . . .,” to indicate that A is the firstfactor of the Cartesian product and B the second.)

This definition is enormously powerful in its simplicity and the scope of its application.The function that associates each real number to its square is just the set of pairs{(x, y) ∈ R×R : x2 = y}. To describe (or postulate) an order relation on a set A, simplydescribe (or postulate) a subset of A × A satisfying the appropriate properties; x < yif and only if (x, y) is in this subset. If S is a set of points in which distances can bemeasured, these distances are completely described by a subset of (S×S)×R satisfying

8.6. OPERATIONS ON RELATIONS: INVERSE AND COMPOSITION 103

appropriate properties, associating a unique positive real number to each pair of elementsof S. In short, to specify a relation, we just specify the set of pairs of elements that arerelated.

Many relations relate elements of the same set. Since these are so common, we havespecial terminology for a relation of a set to itself.

Definition. A relation of on set A is a relation of A to itself, that is, a subset of A×A.

(Note that although we may now use symmetric language to describe the factors ofthe Cartesian product, as they are the same, the relation itself need not be symmetric.For example, an order relation on a set is not symmetric.)

8.6 Operations on Relations: Inverse and Composi-

tion

Given a relation R of A to B, it is natural to consider the relation of B to A givenby reversing the order of the pairs. This latter relation is called the inverse of R anddenoted by R−1.

Definition. Given a relation R of A to B, the inverse relation, denoted by R−1, is therelation of B to A defined by R−1 = {(b, a) ∈ B × A : (a, b) ∈ R}.

For example, the inverse to the < relation on R is the > relation on R. As anotherexample, the inverse of the relation of points lying on lines is the relation of lines passingthrough points. (Note that inverse relations have nothing to do with taking reciprocals.)

There is also a natural method of combining two relations. If R relates set A to setB, and S relates set B to set C, the composition S ◦ R is the relation of A to C thatpairs those elements associated with a common element of B by R and S, respectively.(You might wonder why S is written on the left, when it would seem natural to writeit on the right. The reason is to make this notation compatible with the standardfunctional notation, described below. The operation of composition is typically usedwith functions.)

Definition. Given a relation R of A to B and a relation S of B to C, the com-position S ◦ R is the relation of A to C defined by S ◦ R = {(a, c) ∈ A × C :(∃b ∈ B) [(a, b) ∈ R ∧ (b, c) ∈ S]}.

Some special types of relations are so pervasive in mathematics that they merit sepa-rate, detailed discussion. These are functions, order relations, and equivalence relations,


the topics of the following sections. I suspect you are at least informally familiar withthese notions. However, keep in mind that not all important relations fit into one of thesecategories. The relationship of betweenness for points on a line is not a function, orderrelation, or equivalence relation. (Once coordinates are chosen on a line, betweennesscan be described in terms of the order of these coordinates, but is not itself an orderrelation.) Neither is the relation of incidence between points and lines. (“Incidence”is the technical term. We more commonly say that a point “lies on” a line, or that aline “passes through” a point.) The association of a number to its square roots is theinverse relation to a function, but is not itself a function (or order relation or equivalencerelation).

8.7 Exercises

1. Let R be the relation on R given by {(x, y) : x < y}. Then R−1 is the relation onR given by {(x, y) : y < x}. Describe:

(a) R−1 ◦R(b) R ◦R−1

(c) R ◦R

2. Let S be the relation on R given by {(x, y) : x2 = y}. Describe:

(a) S−1

(b) S−1 ◦ S(c) S ◦ S−1

(d) S ◦ S.

3. Let T be the relation on R given by {(x, y) : x = ±y}. Describe:

(a) T−1

(b) T−1 ◦ T(c) T ◦ T−1

(d) T ◦ T .

4. Let U be the relation on R given by {(x, y) : x = ±y ∧ |x| ≤ 1}. (We define|x| = x, if x ≥ 0, and |x| = −x, if x < 0.) Describe:

8.8. FUNCTIONS 105

(a) U−1

(b) U−1 ◦ U(c) U ◦ U−1

(d) U ◦ U .

5. With R, S, T , and U as defined above, describe:

(a) R ◦ S(b) S ◦R(c) R ◦ T(d) T ◦R(e) R ◦ U(f) U ◦R(g) S ◦ T(h) T ◦ S(i) S ◦ U(j) U ◦ S(k) T ◦ U(l) U ◦ T

6. One may graph any relation on R in the coordinate plane. Graph each of therelations in the preceding exercises.

7. Let R ⊆ A × B be a relation. Prove that (R−1)−1 = R. Thus, inverse relationscome in pairs, just like additive and multiplicative inverses of numbers.

8. Let R be a relation of A to B and S be a relation of B to C. Prove that (S◦R)−1 =R−1 ◦ S−1.

8.8 Functions

8.8.1 The Definition of a Function

A function is a relation that associates a unique second coordinate, which we thinkof as the “output,” to each first coordinate, thought of as the “input.” Sometimes the


term “independent variable” is used for the input, and the term “dependent variable” isused for the output, since it depends on the input. The latter terminology is somewhatmisleading, however, because the second coordinate of any relation depends on the first,it just need not be uniquely determined by it. For example, if I ask which real numbersare greater than, say, 1, my answer depends on the number 1. The set of numberssatisfying this condition is infinite, but not arbitrary, and the set of numbers greaterthan, say, 2 is different from the set of those greater than 1. The key defining propertyof a function is that the first coordinate determines the second coordinate uniquely.Another way to look at this is that each element of the first factor appears in only onepair in the relation.

Definition. A function from A to B is a relation of A to B satisfying the additionalproperty that each element of A occurs as the first coordinate of one and only one elementof the relation. If a ∈ A, then the unique element b ∈ B for which (a, b) is an elementof the function is called the image of a. The first factor, A, is called the domain ofthe function. The second factor, B, is called the range of the function. The subset ofthe range whose elements actually occur as second coordinates of ordered pairs in thefunction (that is, images of elements in the domain) is called the image of the function.

Notation. If f is a function from A to B, we write f : A → B. If (a, b) ∈ f , we writeb = f(a).

The uniqueness of the output corresponding to each input is described by the phrase“only one” in the definition. In addition, each element of the domain, A, is indeedan input and has an image in the range, B; this a second, more technical property offunctions. Relations that are not functions need not have this property. For example,in the order relation on the interval [0, 1] ⊂ R, nothing is greater than 1, so 1 does notoccur as the first coordinate of any pair (x, y) in the relation {(x, y) ∈ [0, 1] : x < y}. Wecan always arrange for the domain of a function to be restricted to elements that haveoutputs, but if we want to talk of an order relation on [0, 1], we cannot leave 1 out!

Warning: as with many concepts, the terminology used about functions varies some-what. Some authors use the word “range” to refer to what we have defined as the image,and use the word “codomain” to refer to what we have defined as the range. Whenreading any source, be sure to take note of how words are used in that context!

A misleading view of functions that is, unfortunately, rather commonplace in sec-ondary schools is that they are “machines” that change inputs into outputs. Functionsdon’t change anything! The squaring function, for instance, associates the output 4 tothe input 2, but it does not change 2 into 4. If you could find any means of changing2 into 4, you would be doing better than the dreams of the alchemists who tried to

8.8. FUNCTIONS 107

change lead into gold! Even the functions we speak of as “transformations,” such aslinear transformations of the plane or of higher-dimensional vector spaces, do not changeany of the positions or vectors, but only associate them to new positions or vectors.It is sometimes helpful to think of representations of the vectors as moving, but thevectors themselves do not change. Much is lost if one thinks of a function as somehow“producing” an output from an input, because it is the correspondence between inputsand outputs that characterizes a function. Seeing this correspondence (such as when onegraphs the function) requires thinking of the input and output together, as an orderedpair.

8.8.2 Some special types of functions

Identity functions

Given a set A, the identity function on A is the function iA : A→ A defined by iA(a) = a.(In set notation, iA = {(x, y) ∈ A × A : x = y}. Note that the identity function is thesame as the equality relation on A, sometimes called the diagonal of A× A.)

Characteristic functions

Let B ⊆ A. The characteristic function of B, denoted χB, is the function from A to{0, 1} defined by

χB(a) =

{1, if a ∈ B0, if a /∈ B

The χB is sometimes called the indicator function for B; it indicates whether or notan element of the domain is in B.

Binary operations

A binary operation on a set A is a function from A×A to A. This definition formalizesthe notion introduced in the previous chapter.

Unitary operations

A unitary operation on a set A is a function from A to itself. For example, the functionf : R → R defined by f(x) = x2 is a unitary operation. (Recall that in set notation,f = {(x, y) ∈ R× R : x2 = y}.)


Taking additive and multiplicative inverses are important examples of unitary oper-ations. Thus, the functions f : R→ R defined by f(x) = −x and g : R \ {0} → R \ {0}defined by g(x) = 1

xare unitary operations.

Observe that f ◦ f = iR and g ◦ g = iR\{0}. Any unitary operation whose compositionwith itself is the identity is called an inversion. Reflection in a line in the plane is ageometric example of an inversion, as is reflection through a point.

Metrics

A very important type of function that occurs in geometry is a metric, or “distancefunction.” A metric on a set A is a function d : A × A → R that satisfies the followingproperties:

• Symmetry : ∀a, b ∈ A, d(a, b) = d(b, a). (Note that this is a different notion ofsymmetry from that of a relation on A being symmetric.)

• Positive definiteness : ∀a, b ∈ A, d(a, b) ≥ 0 ∧ (d(a, b) = 0⇔ a = b).

• Triangle inequality : ∀a, b, c ∈ A, d(a, b) + d(b, c) ≥ d(a, c).

8.8.3 Surjectivity, Injectivity, and Inverse Functions

Definition. A function is surjective if its image is equal to its range. That is, everyelement of the range is the image of some point in the domain. A technical formulationis more useful in proofs; f : A → B is surjective if, ∀b ∈ B, ∃a ∈ A such that f(a) = b.(Note that the statement obtained by interchanging a and b in the quantifiers is partof the definition of a function: ∀a ∈ A, ∃b ∈ B such that f(a) = b.) If f : A → B issurjective, we say it is a function from A onto B.

Notation. If f is a surjective function from A onto B, it is common to write f : A� B.

Definition. A function is injective if each element of its range occurs as the image of atmost one element of the domain. Again, a technical formulation is more useful in proofs;f : A → B is injective if, ∀a, a′ ∈ A, f(a) = f(a′) ⇒ a = a′. (Note that the converseimplication is again part of the definition of a function.) If f : A → B is injective, wesay it is a function from A into B.

Notation. If f is an injective function from A into B, it is common to write f : A ↪→ B.

Definition. A function that is both injective and surjective is called bijective.

8.8. FUNCTIONS 109

Definition. Let g : A→ B. A function f is a left inverse (function) for g if f ◦ g = iA.

Definition. Let g : A→ B. A function h is a right inverse (function) for g if g ◦h = iB.

Theorem (Existence of Left or Right Inverse Functions). A function has a left inverseif and only if it is injective. A function has a right inverse if and only if it is surjective.It follows that a function has both a left and a right inverse if and only if it is bijective.

The proof of this theorem is contained in the exercises.It is important to avoid confusion here. A right or left inverse function of a given

function is not necessarily the same as the inverse relation of that function. If g isinjective but not surjective, a left inverse f for g contains the inverse relation of g as aproper subset, because images under f must be chosen for those elements of the range ofg that are not in the image of g. It does not matter how these images are chosen whendefining f ; therefore, there is not a unique left inverse of such a function. If g is surjectivebut not injective, a right inverse h for g is a proper subset of the inverse relation of g,because an image under h for each element of the range of g must be chosen from amongits pre-images. Again, this choice is not unique, so there is no unique right inverse ofsuch a function. (To see all this more clearly, I suggest drawing diagrams. Working theexercises will also clarify these facts - as usual!) If g is bijective, then the inverse relationof g is a function and is both the unique left inverse and the unique right inverse of g.This function is called simply the inverse (function) of g and denoted by g−1. (Pleasenote again that this has nothing to do with reciprocals!) This result is stated below; itsproof is left as an exercise.

Theorem (Existence of an Inverse Function). Let g : A→ B be a function. The inverserelation of g is a function if and only if g is bijective. In this case, the inverse relationof g is the unique left inverse function for g and also the unique right inverse functionfor g.

Definition. Let g : A→ B be bijective. The inverse (function) to g, denoted by g−1, isthe unique function such that g−1 ◦ g = iA and g ◦ g−1 = iB.

Combining and summarizing the previous two theorems, we obtain:

Theorem. A function g has both a left inverse f and a right inverse h if and only if gis bijective and f = h = g−1.

Definition. Given a function f : A → B, for any b ∈ B we define the pre-image of b,denoted by f−1(b), to be the set {a ∈ A : f(a) = b}. More generally, for D ⊆ B, thepre-image of D, denoted by f−1(D) is defined as the set {a ∈ A : f(a) ∈ D}. Note thatthe pre-image of a point or set may be empty.


When working with functions, it is important to be aware of context, as some symbolshave different meanings in different contexts. For example, f−1 may be used to denotethe inverse function, if it exists, or the pre-image of a point or set (even if the function fis not bijective and has no inverse function). If f is bijective, then f−1 generally meansthe inverse function; hence, if f(a) = b, f−1(b) = a. In this case, the element a is calledthe pre-image of b, even though according to the above definition, the pre-image of b isthe set {a}. Similarly, for any function f : A → B and any b ∈ B, f−1(b) = f−1({b}).For obvious reasons, these multiple usages of words and notations do not generally causeconfusion. That is why we use them! As is often the case, it would be unnecessarilycumbersome to introduce more complicated notation to avoid them. (Mathematicianssometimes refer to these multiple usages as “abuse of notation.” Better to abuse thenotation than the reader!)

8.8.4 Exercises

1. Let f : A→ B and g : B → C be functions. Prove that the composition g ◦ f is afunction from A to C, and that g ◦ f(a) = g(f(a)).

Remark. Some writers prefer to write functions on the right, instead of the left, sothat compositions come out in a more natural order: b = (a)f , if (a, b) ∈ f . Thenthe composition above comes out as f ◦ g, with (a)f ◦ g = ((a)f)g. However nice,this practice has not generally caught on. Tradition dies hard!

2. Let A = {1, 2} and B = {3, 4}. List all possible functions from A to B andsketch their graphs. Is it possible that a function from A to B is injective but notsurjective? Is it possible that a function from A to B is surjective but not injective?Explain!

3. Let A = {1, 2} and B = {3, 4}. Give the inverse function for each bijective functionfrom A to B and sketch its graph. What is the geometric relationship between thegraphs of a function and its inverse function?

4. Let A = {1, 2} and B = {3, 4, 5}. List all possible functions from A to B andsketch their graphs. For each injective function, list all of its left inverses andsketch their graphs. Is it possible to give an example of a surjective function fromA to B? Explain.

5. Let A = {1, 2, 3} and B = {3, 4}. List all possible functions from A to B andsketch their graphs. For each surjective function, list all of its right inverses. Is itpossible to give an example of an injective function from A to B? Explain.

8.8. FUNCTIONS 111

6. Give an example of a set A and two functions, f : A → A and g : A → A, suchthat g ◦ f 6= f ◦ g. (What does it mean for two functions to be equal?)

7. Let c be any real number. Let f : R→ R be defined by f(x) = x + c. Prove thatf is bijective.

8. Let c be any real number other than zero. Let g : R→ R be defined by g(x) = cx.Prove that g is bijective.

9. Let h : R → R be defined by h(x) = x2. Prove that h is neither injective norsurjective.

10. Let j : R → R be defined by j(x) = (x − 1)(x − 2)(x − 3). Sketch the graph of j(that is, the points (x, y) ∈ R × R such that (x, y) ∈ j, in other words, such thaty = j(x)). Use the graph to present an informal argument that j is surjective butnot injective.

11. With f , g, h, and j defined as above, give formulas for the following functions (youneedn’t simplify the complicated ones):

(a) f ◦ g(b) g ◦ f(c) f ◦ h(d) h ◦ f(e) f ◦ j(f) j ◦ f(g) g ◦ h(h) h ◦ g(i) g ◦ j(j) j ◦ g(k) h ◦ j(l) j ◦ h

12. Prove: The composition of two injective functions is injective. The compositionof two surjective functions is surjective. It follows that the composition of twobijective functions is bijective.


13. Let f : A → B and g : B → C be functions. Prove: The composition g ◦ fis injective only if f is injective. The composition g ◦ f is surjective only if g issurjective. It follows that the composition g ◦ f is bijective only if f is injectiveand g is surjective.

14. Show that the conditions given in the previous problem are not (nearly!) sufficientby giving an example of a pair of functions f and g such that f is injective and gis surjective, but g ◦ f is neither injective nor surjective.

15. Prove the theorem on the existence of left or right inverses.

Remark. The proof that a function has a right inverse if it is surjective requiresthat you make a choice from the pre-image of each element of the range, in orderto define a right inverse. That you can make these choices all at once, even ifthe range is infinite, requires an additional set-theoretic assumption, called theAxiom of Choice. In essence, you are assuming the existence of a function, whichyou cannot define, that associates to each element of the range an element of itspre-image set. It is obviously necessary that each pre-image set be non-empty, butnothing in the nature of logic or in our previous axioms of set theory assures usthat this condition is sufficient.

The Axiom of Choice is slightly controversial. Aside from invoking it in this exer-cise, we will not need or discuss the Axiom of Choice further in this book; however,you should develop an awareness of when you require such a choice function.

16. Prove the theorem on the existence of an inverse function.

17. Prove that for any relation R of A to B , iB ◦R = R = R◦ iA. In particular, for anyfunction f : A→ B, iB ◦ f = f = f ◦ iA. Thus, the identity functions constitute asort of identity for the composition operation.

8.8.5 The Induced Function on Power Sets

Any function f : A → B induces a function from the power set of A to the power setof B, which we generally denote by the same letter, since it is clear from context whichfunction we mean. (Abuse of notation again!) This induced function is defined as follows:for each C ⊆ A, the image of C is defined as f(C) = {b ∈ B : (∃c ∈ C)f(c) = b}. Inother words, f(C) = {f(c) : c ∈ C}.

It is important to recognize the relationships between images and pre-images of sets,which the following exercises are intended to illuminate.

8.9. PROPERTIES OF RELATIONS: A COMPENDIUM 113

8.8.6 Exercises

In the following exercises, f : A→ B is a function, C ⊆ A, D ⊆ B, E ⊆ A, and F ⊆ B.

1. Prove that f(f−1(D)) ⊆ D. Give a counterexample to show that equality doesnot necessarily hold. Give a necessary and sufficient condition on f for equality tohold for all subsets D ⊆ B, and prove your condition is necessary and sufficient.

2. Prove that f−1(f(C)) ⊇ C. Give a counterexample to show that equality does notnecessarily hold. Give a necessary and sufficient condition on f for equality to holdfor all subsets C ⊆ A, and prove your condition is necessary and sufficient.

3. Prove that f(C ∪ E) = f(C) ∪ f(E).

4. Prove that f(C ∩ E) ⊆ f(C) ∩ f(E). Give a counterexample that shows equalityneed not hold. Give a necessary and sufficient condition on f for equality to holdfor all subsets C and E of A, and prove your condition is necessary and sufficient.

5. Prove that f−1(D ∪ F ) = f−1(D) ∪ f−1(F ).

6. Prove that f−1(D ∩ F ) = f−1(D) ∩ f−1(F ).

Remark. Observe that pre-images behave nicely under intersection as well as union, asshown in exercises 5 and 6, because each element of the domain has only one image.Images do not behave so nicely under intersection, because an element of the range mayhave multiple pre-images. (See exercise 4); However, this does not present a problemfor unions, which behave nicely under images, too. (See exercise 3). Make a point ofremembering these observations!

8.9 Properties of Relations: a Compendium

In working the examples, you may well have noticed relations that have special properties.Functions are an obvious example; order relations (which we will soon revisit in greaterdepth) are another. Remember that mathematics is about recognizing patterns, andrecognizing patterns is about finding recurrent properties and naming them so we cantalk about them, work with them, and deduce their consequences. Here we collect somedistinctive general properties that a relation may - or may not - have.

• Functionality : A relation R of A to B is functional if, ∀a ∈ A, ∃!b ∈ B such that(a, b) ∈ R.


As we have seen, a relation with this property is called a function. (Relations thatare not functional still work very well in many situations; the mathematical usage of theterm does not imply its colloquial meaning!)

The following properties of relations are generally applied to relations on a single set:

• Comparability : A relation R on A satisfies this property if any two distinct ele-ments of A are comparable, that is, related in some way by R. Thus, R satisfiescomparability if, ∀a1 ∈ A and ∀a2 ∈ A, a1 6= a2 ⇒ (a1, a2) ∈ R ∨ (a2, a1) ∈ R.

• Reflexivity : A relation R on A is reflexive if, ∀a ∈ A, (a, a) ∈ R.

• Anti-reflexivity : A relation R on A is anti-reflexive if, ∀a ∈ A, (a, a) /∈ R.

• Symmetry : A relation R on A is symmetric if, ∀a1 ∈ A and ∀a2 ∈ A, (a1, a2) ∈R ⇒ (a2, a1) ∈ R. (Note that since a1 and a2 are arbitrary, biconditionality isautomatic.)

• Anti-symmetry : A relation R on A is anti-symmetric if, ∀a1 ∈ A and ∀a2 ∈ A,(a1, a2) ∈ R ⇒ (a2, a1) /∈ R. (Here biconditionality need not hold. If R satisfiescomparability, the converse implication is of course true, by definition.)

• Transitivity : A relation R on A is transitive if, ∀a1 ∈ A, ∀a2 ∈ A, and ∀a3 ∈ A,[(a1, a2) ∈ R ∧ (a2, a3) ∈ R]⇒ (a1, a3) ∈ R.

As always, the warning applies that terminology varies somewhat among authors.Some authors use irreflexive instead of anti-reflexive, for example. Worse yet, someauthors use nonreflexive to mean simply not reflexive, whereas others use it the waywe have used anti-reflexive, which describes a much stronger condition. Pay carefulattention to the definitions that pertain in any particular text!

8.9.1 Exercises

1. (Reformulation of transitivity.) Show that a relation R on A is transitive if andonly if R ◦R ⊆ R.

2. Show that if R ⊆ A × B is any relation of A to B, then R−1 ◦ R is a symmetricrelation on A and R ◦R−1 is a symmetric relation on B.

8.10. ORDER RELATIONS 115

8.10 Order Relations

Let R be a relation on a set A. As we know from studying the order of the real numbers,R is an order relation if it is anti-reflextive and transitive. As we have seen for theordering of the real numbers and will prove generally in the exercises, anti-symmetry isan additional property of order relations; it is a consequence of transitivity together withanti-reflexivity. Usually the symbol x < y or something similar such as x ≺ y, if there adesire to emphatically distinguish R from the stand order of numbers, is used to denotethe statement (x, y) ∈ R. (We may also use x > y or x � y if we want to think of theorder in reverse. See Exercise ?? of Section 8.10.4.)

8.10.1 Weak Orders Versus Strong Orders

Some authors define an order relation slightly differently. They take order relations to bereflexive, and denote them using symbols such as ≤ or �. This necessitates modifying theanti-symmetry property, since for each element a of A, (a, a) ∈ R obviously violates anti-symmetry. We require that this is the only situation in which anti-symmetry is violated:∀a1 ∈ A and ∀a2 ∈ A, [(a1, a2) ∈ R ∧ (a2, a1) ∈ R] ⇒ a1 = a2. (Equivalently, using thecontrapositive: a1 6= a2 ⇒ [(a1, a2) ∈ R⇒ (a2, a1) /∈ R]. Let us call this property weakanti-symmetry. An order relation in either conception must be transitive.

To avoid ambiguity, an order relation as we originally defined it (anti-reflexive, tran-sitive, and consequently anti-symmetric) is called a strong order, and an order relationdefined in the second manner (reflexive, weakly anti-symmetric, transitive) is called aweak order. If we use the word “order” unmodified, we mean a strong one. (Some authorsuse the phrase “strict order” for a strong order and, as noted above, take the unmodifiedword “order” to mean a weak one. The meaning is usually clear from notation andcontext in any case.)

This is all a matter of convention, not substance. If < is a strong order on a set A,then defining a1 ≤ a2 by a1 < a2∨a1 = a2 gives a weak order. Conversely, if ≤ is a weakorder on A, then defining a1 < a2 by a1 ≤ a2∧a1 6= a2 gives a strong order. The proof ofthese fairly obvious facts is left to the exercises. Depending on the situation, it may bemore natural to define a weak order first and the corresponding strong order in terms ofit, or it may be more natural to define a strong order first and the corresponding weakorder in terms of it.


8.10.2 Partial Orders Versus Total Orders

An order relation is total if it satisfies comparability. Otherwise, it is called partial.Visually, one can think of a total order on A as arranging the elements of A in a line,whereas a partial order arranges them in a sort of directed web.

Example. Let A{1, 2, 3}. The partial ordering of P(A) by inclusion is illustrated in thefollowing figure:

UIUI

UI UIUIUI UI

UI

UIUI

UI

UI

O

{1}

{1, 2}

{1, 3}

{2, 3}

{1, 2, 3}{2}

{3}

8.10.3 The Dictionary Order on a Cartesian Product

Let <A be a (strong) order on a set A, and let <B be a strong order on a set B. Thenwe define a (strong) order on A×B by:

(a, b) ≺ (a′, b′)⇔ [a <A a′] ∨ [a = a′ ∧ b <B b

′].

The order defined in this manner is called the dictionary order on A×B (correspond-ing to the given orders <A and <B), for reasons that should be obvious.

Example. Consider the standard order on R. In the corresponding dictionary orderon R × R, (0, 2) ≺ (1, 0) ≺ (1, 1). Note: do not confuse the interval (x, y), defined as{z ∈ R : x < z < y}, with the ordered pair (x, y). Although the same notation is used,the meaning should be clear from context.

8.10.4 Exercises

1. Prove that if a relation on a set A is anti-reflexive and transitive, then it is anti-symmetric. (Hint: The proof that worked in for the standard order on the realnumbers works in general. See if you can remember it without looking!)


2. Let ≤ be a weak order relation on a set A. (That is, assume ≤ is reflexive, weaklyanti-symmetric, and transitive.) Define a1 < a2 to mean that a1 ≤ a2 and a1 6= a2.Prove that < is anti-reflexive and transitive, hence a strong order relation. (Anti-reflexivity is immediate from the definition; transitivity takes a bit more care.)

3. Let < be a strong order relation on a set A. (That is, assume < is anti-reflexiveand transitive, hence also anti-symmetric.) Define a1 ≤ a2 to mean that a1 < a2

or a1 = a2. Prove that ≤ is reflexive, weakly anti-symmetric, and transitive, hencea weak order relation. (Reflexivity is immediate from the definition.)

4. Demonstrate a relation on the two-element set {a, b} (a 6= b) that is reflexive andtransitive but not weakly anti-symmetric. This shows that the condition of weakanti-symmetry must be assumed for a weak order relation. (Hint: This is easy;you don’t have a whole lot of choice, here!)

5. If < is a total order on a set A, it follows immediately from comparability that, forany elements a1, a2 ∈ A, one of the following holds:

• a1 = a2,

• a1 < a2, or

• a2 < a1.

(Make sure you understand why!) Complete the proof of trichotomy in this generalcontext by proving that only one of these properties holds. (Hint: The proof thatworked for the standard order on the real numbers works in general. See if you canremember it without looking!)

6. Let < be an order on a set A. Define a1 > a2 in the usual manner to mean a2 < a1.(Thus the relation > is just the inverse of the relation <.) Show that > is also anorder relation on A. (That is, show that > is anti-reflexive and transitive.)

7. Verify that the dictionary order is indeed anti-reflexive and transitive.

8. Define a relation on R × R as follows: (x1, y1) ≺ (x2, y2) ⇔ y21 − x1 < y2

2 − x2 ∨(y2

1 − x1 = y22 − x2 ∧ y1 < y2). Show that ≺ is an order relation.

9. Draw a diagram depicting the partial order by inclusion on P({1, 2, 3, 4}).


8.10.5 Maxima and Minima; Bounds; Suprema and Infima

Given an order relation on a set, we often want to compare some element to all of theelements in a specified subset. A given subset may or may not contain an element that isgreater than all the others. If it does, this element is unique. The reasoning is so simplethat the result takes longer to state than to prove!

Proposition. Let < be a (strong) order on a set A (with corresponding weak order ≤).Let B ⊆ A. Suppose b0 ∈ B has the property that (∀b ∈ B)b ≤ b0. Suppose b′0 ∈ B alsohas this property. Then b0 = b′0.

Proof. By assumption, b0 ≤ b′0 and b′0 ≤ b0. Thus b0 = b′0 (by anti-symmetry).

We are therefore justified in using the word the in the following definition:

Definition. Let < be a (strong) order on a set A (with corresponding weak order ≤).Let B ⊆ A. The element b0 ∈ B is the greatest element of B, also called the maximumof B, if (∀b ∈ B)b ≤ b0.

Similarly (and as we have already seen in the section on well-ordering):

Definition. Let < be a (strong) order on a set A (with corresponding weak order ≤).Let B ⊆ A. The element b0 ∈ B is the smallest element of B, also called the leastelement or minimum of B, if (∀b ∈ B)b0 ≤ b.

Example. Consider the standard order relation on the real numbers. The interval (0, 1)has no greatest or least element. (Recall that the interval (0, 1) is the set {x ∈ R : 0 <x < 1}.)

Example. Again considering the standard order on R, 1 is the greatest element of theinterval (0, 1]. (Recall that (0, 1] = {x ∈ R : 0 < x ≤ 1}.) The interval (0, 1] has no leastelement.

If an element is at least as large as all of the elements in a given subset, it is calledan upper bound for that subset; an upper bound need not be an element of the givensubset. Similarly, an element that is at least as small as every element in a given subsetis called a lower bound for that subset. Formally:

Definition. Let < be a (strong) order on a set A, (with corresponding weak orderdenoted by ≤). Let B ⊆ A. An element a ∈ A is an upper bound for B if, ∀b ∈ B, b ≤ a.

Definition. Let < be a (strong) order on a set A, (with corresponding weak orderdenoted by ≤). Let B ⊆ A. An element a ∈ A is an lower bound for B if, ∀b ∈ B, a ≤ b.


Upper and lower bounds are generally not unique; anything larger than an upperbound is still an upper bound, and anything smaller than a lower bound is still a lowerbound. Note that, for the empty set, any element is both an upper bound and a lowerbound, since the defining conditions are vacuously true. Also note that an upper or lowerbound for a given set need not exist.

A set that has an upper bound is said to be bounded above; as set that has a lowerbound is said to be bounded below. A set that is bounded both above and below is saidto be bounded.

Example. Consider the standard order relation on the real numbers. The number 1 isan upper bound for the interval (0, 1). So is the number 2, or any number larger than 1.Any number less than or equal to 0 is a lower bound for (0, 1).

Example. Again considering the standard order on R, 1 is an upper bound for theinterval (0, 1], as is any number greater than 1. Any number less than or equal to 0 is alower bound for (0, 1].

Example. Again considering the standard order on R, 0 is a lower bound for the interval(0,∞) = {x ∈ R : 0 < x}, as is any number less than 0. The interval (0,∞) has noupper bound.

Example. In the dictionary order on R×R, (1, 0) is an upper bound for the set {0}×R ={(x, y) ∈ R × R : x = 0}. So are (1

2, 0), (1

4, 0), (1

8, 0), (1

8, 1), and any ordered pair (x, y)

such that x > 0. Similarly, any ordered pair (x, y) such that x < 0 is a lower bound for{0} × R.

Example. In the dictionary order on R×R, the set R×{0} = {(x, y) ∈ R×R : y = 0}has neither an upper nor a lower bound.

If a set is bounded above, the set of upper bounds may or may not have a leastelement. If it does, that least element is unique, as we proved above, so we are justifiedin using the word the in the following definition:

Definition. Let < be an order on a set A, and let B ⊆ A. The element a is the leastupper bound, also called the supremum, for B if:

• a is an upper bound for B, and

• if c is any upper bound for B, the a ≤ c.

In other words, a is the least upper bound (supremum) for B if it is the least elementof the set of upper bounds for B.


Similarly:

Definition. Let < be an order on a set A, and let B ⊆ A. An element a is the greatestlower bound, also called the infimum, for B if:

• a is a lower bound for B, and

• if c is any lower bound for B, the a ≥ c.

In other words, a is the greatest lower bound (infimum) for B if it is the greatestelement of the set of lower bounds for B.

Again, note that a given set need have a supremum or infimum, even if it is boundedabove or below. Also note that the supremum or infimum, if it exists, need not be anelement of the set.

Notation. The supremum for a set B ⊆ A is denoted by supAB. The infimum for aset B is denoted by infAB. If the ordered set A is clear from context, it is generallyomitted. If there is more than one ordering of A being considered, the order to whichthe supremum or infemum refers may also be included in the subscript, as in infA,<B.

“Supremum” and “infimum” are Latin words; therefore, the plural of “supremum” is“suprema,” and the plural of “infimum” is “infima.”

Examples. supR(0, 1) = 1; supR(0, 1] = 1; infR(0, 1) = 0; infR(0, 1] = 0; infR(0,∞) = 0.(0,∞) has no supremum, because it is not even bounded above. Similarly, R× {0} hasneither a supremum nor an infimum in the dictionary order on R × R. Even though{0} × R is bounded (above and below) in the dictionary order on R× R, it has neithera supremum nor an infimum. There is no smallest element (x, y) with x > 0; for anyx > 0, there is always a number x′ such that x > x′ > 0. Similarly, there is no largestelement (x, y) with x < 0.

Finally, an element of a set is called maximal if no element of the given set is greater.For a total order, the concepts of maximal and maximum are the same, but for a partialorder they are not. A maximal element need not be greater than every other element ofthe given set; it may simply be incomparable to some of them. There may be more thanone maximal element. However, if a partially ordered set has a maximum, the maximumis the unique maximal element. Similarly, an element of a set is called minimal if noelement of the given set is smaller.

Example. Consider the partial order on P ({1, 2, 3}) given by inclusion. The set {1, 2, 3}is both maximal and maximum. The empty set is both minimal and minimum.

Example. If we consider the family P ({1, 2, 3})\{{1, 2, 3}} of proper subsets of {1, 2, 3},then each of the sets {1, 2}, {1, 3}, and {2, 3} is maximal. There is no maximum.


8.10.6 Exercises

1. Verify that the least element of a set is unique.

2. Prove that if a set has a maximum, the maximum is also the supremum for thatset.

3. Prove that if a set has a minimum, the minimum is also the infimum for that set.

4. Prove that a set has a maximum if and only if an element of the set is an upperbound for the set.

5. Prove that a set has a minimum if and only if an element of the set is a lowerbound for the set.

6. Prove that if a set has a maximum, the maximum is the unique maximal elementof that set.

7. Prove that if a set has a minimum, the minimum is the unique minimal element ofthat set.

In the following exercises, R is assumed to have the standard order, and R × R isassumed to have the corresponding dictionary order.

8. For each of the following sets, state if it has a maximum and, if so, state what themaximum is.

(a) The interval (−∞, 2).

(b) The interval (−∞, 2].

(c) {4, 5, 6, 7}.(d) {1

2, 2

3, 3

4, . . .}.

(e) {12, 1

3, 1

4, . . .}.

(f) N.

(g) N× {2}.(h) {2} × N.

(i) (0, 1)× N.

(j) [0, 1]× N.


9. For each of the following sets, state if it is bounded above. If it is, state if it has aleast upper bound, and if so, state what the least upper bound is.



(c) {4, 5, 6, 7}.(d) {1

2, 2

3, 3

4, . . .}.

(e) {12, 1

3, 1

4, . . .}.

(f) N.

(g) N× {2}.(h) {2} × N.

(i) (0, 1)× N.

(j) [0, 1]× N.

10. For each of the following sets, state if it has a minimum and, if so, state what theminimum is.



(c) {4, 5, 6, 7}.(d) {1

2, 2

3, 3

4, . . .}.

(e) {12, 1

3, 1

4, . . .}.

(f) N.

(g) N× {2}.(h) {2} × N.

(i) (0, 1)× N.

(j) [0, 1]× N.

11. For each of the following sets, state if it is bounded below. If it is, state if it has agreatest lower bound, and if so, state what the greatest lower bound is.




(c) {4, 5, 6, 7}.(d) {1

2, 2

3, 3

4, . . .}.

(e) {12, 1

3, 1

4, . . .}.

(f) N.

(g) N× {2}.(h) {2} × N.

(i) (0, 1)× N.

(j) [0, 1]× N.

For the following exercises, consider R × R with the order given in Exercise 8 ofSection 8.10.4.

12. For each of the following sets, state if it has a maximum and, if so, state what themaximum is.

(a) R× {2}.(b) {2} × R.

13. For each of the following sets, state if it is bounded above. If it is, state if it has aleast upper bound, and if so, state what the least upper bound is.

(a) R× {2}.(b) {2} × R.

14. For each of the following sets, state if it has a minimum and, if so, state what theminimum is.

(a) R× {2}.(b) {2} × R.

15. For each of the following sets, state if it is bounded below. If it is, state if it has agreatest lower bound, and if so, state what the greatest lower bound is.

(a) R× {2}.


(b) {2} × R.

16. Let A = {1, 2, 3, 4}. What are the maximal and minimal elements of P(A), orderedby inclusion?

17. Let A = {1, 2, 3, 4}, and let B = P(A) \ {A, ∅}, ordered by inclusion. What arethe maximal and minimal elements of B?

8.11 Equivalence Relations

As the name suggests, equivalence relations are used to group elements that have somecommon quality, ignoring the differences among them in other respects. As an example,suppose we have a metric that measures distances in the plane. Then we can define twosegments to be congruent if they have the same length. (That is, if the distances betweentheir respective endpoints are the same.) In doing so, we ignore the various positionsof the segments and focus only on length as the quality of interest. Congruence is anequivalence relation.

In general, an equivalence relation is defined to be any relation that is reflexive,symmetric, and transitive. We can readily check these properties for congruence ofsegments in the plane: a segment obviously has the same length as itself; if segmentAB has the same length as CD, then obviously CD has the same length as AB; if ABhas the same length as CD, and CD has the same length as EF , then clearly AB hasthe same length as EF . “Sameness” is, by its very nature, reflexive, symmetric, andtransitive! Conversely, any relation with these three properties, even if it is defined in amanner that does not make them quite so obvious, defines a way in which related objectsare the same. If R is an equivalence relation on a set A, we generally use notation suchas a1 ∼ a2, a1 ' a2, a1 ≈ a2, a1

∼= a2, or a1 ≡ a2 to denote that (a1, a2) ∈ R.Congruence is a special case of a very general method of defining an equivalence

relation. Let f : A → B be any function. Define a1 'f a2 to mean f(a1) = f(a2).Then 'f is an equivalence relation. (For the congruence relation on segments, segmentscorrespond to pairs of points. If d denotes the metric, define f(AB) = d(A,B). Then'f is the congruence relation on segments.)

8.11.1 Equivalence Classes and Partitions

Let ' be an equivalence relation on a set A. Let a be an element of A. We define theequivalence class of a, denoted by [a], to be the set of all elements of A that are equivalentto a: [a] = {a′ ∈ A : a′ ' a}. (If there might be ambiguity about which particular

8.11. EQUIVALENCE RELATIONS 125

equivalence relation on A is being considered, it may be specified as a subscript, asin [a]'. Understanding and describing the equivalence classes provides a good way to“picture” an equivalence relation.

Example. Let f : R → R be defined by f(x) = |x|. Then (∀x ∈ R)[x]'f= {±x}.

Thus, the equivalence class of 0 contains a single element (0 itself), and every otherequivalence class contains two elements. Note that the equivalence classes of a numberand its additive inverse are the same: [x] = [−x]. If we graph 'f in the Cartesian planein the usual manner, we can see that each vertical line intersects the graph at two points,except for the y-axis.

It should be intuitively clear that the equivalence classes divide A into disjoint subsets.Each subset is a “largest possible” set of elements that are equivalent, in the sense thatnothing else could be added, because nothing else is equivalent to those elements alreadyin the equivalence class. We now make this notion precise and prove it.

Definition. Let A be a set. A partition of A is a family B ⊂ P(A) of non-empty subsetsof A such that:

• Every element of A is in some subset in the family: (∀a ∈ A)(∃B ∈ B)a ∈ B.

Equivalently (by definition),⋃B∈B

B = A.

• The distinct subsets of the family are pairwise disjoint: (∀B1, B2 ∈ B)(B1 6= B2 ⇒B1 ∩B2 = ∅).

In other words, together the criteria for a partition specify that every element of A isin exactly one subset of the partition, as the term suggests. (The first criterion specifiesthat every element is in at least one, and the second specifies that every element is in atmost one.

Theorem. Let ' be an equivalence relation on a set A. Let B = {[a] : a ∈ A}. Thefamily B is a partition of A.

Proof.

Claim. The sets in B are non-empty, and every element of A is in one of them.Let a ∈ A. By reflexivity, a ∈ [a]. (That is, every element is in its own equivalence

class. Since each set in B is, by definition, an equivalence class of some element of A, itis not empty.)

Claim. The sets in B are pairwise disjoint: (∀B1, B2 ∈ B)(B1 6= B2 ⇒ B1 ∩B2 = ∅).


Equivalently, B1 ∩ B2 6= ∅ ⇒ B1 = B2. Suppose [a1] ∩ [a2] 6= ∅. Then there is someelement a′ ∈ [a1] ∩ [a2]; by definition of [a1], [a2], and intersection, a′ ' a1 and a′ ' a2.By transitivity and symmetry, we have that a ∈ [a1]⇔ a ' a1 ' a′ ' a2 ⇔ a ∈ [a2]. So[a1] = [a2].

8.11.2 Exercises

1. Let f : A→ B. Verify that 'f is an equivalence relation on A.

2. Consider f : R×R→ R given by f(x, y) = x+ y. What do the equivalence classesof 'f look like?

3. Consider g : R × R → R given by g(x, y) = x2 + y2. What do the equivalenceclasses of 'g look like?

Patterns of Structure and Argument: a guide to...

Documents

Transcript of Patterns of Structure and Argument: a guide to...