Contents...Chapter 1 Introduction: Sets, Functions and Relations 1.1 Introduction In this chapter,...
Transcript of Contents...Chapter 1 Introduction: Sets, Functions and Relations 1.1 Introduction In this chapter,...
Contents
1 Introduction: Sets, Functions and Relations 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . 61.4 Finite and Infinite Sets . . . . . . . . . . . . . . . . . . . . . 91.5 Cardinal Numbers of Sets . . . . . . . . . . . . . . . . . . . . 121.6 Power set of a set . . . . . . . . . . . . . . . . . . . . . . . . . 161.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.8 Partially Ordered Sets . . . . . . . . . . . . . . . . . . . . . . 201.9 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.10 Boolean Algebras . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 351.10.2 Examples of Boolean algebras . . . . . . . . . . . . . . 36
1.11 Atoms in a Lattice . . . . . . . . . . . . . . . . . . . . . . . . 401.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 Combinatorics 472.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.2 Elementary Counting Ideas . . . . . . . . . . . . . . . . . . . . 48
2.2.1 Sum Rule . . . . . . . . . . . . . . . . . . . . . . . . . 492.2.2 Product Rule . . . . . . . . . . . . . . . . . . . . . . . 49
2.3 Combinations and Permutations . . . . . . . . . . . . . . . . . 512.4 Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . 522.5 Examples in simple combinatorial reasoning . . . . . . . . . . 542.6 The Pigeon-Hole Principle . . . . . . . . . . . . . . . . . . . . 592.7 More Enumerations . . . . . . . . . . . . . . . . . . . . . . . . 62
2.7.1 Enumerating permutations with constrained repetitions 642.8 Ordered and Unordered Partitions . . . . . . . . . . . . . . . . 65
2.8.1 Enumerating the ordered partitions of a set . . . . . . 652.9 Combinatorial Identities . . . . . . . . . . . . . . . . . . . . . 682.10 The Binomial and the Multinomial Theorems . . . . . . . . . 71
i
CONTENTS ii
2.11 Principle of Inclusion-Exclusion . . . . . . . . . . . . . . . . . 752.12 Euler’s φ-function . . . . . . . . . . . . . . . . . . . . . . . . . 782.13 Inclusion-Exclusion Principle and the Sieve of Eratosthenes . . 792.14 Derangements . . . . . . . . . . . . . . . . . . . . . . . . . . . 802.15 Partition Problems . . . . . . . . . . . . . . . . . . . . . . . . 83
2.15.1 Recurrence relations p(n,m) . . . . . . . . . . . . . . . 832.16 Ferrer Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.16.1 Proposition . . . . . . . . . . . . . . . . . . . . . . . . 852.16.2 Proposition . . . . . . . . . . . . . . . . . . . . . . . . 852.16.3 Proposition . . . . . . . . . . . . . . . . . . . . . . . . 86
2.17 Solution of Recurrence Relations . . . . . . . . . . . . . . . . 862.18 Homogeneous Recurrences . . . . . . . . . . . . . . . . . . . . 902.19 Inhomogeneous Equations . . . . . . . . . . . . . . . . . . . . 932.20 Repertoire Method . . . . . . . . . . . . . . . . . . . . . . . . 942.21 Perturbation Method . . . . . . . . . . . . . . . . . . . . . . . 962.22 Solving Recurrences using Generating Functions . . . . . . . . 97
2.22.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . 982.23 Some simple manipulations . . . . . . . . . . . . . . . . . . . . 100
2.23.1 Solution of recurrence relations . . . . . . . . . . . . . 1022.23.2 Some common tricks . . . . . . . . . . . . . . . . . . . 104
2.24 Illustrative Problems . . . . . . . . . . . . . . . . . . . . . . . 105
3 Basics of Number Theory 1153.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1153.2 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1153.3 gcd and lcm of two integers . . . . . . . . . . . . . . . . . . . 1173.4 Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1223.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1263.6 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1283.7 Complete System of Residues . . . . . . . . . . . . . . . . . . 1313.8 Linear Congruences and Chinese Remainder Theorem . . . . . 1363.9 Lattice Points Visible from the Origin . . . . . . . . . . . . . . 1413.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1453.11 Some Arithmetical Functions . . . . . . . . . . . . . . . . . . 1463.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1543.13 The big O notation . . . . . . . . . . . . . . . . . . . . . . . . 155
4 Mathematical Logic 1614.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 1634.2 Fully Parenthesized Propositions and their Truth Values . . . 1674.3 Validity, Satisfiability and Related Concepts . . . . . . . . . . 172
CONTENTS iii
4.4 Normal forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 1774.4.1 Conjunctive and Disjunctive Normal Forms . . . . . . 178
4.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . 1804.6 The Resolution Principle in Propositional Calculus . . . . . . 1824.7 Predicate Calculus – Basic Ideas . . . . . . . . . . . . . . . . . 1884.8 Formulas in Predicate Calculus . . . . . . . . . . . . . . . . . 191
4.8.1 Free and bound Variables . . . . . . . . . . . . . . . . 1934.9 Interpretation of Formulas of Predicate Calculus . . . . . . . . 194
4.9.1 Structures . . . . . . . . . . . . . . . . . . . . . . . . . 1954.9.2 Truth Values of formulas in Predicate Calculus . . . . . 196
4.10 Equivalence of Formulas in Predicate Calculus . . . . . . . . . 2004.11 Prenex Normal Form . . . . . . . . . . . . . . . . . . . . . . . 2034.12 The Expansion Theorem . . . . . . . . . . . . . . . . . . . . . 204
5 Algebraic Structures 2095.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2095.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2095.3 Addition, Scalar Multiplication and Multiplication of Matrices 210
5.3.1 Transpose of a Matrix . . . . . . . . . . . . . . . . . . 2115.3.2 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . 2115.3.3 Symmetric and Skew-symmetric matrices . . . . . . . . 2125.3.4 Hermitian and Skew-Hermitian matrices . . . . . . . . 2135.3.5 Orthogonal and Unitary matrices . . . . . . . . . . . . 213
5.4 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2155.4.1 Group Tables . . . . . . . . . . . . . . . . . . . . . . . 221
5.5 A Group of Congruent Transformations (Also called Symme-tries) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.6 Another Group of Congruent Transformations . . . . . . . . . 2245.7 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2245.8 Cyclic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 2265.9 Lagrange’s Theorem for Finite Groups . . . . . . . . . . . . . 2335.10 Homomorphisms and Isomorphisms of Groups . . . . . . . . . 2385.11 Properties of Homomorphisms of Groups . . . . . . . . . . . . 2415.12 Automorphism of Groups . . . . . . . . . . . . . . . . . . . . 2445.13 Normal Subgroups . . . . . . . . . . . . . . . . . . . . . . . . 2455.14 Quotient Groups (or Factor Groups) . . . . . . . . . . . . . . 2485.15 Basic Isomorphism Theorem for Groups . . . . . . . . . . . . 2505.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2535.17 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
5.17.1 Rings, Definitions and Examples . . . . . . . . . . . . . 2585.17.2 Units of a ring . . . . . . . . . . . . . . . . . . . . . . . 260
CONTENTS iv
5.18 Integral Domains . . . . . . . . . . . . . . . . . . . . . . . . . 2635.19 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2635.20 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2645.21 Characteristic of a Field . . . . . . . . . . . . . . . . . . . . . 2665.22 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
5.22.1 Examples of Vector Spaces . . . . . . . . . . . . . . . . 2705.23 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2715.24 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 2725.25 Linear Independence and Base . . . . . . . . . . . . . . . . . . 2755.26 Bases of a Vector Space . . . . . . . . . . . . . . . . . . . . . 2795.27 Dimension of a Vector Space . . . . . . . . . . . . . . . . . . . 2805.28 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2855.29 Solutions of Linear Equations and Rank of a Matrix . . . . . . 2875.30 Solutions of Linear Equations . . . . . . . . . . . . . . . . . . 2905.31 Solutions of Nonhomogeneous Linear Equations . . . . . . . . 2925.32 LUP Decomposition . . . . . . . . . . . . . . . . . . . . . . . 294
5.32.1 Computing an LU Decomposition . . . . . . . . . . . . 2955.33 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3015.34 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3025.35 Factorization of Polynomials over Finite Fields . . . . . . . . . 309
5.35.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 3125.36 Mutually Orthogonal Latin Squares [MOLS] . . . . . . . . . . 313
6 Graph Theory 3166.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3166.2 Basic definitions and ideas . . . . . . . . . . . . . . . . . . . . 319
6.2.1 Types of Graphs . . . . . . . . . . . . . . . . . . . . . 3196.2.2 Two Interesting Applications . . . . . . . . . . . . . . 3286.2.3 First Theorem of Graph Theory . . . . . . . . . . . . . 331
6.3 Representations of Graphs . . . . . . . . . . . . . . . . . . . . 3336.4 Basic Ideas in Connectivity of Graphs . . . . . . . . . . . . . . 337
6.4.1 Some Graph Operations . . . . . . . . . . . . . . . . . 3376.4.2 Vertex Cuts, Edge Cuts and Connectivity . . . . . . . 3396.4.3 Vertex Connectivity and Edge-Connectivity . . . . . . 342
6.5 Trees and their properties . . . . . . . . . . . . . . . . . . . . 3496.5.1 Basic Definition . . . . . . . . . . . . . . . . . . . . . . 3496.5.2 Sum of distances from a leaf of a tree . . . . . . . . . . 353
6.6 Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 3556.6.1 Definition and Basic Results . . . . . . . . . . . . . . . 3556.6.2 Minimum Spanning Tree . . . . . . . . . . . . . . . . . 3606.6.3 Algorithm PRIM . . . . . . . . . . . . . . . . . . . . . 361
CONTENTS v
6.6.4 Algorithm KRUSKAL . . . . . . . . . . . . . . . . . . 3636.7 Independent Sets and Vertex Coverings . . . . . . . . . . . . . 363
6.7.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . 3636.8 Vertex Colorings of Graphs . . . . . . . . . . . . . . . . . . . . 366
6.8.1 Basic Ideas . . . . . . . . . . . . . . . . . . . . . . . . 3666.8.2 Bounds for χ(G) . . . . . . . . . . . . . . . . . . . . . 369
7 Coding Theory 3717.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3717.2 Binary Symmetric Channels . . . . . . . . . . . . . . . . . . . 3737.3 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 3747.4 The Minimum Weight (Hamming Weight) of a Code . . . . . 3797.5 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . 3807.6 Standard Array Decoding . . . . . . . . . . . . . . . . . . . . 3817.7 Sphere Packings . . . . . . . . . . . . . . . . . . . . . . . . . . 3867.8 Extended Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 3887.9 Syndrome Decoding . . . . . . . . . . . . . . . . . . . . . . . . 3897.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
8 Cryptography 3928.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3928.2 Some Classical Cryptosystem . . . . . . . . . . . . . . . . . . 393
8.2.1 Caesar Cryptosystem . . . . . . . . . . . . . . . . . . . 3938.2.2 Affine Cryptosystem . . . . . . . . . . . . . . . . . . . 3948.2.3 Private Key Cryptosystems . . . . . . . . . . . . . . . 3968.2.4 Hacking an affine cryptosystem . . . . . . . . . . . . . 396
8.3 Encryption Using Matrices . . . . . . . . . . . . . . . . . . . . 3998.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4018.5 Other Private Key Cryptosystems . . . . . . . . . . . . . . . . 402
8.5.1 Vigenere Cipher . . . . . . . . . . . . . . . . . . . . . . 4028.5.2 The One-Time Pad . . . . . . . . . . . . . . . . . . . . 403
8.6 Public Key Cryptography . . . . . . . . . . . . . . . . . . . . 4048.6.1 Working of Public Key Cryptosystems . . . . . . . . . 4058.6.2 RSA Public Key Cryptosystem . . . . . . . . . . . . . 4068.6.3 The ElGamal Public Key Cryptosystem . . . . . . . . 4098.6.4 Description of ElGamal System . . . . . . . . . . . . . 410
8.7 Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . 4118.7.1 Nontrivial Square Roots (mod n) . . . . . . . . . . . . 4118.7.2 Prime Number Theorem . . . . . . . . . . . . . . . . . 4128.7.3 Pseudoprimality Testing . . . . . . . . . . . . . . . . . 4138.7.4 The Miller-Rabin Primality Testing Algorithm . . . . . 414
CONTENTS vi
8.7.5 Miller-Rabin Algorithm (a, n) . . . . . . . . . . . . . . 4168.8 The Agrawal-Kayal-Saxena (AKS) Primality Testing Algorithm417
8.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4178.8.2 The Basis of AKS Algorithm . . . . . . . . . . . . . . . 4188.8.3 Notation and Preliminaries . . . . . . . . . . . . . . . . 4198.8.4 The AKS Algorithm . . . . . . . . . . . . . . . . . . . 421
9 Finite Automata 4309.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4309.2 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4319.3 Regular Expressions and Regular Languages . . . . . . . . . . 4359.4 Finite Automata — Definition . . . . . . . . . . . . . . . . . . 437
9.4.1 The Product Automaton and Closure Properties . . . . 4429.5 Nondeterministic Finite Automata . . . . . . . . . . . . . . . 444
9.5.1 Nondeterminism . . . . . . . . . . . . . . . . . . . . . 4449.5.2 Definition of NFA . . . . . . . . . . . . . . . . . . . . . 446
9.6 Subset Construction—Equivalence of DFA and NFA . . . . . . 4479.7 Closure of Regular Languages Under Concatenation and Kleene
Star . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4529.8 Regular Expressions and Finite Automata . . . . . . . . . . . 4559.9 DFA State Minimization . . . . . . . . . . . . . . . . . . . . . 4629.10 Myhill-Nerode Relations . . . . . . . . . . . . . . . . . . . . . 467
9.10.1 Isomorphism of DFAs . . . . . . . . . . . . . . . . . . . 4679.10.2 Myhill-Nerode Relation . . . . . . . . . . . . . . . . . . 4679.10.3 Construction of the DFA from a given Myhill-Nerode
Relation . . . . . . . . . . . . . . . . . . . . . . . . . . 4699.10.4 Myhill-Nerode Relation and the Corresponding DFA . 471
9.11 Myhill-Nerode Theorem . . . . . . . . . . . . . . . . . . . . . 4749.11.1 Notion of Refinement . . . . . . . . . . . . . . . . . . . 4749.11.2 Myhill-Nerode Theorem and Its Proof . . . . . . . . . 475
9.12 Non-regular Languages . . . . . . . . . . . . . . . . . . . . . . 4779.12.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . 4779.12.2 The Pumping Lemma . . . . . . . . . . . . . . . . . . 479
Chapter 1
Introduction: Sets, Functions
and Relations
1.1 Introduction
In this chapter, we recall some of the basic facts about sets, functions, rela-
tions and lattices. We are sure that the reader is already familiar with most
of these that are usually taught in high school algebra with the exception of
lattices. We also assume that the reader is familiar with the basics of real
and complex numbers.
If A and B are sets and A ⊆ B (that is A is a subset of B, and A may
be equal to B), then the complement of A in B is the set B \A consisting of
all elements of B not belonging to A. The sets A and B are equal if A ⊆ B
and B ⊆ A.
Definition 1.1.1:
By a family of sets we mean an indexed collection of sets.
1
Chapter 1 Introduction: Sets, Functions and Relations 2
For instance, F = Aαα∈I is a family of sets. Here for each α ∈ I, there
exists a set Aα of F . Assume that each Aα is a subset of set X. Such a set
X certainly exists since we can take X = ∪α∈I
Aα. For each α ∈ I, denote by
A′α the complement X/Aα of Aα in X. We then have the celebrated laws of
de Morgan.
Theorem 1.1.2 (De Morgan’s laws):
Let Aαα∈I be a family of subsets of a set X. Then
(i) ( ∪α∈I
Aα)′ = ∩α∈I
A′α, and
(ii) ( ∩α∈I
Aα)′ = ∪α∈I
A′α.
Proof. We prove (i); the proof of (ii) is similar.
Let x ∈ ( ∪α∈I
Aα)′. Then x 6∈ ∪α∈I
Aα, and therefore, x 6∈ Aα, for each
α ∈ I. Hence x ∈ A′α for each α, and consequently, x ∈ ∩
α∈IA′
α. Thus
( ∪α∈I
Aα)′ ⊆ ∩α∈I
A′α. Conversely, assume that x ∈ ∩
α∈IA′
α. Then x ∈ A′α for
each α ∈ I, and therefore, x 6∈ Aα for each α ∈ I. Thus x 6∈ ∪α∈I
Aα, and
hence x ∈ ( ∪α∈I
Aα)′. Consequently, ∩α∈I
A′α ⊆ ( ∪
α∈IAα)′. This proves (i).
Definition 1.1.3:
A family Aαα∈I is called a disjoint-family of sets if whenever α ∈ I, β ∈ I
and α 6= β, we have Aα ∩ Aβ = φ.
For instance, if A1 = 1, 2, A2 = 3, 4 and A3 = 5, 6, 7 then Aαα∈I ,
I = 1, 2, 3 is a disjoint-family of sets.
Chapter 1 Introduction: Sets, Functions and Relations 3
1.2 Functions
Definition 1.2.1:
A function (also called a map or mapping or a single-valued function) f :
A → B from a set A to a set B is a rule by which to each a ∈ A, there is
assigned a unique element f(a) ∈ B. f(a) is called the image of a under f .
For example, if A is a set of students of a particular class, then for a ∈ A
if f(a) denotes the height of a, then f : A → R+ the set of positive real
numbers is a function.
Definition 1.2.2:
Two functions f : A→ B and g : A→ B are called equal if f(a) = g(a) for
each a ∈ A.
Definition 1.2.3:
If E is a subset of A, then the image of E under f : A→ B is ∪a∈Ef(a). It
is denoted by f(E).
Definition 1.2.4:
A function f : A → B is one-to-one (or 1–1 or injective) if for a1 and a2 in
A, f(a1) = f(a2) implies that a1 = a2.
Equivalently, this means that f(a1) 6= f(a2) implies that a1 6= a2. Hence
f is 1–1 iff distinct elements of A have distinct images in B under f .
As an example, let A denote the set of 1,000 students of a college A,
and B the set of positive integers. For a ∈ A, let f(a) denote the exam
registration number of a. Then f(a) ∈ B. Clearly, f is 1–1. On the other
Chapter 1 Introduction: Sets, Functions and Relations 4
hand if for the above sets A and B, f(a) denotes the age of the student a, f
is not 1–1.
Definition 1.2.5:
A function f : A → B is called onto (or surjective) if for each b ∈ B, there
exists at least one a ∈ A with f(a) = b (that is, the image f(A) = B).
For example, let A denote the set of integers Z and B, the set of even
integers. If f : A → B is defined by setting f(a) = 2a, then f : A → B is
onto. Again, if f : R → (set of non-negative reals) defined by f(x) = x2,
then f is onto but not 1–1.
Definition 1.2.6:
A function f : A → B is bijective (or is a bijection) if it is both 1–1 and
onto.
The function f : Z→ 2Z defined by f(a) = 2a is bijective.
An injective (respectively surjective, bijective) mapping is referred to an
injection (respectively surjection, bijection).
Definition 1.2.7:
Let f : A → B and g : B → C be functions. The composition of g with f ,
denoted by g f , is the function
g f : A→ C
defined by (g f)(a) = g(f(a)) for a ∈ A.
As an example, let A = Z denote the set of integers, B = N∪ 0, where
Chapter 1 Introduction: Sets, Functions and Relations 5
N is the set of natural numbers 1, 2, . . ., and C = N. If f : A→ B is given
by f(a) = a2, a ∈ A, and g : B → C is defined by g(b) = b + 1, b ∈ B then
h = g f : A→ C is given by, h(a) = g(f(a)) = g(a2) = a2 + 1, a ∈ Z = A.
Definition 1.2.8:
Let f : A → B be a function. For F ⊆ B, the inverse image of F under f ,
denoted by f−1(F ) is the set of all a ∈ A with f(a) ∈ F . In symbols:
f−1(F ) = a ∈ A : f(a) ∈ F.
Consider the example under Definition 1.2.5. If F = 1, 2, that is, if F
consists of the 1st and 2nd standards of the school, then f−1(F ) is the set of
students who are either in the 1st standard or in the 2nd standard.
Theorem 1.2.9:
Let f : A → B, and X1, X2 ⊆ A and Y1, Y2 ⊆ B. Then the following
statements are true:
(i) f(X1 ∪X2) = f(X1) ∪ f(X2),
(ii) f(X1 ∩X2) ⊆ f(X1) ∩ f(X2),
(iii) f−1(Y1 ∪ Y2) = f−1(Y1) ∪ f−1(Y2), and
(iv) f−1(Y1 ∩ Y2) = f−1(Y1) ∩ f−1(Y2).
Proof. We prove (iv). The proofs of the other statements are similar.
So assume that a ∈ f−1(Y1 ∩ Y2), where a ∈ A. Then f(a) ∈ Y1 ∩ Y2, and
therefore, f(a) ∈ Y1 and f(a) ∈ Y2. Hence a ∈ f−1(Y1) and a ∈ f−1(Y2), and
therefore, a ∈ f−1(Y1) ∩ f−1(Y2). The converse is proved just by retracing
the steps.
Chapter 1 Introduction: Sets, Functions and Relations 6
Note that, in general, we may not have equality in (ii). Here is an example
where equality does not hold good. Let A = 1, 2, 3, 4, 5 and B = 6, 7, 8.
Let f(1) = f(2) = 6, f(3) = f(4) = 7, and f(5) = 8. Let X1 = 1, 2, 4 and
X2 = 2, 3, 5. Then X1 ∩ X2 = 2, and so, f(X1 ∩ X2) = f(2) = 6.
However, f(X1) = 6, 7, and f(X2) = 6, 7, 8. Therefore f(X1)∩ f(X2) =
6, 7 6= f(X1 ∩X2).
We next define a family of elements and a sequence of elements in a set
X.
Definition 1.2.10:
A family xii∈I of elements xi in a set X is a map x : I → X, where for
i ∈ I, x(i) = xi ∈ X. I is the indexing set of the family (In other words, for
each i ∈ I, there is an element xi ∈ X of the family).
Definition 1.2.11:
A sequence xnn∈N of elements of X is a map x : N→ X. In other words, a
sequence in X is a family in X where the indexing set is the set N of natural
numbers. For example, 2, 4, 6, . . . is the sequence of even positive integers.
1.3 Equivalence Relations
Definition 1.3.1:
The Cartesian product X × Y of two (not necessarily distinct) sets X and
Y is the set of all ordered pairs (x, y), where x ∈ X and y ∈ Y . In symbols:
X × Y = (x, y) : x ∈ X, y ∈ Y .
In the ordered pair (x, y), the order of x and y is important whereas the
Chapter 1 Introduction: Sets, Functions and Relations 7
unordered pairs (x, y) and (y, x) are equal. As ordered pairs they are equal
if and only if x = y. For instance, the pairs (1, 2) and (2, 1) are not equal
as ordered pairs, while they are equal as unordered pairs.
Definition 1.3.2:
A relation R on a set X is a subset of the Cartesian product X ×X.
If (a, b) ∈ R, then we say that b is related to a under R and we also denote
this fact by aRb. For example, ifX = 1, 2, 3, the setR = (1, 1), (1, 2), (2, 2)
is a relation on X.
One of the important concepts in the realm of relations is the equivalence
relation.
Definition 1.3.3:
A relation R on a set X is an equivalence relation on X if
(i) R is reflexive, that is, (a, a) ∈ R for each a ∈ X,
(ii) R is symmetric, that is, if (a, b) ∈ R then (b, a) ∈ R, and
(iii) R is transitive, that is, if (a, b) ∈ R and (b, c) ∈ R, then (a, c) ∈ R.
We denote by [a] the set of elements of X which are related to a under
R. In other words, [a] = x ∈ X : (x, a) ∈ R. [a] is called the equivalence
class defined by a in the relation R.
Example 1.3.4:(1) On the set N of positive integers, let aRb mean that a|b (a
is a divisor of b). Then R is reflexive and transitive but not symmetric.
(2) Let X = N, the set of positive integers. For x, y ∈ X, set xRy iff x is a
Chapter 1 Introduction: Sets, Functions and Relations 8
divisor of y. Then R is not symmetric but reflexive and transitive.
It is clear that similar examples can be constructed.
Example 1.3.5 (Example of an equivalence relation):
On the set Z of integers (positive integers, negative integers and zero), set
aRb iff a− b is divisible by 5. Clearly R is an equivalence relation on Z.
Definition 1.3.6:
A partition P of a set X is a collection P of nonvoid subsets of X whose
union is X such that the intersection of any two distinct members of P is
empty.
Theorem 1.3.7:
Any equivalence relation R on a set X induces a partition on X in a natural
way.
Proof. As above, let [x] denote the class defined by x. We show that the
classes [x], x ∈ X, define a partition on X. First of all, each x of X belongs
to class [x] since (x.x) ∈ R. Hence
X = ∪x∈X
[x]
We now show that if (x, y) 6∈ R, then [x]∩ [y] = φ. Suppose on the contrary,
[x] ∩ [y] 6= φ. Let z ∈ [x] ∩ [y]. This means that z ∈ [x] and z ∈ [y];
hence (z, x) ∈ R and (z, y) ∈ R. This of course means that (x, z) ∈ R
and (z, y) ∈ R and hence by transitivity (x, y) ∈ R, a contradiction. Thus
[x] : x ∈ X forms a partition of X.
Example 1.3.8:
Chapter 1 Introduction: Sets, Functions and Relations 9
Let X = Z, the set of integers, and let (a, b) ∈ R iff a− b is a multiple of 5.
Then clearly, R is an equivalence relation on Z. The equivalence classes are:
[0] =. . . ,−10, −5, 0, 5, 10, . . .,
[1] =. . . ,−9, −4, 1, 6, 11, . . .,
[2] =. . . ,−8, −3, 2, 7, 12, . . .,
[3] =. . . ,−7, −2, 3, 8, 13, . . .,
[4] =. . . ,−6, −1, 4, 9, 14, . . ..
Note that [5]=[0] and so on. Then the collection [0], [1], [2], [3], [4] of
equivalence classes forms a partition of Z.
1.4 Finite and Infinite Sets
Definition 1.4.1:
Two sets are called equipotent if there exists a bijection between them.
Equivalently, if A and B are two sets then A is equivalent to B if there
exists a bijection φ : A→ B from A onto B.
If φ : A → B is a bijection from A to B, then φ−1 : B → A is also
a bijection. Again, if φ : A → B and ψ : B → C are bijections, then
ψ φ : A → C is also a bijection. Trivially, the identity map i : A → A
defined by i(a) = a for each a ∈ A, is a bijection on A. Hence if X is a
nonvoid set, and P(X) is the power set of X, that is, the collection of all
subsets of X, then for A, B ∈P(X), if we set ARB iff there exists a bijection
φ from A onto B, then R is an equivalence relation on P(X). Equipotent
sets have the same “cardinal number” or “cardinality”.
Chapter 1 Introduction: Sets, Functions and Relations 10
Let Nn denote the set 1, 2, . . . , n. Nn is called the initial segment
defined with respect to n.
Definition 1.4.2:
A set S is finite if S is equipotent to Nn for some positive integer n; otherwise
S is called an infinite set.
If S is equipotent to Nn, then the number of elements in it is n. Hence
if n 6= m, Nn is not equipotent to Nm. Consequently, no finite set can be
equipotent to a proper subset of itself.
For instance, the set of trees in a garden is finite whereas the set Z of
integers is infinite. Any subset of a finite set is finite and therefore any
superset of an infinite set is infinite. (If S is an infinite set and S ⊆ T , then
T must be infinite; otherwise, S being a subset of a finite set T , must be
finite).
Theorem 1.4.3:
Let S be a finite set and f : S → S. Then f is 1–1 iff f is onto.
Proof. Suppose f : S → S is 1–1, and T = f(S). If T 6= S, as f is a bijection
from S to T(⊂6=S). Hence S and T have the same number of elements, a
contradiction to the fact that T is a proper subset of the finite set S. Hence
f must be onto.
Conversely, assume that f is onto. (If f is not 1–1, there exists at least
one s ∈ S having at least two preimages). For each s ∈ S choose a preimage
s′ ∈ S under f . Let S ′ be the set of all such s′. Clearly, if s and t are distinct
elements of S, then s′ 6= t′. Hence S ′ is a proper subset of S. Moreover, the
Chapter 1 Introduction: Sets, Functions and Relations 11
function φ : S → S ′ defined by φ(s) = s′ is a bijection. Thus S is bijective
with the proper subset S ′ of S, a contradiction to the fact that S is a finite
set.
Example 1.4.4:
We show by means of examples that the conclusions in Theorem 1.4.3 may
not be true if S is an infinite set.
First, take S = Z, the set of integers and f : Z→ Z defined by f(a) = 2a.
Clearly f is 1–1 but not onto (the image of f being the set of even integers).
Next, let R be the set of real numbers, and let f : R→ R be defined by
f(x) =
x− 1 if x > 0
0 if x = 0
x+ 1 if x < 0
Clearly f is onto; however, f is not 1–1 since
f(1) = f(0) = f(−1) = 0.
Theorem 1.4.5:
The union of any two finite sets is finite.
Proof. First we show that the union of any two disjoint finite sets is finite.
Let S and T be any two finite sets of cardinalities n and m respectively. Then
S is equipotent to Nn and T equipotent to Nm = 1, 2, . . . , m. Clearly T
is also equipotent to the set n + 1, n + 2, . . . , n + m. Hence S ∪ T is
equipotent to 1, . . . , n ∪ n+ 1, . . . , n+m = Nn+m. Hence S ∪ T is also
a finite set.
Chapter 1 Introduction: Sets, Functions and Relations 12
By induction it follows that the union of a disjoint-family of a finite
number of finite sets is finite.
We now show that the union of any two finite sets is finite. Let S and T
be any two finite sets. Then S∪T is the union of the three pair-wise disjoint
sets S\T , S ∪ T and T\S and hence is finite.
Corollary 1.4.6:
The union of any finite number of finite sets is finite.
Proof. By induction on the number of finite sets.
1.5 Cardinal Numbers of Sets
In this section, we briefly discuss the cardinal numbers of sets. Recall Sec-
tion 1.4.
Definition 1.5.1:
A set A is equipotent to a set B if there exists a bijection f from A onto
B, and that equipotence between members of a collection of sets S is an
equivalence relation on S .
As mentioned before, the sets in the same equivalence class are said to
have the same cardinality or the cardinal number. Intuitively it must be
clear that equipotent sets have the same “number” of elements. The cardinal
number of any finite set is a positive integer, while the cardinal numbers of
infinite sets are denoted by certain symbols. The cardinal number of the
infinite set N (the set of positive integers) is denoted by ℵ0 (aleph not). ℵ is
Chapter 1 Introduction: Sets, Functions and Relations 13
the first character of the Hebrew language.
Definition 1.5.2:
A set is called denumerable if it is equipotent to N (equivalently, if it has
cardinal number ℵ0). A set is countable if it is finite or denumerable. It
is uncountable if it is not countable (clearly, any uncountable set must be
infinite).
Lemma 1.5.3:
Every infinite set contains a denumerable subset.
Proof. Let X be an infinite set and let x1 ∈ X. Then X1 = X \ x1 is an
infinite subset of X (If not, X = X1∪x1 is a union of two finite subsets of X
and therefore finite by Corollary 1.4.6). As X1 is infinite, X1 has an element
x2, and X1 \ x2 = X \ x1, x2 is infinite. Suppose we have found out
distinct elements x1, x2, . . . , xn in X with Xn = X \ x1, . . . , xn infinite.
Then there exists xn+1 in Xn so that Xn \ xn+1 is infinite. By induction,
there exists a denumerable subset x1, x2, . . . , xn, xn+1, . . . of X.
Theorem 1.5.4:
A set is infinite iff it is equipotent to a proper subset of itself.
Proof. That no finite set can be equipotent to a proper subset of it has
already been observed (Section 1.4).
Assume that X is infinite. Then, by Lemma 1.5.3, X contains a denu-
merable subset X0 = x1, x2, . . . , . Let
Y =(X \X0) ∪ x2, x3, . . . = X \ x1.
Chapter 1 Introduction: Sets, Functions and Relations 14
Then the mapping φ : X → Y defined by
φ(x) =
x if x ∈ X \X0
xn+1 if x = xn, n ≥ 1,
(so that φ(x1) = x2, φ(x2) = x3 and so on) is a 1–1 map of X onto Y , and
therefore an equipotence (that is, a bijection). Thus X is equipotent to the
proper subset Y = X \ x of X.
Notation
We denote the cardinality of a set X by |X|.
If X is a set of, say, 17 elements and y is a set of 20 elements, then
|X| < |Y | and there exists a 1–1 mapping of X to Y . Conversely, if X and
Y are finite sets and |X| < |Y |, then there exists a 1–1 map from X to Y .
These ideas can be generalized to any two arbitrary sets.
Definition 1.5.5:
Let X and Y be any two sets. Then |X| ≤ |Y | iff there exists a 1–1 mapping
from X to Y .
Suppose we have |X| ≤ |Y |, and |Y | ≤ |X|. If X and Y are finite sets,
it is clear that X and Y have the number of elements, that is, |X| = |Y |.
The same result holds good even if X and Y are infinite sets. This result is
known as Schroder-Bernstein theorem.
Theorem 1.5.6 (Schroder-Bernstein):
If X and Y are sets such that |X| ≤ |Y | and |Y | ≤ |X|, then |X| = |Y |.
For the proof of Theorem 1.5.6, we need a lemma.
Chapter 1 Introduction: Sets, Functions and Relations 15
Lemma 1.5.7:
Let A be a set and A1 and A2 be subsets of A such that A ⊇ A1 ⊇ A2. If
|A| = |A2|, then |A| = |A1|.
Proof. If A is a finite set, then |A| = |A2| gives that A = A2, as A2 is a
subset of A. Hence A = A1 = A2, and therefore |A| = |A1|.
So assume that A is an infinite set. |A| = |A2| means that there exists a
bijection φ : A→ A2. Let φ(A1) = A3 ⊆ A2. We then have,
A1 ⊇ A2 ⊇ A3, and |A1| = |A3| (1.1)
So starting with A ⊇ A1 ⊇ A2 and |A| = |A2|, we get (1.1). Starting with
(1.1) and using the same argument, we get
A2 ⊇ A3 ⊇ A4 and |A2| = |A4|. (1.2)
Note that the bijection from A2 to A4 is given by the same map φ. In this
way, we get a sequence of sets
A ⊇ A1 ⊇ A2 ⊇ A3 ⊇ . . . (1.3)
with |A| = |A3|, and |Ai| = |Ai+2|, for each i ≥ 1. Moreover,
∣∣A \ A1
∣∣ =∣∣A2 \ A3
∣∣,∣∣A1 \ A2
∣∣ =∣∣A3 \ A4
∣∣,∣∣A2 \ A3
∣∣ =∣∣A4 \ A5
∣∣,
and so on. (see Figure 1.1). Once again, the bijections are under the same
map φ. Let P = A ∩ A1 ∩ A2 ∩ . . .
ThenA =(A \ A1) ∪ (A1 \ A2) ∪ . . . ∪ P, and
Chapter 1 Introduction: Sets, Functions and Relations 16
A\A1
A1A
A2\A3
A3 A2
φ
Figure 1.1:
A1 =(A1 \ A2) ∪ (A2 \ A3) ∪ . . . ∪ P,
where the sets on the right are pairwise disjoint. Moreover, |A \ A1| =
|A1 \ A2|, |A1 \ A2| = |A2 \ A3| and so on. Hence |A| = |A1|.
Proof of Schroder–Bernstein theorem. By hypothesis |X| ≤ |Y |. Hence there
exists a 1–1 map φ : X → Y . Let φ(X) = Y ∗ ⊆ Y . As |Y | ≤ |X|, there
exists a 1–1 map ψ : Y → X. Let ψ(Y ) = X∗, and ψ(Y ∗) = X∗∗ ⊆ X.
Then as Y ∗ ⊆ Y , and ψ is 1–1, ψ(Y ∗) ⊆ ψ(Y ), that is, X∗∗ ⊆ X∗. Thus
|X| = |Y ∗| = |X∗∗|, and X ⊇ X∗ ⊇ X∗∗. By Lemma 1.5.7, |X| = |X∗|, and
since |Y | = |X∗|, we have |X| = |Y |.
For a different proof of Theorem 1.5.6, see Chapter 6.
1.6 Power set of a set
We recall the definition of the power set of a given set from Section 1.4.
Definition 1.6.1:
The power set P(X) of a set X is the set of all subsets of X.
Chapter 1 Introduction: Sets, Functions and Relations 17
For instance, if X = 1, 2, 3, then
P(X) =φ, 1, 2, 3, 1, 2, 2, 3, 3, 1, 1, 2, 3 = X
The empty set φ and the whole set X, being subsets of X, are elements of
P(X). Now each subset S of X is uniquely defined by its characteristic
function χS : X → 0, 1 defined by
χS =
1 if s ∈ S
0 if s /∈ S.
Conversely, every function f : X → 0, 1 is the characteristic function of a
unique subset S of X. Indeed, if
S = x ∈ X : f(x) = 1,
then f = χS.
Definition 1.6.2:
For sets X and Y , denote by Y X , the set of all functions f : X → Y .
Theorem 1.6.3:
|X| |P(X)| for each nonvoid set X.
Proof. Theorem 1.6.3, implies that there exists a 1–1 function from X to
P(X) but none from P(X) to X.
First of all, the mapping f : X →P(X) defined by f(x) = x ∈P(X)
is clearly 1–1. Hence |X| ≤ |P(X)|. Next, suppose there exists a 1–1
map from P(X) to X. Then by Schroder–Bernstein theorem, there exists a
bijection g : P(X)→ X. This means that for each element S of P(X), the
Chapter 1 Introduction: Sets, Functions and Relations 18
mapping g : s → g(S) = x ∈ X is a bijection. Now the element x may or
may not belong to S. Call x ∈ X ordinary if x ∈ S, that is, x is a member
of the subset S of X whose image under the map g is x. Otherwise, call x
extraordinary. Let A be the subset of X consisting of all the extraordinary
elements of X. Then A ∈ P(S). (Note: A may be the empty set; still,
A ∈P(S)). Let g(A) = a ∈ X. Is a an ordinary element or an extraordinary
element of X? Well, if we assume that a is ordinary, then a ∈ A; but then a
is extraordinary as A is the set of extraordinary elements. Suppose we now
assume that a is extraordinary; then a /∈ A and so a is an ordinary element
of X, again a contradiction. These contradictions show that there exists no
1–1 mapping from P(X) to X (X 6= φ), and so |P(X)| > |X|.
1.7 Exercises
1. State true or false (with reason):
(i) Parallelism is an equivalence relation on the set of all lines in the
plane.
(ii) Perpendicularity is an equivalence relation on the set of all lines in
the plane.
(iii) A finite set can be equipotent to a proper subset of itself.
2. Prove the statements (i), (ii) and (iii) in Theorem 1.2.9.
3. Prove that the set Z of integers is denumerable.
4. (a) Prove that the denumerable union of denumerable sets is denumer-
able.
Chapter 1 Introduction: Sets, Functions and Relations 19
(b) Prove that a countable union of (nonvoid) denumerable sets is denu-
merable.
5. Prove that the set of all rational numbers is denumerable.
6. Let Mn = m ∈ N : m is a multiple of n. Find
(i) ∪n∈N
Mn.
(ii) Mn ∩Mm.
(iii) ∩n∈N
Mn.
(iv) ∪p=a prime
Mp.
7. Let f : A→ B, and g : B → C be functions.
Prove:
(i) If f and g are 1–1, then so is g f .
(ii) If f and g are onto, then so is g f .
(iii) If g f is 1–1, then f is 1–1.
(iv) If g f is onto, then g is onto.
8. Give an example of a relation which is
(i) reflexive and symmetric but not transitive
(ii) reflexive and transitive but not symmetric.
9. Does there exist a relation which is not reflexive but both symmetric and
transitive?
10. Let X be the set of all ordered pairs (a, b) of integers with b 6= 0. Set
(a, b) ∼ (c, d) in X iff ad = bc. Prove that ∼ is an equivalence relation on
X. What is the class to which (1,2) belongs?
Chapter 1 Introduction: Sets, Functions and Relations 20
11. Give a detailed proof of Corollary 1.4.6.
1.8 Partially Ordered Sets
Definition 1.8.1:
A relation R on a set X is called antisymmetric if, for a, b ∈ X, (a, b) ∈ R
and (b, a) ∈ R together imply that a = b.
For instance, the relation R defined on N, the set of natural numbers,
by setting that “(a, b) ∈ R iff a|b(a divides b)” is an antisymmetric rela-
tion. However, the same relation defined on Z⋆ = Z/0, the set of nonzero
integers, is not antisymmetric. For instance, 5|(−5) and (−5)|5 but 5 6= −5.
Definition 1.8.2:
A relation R on a set X is called a partial order on X if it is (i) Reflexive,
(ii) Antisymmetric and (iii) Transitive.
A partially ordered set is a set with a partial order defined on it
Examples
1. Let R be defined on the set N by setting aRb iff a|b. Then R is a partial
order on N.
2. Let X be a nonempty set. Define a relation R on P(X) by setting
ARB in P(X) iff A ⊆ B. Then P(X) is a partially ordered set with
respect to the above partial order.
It is customary to denote a general partial order on a set X by “≤”. We
then write that (X, ≤) is a partially ordered set or a poset in short.
Chapter 1 Introduction: Sets, Functions and Relations 21
b
b
b
b
b
b
b
b
b
b
b
b
1
φ
3
1,31,2,3
1,2
2
2,3
Figure 1.2: The Hasse diagram of P(1, 2, 3)
Every poset (X, ≤) can be represented pictorially by means of its Hasse
diagram. This diagram is drawn in the plane by taking the elements of X as
points of the plane and representing the fact that a ≤ b in X by placing b
above a and joining a and b by a line segment.
As an example, take S = 1, 2, 3 and X = P(S), the power set of S
and ⊆ to stand for“≤”. Figure 1.2 gives the Hasse diagram of (X, ≤). Note
that φ ≤ 1, 2 since φ is a subset of 1,2 . However, we have not drawn a
line between φ and 1, 2 but a (broken) line exists from φ to 1, 2 via 1
or 2. When there is no relation between a and b of X, both a and b can
appear in the same horizontal level.
Definition 1.8.3:
A partial order “≤” on X is a total order (or linear order) if for any two
elements a and b of X, either a ≤ b or b ≤ a holds.
For instance, if X = 1, 2, 3, 4 and “ ≤′′ is the usual “less than or equal
to” then (X, ≤) is a totally ordered set since any two elements of X are
Chapter 1 Introduction: Sets, Functions and Relations 22
comparable. The Hasse diagram of (X, ≤) is given in Figure1.3.
b
b
b
b
b
φ
1
2
3
4
Figure 1.3: Hasse diagram of (X, ≤), X = 1, 2, 3, 4
The Hasse diagrams of all lattices with five elements are given in Fig-
ure 1.4.
b
0
1
V 11
b
b
0
1
V 21
b
b
b
0
1
a
V 31
b
b
b
bb
1
a
0
V 41
b
b
b
b
0
1
a
b
V 42
b
b
b
bb
1
a
cb
b
0
V 51
b
b
b
bb
c
a
0
b
b1
V 52
b
b
b
b
b
b
b
1
a
0
bc
V 53
b
b
b
b
b
0
a
c
1
b
V 54
b
b
b
b
b
0
a
b
c
1
V 55
Figure 1.4: Hasse diagrams of five elements
Chapter 1 Introduction: Sets, Functions and Relations 23
If S has at least two elements, then (P(S), ≤) is not a totally order set.
Indeed, if a, b ∈ S, then a and b are incomparable (under ⊆) elements
of P(S).
Definition 1.8.4 (Converse Relation):
If f : A −→ B is a relation, the relation f−1 : B −→ A is called the converse
of f provided that (b, a) ∈ f−1 iff (a, b) ∈ f .
We note that if (X, ≤) is a poset, then (X, ≥) is also a poset. Here a ≥ b
in (X, ≥) is defined by b ≤ a in (X, ≤).
Definition 1.8.5:
Let (X, ≤) be a poset.
(i) a is called a greatest element of the poset if x ≤ a for each x ∈ X. An
element b ∈ X is called a smallest element if b ≤ x for each x ∈ X.
If a and a′ are greatest elements of X, by definition, a ≤ a′ and a′ ≤ a,
and so by antisymmetry a = a′. Thus a greatest (smallest) element, if it
exists, is unique. The greatest and least elements of a poset, whenever
they exist, are denoted by 1 and 0 respectively. They are called the
universal elements of the poset.
(ii) An element a ∈ X is called a minimal element of X if there exists no
element c ∈ X such that c < a (that is, c ≤ a and c 6= a). b ∈ X is a
maximal element of X if there exists no element c of X such that c > b
(that is, b ≤ c, b 6= c).
Clearly, the greatest element of a poset is a maximal element and the least
element a minimal element.
Chapter 1 Introduction: Sets, Functions and Relations 24
Example 1.8.6:
Let (X, ⊆) be the poset where X =1, 2, 1, 2, 2, 3, 1, 2, 3
. In
X, 1, 2 are minimal elements, 1, 2, 3 is the greatest element (and the
only maximal element) but there is no smallest element.
Definition 1.8.7:
Let (X, ≤) be a poset and Y ⊆ X
(i) x ∈ X is an upper bound for Y if y ≤ x for all y ∈ Y .
(ii) x ∈ X is a lower bound for Y if x ≤ y for all y ∈ Y .
(iii) The infimum of Y is the greatest lower bound of Y , if it exists. It is
denoted by inf Y .
(iv) The supremum of Y is the least upper bound of Y , if it exists. It is
denoted by sup Y .
Example 1.8.8:
If X = [0, 1] and ≤ stands for the usual ordering in the reals, then 1 is the
supremum of X and 0 is the infimum of X. Instead, if we take X = (0, 1), X
has neither an infimum nor a supremum in X. Here we have taken Y = X.
However, if X = R and Y = (0, 1), then 1 and 0 are the supremum and
infimum of Y respectively. Note that the supremum and infimum of Y ,
namely, 1 and 0, do not belong to Y .
1.9 Lattices
We now define a lattice.
Chapter 1 Introduction: Sets, Functions and Relations 25
Definition 1.9.1:
A lattice L = (L, ∧, ∨) is a nonempty set L together with two binary oper-
ations ∧ (called meet or intersection or product) and ∨ (called join or union
or sum) that satisfy the following axioms:
For all a, b, c ∈ L,
(L1) a ∧ b = b ∧ a; a ∨ b = b ∨ a, (Commutative law)
(L2) a ∧ (b ∧ c) = (a ∧ b) ∧ c; a ∨ (b ∨ c) = (a ∨ b) ∨ c,
(Associative law)
and (L3) a ∧ (a ∨ b) = a; a ∨ (a ∧ b) = a, (Absorbtion law)
Now, by (L3), a ∨ (a ∧ a) = a,
and hence again by (L3) (a ∧ a) = a ∧ (a ∨ (a ∧ a)) = a.
Similarly, a ∨ a = a for each a ∈ L.
Theorem 1.9.2:
The relation “a ≤ b iff a∧ b = a” in a lattice (L,∧,∨), defines a partial order
on L.
Proof. (i) Trivially a ≤ a since a∧ a = a in L. Thus “≤” is reflexive on L.
(ii) If a ≤ b and b ≤ a, we have a ∧ b = a and b ∧ a = b. Hence a = b since
by (L1), a ∧ b = b ∧ a. This proves that “≤” is antisymmetric.
(iii) Finally we prove that “≤” is transitive. Let a ≤ b and b ≤ c so that
a ∧ b = a and b ∧ c = b.
Chapter 1 Introduction: Sets, Functions and Relations 26
Now
a ∧ c = (a ∧ b) ∧ c = a ∧ (b ∧ c) by(L2)
= a ∧ b = a and hence a ≤ c.
Thus (L, ≤) is a poset.
The converse of Theorem 1.9.2 is as follows:
Theorem 1.9.3:
Any partially ordered set (L,≤) in which any two elements have an infimum
and a supremum in L is a lattice under the operations,
a ∧ b = inf(a, b), and a ∨ b = sup(a, b).
Proof. Follows from the definitions of supremum and infimum.
Examples of Lattices
For a nonvoid set S,(P(S), ∩, ∪
)is a lattice. Again for a positive integer
n, define Dn to be the set of divisors of n, and let a ≤ b in Dn mean that
a | b, that is, a is a divisor of b. Then a ∧ b = (a, b), the gcd of a and b, and
a∨ b = [a, b], the lcm of a and b, and(Dn, ∨, ∧
)is a lattice (See Chapter 2
for the definitions of gcd and lcm ). For example, if n = 20, Fig. 1.5 gives
the Hasse diagram of the lattice D20 = 1, 2, 4, 5, 10, 20. It has the least
element 1 and the greatest element 20.
Chapter 1 Introduction: Sets, Functions and Relations 27
1
2
4
2010
5
Figure 1.5: The lattice D20
We next give the Duality Principle valid in lattices.
Duality Principle
In any lattice (L,∧,∨), any formula or statement involving the operations ∧
and ∨ remains valid if we replace ∧ by ∨ and ∨ by ∧.
The statement got by the replacement is called “the dual statement” of
the original statement.
The validity of the duality principle lies in the fact that in the set of
axioms for a lattice, any axiom obtained by such a replacement is also an
axiom. Consequently, whenever we want to establish a statement and its
dual, it is enough to establish one of them. Note that the dual of the dual
statement is the original statement. For instance, the statement:
a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c)
implies the statement
a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c).
Chapter 1 Introduction: Sets, Functions and Relations 28
Definition 1.9.4:
A subset L′ of a lattice L = (L, ∧, ∨) is a sublattice of L if (L′, ∧, ∨) is a
lattice.
A subset S of a lattice (L, ∧, ∨) need not be a sublattice even if it is a
poset with respect to the operation “ ≤ ” defined by
a ≤ b iff a ∧ b = a
For example, let (L, ∩, ∪) be the lattice of all subsets of a vector space L
and S be the collection of all subspaces of L. Then S is, in general, not a
sublattice of L since the union of two subspaces of L need not be a subspace
of L.
Lemma 1.9.5:
In any lattice L = (L, ∧, ∨), the operations ∧ and ∨ are isotone, that is, for
a, b, c in L,
if b ≤ c, then a ∧ b ≤ a ∧ c and a ∨ b ≤ a ∨ c.
Proof. We have (see Exercise 5 of 1.12)
a ∧ b = a ∧ (b ∧ c) = (a ∧ b) ∧ c (by L2)
≤ a ∧ c (as a ∧ b ≤ a).
Similarly (or by duality), a ∨ b ≤ a ∨ c.
Lemma 1.9.6:
Any lattice satisfies the two distributive inequalities:
(i) x ∧ (y ∨ z) ≥ (x ∧ y) ∨ (x ∧ z), and
Chapter 1 Introduction: Sets, Functions and Relations 29
(ii) x ∨ (y ∧ z) ≤ (x ∨ y) ∧ (x ∨ z).
Proof. We have x∧y ≤ x, and x∧y ≤ y ≤ y∨z. Hence, x∧y ≤ inf(x, y∨z) =
x ∧ (y ∨ z), Also x ∧ z ≤ x, and x ∧ z ≤ z ≤ y ∨ z. Thus x ∧ z ≤ x ∧ (y ∨ z).
Therefore, x ∧ (y ∨ z) is an upper bound for both x ∧ y and x ∧ z and hence
greater than or equal to their least upper bound, namely, (x ∧ y) ∨ (x ∧ z).
The second statement follows by duality.
Lemma 1.9.7:
The elements of a lattice satisfy the modular inequality:
x ≤ z implies x ∨ (y ∧ z) ≤ (x ∨ y) ∧ z.
Proof. We have x ≤ x ∨ y and x ≤ z. Hence x ≤ (x ∨ y) ∧ z. Also,
y ∧ z ≤ y ≤ x ∨ y and y ∧ z ≤ z, whence y ∧ z ≤ (x ∨ y) ∧ z. These together
imply that
x ∨ (y ∧ z) ≤ (x ∨ y) ∧ z.
Aliter.
By Lemma 1.9.6 x ∨ (y ∧ z) ≤ (x ∨ y) ∧ (x ∨ z)
= (x ∨ y) ∧ z, as x ≤ z.
Distributive and Modular Lattices
Two important classes of lattices are the distributive lattices and modular
lattices. We now define them.
Definition 1.9.8:
Chapter 1 Introduction: Sets, Functions and Relations 30
A lattice (L, ∧, ∨) is called distributive if the two distributive laws:
a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c),
and a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c)
hold for all a, b, c ∈ L.
Note that in view of the duality that is valid for lattices, if one of the
two distributive laws holds in L then the other would automatically remain
valid.
Example 1.9.9 (Examples of Distributive Lattices): (i)(P(S), ∩, ∪
)
(ii) (N, gcd, lcm ). (Here a∧b = (a, b), the gcd of a and b, and a∨b = [a, b],
the lcm of a and b.)
Example 1.9.10 (Examples of Nondistributive Lattices): (i) The ‘diamond lat-
tice’ of Figure 1.6 (a)
(ii) The ‘pentagonal lattice’ of Figure 1.6 (b)
b
b
b
b
b
b
b
a
0
b
1
c
Diamond Lattice(a)
b
b
b
b
b
1
c
a
0
b
Pentagonal Lattice(b)
Figure 1.6: Hasse diagram of the diamond and pentagonal lattices
Chapter 1 Introduction: Sets, Functions and Relations 31
In the diamond lattice,
a ∧ (b ∨ c) = a ∧ 1 = a, while (a ∧ b) ∨ (a ∧ c) = 0 ∨ 0 = 0 (6= a).
In the case of the pentagonal lattice,
a ∨ (b ∧ c) = a ∨ 0 = a, while
(a ∨ b) ∧ (a ∨ c) = 1 ∧ c = c(6= a).
Complemented Lattice
Definition 1.9.11:
A lattice L with 0 and 1 is complemented if for each element a ∈ L, there
exists at least one element b ∈ L such that
a ∧ b = 0 and a ∨ b = 1.
Example 1.9.12:(1) Let L = P(S), the power set of a nonvoid set S. If
A ∈ L, then A has a unique complement B = L\A. Here 0 = the empty
set and 1 = the whole set S.
(2) The complement of an element need not be unique. For example, in
the lattice of Figure 1.7 (a), both a and c are complements of b, since
b ∧ a = b ∧ c = 0, and b ∨ a = b ∨ c = 1.
(3) Not every lattice with 0 and 1 is complemented. In the lattice of Fig-
ure 1.7 (b), a has no complement.
That the diamond lattice and the pentagonal lattice (of Figure 1.6) are
crucial in the study of distributive lattices is the content of Theorem 1.9.13.
Chapter 1 Introduction: Sets, Functions and Relations 32
b
b
b
b
b
b
0
a
c
1b
b
b
b
c
a
1
b
b
b0(a) (b)
Figure 1.7:
Theorem 1.9.13:
A lattice is distributive iff it does not contain a sublattice isomorphic to the
diamond lattice or the pentagonal lattice.
The necessity of the condition in Theorem 1.9.13 is trivial but the proof
of sufficiency is more involved. (For a proof, see ????)
However, a much simpler result is the following:
Theorem 1.9.14:
If a lattice L is distributive then for a, b, c ∈ L, the equations a ∧ b = a ∧ c
and a ∨ b = a ∨ c together imply that b = c.
Proof. Assume that L is distributive. Suppose that a∧ b = a∧ c and a∨ b =
a ∨ c. We have to show that b = c.
Now b = b ∧ (a ∨ b) (by absorption law)
= b ∧ (a ∨ c) = (b ∧ a) ∨ (b ∧ c) (by distributivity)
= (a ∧ b) ∨ (b ∧ c) = (a ∧ c) ∨ (b ∧ c) = (a ∨ (b ∧ c)) ∧ (c ∨ (b ∧ c))
= (a ∨ (b ∧ c)) ∧ c (by absorbtion law)
Chapter 1 Introduction: Sets, Functions and Relations 33
= ((a ∨ b) ∧ (a ∨ c)) ∧ c = ((a ∨ c) ∧ (a ∨ c)) ∧ c = (a ∨ c) ∧ c
= c.
We now consider another important class of lattices called modular lat-
tices.
Modular Lattices
Definition 1.9.15:
A lattice is modular if it satisfies the following modular identity:
x ≤ z ⇒ x ∨ (y ∧ z) = (x ∨ y) ∧ z.
Hence the modular lattices are those lattices for which equality holds in
Lemma 1.9.7. We prove in Chapter 5 that the normal subgroups of any
group form a modular lattice.
The pentagonal lattice of Figure 1.7 is nonmodular since in it a ≤ c, a ∨
(b ∧ c) = a ∨ 0 = a while (a ∨ b) ∧ c = 1 ∧ c = c(6= a). In fact, the following
result is true.
Theorem 1.9.16:
Any nonmodular lattice L contains the pentagonal lattice as a sublattice.
Proof. As L is nonmodular, there exist elements a, b, c in L such that
a < c and a ∨ (b ∧ c) 6= (a ∨ b) ∧ c.
But the modular inequality (Lemma 1.9.7) holds for any lattice. Hence
a < c and a ∨ (b ∧ c) < (a ∨ b) ∧ c.
Chapter 1 Introduction: Sets, Functions and Relations 34
Set x = a ∨ (b ∧ c), and y = (a ∨ b) ∧ c, so that x < y. Now x ∨ b =(a∨ (b∧ c)
)∨ b = a∨
((b∧ c)∨ b
)= a∨ b = y ∨ b = a∨ ((b∧ c)∨ b) (by L3).
b
b
b
b
b
a ∧ b
b
b ∧ y
x
y
Figure 1.8:
By duality, x ∧ b = y ∧ b. Now since y ≤ c, b ∧ y ≤ b ∧ c ≤ x, the lattice
of Figure 1.8 is the pentagonal lattice contained in L.
A consequence of Theorems 1.9.13 and 1.9.16 is that every distributive
lattice is modular (since L is distributive ⇒ L does not contain the pen-
tagonal lattice as a sublattice ⇒ L is modular). The diamond lattice is an
example of a modular lattice that is not distributive. Another example is the
lattice of all vector subspaces of a vector space V . (See Exercise 1.12.)
Theorem 1.9.17:
In a distributive lattice L, an element can have at most one complement.
Proof. Suppose x has two complements y1, y2. Then
x ∧ y1 = 0 = x ∧ y2, and x ∨ y1 = 1 = x ∨ y2.
As L is distributive, by Theorem 1.9.14, y1 = y2.
Chapter 1 Introduction: Sets, Functions and Relations 35
A consequence of Theorem 1.9.17 is that any complemented lattice having
an element without a unique complement is not distributive. For example,
the lattice of Figure 1.7 is not distributive since c has two complements,
namely, a and b.
1.10 Boolean Algebras
1.10.1 Introduction
A Boolean algebra is an abstract mathematical system primarily used in
computer science and in expressing the relationships between sets. This sys-
tem was developed by the English mathematician George Boole in 1850 to
permit an algebraic manipulation of logical statements. Such manipulation
can demonstrate whether or not a statement is true and show how a com-
plicated statement be rephrased in a similar, more convenient form without
losing its meaning.
Definition 1.10.1:
A complemented distributive lattice is a Boolean algebra. Hence a Boolean
algebra B has the universal elements 0 and 1 and that every element x
of B has a complement x′, and since B is a distributive lattice, by Theo-
rem 1.9.17, x′ is unique. The Boolean algebra B is symbolically represented
as (B, ∧, ∨, 0, 1,′ ).
Chapter 1 Introduction: Sets, Functions and Relations 36
1.10.2 Examples of Boolean algebras
1. Let S be a nonvoid set. Then (P(S), ∩, ∪, φ, S, ′) is a Boolean alge-
bra. Here if A ∈P(S), A′ = S \ A is the complement of A in S.
2. Let Bn denote the set of all binary sequences of length n. For (a1, . . . , an)
and (b1, . . . , bn) ∈ Bn, set
(a1, . . . , an) ∧ (b1, . . . , bn) = (min(a1, b1), . . . ,min(an, bn)),
(a1, . . . , an) ∨ (b1, . . . , bn) = (max(a1, b1), . . . ,max(an, bn)),
and (a1, . . . , an)′ = (a′1, . . . , a′n), where 0′ = 1 and 1′ = 0.
Note that the zero element is the n-vector (0, 0, . . . , 0), and, the unit
element is (1, 1, . . . , 1). For instance, if n = 3, x = (1, 1, 0) and
y = (0, 1, 0), then x ∧ y = (0, 1, 0), x ∨ y = (1, 1, 0), and x′ = (0, 0, 1).
Theorem 1.10.2 (DeMorgan’s Laws):
Any Boolean algebra B satisfies DeMorgan’s laws: For any two elements a,
b ∈ B,
(a ∧ b)′ = a′ ∨ b′, and (a ∨ b)′ = a′ ∧ b′.
Proof. We have by distributivity,
(a ∧ b) ∧ (a′ ∨ b′) = (a ∧ (a′ ∨ b′)) ∧ (b ∧ (a′ ∨ b′))
= ((a ∧ a′) ∨ (a ∧ b′)) ∧ ((b ∧ a′) ∨ (b ∧ b′))
= (0 ∨ (a ∧ b′)) ∧ ((b ∧ a′) ∨ 0)
= (a ∧ b′) ∧ (b ∧ a′)
= (a ∧ b ∧ a′) ∧ (b′ ∧ b ∧ a′) (see Exercise 1.12 #4)
= (a ∧ a′ ∧ b) ∧ (b′ ∧ b ∧ a′)
= (0 ∧ b) ∧ (0 ∧ a′)
Chapter 1 Introduction: Sets, Functions and Relations 37
= 0 ∧ 0
= 0 (since a ∧ a′ = 0 = b ∧ b′).
Similarly,
(a ∧ b) ∨ (a′ ∨ b′) = (a ∨ (a′ ∨ b′)) ∧ (b ∨ (a′ ∨ b′)) (by distributivity)
= ((a ∨ a′) ∨ b′) ∧ ((b ∨ b′) ∨ a′) (since a′ ∨ b′ = b′ ∨ a′)
= (1 ∨ b′) ∧ (1 ∨ a′) = 1 ∧ 1 = 1.
Hence the complement of a ∧ b is a′ ∨ b′. In a similar manner, we can show
that (a ∨ b)′ = a′ ∧ b′.
Corollary 1.10.3:
In a Boolean algebra B, for a, b ∈ B,a ≤ b iff a′ ≥ b′
Proof. a ≤ b⇔ a ∨ b = b⇔ (a ∨ b)′ = a′ ∧ b′ = b′ ⇔ a′ ≥ b′.
Theorem 1.10.4:
In a Boolean algebra B, we have for all a, b ∈ B,
a ≤ b iff a ∧ b′ = 0 iff a′ ∨ b = 1.
Proof. Since (a ∧ b′)′ = a′ ∨ b and 0′ = 1, it is enough to prove the first part
of the theorem. Now
a ≤ b⇒ (by isotone property) a∧ b′ ≤ b∧ b′ = 0. ⇒ a∧ b′ ≤ 0⇒ a∧ b′ = 0.
Conversely, let a ∧ b′ = 0. Then
a = a ∧ 1 = a ∧ (b ∨ b′) = (a ∧ b) ∨ (a ∧ b′)
= a ∧ b⇒ a ≤ b.
Chapter 1 Introduction: Sets, Functions and Relations 38
Next we briefly discuss Boolean subalgebras and Boolean isomorphisms.
These are notions similar to subgroups and group-isomorphisms.
Boolean Subalgebras
Definition 1.10.5:
A Boolean subalgebra of a Boolean algebra B = (B, ∧, ∨, 0, 1, ′) is a subset
B1 of B such that (B1, ∧, ∨, 0, 1, ′) is itself a Boolean algebra with the same
elements 0 and 1 of B.
Boolean Isomorphisms
Definition 1.10.6:
A Boolean homomorphism from a Boolean algebra B1 to a Boolean algebra
B2 is a map f : B1 → B2 such that for all a, b in B1,
(i) f(a ∧ b) = f(a) ∧ f(b),
(ii) f(a ∨ b) = f(a) ∨ f(b), and
(iii) f(a′) = (f(a))′.
Conditions (i) and (ii) imply that f is a lattice homomorphism from B1 to B2
while condition (iii) tells that f takes the complement of an element (which
is unique in a Boolean algebra) in B1 to the complement in B2 of the image
of that element.
Theorem 1.10.7:
Let f : B1 → B2 be a Boolean homomorphism. Then
(i) f(0) = 0, and f(1) = 1;
Chapter 1 Introduction: Sets, Functions and Relations 39
(ii) f is isotone.
(iii) The image f(B1) is a Boolean subalgebra of B2.
Proof. Straightforward.
Example 1.10.8:
Let S = 1, 2, . . . , n, and let A be the Boolean algebra (P(S) = A, ∩, ∪, ′),
and let B be the Boolean algebra defined by the set of all functions from S
to the set [0, 1]. Any such function is a sequence (x1, . . . , xn) where xi = 0
or 1. Let ∧, ∨ and be as in Example 2 of Section 1.10.1. Now consider the
map f : A = P(S) → B = [0, 1]S defined as follows: For X ⊂ S (that is,
X ∈ P(S) = A), f(X) = (x1, x2, . . . , xn), where xi = 1 or 0 according to
whether i ∈ X or not. For X,Y ∈ P (S), f(X ∩ Y ) = the binary sequence
having 1 only in the places common to X and Y = f(X) ∧ f(Y ) as per the
definitions in Example 2 of Section 1.10.1 Similarly, f(X ∪ Y ) = the binary
sequence having 1 in all the places corresponding to the 1’s in the set X ∪ Y
= f(X) ∨ f(Y ).
Further, f(X ′) = f(S \X) = the binary sequence having 1’s in the places
where X had zeros, and zeros in the places where X had 1’s = (f(X)′). f
is 1–1 since distinct binary sequences in B arise out of distinct subsets of
S. Finally, f is onto, since any binary sequence in B is the image of the
corresponding subset (that is, the subset corresponding to the places of the
sequence with 1) of X. Thus f is a Boolean isomorphism.
Example 1.10.9:
Let A be a proper Boolean subalgebra of B = P(S). Then if f : A→ B is
Chapter 1 Introduction: Sets, Functions and Relations 40
the identity function, f is a lattice homomorphism since,
f(A1 ∧ A2) = A1 ∧ A2 = f(A1) ∧ f(A2),
and f(A1 ∨ A2) = A1 ∨ A2 = f(A1) ∨ f(A2).
However f(A′) = f (complement of A in A)= f(φ) = φ = 0B,
while (f(A))′ = A′ = B|A 6= φ.
Hence f(A′) 6= f(A′), and f is not a Boolean homomorphism.
1.11 Atoms in a Lattice
Definition 1.11.1:
An element a of a lattice L with zero is called an atom of L if a 6= 0 and for
all b ∈ L, 0 < b ≤ a ⇒ b = a. That is to say, a is an atom if there is no
nonzero b strictly less than a.
Definition 1.11.2:
An element a of a lattice L is called join-irreducible if a = b ∨ c, then a = b
or a = c; otherwise, a is join-reducible.
Lemma 1.11.3:
Every atom of a lattice with zero is join-irreducible.
Proof. Let a be an atom of a lattice L, and let a = b ∨ c, a 6= b. Then
b ≤ b ∨ c = a, but since b 6= a and a is an atom of L, b = 0, and hence
a = c.
Lemma 1.11.4:
Let L be a distributive lattice and c ∈ L be join-irreducible. If c ≤ a ∨ b,
Chapter 1 Introduction: Sets, Functions and Relations 41
then c ≤ a or c ≤ b. In particular, the same result is true if c is an atom of
L.
Proof. As c ≤ a ∨ b, and L is distributive, c = c ∧ (a ∨ b) = (c ∧ a) ∨ (c ∧ b).
As c is join-irreducible, this means that c = c ∧ a or c = c ∧ b, that is, c ≤ a
or c ≤ b. The second statement follows immediately from Lemma 1.11.3.
Definition 1.11.5: (i) Let L be a lattice and a and b, a ≤ b, be any two
elements of L. Then the closed interval [a, b] is defined as:
[a, b] = x ∈ L : a ≤ x ≤ b.
(ii) Let x ∈ [a, b]. x is said to be relatively complemented in [a, b], if x has
a complement y in [a, b], that is, x∧y = a and x∨y = b. If all intervals
[a, b] of L are complemented, then the lattice L is said to be relatively
complemented.
(iii) If L has a zero element and all elements in [0, b] have complements in L
for every nonzero b in L, then L is said to be sectionally complemented.
Our next theorem is crucial for the proof of the representation theorem
for finite Boolean algebras.
Theorem 1.11.6:
The following statements are true:
(i) Every Boolean algebra is relatively complemented.
(ii) Every relatively complemented lattice is sectionally complemented.
Chapter 1 Introduction: Sets, Functions and Relations 42
(iii) In any finite sectionally complemented lattice, each nonzero element is
a join of finitely many atoms.
Proof. (i) Let [a, b] be an interval in a Boolean algebra B, and x ∈ [a, b].
We have to prove that [a, b] is complemented. Now, as B is the Boolean
algebra, it is a complemented lattice and hence there exists x′ in B such
that x ∧ x′ = 0, and x ∨ x′ = 1. Set y = b ∧ (a ∨ x′). Then y ∈ [a, b].
Also, y is a complement of x in [a, b] since
x ∧ y = x ∧ (b ∧ (a ∨ x′))
= (x ∧ b) ∧ (a ∨ x′) = x ∧ (a ∨ x′) (as x ∈ [a, b], x ≤ b)
= (x ∧ a) ∨ (x ∧ x′) (as B is distributive)
= (x ∧ a) ∨ (0) = a (as a ≤ x)
and, x ∨ y = x ∨ (b ∧ (a ∨ x′)) = (x ∨ b) ∧ (x ∨ (a ∨ x′))
= b ∧((x ∨ x′) ∨ a
)(again by distributivity)
= b ∧ 1 = b (since x ∨ x′ = 1, and 1 ∨ a = 1).
Hence B is complemented in the interval [a, b].
(ii) If L is the relatively complemented, L is complemented in [0, b] for each
b ∈ L (take a = 0). Hence L is sectionally complemented.
(iii) Let a be a nonzero element of a finite sectionally complemented lattice
L. As L is finite, there are only finitely many atoms p1, . . . , pn in L such
that pi ≤ a, 1 ≤ i ≤ n, and let b = p1∨· · ·∨pn. Now, b ≤ a, since b is the
least upper bound of p1, . . . , pn while a is an upper bound of p1, . . . pn.
Suppose b 6= a, then b has a nonzero complement, say, c, in the section
[0, a] since we have assumed that L is sectionally complemented. Let
Chapter 1 Introduction: Sets, Functions and Relations 43
p be an atom such that p ≤ c (≤ a). Then p ∈ p1, . . . , pn, as
by assumption p1, . . . , pn are the only atoms with pi ≤ a, and hence,
p = p ∧ b ≤ c ∧ b = 0 (as c is the complement of b), a contradiction.
Hence b = a = p1 ∨ · · · ∨ pn.
An immediate consequence of Theorem 1.11.6 is the following result.
Corollary 1.11.7:
In any finite Boolean algebra, every nonzero element is a join of atoms.
We end this section with the representation theorem for finite Boolean
algebras which says that any finite Boolean algebra may be thought of as the
Boolean algebra P(S) defined on a finite set S.
Theorem 1.11.8 (Representation theorem for finite Boolean algebras):
Let B be a finite Boolean algebra and A, the set of its atoms. Then there
exists a Boolean isomorphism B ≃P(A).
Proof. For b ∈ B, define A(b) = a ∈ A : a ≤ b so that A(b) is the set of
the atoms of B that are less than or equal to b. Then A(b) ∈ P(A). Now
define
φ : B →P(A)
by setting φ(b) = A(b). We now prove that φ is a Boolean isomorphism.
We first show that φ is a lattice homomorphism, that is, for b1, b2 ∈ B,
φ(b1 ∧ b2) = φ(b1) ∧ φ(b2) = φ(b1) ∩ φ(b2), and φ(b1 ∨ b2) = φ(b1) ∨ φ(b2) =
φ(b1) ∪ φ(b2).
Chapter 1 Introduction: Sets, Functions and Relations 44
Equivalently, we show that
A(b1 ∧ b2) = A(b1) ∩ A(b2),
and A(b1 ∨ b2) = A(b1) ∪ A(b2).
Let a be an atom of B. Then a ∈ A(b1∧ b2)⇔ a ≤ b1∧ b2 ⇔ a ≤ b1 and a ≤
b2 ⇔ a ∈ A(b1) ∩ A(b2). Similarly, a ∈ A(b1 ∨ b2)⇔ a ≤ b1 ∨ b2 ⇔ a ≤ b1 or
a ≤ b2 (As a is an atom, a is join-irreducible by Lemma 1.11.3 and B being a
Boolean algebra, it is a distributive lattice. Now apply Lemma 1.11.4)⇔ a ∈
A(b1) or a ∈ A(b2) ⇔ a ∈ A(b1) ∪A(b2). Next, as regards complementation,
a ∈ φ(b′)⇔ a ∈ A(b′)⇔ a ≤ b′ ⇔ a ∧ b = 0 (by Theorem 1.10.4)
⇔ a b⇔ a /∈ A(b)⇔ a ∈ A \ A(b) = (A(b))′. Thus A(b′) =(A(b)
)′.
Finally, φ(0) = set of atoms in B that are ≤ 0
= φ (as there are none), the zero element of P(A),
and φ(1) = set of atoms in B that are ≤ 1
= set of all atoms in B = A, the unit element of P(A).
All that remains to show is that φ is a bijection. By Corollary 1.11.7, any
b ∈ B is a join, say, b = a1∨· · ·∨an (of a finite number n) of atoms a1, . . . , an
of B. Hence ai ≤ b, 1 ≤ i ≤ n. Suppose φ(b) = φ(c), that is, A(b) = A(c).
Then each ai ∈ A(b) = A(c) and so ai ≤ c for each i, and hence b ≤ c. In a
similar manner, we can show that c ≤ b, and hence b = c. In other words, φ
is injective.
Finally we show that φ is surjective. Let C = c1, . . . , ck ∈ P(A) so
that C is a set of atoms in A. Set b = c1 ∨ · · · ∨ ck. We show that φ(b) = C
Chapter 1 Introduction: Sets, Functions and Relations 45
and this would prove that φ is onto. Now ci ≤ b for each i, and so by the
definition of φ, φ(b) = set of atoms c ∈ A with c ≤ b ⊇ C. Conversely,
if a ∈ φ(b), then a is an atom with a ≤ b = c1 ∨ . . . ∨ ck. Therefore a ≤ ci
for some i by Lemma 1.11.4. As ci is an atom and a 6= 0, this means that
a = ci ∈ C. Thus φ(b) = C.
1.12 Exercises
1. Draw the Hasse diagram of all the 15 essentially distinct lattices with
six elements.
2. Show that the closed interval [a, b] is a sublattice of the lattice(R, inf, sup
).
3. Give an example of a lattice with no zero element and with no unit
element.
4. In a lattice, show that
(i) a ∧ b ∧ c = c ∧ b ∧ a,
(ii) (a ∧ b) ∧ c = (a ∧ c) ∧ (b ∧ c).
5. Prove that in a lattice a ≤ b⇔ a ∧ b = a⇔ a ∨ b = b.
6. Show that every chain is a distributive lattice.
7. Show that the three lattices of Fig. 1.9 are not distributive.
Chapter 1 Introduction: Sets, Functions and Relations 46
b
b
b
b
b
b
b
b
b
0
a
c
1
d e
b
f
(a)
b
b
b
b
b
b
b
b
b
0
a
1
d
bc
(b)
b
b
b
b
b
b
b
b
b
b
b
b
0
a
1
dcb
(c)
Figure 1.9:
8. Show that the lattice of Fig. 1.9 (c) is not modular.
9. Show that the lattice of all subspaces of a vector space is not distribu-
tive.
10. Which of the following lattices are (i) distributive (ii) modular (iii) mod-
ular, but not distributive? (a) D160 (b) D20 (c) D36 (d) D40.
11. Give a detailed proof of Theorem 1.10.7.
Chapter 2
Combinatorics
2.1 Introduction
Combinatorics is the science (and to some extent, the art) of counting and
enumeration of configurations (it is understood that a configuration arises
every time objects are distributed according to certain predetermined con-
straints). Just as arithmetic deals with integers (with the standard opera-
tions), algebra deals with operations in general, analysis deals with functions,
geometry deals with rigid shapes and topology deals with continuity, so does
combinatorics deals with configurations. The word combinatorial was first
used in the modern mathematical sense by Gottfried Wilhelm Leibniz (1646–
1716) in his Dissertatio de Arte Combinatoria (Dissertation Concerning the
Combinatorial Arts). Reference to “combinatorial analysis” is found in En-
glish in 1818 in the title Essays on the Combinatorial Analysis by P. Nichol-
son (see Jeff Miller’s Earliest known uses of the words of Mathematics, Society
for Industrial and Applied Mathematics, U.S.A.). In his book [4], C. Berge
points out the following interesting aspects of combinatorics:
47
Chapter 2 Combinatorics 48
1. study of the intrinsic properties of a known configuration
2. investigation into the existence/non-existence of a configuration with
specified properties
3. counting and obtaining exact formulas for the number of configurations
satisfying certain specified properties
4. approximate counting of configurations with a given property
5. enumeration and classification of configurations
6. optimization in the sense of obtaining a configuration with specified
properties so as to minimize an associated cost function.
Basic combinatorics is now regarded as an important topic in Discrete
Mathematics. Principles of counting appear in various forms in Computer
Science and Mathematics, especially in the analysis of algorithms and in
probability theory.
2.2 Elementary Counting Ideas
We begin with some simple ideas of counting using the Sum Rule, the Product
Rule and obtaining permutations and combinations of finite sets of objects.
Chapter 2 Combinatorics 49
Sum Rule and Product Rule
2.2.1 Sum Rule
This is also known as the principle of disjunctive counting. If a finite set
X is the union of pairwise disjoint nonempty subsets S1, S2, . . . , Sn, then
|X| = |S1| + |S2| + · · · + |Sn| where, |X| denotes the number of elements in
the set X.
2.2.2 Product Rule
This is also known as the principle of sequential counting. If S1, S2, . . . ,
Sn are nonempty finite sets, then the number of elements in the cartesian
product S1 × S2 × · · · × Sn is |S1| × |S2| × · · · × |Sn|.
For instance, consider the set A = a, b, c, d, e and the set B = f, g, h.
The cardinality |A × B| of the set A× B is |A| · |B| = 5.3 = 15. The proof
of the Product Rule is straightforward. One way to do it is to prove the rule
taking two sets and then use induction on the number of sets.
The Sum Rule and the Product Rule are basic and they are applied
mechanically in many situations.
Example 2.2.1:
Assume that a car registration system allows a registration plate to consist of
one, two or three English alphabets followed by a number (not zero or starting
with zero) having a number of digits equal to the number of alphabets. How
many possible registrations are there?
There are 26 possible alphabets and 10 possible digits, including 0. By the
Chapter 2 Combinatorics 50
Product Rule there are 26, 26× 26 and 26× 26× 26 possibilities of alphabet
combination(s) of length 1,2 and 3 respectively and 9, 9× 10 and 9× 10× 10
permissible numbers. Each occurrence of a single alphabet can be combined
with any of the nine single digit numbers 1 to 9. Similarly each occurrence
of a double (respectively triple) alphabet canbe combined with any of the
allowed ninety two digit numbers from 10 to 99 (respectively nine hundred
three digit numbers from 100 to 999).Hence the number of registrations of
lengths 2, 4 and 6 characters are 26 × 9 (= 234), 676 × 90 (= 60 840) and
17 576×900 (= 15 818 400) respectively. Since these possibilities are mutually
exclusive and together they exhaust all the possibilities, by the Sum Rule the
total number of possible registrations is 234+60 840+15 818 400 = 15 879 474.
Example 2.2.2:
Consider tossing 100 indistinguishable dice. By the Product Rule, it follows
that there are 6100 ways of their falling.
Example 2.2.3:
A self-dual 2-valued Boolean function is one whose definition remains un-
changed if we change all the 0’s to 1’s and all the 1’s to 0’s simultaneously.
How many such functions in n-variables exist?
Chapter 2 Combinatorics 51
Values of FunctionVariables Value0 0 0 10 0 1 10 1 0 00 1 1 01 0 0 11 0 1 11 1 0 01 1 1 0
As an example, we consider a Boolean function in 3 variables. The function
value is what is to be assigned in defining the function. As the function is
required to be self-dual, it is evident that an arbitrary assignment (of 0’s and
1’s) of values cannot be made. The self-duality requires that changing all
0’s to 1’s and all 1’s to 0’s does not change the function definition. Thus, if
the values to the variables are assigned as 0 0 0 and if the function value is
1, then for the assignment (of variables) 1 1 1 the function value must be the
complement of 1 as shown in the table. It is thus easy see that only 2n−1
independent assignments to the function value can be made. Thus, the total
number of self-dual 2-valued Boolean functions is 22(n−1).
2.3 Combinations and Permutations
In obtaining combinations of various objects the general idea is one of selec-
tion. However, in obtaining permutations of objects the idea is to arrange
the objects in some order.
Consider the five given objects, a, a, a, b, c abbreviated as 3.a, 1.b, 1.c.
The numbers 3, 1, 1 denoting the multiplicities of the objects are called the
repetition numbers of the objects. The 3-combinations are a a a, a a b, a a c
Chapter 2 Combinatorics 52
and a b c. This ignores the ordering of objects. On the other hand, the 3-
permutations are a a a, a a b, a b a, b a a, a a c, a c a, c a a, a b c, a c b, b a c, b c a,
c a b and c b a.
When we allow unlimited repetitions of objects we denote the repetition
number by α. Consider the 3-combinations possible from α.a, α.b, α.c, α.d.
There are 20 of them. On the other hand, there are 43 or 64 3-permutations.
We use the following notations.
P (n, r) = the number of r permutations of n distinct elements without rep-
etitions.
C(n, r) = the number of r combinations of n distinct elements without rep-
etitions.
The following are basic results: P (n, r) = n!(n−r)!
and C(n, r) = n!r!(n−r)!
where n! (read as “n-factorial”) is the product 1 · 2 · 3 · · ·n.
2.4 Stirling’s Formula
A useful approximation to the value of n! for large values of n was given by
James Stirling in 1730. This formula says that when n is “large”, n! can be
approximated by, Sn =√
(2πn) · (n/e)n
That is, limn→∞
n!/Sn = 1. The following table gives the values of n! and
the corresponding Sn for the indicated n also indicates the percentage error
in Sn.
For standard generalizations of n! see [39].
The following two examples are variations of permutations:
Example 2.4.1:
Chapter 2 Combinatorics 53
Table 2.1: Sn for the indicated n and percentage error in Sn
n n! Sn Percentageerror
8 40 320 39 902 1.03579 362 880 359 537 0.9213
10 3 628 800 3 598 696 0.829611 39 916 800 39 615 625 0.754512 479 001 600 475 687 486 0.691913 6 227 020 800 6 187 239 475 0.6389
Given n objects it is required to arrange them in a circle. In how many ways
is this possible?
Number the objects as 1, 2, . . . , n. Keeping the object 1 fixed, it is possible
to obtain (n − 1)! different arrangements of the remaining objects. Note
that, for any arrangement it is possible to shift the position of the object 1
while simultaneously shifting (rotating) all the other objects. Thus, the total
number of arrangements is just (n− 1)!.
Example 2.4.2:
Enumerating r-permutations of n objects with unlimited repetitions is easy.
We consider r boxes, each to be filled by one of n possibilities. Thus the
answer is U(n, r) = nr.
It is easy to see that C(n, r) can also be expressed as n(n−1)(n−2) · · · (n−
r + 1)/r!
The numerator is often denoted by [n]r which is a polynomial in n of
degree r. Thus we can write, [n]r = sor + s1
rn+ s2rn
2 + · · ·+ srrn
r.
By definition, the coefficients skr are the Stirling Numbers of the first kind.
Chapter 2 Combinatorics 54
These numbers can be calculated from the following formulas:
s0r = 0, sr
r = 1
skr+1 = sk−1
r − rskr (2.1)
Proof. By definition, [x]r+1 = [xr](x− r). Again by definition, we have from
the above equality, · · ·+ skr+1x
k + · · · = (· · ·+ sk−1r xk−1 + sk
rxk + · · · )(x− r)
Equating the coefficients of xk on both the sides above, gives the required
recurrence. From the above relations we can build the following table:
skr k = 0 1 2 3 4
r = 1 0 1 0 0 02 0 -1 1 0 03 0 2 -3 1 04 0 -6 11 -6 1
2.5 Examples in simple combinatorial reason-
ing
Clever reasoning forms a key part in most of the combinatorial problems.
We illustrate this with some typical examples.
Example 2.5.1:
Show that C(n, r) = C(n, n− r).
The left hand side of the equality denotes the number of ways of choosing
r objects from n objects. Each such choice leaves out (n − r) objects. This
is exactly equivalent to choosing (n− r) objects, leaving out r objects, which
is the right hand side.
Chapter 2 Combinatorics 55
Example 2.5.2:
Show that C(n, r) = C(n− 1, r − 1) + C(n− 1, r).
The left hand side is the number of ways of selecting r objects from out
of n objects. To do this, we proceed in a different manner. We mark one of
the n objects as X. In the selected r objects, (a) either X is included or (b)
X is excluded. The two cases (a) and (b) are mutually exclusive and totally
exhaustive. Case (a) is equivalent to selecting (r − 1) objects from (n − 1)
objects while case (b) is equivalent to selecting r objects from (n−1) objects.
Example 2.5.3:
There are a roads from city A to city B, b roads from city B to city C, c
roads from city C to city D, e roads from city A to city C, d roads from
city B to city D and f roads from city A to city D. In how many number
of ways one can travel from city A to city D and come back to city A while
visiting at least one of city B and/or city C at least once? Starting from the
city A, the different routes leading to city D are shown in the following “tree
diagram”.
It follows that the total number of ways of going to city D from city A is
(abc + ad + ec + f). The tree diagram also suggests (from the leaves to the
root) the number of ways of going from city D to city A is (abc+ad+ec+f).
Therefore, the total number of ways of going from city A to city D and back
is (abc+ ad+ ec+ f)2. The number of ways of directly going from city A to
city D and back to city A directly is f 2. Hence the number of ways of going
from city A to city D and back while visiting city B and/or city C at least
once is (abc+ ad+ ec+ f)2 − f 2.
Chapter 2 Combinatorics 56
c
b d
a
c
ef
A
B
C
D
D
C D
D
Figure 2.1:
Example 2.5.4:
Show that P (n, r) = r · P (n− 1, r − 1) + P (n− 1, r).
The left hand side is the number of ways of arranging r objects from out
of n objects. This can be done in the following way. Among the r objects, we
mark one object as X. The selected arrangement either includes X or does
not include X. In the former case, we first select (r− 1) objects from among
(n − 1) objects and then introduce X in any of the r positions. This gives
the first term on the right hand side. In the latter case, we simply select r
objects from out of (n− 1) objects excluding X. This gives the second term
on the right hand side.
Example 2.5.5:
Count the number of simple undirected graphs with a given set V of n ver-
tices.
Obviously, V contains C(n, 2) = n(n − 1)/2 unordered pairs of vertices.
We may include or exclude each pair as an edge in forming a graph with
vertex set V . Therefore, there are 2C(n,2) simple graphs with vertex set V .
Chapter 2 Combinatorics 57
In the above counting, we have not considered isomorphism (see Chap-
ter 6). Thus, although there are 64 simple graphs having four vertices, there
are only eleven of them distinct up to isomorphism.
Example 2.5.6:
Let S be a set of 2n distinct objects. A pairing of S is a partition of S into
2-element subsets; that is, a collection of pairwise disjoint 2-element subsets
whose union is S. How many different pairings of S are there?
Method 1: We pick an element say x from S. The number of ways to
select x’s partner, say y, is (2n− 1). (Now x, y forms a 2-element subset).
Consider now the (2n− 2) elements in S \ x, y. We pick any element say u
from S \ x, y. The number of ways to select u’s partner, say v, is (2n− 3).
Thus, the total number of ways of picking x, y and u, v is (2n−1)(2n−3).
Extending the argument in a similar way and applying the product rule, the
total number of ways of partitioning S into 2-element subsets is given by,
(2n− 1) · (2n− 3) · · · 5 · 3 · 1.
Method 2: Consider the n 2-element subsets (boxes shown as braces) shown
below:
−,− , −,−, . . . , −,− numbered respectively as 1, 2, . . . , n.
We form a 2-element subset of S and assign it to box 1. There are C(2n, 2)
ways of doing this. Next, we form a 2-element subset from the remaining
(2n − 2) elements and assign it to box 2. This can be done in C(2n − 2, 2)
ways. It is easy to see that the total number of ways to form the different
2-element subsets of S is,
C(2n, 2) · C(2n− 2, 2) · C(2n− 4, 2) · · ·C(4, 2) · C(2, 2).
Chapter 2 Combinatorics 58
However, in the above, we have considered the ordering of the various 2-
element subsets also. Since the ordering is immaterial, the number of ways
of partitioning S into 2-element subsets is
1
n![C(2n, 2) · C(2n− 2, 2) · C(2n− 4, 2) · · ·C(4, 2) · C(2, 2)]
This expression is the same as the one obtained in the method 1 above.
Example 2.5.7:
Let S = 1, 2, . . . , (n + 1) where n ≥ 2 and let T = (x, y, z) ∈ S3|x < z
and y < z. Show by counting |T | in two different ways that,
∑
1≤k≤n
k2 = C(n+ 1, 2) + 2C(n+ 1, 3) . . . . . . (2.2)
We first show that the number on the left-hand side of 2.2 is |T |: In
selecting (x, y, z) we first fix z = 2. We then have ordered 3-tuples of the
form (−,−, z). The number of ways of filling the blanks is 1 × 1 as z is
greater than both x and y. Next we fix z = 3 to get ordered 3-tuples of the
form (−,−, z). As both 2 and 1 are less than the fixed z, the number of
ways of filling the blanks is 2 × 2 (whence we get the four ordered 3-tuples
(1, 1, 3), (1, 2, 3), (2, 1, 3) and (2, 2, 3) ). The argument can be extended by
fixing z = 3, 4, . . . , n. Thus the total number of different ordered 3-tuples of
the required type is simply 1 · 1 + 2 · 2 + · · ·+n ·n = |T |. We next show that
the number on the right-hand side of 2.2) also represents |T |: We consider
two mutually exclusive and totally exhaustive cases, namely x = y and x 6= y
in the 3-tuples of interest. For the first case, we select only 2 integers from S,
take the larger to be z and the smaller to be both x as well as y. This can be
done in C(n+ 1, 2) ways. For the second case, we select 3 integers, take the
Chapter 2 Combinatorics 59
largest to be z and assign the remaining two to be x and y in two possible
ways (note that each selection will produce two ordered 3-tuples). The total
number of ordered 3-tuples that are possible in this way is 2C(n + 1, 3).
Hence the number of elements in |T | is C(n+ 1, 2) + 2.C(n+ 1, 3).
Example 2.5.8:
A sequence of (mn + 1) distinct integers u1, u2, . . . , umn+1 is given. Show
that the sequence contains either a decreasing subsequence of length greater
than m or an increasing subsequence of length greater than n (this result is
due to P. Erdos and G. Szekeres (1935)).
We present the proof as in [4]. Let li(−) be the length of the longest
decreasing subsequence with the first term ui(−) and let li(+) be the length
of the longest increasing subsequence with the first term ui(+).
Assume that the result is false. Then ui → (li(−), li(+)) defines a
mapping of u1, u2, . . . , umn+1 into the Cartesian product 1, 2, . . . ,m ×
1, 2, . . . , n. This mapping is injective since if i < j,
ui > uj ⇒ li(−) > lj(−)⇒ (li(−), li(+)) 6= (lj(−), lj(+))
ui < uj ⇒ li(+) > lj(+)⇒ (li(−), li(+)) 6= (lj(−), lj(+))
Hence, |u1, u2, . . . , umn+1| ≤ |1, 2, . . . ,m × 1, 2, . . . , n| and therefore
mn+ 1 ≤ mn which is impossible.
2.6 The Pigeon-Hole Principle
One deceptively simple counting principle that is useful in many situations
is the Pigeon-Hole Principle. This principle is attributed to Johann Peter
Chapter 2 Combinatorics 60
Dirichlet in the year 1834 although he apparently used the German term
Schubfachprinzip. The French term is le principe de tiroirs de Dirichlet which
can be translated as “the principle of the drawers of Dirichlet”.
The Pigeon-Hole Principle: If n objects are put into m boxes and n > m
(m and n are positive integers) then at least one box contains two or more
objects.
A stronger form: If n objects are put into m boxes and n > m, then some
box must contain at least [n/m] objects.
Another form: Let k and n be the two positive integers. If at least kn + 1
objects are distributed among n boxes then one of the boxes must contain
at least k + 1 objects.
We now illustrate this principle with some examples.
Example 2.6.1:
Show that, among a group of 7 people there must be at least four of the same
sex.
We treat the 7 people as 7 objects. We create 2 boxes—one (say, box1) for
the objects corresponding to the females and one (say, box2) for the objects
corresponding to the males. Thus, the 7(= 3 · 2 + 1) objects are put into two
boxes. Hence, by the Pigeon-Hole principle there must be at least 4 objects
in one box. In other words, there must be at least four people of the same
sex.
Example 2.6.2:
Given any five points, chosen within a square of side with length 2 units,
prove there must be two points which are at most√
2 units apart.
Chapter 2 Combinatorics 61
Subdivide the square into four small squares each with side of length 1
unit. By the pigeon-hole principle at least two of the chosen points must be
in (or on the boundary) one small square. But then the distance between
these two points cannot exceed the diagonal length,√
2 of the small square.
Example 2.6.3:
Let A be a set of m positive integers. Show that there exists a nonempty
subset B of A such that the sum∑x∈B
is divisible by m.
We make use of the congruence relation. By a ≡ b (mod m), we mean m
divides (a − b). A basic property we will use, is that if a ≡ r (mod m) and
b ≡ r (mod m), then a ≡ b (mod m).
Let A = a1, a2, . . . , an. Consider the following m subsets of A and the
sum of their respective elements:
Set A1 = a1 − sum of the elements is a1
Set A2 = a1, a2 − sum of the elements is a1 + a2
Set A3 = a1, a2, a3 − sum of the elements is a1 + a2 + a3
. . . . . . . . . . . . . . .
Set Am = a1, a2, ..., am − sum of the elements is a1 + a2 + · · ·+ am.
If any of the sums is exactly divisible by m, then the corresponding set is the
required subset B. Therefore, we will assume that none of the above sums is
divisible by m. We thus have,
a1 ≡ r1 ( mod m)
a1 + a2 ≡ r2 ( mod m)
a1 + a2 + a3 ≡ r3 ( mod m)
Chapter 2 Combinatorics 62
a1 + a2 + · · ·+ am ≡ rm ( mod m)
where each ri (1 ≤ i ≤ m) is in 1, 2, . . . , (m−1). Now, we consider (m−1)
boxes numbered 1 through (m−1) and we distribute the integers r1 through
rm to these boxes so that ri goes into box i if r1 = i . By the Pigeon-Hole
principle there must be one box containing an ri and an rj both of which
must be the same, say r. That is, we must have,
a1 + a2 + · · ·+ ai ≡ r ( mod m)
and a1 + a2 + · · ·+ aj ≡ r ( mod m),
where j > i without loss of generality.
Therefore, m divides the difference (a1 +a2 + · · ·+aj)−(a1 +a2 + · · ·+ai).
Accordingly, Aj \ A1 is the required subset B.
2.7 More Enumerations
We now consider the interesting case of enumerating combinations with un-
limited repetitions:
Let us consider the distinct objects to be a1, a2, a3, . . . , an. We are inter-
ested in selecting r-combinations from ∞ · a1,∞ · a2,∞ · a3, . . . ,∞ · an.
Any r-combination will be of the form x1 ·a1, x2 ·a2, . . . , xn ·an where the
x1, x2, . . . , xn are the repitition numbers, each being a non-negative integer
and they add up to r. Thus the number of r-combinations of ∞·a1,∞·a2,∞·
a3, . . . ,∞·an is equal to the number of solutions of x1+x2+x3+· · ·+xn = r.
For each xi we put a bin and assign xi balls to that bin; the number of
balls will add up to r. Thus the number of solutions to the above equation
Chapter 2 Combinatorics 63
is equal to the number of ways of placing r indistinguishable balls in n bins
numbered 1 through n. We now make the following observation.
Consider 10 objects and consider the 7-combinations from them. One
solution is: (3 0 0 2 0 0 0 2 0 0).
Corresponding to this (distribution of balls in bins) we can form a unique
binary number as follows: first we separate the integers by a 1 (imagine this
as a vertical bar). Then we ignore the zeros and put (appropriate number
of) zeros in the place of non-zero integers. For the above example, we get,
000 1 1 1 00 1 1 1 1 00 1 1 .
Generalizing this we can say that the number of ways of placing r in-
distinguishable balls into n numbered bins is equal to the number of binary
numbers with (n − 1) 1’s and r 0’s. Counting such binary numbers is easy:
we have (n−1+r) positions and we have to choose r positions to be occupied
by the 0’s ( the remaining (n − 1) positions get filled by 1’s ). This can be
done in C(n− 1 + r, r) ways. We can now state the result:
Let V (n, r) be the number of r-combinations of n distinct objects with
unlimited repitions. We have, V (n, r) = C(n−1+ r, r) = C(n−1+ r, n−1).
Example 2.7.1:
The number of integral solutions of, x1 + x2 + · · · + xn = r, xi > 0, for all
admissible values of i is equal to the number of ways of distributing r similar
balls into n numbered bins with at least one ball in each bin. This is equal
to C(n− 1 + (r − n), r − n) = C(r − 1, r − n).
Chapter 2 Combinatorics 64
2.7.1 Enumerating permutations with constrained rep-
etitions
We begin with an illustrative example. Consider the problem of obtaining
the 10-permutations of 3 · a, 4 · b, 2 · c, 1 · d. Let x be the number of such
permutations. If the 3 a’s are replaced by a1, a2, a3 it is easy to see that
we will get 3!x permutations. Further if the 4b’s are replaced by b1, b2, b3,
b4 then we will get (4!)(3!)x permutations. In addition, if we replace the
2c’s by c1, c2 we will get (2!)(4!)(3!)x permutations. In the process we have
generated the set a1, a2, a3, b1, b2, b3, b4, c1, c2, d the elements of which
can be permuted in 10! ways. Therefore,
(2!)(4!)(3!)x = 10! and hence, x = 10!/(2!)(4!)(3!).
We now generalize the above example.
Let q1, q2, . . . , qt be nonnegative integers such that n = q1 + q2 + . . . + qt .
Also let a1, a2, . . . , at be t distinct objects. Let P (n; q1, q2, . . . , qt) denote the
number of n-permutations of the n-combination of q1 · a1, q2 · a2, . . . , qt · at.
By an argument similar to the above example we have
P (n; q1, q2, . . . , qt) = n!/(q1! · q2! · . . . · qt!) =
C(n, q1)C(n− q1, q2)C(n− q1 − q2, q3) . . . C(n− q1 − q2 − . . .− qt−1, qt)
By substituting the formula for each term in the product, the last expression
can be simplified to the previous expression.
Chapter 2 Combinatorics 65
2.8 Ordered and Unordered Partitions
Let S be a set with n distinct elements and let t be a positive integer. A
t-part partition of the set S is a set A1, A2, . . . , At of t subsets of S namely,
A1, A2, . . . , At such that
S = A1 ∪ A2 ∪ · · · ∪ At,
and Ai ∩ Aj = φ, for i 6= j.
This refers to unordered partition.
An ordered partition of S is firstly, a partition of S; secondly there is
a specified order on the subsets. For example, the ordered partitions of
S = a, b, c, d of the type 1, 1, 2 are given below:
(a, b, c, d) (b, a, c, d) (a, c, b, d) (c, a, b, d)
(a, d, b, c) (d, a, b, c) (b, c, a, d) (c, b, a, d)
(b, d, a, c) (d, b, a, c) (c, d, a, b) (d, c, a, b)
Here, our concern is in the number of such partitions rather than the actual
list itself.
2.8.1 Enumerating the ordered partitions of a set
The number of ordered partitions of a set S of the type (q1, q2, . . . , qt) , where
|S| = n, is
P (n; q1, q2, . . . , qt) = n!/(q1! · q2! · · · · qt!)
We see this by choosing the q1 elements to occupy the first subset in C(n, q1)
ways; the q2 elements for the second subset in C(n− q1, q2) ways etc. Thus,
Chapter 2 Combinatorics 66
the number of ordered partitions of the type (q1, q2, . . . , qt) is
C(n, q1)C(n− q1, q2)C(n− q1 − q2, q3) . . . C(n− q1 − q2 − . . .− qt−1, qt)
which is equal to n!/(q1! · q2! · · · qt!).
Example 2.8.1:
In the game of bridge, the four players N , E, S and W are seated in a
specified order and are each dealt with a hand of 13 cards. In how many
ways can the 52 cards be dealt to the four players?
We see that the order counts. Therefore, the number of ways is 52!/(13!)4.
Example 2.8.2:
To show that (n2)!/(n!)n is an integer.
Consider a set of n2 elements. Partition this set into n-part partitions.
Then the number of ordered n-part partitions is (n2)!/(n!)n which has to be
an integer.
We can also consider partitioning a set of objects into different classes.
For example, consider the four objects a, b, c, d. We can partition the
set a, b, c, d into two classes: one class containing two subsets with one
and three elements and another class containing two subsets each with two
elements (these are the only possible classes). The partioning gives the fol-
lowing:
(a, b, c, d) , (b, a, c, d) , (c, a, b, d) , (d, a, b, c) ,
(a, b, c, d) , (a, c, b, d) , (a, d, b, c) .
The number of partitions of a set of n objects into m classes, where n ≥ m,
is denoted by Smn which is called the Stirling Number of the second kind. It
Chapter 2 Combinatorics 67
is also the number of distinct ways of arranging a set of n distinct objects
into a collection of m identical boxes, not allowing any box to be empty ( if
empty boxes were permitted, then the number would be S1n +S2
n + . . .+Smn ).
It easily follows that,
S1n = Sn
n = 1.
Also, Skn+1 = Sk−1
n + kSkn, 1 < k < n
Proof. Consider the partitions of (n+ 1) objects into k classes. We have the
following two mutually exclusive and totally exhaustive cases:
(i) The (n+1)th object is the sole member of a class: In this case, we simply
form the partitions of the remaining n objects into k − 1 classes and
attach the class containing the sole member. The number of partitions
thus formed is Sk−1n .
(ii) The (n+1)th object is not the sole member of any class: In this case, we
first form the partitions of the remaining n objects into k classes. This
gives Skn partitions. In each such partition we then add the (n + 1)th
object to one of the k classes. We thus get kSkn partitions of the required
type.
From (i) and (ii) above, the result follows.
The interpretation of Smn as the number of partitions of 1, . . . , n into
exactly m classes also yields another recurrence. If we remove the class (say
c) containing n and if there are r elements in the class c then we get a
partition of (n− r) elements into (m− 1) classes. The class c can be chosen
Chapter 2 Combinatorics 68
in C(n− 1, r − 1) possible ways. Hence,
Smn =
n∑
r=1
C(n− 1, r − 1)Sm−1n−r .
We can use the above relations to build the following table:
Smn m = 1 2 3 4 5 6 7
n = 1 1 0 0 0 0 0 02 1 1 0 0 0 0 03 1 3 1 0 0 0 04 1 7 6 1 0 0 05 1 15 25 10 1 0 06 1 31 90 65 15 1 07 1 63 301 350 140 21 1
2.9 Combinatorial Identities
There are many interesting combinatorial identities. Some of these identities
are suggested by the Pascal’s Triangle shown in Fig. 2.2. Pascal’s triangle
is constructed by first writing down three 1’s in the form of a triangle (this
corresponds to the first two rows in the figure). Any number (other than the
1’s at the ends) in any other row is obtained by summing up the two numbers
in the previous row that are positioned immediately before and after. The
1’s at the ends of any row are simply carried over. We consider below, some
well-known combinatorial identities. The identities 2 through 5 can be seen
to appear in the Pascal’s triangle.
Newton’s Identity
C(n, r)C(r, k) = C(n, k)C(n− k, r − k), for integers n ≥ r ≥ k ≥ 0.
The left-hand side consists of selecting two sets: first a set A of r objects
and then a set B of k objects from the set A. For example, it is the number
Chapter 2 Combinatorics 69
C(0, 0) = 1
C(1, 0) = 1 C(1, 1) = 1
C(2, 0) = 1 C(2, 1) = 2 C(2, 2) = 1
C(3, 0) = 1 C(3, 1) = 3 C(3, 2) = 3 C(3, 3) = 1
C(4, 0) = 1 C(4, 1) = 4 C(4, 2) = 6 C(4, 3) = 4 C(4, 4) = 1
Figure 2.2:
of ways of selecting a committee of r people out of a set of n people and then
choosing a subcommittee of k people. The right-hand side can be viewed
as selecting the k subcommittee members in the first place and then adding
(r − k) people, to form the committee, from the remaining (n− k) people.
A special case: C(n, r)r = nC(n− 1, r − 1).
Pascal’s Identity: C(n, r) = C(n− 1, r) + C(n− 1, r − 1).
This is attributed to M. Stifel (1486–1567) and to Blaise Pascal (1623–1662).
The proof of the identity is easy.
Diagonal Summation:
C(n, 0) + C(n+ 1, 1) + C(n+ 2, 2) + · · ·+ C(n+ r, r) = C(n+ r + 1, r).
The right-hand side is equal to the number of ways to distribute r indistin-
guishable balls into (n+2) numbered boxes. But, the balls may be distributed
as follows: fix a value for k where 0 ≤ k ≤ r; for each k distribute the k balls
in the first (n + 1) boxes and then the remainder in the last box. This can
be done in∑
k=0,1,...,r C(n+ k, k) ways.
Chapter 2 Combinatorics 70
Row Summation
C(n, 0) + C(n, 1) + . . .+ C(n, r) + · · ·+ C(n, n) = 2n.
Consider finding the number of subsets of a set with n elements. We
take n bins (one for each element). We indicate the picking (respectively, not
picking) of an element by “putting” a 1 (respectively a 0) in the corresponding
bin. It is thus easy to see that the total number of possible subsets is equal to
the total number of filling up of the bins (with a 1 or with a 0). This is equal
to 2n , the right-hand side of the above equality. Now, the various possible
subsets can also be counted as the the number of subsets with 0 elements,
the number of subsets with 1 elements and so on. This way of enumeration
leads to the expression on the left-hand side (note: the proof does not use
the Binomial Theorem).
Row Square Summation
[C(n, 0)]2 + [C(n, 1)]2 + · · ·+ [C(n, r)]2 + · · ·+ [C(n, n)]2 = C(2n, n)
Let S be a set with 2n elements. The right-hand side above counts the
number of n-combinations of n. Now, partition S into two subsets A and
B, each with n elements. Then, an n-combination of S is a union of an r-
combination of A and an (n−r)-combination of B, for r = 0, 1, . . . , n. For any
r, there are C(n, r)r-combinations of A and C(n, n− r)(n− r)-combinations
of B. Thus, by the Product Rule there are C(n, r)C(n, n−r)n-combinations
obtained by taking r elements from A and (n − r) elements from B. Since
C(n, n − r) = C(n, r). we have C(n, r)C(n, n − r) = [C(n, r)]2 for each r.
Then the Sum Rule gives the left-hand side.
Chapter 2 Combinatorics 71
2.10 The Binomial and the Multinomial The-
orems
Theorem 2.10.1:
Let n be a positive integer. Then all elements x and y belonging to a com-
mutative ring with unit element with the usual operations + and ·,
(x+ y)n = C(n, 0)xn +C(n, 1)xn−1y+C(n, 2)xn−2y2 + · · ·+C(n, r)xn−ryr +
· · ·+ C(n, n)yn
Note: For the definition of a commutative ring, see Chapter 5. For the
present, it is enough to think x and y as real numbers.
Proof. The inductive proof is well-known. Below, we present the combinato-
rial proof. We write the left-hand side explicitly as consisting of n factors:
(x+ y)(x+ y) · · · (x+ y)
We select an x or a y from each factor (x + y). This gives terms of the
form xn−ryr for each r = 0, 1, . . . , n. We collect all such terms with similar
exponents on x and y and sum them up. This sum is then the coefficient of
the term of the form xn−ryr in the expansion of (x+ y)n. For any given r, to
get the term xn−ryr we select r of the y’s from the n factors (x gets chosen
from the remaining n − r factors). This can be done in C(n, r) ways. This
then is the coefficient of xn−ryr as required by the theorem.
The binomial coefficients (of the type C(n, r)) appearing above occur in
the Pascal’s triangle. For a fixed n, we can obtain the ratio of the (k + 1)st
bionomial coefficient of order n to the kth : C(n, k+1)/C(n, k) = (n−k)/(k+
1).
Chapter 2 Combinatorics 72
This ratio is larger than 1 if k < (n−1)/2 and is less than 1 if k > (n−1)/2.
Therefore, we can infer that the biggest binomial coefficient must occur in the
“middle”. We use Stirling’s approximation to estimate how big the binomial
coefficients are:
C(n, n/2) =n!
(n/2)!=
(n/e)n√
(2nπ)
(n/2e)n/2√
(nπ)2= 2n
√(2/nπ)
Corollary 2.10.2:
Using the Binomial Theorem we can get expansions for (1+x)n and (1−x)n.
If we set x = 1 and y = −1 in the Binomial Theorem, we get,
C(n, 0)− C(n, 1) + C(n, 2)− · · ·+ (−1)n · C(n, n) = 0.
We can write this as:
C(n, 0) + C(n, 2) + C(n, 4) + · · · = C(n, 1) + C(n, 3) + C(n, 5) + · · · = S
Let S be the common sum. Then, by previous identity (see row summation),
adding the two series, we get 2S = 2n or S = 2n−1. The combinatorial
interpretation is easy. If S is a set with n elements, then the number of
subsets of S with an even number of elements is equal to number of subsets
of S with an odd number of elements and each of these is equal to 2n−1.
Example 2.10.3:
To show that: 1 ·C(n, 1) + 2 ·C(n, 2) + 3 ·C(n, 3) + · · ·+n ·C(n, n) = n2n−1,
for each positive integer n.
We use the Newton’s Identity, rC(n, r) = nC(n−1, r−1) to replace each
term on left-hand side; then the expression on the left reduces to
n [C(n− 1, 0) + · · ·+ C(n− 1, n− 1)] = n2n−1
Chapter 2 Combinatorics 73
giving the expression on the right.
The Multinomial Theorem: This concerns with the expansion of multi-
nomials of the form
(x1 + x2 + . . .+ xt)n.
Here, the role of the binomial coefficients gets replaced by the “multinomial
coefficients”
P (n; q1, q2, . . . , qt) =n!
q1!q2! · · · qt!where qi’s are non-negative integers and
∑qi = n. (recall that the multino-
mial coefficients enumerate the ordered partitions of a set of n elements of
the type (q1, q2, . . . , qt)).
Example 2.10.4:
By long multiplication we can get, (x1 + x2 + x3)3 = x3
1 + x32 + x3
3 + 3x21x2
+ 3x21x3 + 3x1x
22 + 3x1x
23 + 3x2x
23 + 3x2
2x3+ 6x1x2x3.
To get the coefficient of, say, x2x23 we choose x2 from one of the factors
and x3 from the remaining two. This can be done in C(3, 1) · C(2, 2) = 3
ways; therefore the required coefficient should be 3.
Example 2.10.5:
Find the coefficient of x41x
52x
63x
34 in (x1 + x2 + x3 + x4)
18.
The product will occur as often as, x1 can be chosen from 4 out of the
18 factors, x2 from 5 out of the remaining 14 factors, x3 from 6 out of the
remaining 9 factors and x4 from out of the last 3 factors. Therefore the
Chapter 2 Combinatorics 74
coefficient of x41x
52x
63x
34 must be
C(18, 4) · C(14, 5) · C(9, 6) · C(3, 3) = 18!/4!5!6!3!.
Generalization of the above yields the following theorem:
Theorem 2.10.6 (The Multinomial Theorem):
Let n be a positive integer. Then for all x1, x2, . . . ,xt we have,
(x1 + x2 + · · ·+ xt)n =
∑P (n; q1, q2, ..., qt)x
q11 x
q22 · · ·xqt
t
where the summation extends over all sets of non-negative integers q1, q2,
. . . , qt with q1 + q2 + · · ·+ qt = n.
To count the number of terms in the above expansion, we note that each
term of the form xq11 , xq2
2 , . . . , xqtt corresponds to a selection of n objects
with repetitions from t distinct types. There are C(n+t−1, n) ways of doing
this. This then is the number of terms in the above expansion.
Example 2.10.7:
In (x1 + x2 + x3 + x4 +x5)10, the coefficient of x2
1x3x34x
45 is
P (10; 2, 0, 1, 3, 4) = 10!/2!0!1!3!4! = 12, 600.
There are C(10+5−1, 10) = C(14, 10) = 1001 terms in the above multinomial
expansion.
Corollary 2.10.8:
In the multinomial theorem if we let x1 = x2 = · · · = xt = 1, then for any
positive interger t, we have tn =∑P (n; q1, q2, . . . , qt), where the summation
extends over all sets of non-negative integers q1, q2, . . . , qt with∑qi = n.
Chapter 2 Combinatorics 75
2.11 Principle of Inclusion-Exclusion
The Sum Rule stated earlier (see section 1.2 ) applies only to disjoint sets.
A generalization is the Inclusion- Exclusion principle which applies to non-
disjoints sets as well.
We first consider the case of two sets. If A and B are finite subsets of
some universe U , then
|A ∪B| = |A|+ |B| − |A ∩B|
By the Sum Rule
|A ∪B| = |A ∩B′|+ |A ∩B|+ |A′ ∩B| (2.3)
Also, we have
|A| = |A ∩B′|+ |A ∩B|
|B| = |A′ ∩B|+ |A ∩B|, and
|A|+ |B| = |A ∩B′|+ |A′ ∩B|+ 2|A ∩B| (2.4)
(2.5)
From (2.3) and (2.4), we get
|A ∪B| = |A|+ |B| − |A ∩B|.
Example 2.11.1:
From a group of ten professors, in how many ways can a committee of five
members can be formed so that at least one of professor A or professor B is
included?
Chapter 2 Combinatorics 76
Let A1 and A2 be the sets of committees that include professor A and
professor B respectively. Then the required number is |A1 ∪ A2| . Now,
|A1| = C(9, 4) = 126 = |A2| and |A1 ∩ A2| = C(8, 3) = 56.
Therefore it follows that |A1 ∪ A2| = 126 + 126− 56 = 196.
The Inclusion-Exclusion Principle for the case of three sets can be stated
as given below:
If A, B and C are three finite sets then
|A ∪B ∪ C| = |A|+ |B|+ |C| − |A ∩B| − |A ∩ C| − |B ∩ C|+ |A ∩B ∩ C|
The result can be obtained easily using Venn diagram.
We now state a more general form of the Inclusion-Exclusion Principle.
Theorem 2.11.2:
If A1, A2, . . . , an are finite subsets of a universal set then,
|A1 ∪ A2 ∪ · · · ∪ An| =∑|Ai| −
∑|Ai ∩ Aj|+
∑|Ai ∩ Aj ∩ Ak| − · · ·
+(−1)n−1|A1 ∩ A2 ∩ . . . ∩ An| (2.6)
– the second summation on the right-hand side is taken over all the 2-
combinations (i, j) of the integers 1, 2, ..., n; the third summation is taken
over all the 3-combinations (i, j, k) of the integers 1, 2, ..., n and so on.
Thus, for n = 4, there are 4 + C(4, 2) + C(4, 3) + 1 = 24 − 1 = 15 terms on
the right-hand side. In general there are,
C(n, 1) + C(n, 2) + C(n, 3) + · · ·+ C(n, n) = 2n − 1
terms on the right-hand side.
(Note: the term C(n, 0) is missing; so the −1 appears on right hand side)
Chapter 2 Combinatorics 77
Proof. The proof by induction is boring! Here we give the proof based on
combinatorial arguments.
We must show that every element of A1∪A2∪· · ·∪An is counted exactly
once in the right hand side of (2.6). Suppose that an element x ∈ A1 ∪
A2 ∪ · · · ∪ An is in exactly m (integer, ≥ 1) of the sets considered on the
right-hand side of (2.6); for definiteness, say
x ∈ A1, x ∈ A2, . . . , x ∈ Am and x /∈ Am+1, . . . , x /∈ An.
Then x will be counted in each of the terms |Ai|, for i = 1, . . . ,m; in other
words x will be counted C(m, 1) times in the “∑ |Ai| term” on right-hand
side of (2.6). Also, x will be counted C(m, 2) times in the “∑ |Ai ∩ Aj|
term” on right-hand side of (2.6) since there are C(m, 2) pairs of sets Ai, Aj
where x is in both Ai and Aj. Likewise x is counted C(m, 3) times in the
“∑ |Ai ∩ Aj ∩ Ak| term” since there are C(m, 3) 3-combinations of Ai , Aj
and Ak such that x ∈ Ai, x ∈ Aj and x ∈ Ak. Continuing in this manner, we
see that on the right hand side of (2.6) x is counted,
C(m, 1)− C(m, 2) + C(m, 3) + · · ·+ (−1)m−1C(m,m)
number of times. Now, we must show that, this last expression is 1. Ex-
panding (m− 1)m by the Binomial Theorem we get,
0 = C(m, 0)− C(m, 1) + C(m, 2) + · · ·+ (−1)m−1C(m,m).
Using the fact that C(m, 0) = 1 and transposing all other terms to the left-
hand side of the above equation, we get the required relation.
Chapter 2 Combinatorics 78
2.12 Euler’s φ-function
If n is a positive integer, by definition, φ(n) is the number of integers x such
that 1 ≤ x ≤ n and n and x are relatively prime (note: two positive integers
are relatively prime if their gcd is 1).
For example, φ(30) = 8 because the eight integers 1, 7, 11, 13, 17, 19, 23
and 29 are the only positive integers less than 30 and relatively prime to 30.
Let Ai be the subset of U consisting of those integers divisible by pi. The
integers in U relatively prime to n are those in none of the subsets A1, A2,
. . . , Ak. So,
φ(n) = |A′1 ∩ A′
2 ∩ · · · ∩ A′k| = |U | − |A1 ∪ A2 ∪ · · · ∪ Ak|
If d divides n, then there are n/d multiples of d in U . Hence,
|Ai| = n/pi, |Ai ∩ Aj| = n/pipj · · · , |A1 ∩ A2 ∩ · · · ∩ Ak| = n/p1p2 . . . pk.
Thus by Inclusion-Exclusion Principle,
φ(n) = n−∑
i
n/pi +∑
1≤i≤j≤k
n/pipj + · · ·+ (−1)k(n/p1p2 · · · pk).
This is equal to the product n(
1− 1p1
)(1− 1
p2
)· · ·(
1− 1pk
).
It turns out that computing φ(n) is as hard as factoring n. The following
is a beautiful identity involving the Euler φ-function due to Smith (1875):∣∣∣∣∣∣∣∣∣∣∣∣∣
(1, 1) (1, 2) . . . (1, n)
(2, 1) (2, 2) . . . (2, n)
. . . . . . . . . . . .
(n, 1) (n, 2) · · · (n, n)
∣∣∣∣∣∣∣∣∣∣∣∣∣
= φ(1)φ(2) · · ·φ(n)
where (a, b) denotes the gcd of the integers a and b.
Chapter 2 Combinatorics 79
2.13 Inclusion-Exclusion Principle and the Sieve
of Eratosthenes
The Greek mathematician Eratosthenes who lived in Alexandria in the 3rd
century B.C. devised the sieve technique to get all primes between 2 and n.
The method starts by writing down all the integers from 2 to n in the natural
order. Then starting with the smallest that is 2, every second number (these
are multiples of 2) is crossed out. Next starting with the smallest (uncrossed)
number, that is 3, every third number (these are multiples of 3) is crossed out.
This procedure is repeated until a stage is reached when no more numbers
could be crossed out. The surviving uncrossed numbers are the required
primes.
We now compute how many integers between 1 and 1000 are not divisible
by 2, 3, 5 and 7; that is how many integers remain after the first 4 steps of
the sieve method.
Let U be the set of integers x such that 1 ≤ x ≤ 1000. Let A1, A2, A3, A4
be the sets of elements (of U) divisible by 2, 3, 5 and 7 respectively. Then,
A1 ∪ A2 ∪ A3 ∪ A4 is the set of numbers in U that are divisible by at least
one of 2, 3, 5, and 7. Hence the required number is |(A1 ∪ A2 ∪ A3 ∪ A4)′| .
We know that,
|A1| = 1000/2 = 500; |A2| = 1000/3 = 333;
|A3| = 1000/5 = 200; |A4| = 1000/7 = 142;
|A1 ∩ A2| = 1000/6 = 166; |A1 ∩ A3| = 1000/10 = 100;
|A1 ∩ A4| = 1000/14 = 71; |A2 ∩ A3| = 1000/15 = 66;
|A2 ∩ A4| = 1000/21 = 47; |A3 ∩ A4| = 1000/35 = 28;
Chapter 2 Combinatorics 80
|A1 ∩ A2 ∩ A3| = 1000/30 = 33; |A1 ∩ A2 ∩ A4| = 1000/42 = 23;
|A1 ∩ A3 ∩ A4| = 1000/70 = 14; |A2 ∩ A3 ∩ A4|= 1000/106 = 9;
|A1 ∩ A2 ∩ A3 ∩ A4| = 1000/210 = 4
Then, |A1 ∪ A2 ∪ A3 ∪ A4| =
(500+333+200+142)−(166+100+71+66+47+28)+(33+23+14+9)−4 = 772.
Therefore, |(A1 ∪ A2 ∪ A3 ∪ A4)′| = 1000− 772 = 228
2.14 Derangements
Derangements are permutations of 1, . . . , n in which none of the n elements
appears in its ‘natural’ place. Thus, (i1, i2, . . . , in) is a derangement if i1 6= 1,
i2 6= 2, . . . , and in 6= n.
Let Dn denote the number of derangements of (1, 2, . . . , n). It follows that
D1 = 0; D2 = 1, because (2, 1) is a derangement; D3 = 2 because (2, 3, 1)
and (3, 1, 2) are the only derangements of (1, 2, 3). We will determine Dn
using the Inclusion-Exclusion principle.
Let U be the set of all n! permutations of 1, 2, . . . , n. For each i, let
Ai be the set of permutations such that element i is in its correct place; that
is, all permutations (b1, b2, . . . , bn) such that bi = i. Evidently, the set of
derangements is precisely the set A′1 ∩ A′
2 ∩ A′3 ∩ · · · ∩ A′
n; the number of
elements in it is Dn.
Now, the permutations in A1 are all of the form (1, b2, . . . , bn) where
(b2, . . . , bn) is a permutation of 2, . . . , n. Thus |A1| = (n− 1)!. Similarly it
follows thet |Ai| = (n− 1)!.
Chapter 2 Combinatorics 81
Likewise, A1 ∩A2 is the set of permutations of the form (1, 2, b3, . . . , bn),
so that |A1∩A2| = (n−2)!. We can similarly argue, |Ai∩Aj| = (n−2)! for all
admissible pairs of values of i, j, i 6= j. For any integer k, where 1 ≤ k ≤ n,
the permutations in A1 ∩ A2 ∩ . . . ∩ Ak are of the form
(1, 2, . . . , k, bk+1, . . . , bn)
where (bk+1, . . . , bn) is a permutations of (k + 1, . . . , n). Thus, |A1 ∩ A2 ∩
. . . ∩ Ak| = (n− k)! .
More generally, we have |Ai1∩Ai2∩ . . .∩Aik| = (n−k)! for i1, i2, . . . , ik,
a k-combination of 1, 2, . . . , n. Therefore,
|(A′1 ∩ A′
2∩A′3 ∩ · · · ∩ A′
n)| = |U | − |A1 ∪ A2 ∪ . . . ∪ An|
= n!− C(n, 1)(n− 1)! + C(n, 2)(n− 2)! + . . .+ (−1)nC(n, n)
= n!− n!
1!+n!
2!− n!
3!+ . . .+ (−1)nn!
n!.
Thus, Dn = n![1− 1
1!+ 1
2!− 1
3!+ . . .+ (−1)n 1
n!
].
We can get a quick approximation to Dn in terms of the exponential e.
We know that,
e−1 = 1− (1/1!) + (1/2!)− (1/3!) + ...
+ (−1)n(1/n!) + (−1)n+1(1/(n+ 1)!) + . . .
or e−1 = (Dn/n!) + (−1)n+1(1/(n+ 1)!) + (−1)n+2(1/(n+ 2)!) + . . .
or |e−1 − (Dn/n!)| ≤ (1/(n+ 1)!) + (1/(n+ 2)!) + . . .
≤ (1/(n+ 1)!)[1 + (1/(n+ 2)) + (1/(n+ 2)2) + . . .]
≤ (1/(n+ 1)!)[1/1− (1/(n+ 2)]
= (1/(n+ 1)!)[1 + 1/(n+ 1)]
Chapter 2 Combinatorics 82
As n → ∞, we know (n + 1)! → ∞ at a faster rate; thus for large values
of n, we regard n!/e as a good approximation to Dn. For example, D8 =
14833 and the value of 8!/e is about 14832.89906. A different approach
leads to a recurrence relation for Dn . In considering the derangements of
1, . . . , n, n ≥ 2 we can form two mutually exclusive and totally exhaustive
cases as follows:
(i) The integer 1 is displaced to the kth position (1 < k ≤ n) and k is
displaced to the 1st position: In this case, the total number of derange-
ments of is equal to the number of derangements of the set of n − 2
numbers 2, 3, . . . , k−1, k+1, ..., n. The required number is thus Dn−2.
(ii) The integer 1 is displaced to the kth position (1 < k ≤ n) but k is dis-
placed to a position different from the 1st position: These are precisely
the derangements of the set of n − 1 numbers k, 2, 3, . . . , k − 1, k +
1, ..., n. Clearly, any derangement will displace the integer k from the
1st position. The required number is thus Dn−1.
We note that in the above argument, k can be any one of 2, 3, ..., n;
that is, it can take (n− 1) possible values. Thus, we have,
Dn = (n− 1)(Dn−1 +Dn−2)
By simple algebraic manipulations, it can be shown that Dn−nDn−1 =
(−1)n, for n ≥ 2. This recurrence relation when solved with the initial
condition gives,
Dn = n!n−1∑
i=1
(−1)i+1
(i+ 1)!.
Chapter 2 Combinatorics 83
2.15 Partition Problems
Several interesting combinatorial problems arise in connection with the “par-
tioning of integers”. These are collectively referred to as partition problems .
We denote as p(n,m), the number of partitions of an integer n into m parts.
We write,
n = α1 + α2 + · · ·+ αm and specify α1 ≥ α2 ≥ · · · ≥ αm ≥ 1
For example the partitions of 2 are 2 and 1 + 1; so p(2, 1) = p(2, 2) = 1. The
partitions of 3 are 3, 2+2 and 1+1+1 and so, p(3, 1) = p(3, 2) = p(3, 3) = 1.
Similarly the partitions of 4 are 4, 3 + 1, 2 + 2, 2 + 1 + 1 and 1 + 1 + 1 + 1;
so p(4, 2) = 2 and p(4, 1) = p(4, 3) = p(4, 4) = 1.
2.15.1 Recurrence relations p(n,m)
The following recurrence relations can be used to compute p(n,m):
p(n, 1) + p(n, 2) + · · ·+ p(n, k) = p(n+ k, k) (2.7)
and p(n, 1) = p(n, n) = 1 (2.8)
Proof. The second formula is obvious by definition. We proceed to prove the
first formula.
Let A be the set of partitions of n having m parts, m ≤ k ; each partition
in A can be considered as a k-tuple. Let B be the set partitions of n+k into
k parts. Define a mapping φ : A→ B by setting,
φ(α1, α2, . . . , αm) = (α1 + 1, α2 + 1, . . . , αm + 1, 1, 1, . . . , 1)
Since α1 +α2 + · · ·+αm = n, the sequence on the right side above (equation
Chapter 2 Combinatorics 84
***) gives a partition of n + k into k parts. (Note that if m = k the 1’s in
the partion of n+ k will be absent.)
Clearly the mapping is bijective. Hence,
|A| = p(n, 1) + p(n, 2) + · · ·+ p(n, k) = |B| = p(n+ k, k)
The equations (2.7) and (2.8) allow us to compute p(n,m)’s recursively.
For example, if n = 4 and m = 6 the values of p(n,m)’s for n ≤ 4 and m ≤ 6
are given by the following array:
p(n,m) m = 1 2 3 4 5 6n = 1 1 0 0 0 0 0
2 1 1 0 0 0 03 1 1 1 0 0 04 1 2 1 1 0 05 1 2 2 1 1 06 1 3 3 2 1 1
2.16 Ferrer Diagrams
We next illustrate the idea of Ferrer diagram to represent a partition. Con-
sider a partition such as 5 + 3 + 2 + 2. We represent it diagrammatically as
shown in the Fig. 2.3 on the following page. The representation as seen in
the diagram has one row for each part; the number of squares in a row is
equal to the size of the part it represents; an upper row has at least as many
number of squares as there are in a lower row; the rows are aligned to the
left.
The partition obtained by rendering the columns (of a given partition)
as rows is known as the conjugate partition of a given partition. For exam-
ple, from the above Ferrer diagram it follows (by turning the figure by 90o
clockwise and by taking a mirror reflection) that the conjugate partition of
5 + 3 + 2 + 2 is 4 + 4 + 2 + 1 + 1.
Chapter 2 Combinatorics 85
Figure 2.3:
5 + 3 + 3 + 1 + 1→
x
x
x
x
x
x
x x
x
x
x
x x
→ 9 + 3 + 1
Figure 2.4:
2.16.1 Proposition
The number of partitions of n into k parts is equal to the number of partitions
of n into parts, of which the largest is k. It is easy to see that we can establish
a bijection between the set of partitions of n having k as the largest part.
This can be seen using the Ferrer diagram.
2.16.2 Proposition
The number of self-conjugate partitions of n is the same as the number of
partitions of n with all parts unequal and odd.
Consider the Ferrer diagram associated with a partition of n with all
parts unequal and odd. We can obtain a new Ferrer diagram by placing the
squares of each row in a “set-square arrangement” as shown in Fig. 2.4.
The new Ferrer diagram defines a self-conjugate partition. Similarly,
reversing the argument, each self-conjugate partition corresponds to a unique
partition with all parts unequal and odd.
Chapter 2 Combinatorics 86
2.16.3 Proposition
The number of partitions of n is equal to the number of partitions of 2n
which have exactly n parts.
Any partition of n can have at most n parts. Treat each partition as an
n-tuple, (α1, α2, . . . , αn) where, in general, for some i all αi+1 to αn will be
0. Now add 1 to each αj , 1 ≤ j ≤ n –this defines a partition of 2n (as we
are adding n 1’s) where there must be exactly n parts. It is easy to see that
we have a bijection from the set of the original n-tuples to the set of n-tuples
formed as above. Hence the result follows.
2.17 Solution of Recurrence Relations
Different types of recurrence relations (also known as difference equations)
arise in many enumeration problems and in the analysis of algorithms. We il-
lustrate many types of recurrence relations and the commonly adopted tech-
niques and tricks used to solve them. We also give the general technique
based on generating functions to solve recurrence relations. [20], [9] and [56]
provide a good material for solving recurrence relations.
Example 2.17.1:
The number Dn of derangements of the integers (1, 2, . . . , n) as we have seen
in section *** satisfies the recurrence relation, Dn − nDn−1 = (−1)n, for
n ≥ 2 with D1 = 0. The easiest way to solve this recurrence relation is to
rewrite it as,
Dn
n!− Dn−1
(n− 1)!=
(−1)n
n!
which is easy to solve.
Chapter 2 Combinatorics 87
Example 2.17.2:
The sorting problem asks for an algorithm that takes as input a list or an
array of n integers and that sorts them that is, arranges them in nondecreas-
ing (or nonincreasing) order. One algorithm, call it procedure Mergesort(n),
does this by splitting the given list of n integers into two sublists of ⌊n/2⌋ and
⌈n/2⌉ integers, applies the procedure on the sublists to sort them and merges
the sorted sublists (note the recursive formulation). If the “time taken” by
procedure Mergesort(n) in terms of the number of comparisons is denoted
by T (n) then T (n) is known to satisfy the following recurrence relation:
T (n) = T(⌊n
2
⌋)+ T
(⌈n2
⌉)+ n− 1, with T (2) = 1
The above method is characteristic of “divide-and-conquer” algorithms
where a main problem of size n is split it into b (b > 1) subproblems of size
say, n/c (c > 1); the subproblems are solved by further splitting; the splitting
stops when the size of the subproblems are small enough to be solved by a
simple technique. A divide-and-conquer algorithm combines the solutions to
the subproblems to yield a solution to the original problem. Thus there is
a non-recursive cost (denoted by the function f(.) ) associated in splitting
and/or combining (the solutions). Thus we generaly get a recurrence relation
of the type,
T (n) = bT (n/c) + f(n)
Example 2.17.3:
Consider the problem of finding the number of binary sequences of length n
which do not contain two consecutive 1’s.
Let wn be the number of such sequences. Let un be the number of such
Chapter 2 Combinatorics 88
sequences whose last digit is a 1. Also, let vn be the number of such sequences
whose last digit is a 0. Obviously, wn = un + vn.
Consider extending any sequence of the required type from length n− 1
to length n. We have the following two possibilities:
(i) If a sequence of length n−1 ends with a 1 then, we can append a 0 but
not a 1.
(ii) If a sequence of length n− 1 ends with a 0 then, we can append either
a 0 or a 1.
It is not difficult to reason that un and vn can be counted as given by the
following recurrence relations:
vn = vn−1 + un−1 and un = vn−1.
These lead to the equations,
vn = vn−1 + vn−2 and un = un−1 + un−2
which when added give the recurrence relation,
wn = wn−1 + wn−2.
This equation is the same as that for Fibonacci numbers, which can be solved
with the initial conditions w1 = 2 and w2 = 3.
Example 2.17.4:
A circular disk is divided into n sectors. There are p different colors (of
paints) to color the sectors so that no two adjacent sectors get the same
color. We are interested in the number of ways of coloring the sectors.
Chapter 2 Combinatorics 89
n
4
12
3
Let un be the number of ways to color the disk in the required manner.
This number clearly depends upon both n and p. We form a recurrence
relation in n using the following reasoning as given in [56]. We construct two
mutually exclusive and exhaustive cases:
(i) The sectors 1 and 3 are colored differently. In this case, removing sector
2 gives a disk of n− 1 sectors. Sector 2 can be colored in p− 2 ways.
(ii) The sectors 1 and 3 are of the same color. In this case, removing sector
2 gives a disk of n−2 sectors as sectors 1 and 3 being of the same color
can be fused as one. Sector 2 can be colored using any of the p−1 colors
(i.e., excluding the color of sector 1 or sector 3). For each coloring of
sector 2, we can color the disk of n− 2 sectors in un−2 ways. Thus, we
have the following recurrence relation:
un = (p− 2)un−1 + (p− 1)un−2
with the initial conditions, u2 = p(p− 1)andu3 = p(p− 1)(p− 2).
The solution can be obtained as, un = (p− 1)[(p− 1)n−1 + (−1)n].
Chapter 2 Combinatorics 90
2.18 Homogeneous Recurrences
The recurrence relations in the above examples can be solved using the tech-
niques described in this section. In general, we can consider equations of the
form,
an = f(an−1, an−2, . . . , an−i), where n ≥ i
Here we have a recurrence with “finite history” as an depends on a fixed
number of earlier values. If an depends on all the previous values then the
recurrence is said to have a “full history”. In this section, we here consider
equations of the form,
c0an + c1an−1 + ...+ ckan−k = 0. (2.9)
The above equation is linear as it does not contain terms like an−i · an−j,
a2n−i and so on; it is homogeneous as the linear combination of an−i is zero; it
is with constant coefficients because the ci’s are constants. For example, the
well-known Fibonacci Sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . . is defined by
the linear homogeneous recurrence, fn = fn−1 + fn−2 , when n ≥ 2 with the
initial conditions, f0 = 0 and f1 = 1.
It is easy to see that if fn and gn are solutions to (2.9), then so does a
linear combinations of fn and gn say pfn + qgn , where p and q are constants.
To solve (2.9), we try an = xn where x is an unknown constant. If an = xn
is used in (2.9) we should have,
c0xn + c1x
n−1 + . . .+ ckxn−k = 0.
The solution x = 0 is trivial; otherwise, we must have
p(x) ≡ coxk + c1x
k−1 + . . .+ ck = 0.
Chapter 2 Combinatorics 91
The equation p(x) = 0 is known as the characteristic equation which can be
solved for x.
Example 2.18.1:
Consider the Fibonacci Sequence as above. We can write the recurrence
relation as
fn − fn−1 − fn−2 = 0. (2.10)
The characteristic equation is x2 − x − 1 = 0. The roots r1, r2 of this
equation are given by, r1 = (1 +√
5)/2 and r2 = (1−√5)/2.
So, the general solution of (2.10) is, fn = c1rn1 + c2r
n2 .
Using the initial conditions f0 = 0 and f1 = 1 we get, c1 + c2 = 0 and
c1r1 + c2r2 = 1 which give, c1 = 1/√
5 and c2 = −1/√
5.
Thus the nth Fibonacci number fn−1 is given by,
fn−1 =1√5
(1 +√
5
2
)n−1
− 1√5
(1−√5
2
)n−1
It is interesting to observe that the solution of the above recurrence which
has integer coefficients and integer initial values, involves irrational numbers.
Example 2.18.2:
Consider the relation,
an − 6an−1 + 11an−2 − 6an−3 = 0, n > 2
with a0 = 1, a1 = 3 and a2 = 5.
From the given equation we directly write its chracteristic equation as,
x3 − 6x2 + 11x− 6 = 0,
that is, (x− 1)(x− 2)(x− 3) = 0.
Chapter 2 Combinatorics 92
Thus the roots of the characteristic equation are, 1, 2 and 3 and the solution
should be of the form, an = A1n +B2n + C3n.
Applying the initial conditions, we get,
a0 = 1 = A+B+C; a1 = 3 = A+2B+3C; and a2 = 5 = A+4B+9C
which yield, A = −2, B = 4 and C = −1.
Hence the solution is given by an = −2 + 4.2n − 3n.
We next deal with the case when the characteristic equation has multiple
roots. Let the characteristic polynomial p(x) be,
p(x) = coxk + c1x
k−1 + · · ·+ ck.
Let r be a multiple root occurring two times; that is, (x − r)2 is a factor of
p(x). We can write,
p(x) = (x− r)2q(x), where q(x) is of degree (k − 2).
For every n ≥ k consider the nth degree polynomials,
un(x) = c0xn + c1x
n−1 + . . .+ ckxn−k
and vn(x) = c0nxn + c1(n− 1)xn−1 + · · ·+ ck(n− k)xn−k.
We note that, vn(x) = xu′n(x).
Now, un(x) = xn−kp(x) = xn−k(x− r)2q(x) = (x− r)2[xn−kq(x)]
So, u′n(r) = 0, which implies that vn(r) = ru′n(r) = 0, for all n ≥ k.
That is, c0nrn + c1(n− 1)rn−1 + · · ·+ ck(n− k)rn−k = 0.
From (2.9), we conclude an = nrn is also a solution to the given recur-
rence.
More generally, if root r has multiplicity m, then an = rn, an = nrn,
an = n2rn, . . . , an = nm−1rn all are distinct solutions to the recurrence.
Chapter 2 Combinatorics 93
Example 2.18.3:
Consider the recurrence, an − 11an−1 + 39an−2 − 45an−3 = 0, when n > 3
with a0 = 0, a1 = 1 and a2 = 2.
We can write the characteristic equation as
x3 − 11x2 + 39x− 45 = 0,
the roots of which are 3, 3 and 5. Hence, the general solution can be written
as
an = (A+Bn)3n + C5n.
Using the initial conditions, we get a0 = A+C = 0; a1 = 3(A+B)+5C = 1;
and a2 = 9(A+ 2B) + 25C = 2. This gives, A = 1, B = 1 and C = −1.
Hence the required solution is given by,
an = (1 + n)3n + 5n.
2.19 Inhomogeneous Equations
Inhomogeneous equations are more difficult to handle and linear combina-
tions of the different solutions may not be a solution. We begin with a simple
case.
c0an + c1an−1 + · · ·+ ckan−k = bnp(n).
Here the left-hand side is same as before; on the right-hand side, b is a
constant and p(n) is a polynomial in n of degree d. The following example
is illustrative.
Example 2.19.1:
Chapter 2 Combinatorics 94
Consider the recurrence relation,
an − 3an−1 = 5n (2.11)
Multiplying by 5, we get 5an − 15an−1 = 5n+1
Replacing n by n− 1, we get 5an−1 − 15an−2 = 5n (2.12)
Subtracting (2.12) from (2.11), an − 8an−1 + 15an−2 = 0 (2.13)
The corresponding characteristic polynomial is x2−8x+15 = (x−3)(x−5)
and so we can write the general solution as,
an = A3n +B5n (2.14)
We can observe that 3n and 5n are not solutions to (2.11)! The reason is
that (2.11) implies (2.13), but (2.13) does not imply (2.11) and hence they
are not equivalent.
From the original equation we can write a1 = 3a0 + 5, where a0 is the initial
condition.
From (2.14) we get, A+B = a0 and 3A+ 5B = a1 = 3a0 + 5.
Therefore, we should have, A = a0 − 5/2 and B = 5/2.
Hence, an = [(2a0 − 5)3n + 5n+1]/2.
2.20 Repertoire Method
The repertoire method (illustrated in ‘[60]) is one technique that works by
“trial and error” and may be useful in certain cases. Consider for example,
an − nan−1 + (n− 2)an−2 = (2− n),
with the initial condition a1 = 0 and a2 = 1. We “relax” the above recurrence
by writing a general function, say f(n) on the right-hand side of the above
Chapter 2 Combinatorics 95
equation. Thus, we consider the equation,
an − nan−1 + (n− 2)an−2 = f(n).
We now try various possible candidate solutions for an, evaluate the left-
hand side above and check to see if it yields the required value for f(n) – the
required f(n) should be (2− n). We tabulate the work as follows:
Row No. Suggested an Resulting f(n) Resulting initial conditions1 1 0 a1 = 1 and a2 = 12 -1 0 a1 = −1 and a2 = −13 n (2-n) a1 = 1 and a2 = 2
We note that row 1 and row 2 are not linearly independent. We also note
that in row 3, we have an = n which gives the correct f(n) but the initial
conditions are not correct. We can subtract row 1 from row 3 which gives
an = (n − 1) with the correct initial conditions. Thus, an = (n − 1) is the
required solution.
Next we consider the following example.
an − (n− 1)an−1 + nan−2 = (n− 1) forn > 1; a0 = a1 = 1.
We generalize this as,
an − (n− 1)an−1 + nan−2 = f(n) forn > 1; a0 = a1 = 1.
As before, we try various possibilities for an and look for the resulting f(n)
to get a “repertoire” of recurrences. We summarize the work in the following
table:
Row No. Suggested an Resulting f(n) Resulting initial conditions1 1 2 a0 = a1 = 12 n n− 1 a0 = 0 and a1 = 13 n2 n+ 1 a0 = 0 and a1 = 1
Chapter 2 Combinatorics 96
From row 1 and row 3, by subtraction we find that an = (n2 − 1) is a
solution because it results in f(n) = (n − 1); however it gives the initial
conditions a0 = −1 and a1 = 0. This solution and the solution in the second
row are linearly independent. We combine these to give an = (n − n2 + 1)
which gives the right initial conditions.
2.21 Perturbation Method
The Perturbation Method is another technique to approximate the solution
to a recurrence. The method is nicely illustrated in the following example
given in ‘[60]. Consider the following recurrence:
a0 = 1; a1 = 2 and an+1 = 2an + an−1/n2, n > 1 (2.15)
In the perturbation step, we note that the last term contains the “1 / n2”
factor; we therefore reason that it will bring a small contribution to the
recurrence; hence, approximately,
an+1 = 2an
This yields an = 2n which is an approximate solution. To correct the error
involved, we consider the exact recurrence,
bn+1 = 2bn, n > 0; with b0 = 1.
Then, bn = 2n which is exact. We now compare the two quantities an and
bn. Let,
Pn = an/bn = an/2n which gives an = 2nPn.
Using the last relation in (2.15) we get,
2n+1Pn+1 = 2× 2nPn + 2n−1Pn−1/n2 when n > 1.
Chapter 2 Combinatorics 97
Therefore, Pn+1 = Pn + (1/4n2)Pn−1, n > 0;P0 = 1
Clearly the Pn’s are increasing. We see that,
Pn+1 ≤ Pn(1 + (1/4n2)), n ≥ 1,
so that,
Pn+1 ≤n∏
k=1
[1 + (1/4k2)].
The infinite product α0 corresponding to the right hand side above converges
monotonically to
α0 =∞∏
k=1
(1 + (1/4k2)) = 1.46505 . . .
Thus Pn is bounded above by α0 and as it is increasing, it must converge
to a constant. We have thus proved that, an ∼ α · 2n, for some constant
α < 1.46505.
2.22 Solving Recurrences using Generating
Functions
We next consider a general technique to solve linear recurrence relations. This
is based on the idea of generating functions. We begin with an introduction to
generating functions. An infinite sequence a0, a1, a2, . . . can be represented
as a power series in an auxiliary variable z. We write,
A(z) = a0 + a1z + a2z2 + . . . =
∑
k≥0
akzk . . . (2.16)
When it exists, A(z) is called the generating function of the sequence
a0, a1, a2, . . ..
It may be noted that the theory of infinite series says that:
Chapter 2 Combinatorics 98
(i) If the series converges for a particular value of z0 of z then it converges
for all values of z such that |z| < |z0|.
(ii) The series converges for some z 6= 0 if and only if the sequence |an|n/2
is bounded. (If this condition is not satisfied, it may be that the se-
quence an/n! is convergent.)
In practice, the convergence of the series may simply be assumed. When
a solution is discovered by any means, however sloppy, it may be justified
independently say, by using induction.
We use the notation [zn]A(z) to denote the coefficient of zn in A(z); that
is, an = [zn]A(z). The following are easy to check:
1. If the given infinite sequence is 1, 1, 1, . . . then it follows that the corre-
sponding generating function A(z) is given by,
A(z) = 1 + z + z2 + z3 + . . . = 1/(1− z).
2. If the given infinite sequence is 1, 1/1!, 1/2!, 1/3!, 1/4!, . . . then
A(z) = 1 + z/1! + z2/2! + z3/3! + z4/4! + . . . = ez.
3. If the given infinite sequence is 0, 1, 1/2, 1/3, 1/4, . . . then,
A(z) = z + z2/2 + z3/3 + z4/4 + . . . = ln(1/(1− z)).
2.22.1 Convolution
LetA(z) andB(z) be the generating functions of the sequences a0, a1, a2, . . .
and b0, b1, b2, . . . respectively. The product A(z)B(z) is the series,
(a0 + a1z + a2z2 + . . .)× (b0 + b1z + b2z
2 + . . .)
Chapter 2 Combinatorics 99
= a0b0 + (a0b1 + a1b0)z + (a0b2 + a1b1 + a2b0)z2 + . . .
It is easily seen that, [zn]A(z)B(z) =∑n
k=0 akbn−k. Therefore, if we wish to
evaluate any sum that has the general form
cn =n∑
k=0
akbn−k (2.17)
and if the generating functions A(z) and B(z) are known then we have cn =
[zn]A(z)B(z).
The sequence cn is called the convolution of the sequences an and
bn. In short, we say that the convolution of two sequences corresponds to
the product of the respective generating functions. We illustrate this with an
example.
From the Binomial Theorem, we know that, (1 + z)r is the generating
function of the sequence C(r, 0), C(r, 1), C(r, 2), . . .. Thus we have,
(1 + z)r =∑
k≥0
C(r, k)zk and (1 + z)s =∑
k≥0
C(s, k)zk
By multiplication, we get (1 + z)r(1 + z)s = (1 + z)r+s.
Equating the coefficients of zn on both the sides we get,
∑
k=0
C(r, k)C(s, n− k) = C(r + s, n)
which is the well-known Vandermonde convolution.
When bk = 1 (for all k = 0, 1, 2, . . . ) , then from (2.12) we get, cn =∑
k=0 ak.
Thus, convolution of a given sequence with the sequence 1, 1, 1, . . . gives
“a sequence of sums”. The following examples are illustrative of this fact.
Chapter 2 Combinatorics 100
Example 2.22.1:
By taking the convolution of the sequence 1, 1, 1, . . . (whose generating
function is 1/(1− z) ) with itself we can immediately deduce that 1/(1 - z)2
is the generating function of the sequence 1, 2, 3, 4, 5, . . ..
Example 2.22.2:
We can easily see that 1/(1 + z) is the generating function for the sequence
1,−1, 1,−1, . . .. Therefore 1/(1 + z)(1 − z) or 1/(1 − z2) is the generat-
ing function of the sequence 1, 0, 1, 0, . . . which is the convolution of the
sequences 1, 1, 1, . . . and 1,−1, 1,−1, . . ..
2.23 Some simple manipulations
In the following, we assume F (z) and G(z) to be the respective generating
functions of the infinite sequences fn and gn.
1. For some constants u and v we have,
uF (z) + vG(z) = u∑
n≥0
fnzn + v
∑
n≥0
gnzn =
∑
n≥0
(ufn + vgn)zn,
which is therefore the generating function of the sequence ufn + vgn .
2. Replacing z by cz where c is a constant, we get
G(cz) =∑
n≥0
gn(cz)n =∑
n≥0
cngnzn
which is the generating function for the sequence cngn. Thus 1/(1− cz)
is the generating function of the sequence 1, c, c2, c3, . . ..
Chapter 2 Combinatorics 101
3. Given G(z) in a closed form we can getG′(z). Term by term differentiation
(when possible) of the infinite sum of G(z) yields,
G′(z) = g1 + 2g2z + 3g3z2 + 4g4z
3 + . . .
ThusG′(z) represents the infinite sequence g1, 2g2, 3g3, 4g4, . . . i.e., (n+
1)gn+1. Thus, with a shift, we have “brought down a factor of n” into the
terms of the original sequence gn. Equivalently, zG′(z) is the generating
function for ngn.
4. By term by term integration (when possible) of the infinite sum of G(z),
we have
∫ z
0
G(t)dt = g0z + (1/2)g1z2 + (1/3)g2z
3 + (1/4)g3z4 + . . . =
∑
n≥1
(1/n)gn−1zn
Thus by integrating G(z) we get the generating function for the sequence
gn−1/n.
Example 2.23.1:
It is required to find the generating function of the sequence 12, 22, 32, . . ..
We have seen above that 1/(1 − x)2 is the generating function of the
sequence 1, 2, 3, . . ..
By differentiation, we can see that 2/(1 − x)3 is the generating function
of the sequence 2.1, 3.2, 4.3, . . .. In this sequence, the term with index k is
(k+2)(k+1) which can be written as (k+1)2 +k+1. We want the sequence
ak where ak = (k + 1)2. By subtracting the generating function for the
sequence 1, 2, 3, . . . from that for the sequence 2.1, 3.2, 4.3, . . . , we get
the required answer as [2/(1− x)3]− [1/(1− x)2].
Chapter 2 Combinatorics 102
2.23.1 Solution of recurrence relations
Consider the following simultaneous recurrence relations,
an + 2an−1 + 4bn−1 = 0 (2.18)
and bn − 4an−1 − 6bn−1 = 0 (2.19)
with a0 = 1 and b0 = 0. We multiply both the sides of equation (2.18) by zn
and take the sum over n ≥ 1. This gives,
∑
n≥1
anzn + 2z
∑
n≥1
an−1zn−1 + 4z
∑
n≥1
bn−1zn−1 = 0.
or,∑
n≥1
anzn + 2z
∑
n≥1
anzn + 4z
∑
n≥1
bnzn = 0.
The last equation can be written as,
G(z)− 1 + 2zG(z) + 4zH(z) = 0 (2.20)
where the initial condition a0 = 1 has been used and it is assumed that G(z)
and H(z) are the generating functions of the sequences a0, a1, a2, . . . and
b0, b1, b2, . . . respectively. In a similar manner, from (2.19) we can obtain,
H(z)− 4zG(z)− 6zH(z) = 0 (2.21)
Equations (2.20) and (2.21) can be solved to yield,
G(z) = (1− 6z)/(1− 2z)2 and H(z) = 4z/(1− 2z)2.
From these closed forms it is easy to obtain,
[zn]G(z) = an = 2n(1− 2n) and [zn]H(z) = bn = n2n+1
Next we consider the following recurrence relation.
an+2 − 3an+1 + 2an = n, with a0 = a1 = 1.
Chapter 2 Combinatorics 103
We multiply both sides of the above equation by zn+2 and sum over all n
obtaining,
∑
n≥0
an+2zn+2 − 3z
∑
n≥0
an+1zn+1 + 2z2
∑
n≥0
anzn =
∑nzn+2.
If we define G(z) =∑
n≥0 anzn then the above equation can then be written
as,
(G(z)− z − 1)− 3z(G(z)− 1) + 2z2G(z) = z3/(1− z)2.
Note that in obtaining the above we have used the initial conditions a0 =
a1 = 1 and we have used the fact that the infinite sequence 0, 0, 0, 1, 2, 3, . . .
has the generating function z3/(1− z)2. From the above, we get G(z) as,
G(z) = z3/(1− z)2(1− 3z + 2z2)+ (1− 2z)/(1− 3z + 2z2).
To get [zn]G(z) , we first express the right-hand side of G(z) above in terms
of partial fractions. We get,
G(z) = 1
(1− 2z)
+ 1
(1− z)2
− 1
(1− z)3
.
Using infinite series expansions of the terms on the right, it is easy to check
that,
[zn]G(z) = an = 2n − (n2 + n)/2.
We note that we can describe a sequence by a recurrence relation and then
try to use the recurrence relation (as in the above examples) to obtain an
equation in the associated generating function. The following exercise illus-
trates that, in general, we may end up in a differential equation involving
the generating function.
Chapter 2 Combinatorics 104
Example 2.23.2:
Consider the following sequence:
a0 = 1, a1 = 1/2, a2 = (1 · 3)/(2 · 4),
a3 = (1 · 3 · 5)/(2 · 4 · 6), a4 = (1 · 3 · 5 · 7)/(2 · 4 · 6 · 7)
Obtain the generating function for the sequence an.
2.23.2 Some common tricks
We now illustrate some common manipulations in dealing with certain re-
currences.
1. Consider the recurrence, an = nan−1 + n(n − 1)an−2, where n > 1 with
a1 = 1 and a0 = 0. Dividing both the sides by n! gives the Fibonacci
relation in an/n! which can be readily solved with the given initial condi-
tions.
2. Consider the non-linear recurrence, an =√
(an−1an−2), n > 1 with a1 = 2
and a0 = 1. We take the logarithm on both the sides and set bn = log an.
This gives,
bn = (bn−1 + bn−2)/2, n > 1 with b1 = 1 and b0 = 0
which is a linear recurrence with constant coefficients.
3. A slightly more tricky case is the following recurrence:
an = 1/(1 + an−1), n > 0 with a0 = 1
The first few iterations give a0 = 1, a1 = 1/2, a2 = 2/3, a3 = 3/5, a4 = 5/8
etc. We can easily recognize the Fibonacci numbers in these ratios. This
Chapter 2 Combinatorics 105
suggests the substitution an = bn−1/bn which yields the equation,
bn−1/bn = 1/(1 + bn−2/bn−1), n > 0 with b1 = 1 and b0 = 1
which reduces to, bn = bn−1 + bn−2 which can be easily solved.
2.24 Illustrative Problems
We end this chapter with two illustrative problems.
Problem 2.24.1 (Counting binary trees):
We consider this problem as given in [49].
We recursively define a binary tree as: a binary tree is empty (having no
vertex) or it consists of a distinguished vertex, called the root, together with
an ordered pair of binary trees called the left subtree and the right subtree.
It can be noted that this definition admits a binary tree to have only a left
subtree or only a right subtree. The problem is to count the number bn of
binary trees with exactly n vertices. By exhaustive listing we find b0 = 1,
b1 = 1, b2 = 2 and b3 = 5. Let B(z) be the generating function for the
sequence b0, b1, b2, b3, . . ..
By definition, B(z) = b0 + b1z + b2z2 + b3z
3 + . . ..
For n ≥ 1 the number of binary trees with n vertices can be enumerated
as the number of ordered pairs of the form (B1, B2) where B1 and B2 are
binary trees that together have exactly (n − 1) vertices i.e., if B1 has k
vertices then B2 will have (n − k − 1) vertices where k can take the values
0, 1, 2, 3, . . . , (n− 1). Therefore the number bn of such ordered pairs is given
Chapter 2 Combinatorics 106
by,
bn = b0bn−1 + b1bn−2 + · · ·+ bn−1b0 where n ≥ 1.
The right-side above can be recognized to be the coefficient of zn−1 in the
product B(z)B(z) = B(z)2. Hence bn should be the coefficient of zn in
zB(z)2. We note that zB(z)2 is the generating function for the sequence
b1, b2, b3, . . . which is the same as the sequence generated by B(z) except
that B(z) also generates b0 = 1. We thus have,
B(z) = 1 + zB(z)2
If z is such a real number so that the power series B(z) converges then B(z)
will also be a real number; then the above quadratic can be solved to give,
B(z) =1 +
√(1− 4z)
2zor
1−√
(1− 4z)
2z
yielding two possibilities. We thus have,
zB(z) =1 +
√(1− 4z)
2or
1−√
(1− 4z)
2
Differentiating the right-side of the above equation with respect to z and
letting z → 0, we see that the first expression tends to −1 whereas the
second expression tends to 1. Now,
zB(z) = b0z + b1z2 + b2z
3 + · · · ,
and its derivative with respect to z is b0 + 2b1z+ 3b2z2 + · · · , which tends to
1 as z → 0. Therefore we must have,
zB(z) =1− (1− 4z)1/2
2or B(z) =
1− (1− 4z)1/2
2z
Chapter 2 Combinatorics 107
By expanding (1− 2z)1/2 in infinite series we can get the expansion for B(z)
and get,
bn = [zn]B(z) =1
n+ 1
(2n
n
)
The numbers bn are known as Catalan numbers.
Problem 2.24.2:
Average-case analysis of a simple algorithm to find the maximum in a list.
This problem is to do an average-case analysis of a trivial algorithm.
Let X[1 . . . n] be an array of n distinct positive real numbers. The follow-
ing pseudo-code FindMax returns in the variable “max”, the largest element
in the array X[1 . . . n]: max:= -1;
for i := 1 to n do
if max < X[i] then max := X[i];The problem is to find the average-case time complexity of the above code-
segment. This essentially involves finding the average number of times the
assignment “max := X[i]” is executed . FindMax can be written as the
following assembly-language like code using a reduced set of pseudo-code
statements: max := −1;
i := 0;
1: i := i + 1;
if i > n then goto 2;
if max ≥ X[i] then goto 1;
Chapter 2 Combinatorics 108
max := X[i];
goto 1;
2: - - -Thus on a sequential computer, a compiled form of FindMax will execute:
(i) a fixed number of assignments to initialize the variables max and i
(ii) (n+1) comparisons of the form “ i > n? ”
(iii) (n+1) increments of the index i
(iv) n comparisons of the form “max ≥ X[i]?”
(v) a variable number (between 1 and n ) of assignments “max := X[i]”.
We can thus conclude that the time, tmaxfn of FindMax has to be of the
form:
tmaxfn = c0 + c1n + c2 EXCH[X]
where EXCH[X] is the number of times the instruction “max := X[i]” is
executed ( i.e., the number of “exchanges” that has taken place) and c0, c1
and c2 are the implementation constants dependent on the machine where
the code runs. We note that EXCH[X] is 1 if X[1] is the largest element. Also,
EXCH[X] takes the maximum value n when the array X is already sorted in
the increasing order.
To get an estimate of the expected value of EXCH[X] we introduce the
“permutation model”. In this model we will assume that the array X is a
permutation of the integers (1, . . . , n). Then each permutation can occur (be
an input to FindMax) with equal probability 1/n!. Let sn,k be the num-
ber of those permutations wherein EXCH[X] is k. Then, if pn,k denotes the
Chapter 2 Combinatorics 109
probability that EXCH[X] = k we have,
pn,k =sn,k
n!
The expected value, exch[X] of EXCH[X] is given by,
exch[X] =∑
k=1
ksn,k
n!=
1
n!
∑
k=1
ksn,k. (2.22)
To get the sum on the right-side above, we consider all those permutations
σ1σ2σ3 . . . σn of (1, . . . , n) wherein the value EXCH[X] is exactly k (by defini-
tion, there are exactly sn,k of these). With respect to these types of permu-
tations, we reason that the following two cases can occur:
(a) the last element σn is equal to n: in this case σ1σ2 . . . σn−1 should have
produced exactly k − 1 exchanges because the last element being the
largest will surely force one more exchange. Thus the number of permu-
tations in this case is sn−1,k−1.
(b) the last element σn is not equal to n: in this case σn is one of 1, 2, 3, . . . , (n−
1). Then σ1σ2 . . . σn−1 should have produced exactly k exchanges because
the element (being less than the maximum) will not be able to force an
exchange. In this case the number of permutations is (n− 1)sn−1,k.
Thus we have,
sn,k = sn−1,k−1 + (n− 1)sn−1,k (2.23)
We introduce the following generating function Sn(x) for each n:
Sn(x) =n∑
k=1
sn,kxk. (2.24)
Multiplying both the sides of (2.23) by xk and summing over k from 1 through
n and using the definition (2.12) we get,
Sn(x) = xSn−1(x) + (n− 1)Sn−1(x) = (x+ n− 1)Sn−1(x) (2.25)
Chapter 2 Combinatorics 110
From the definition (2.24) we find S1(x) = x. Then from (2.25) we get
S2(x) = x(x + 1) etc. In general we find that the explicit form of Sn(x) is
given by
Sn(x) =n−1∏
j=0
(x+ j) (2.26)
From (2.24) we get,
S ′n(x) =
n∑
k=1
ksn,kxk−1 which gives S ′
n(1) =n∑
k=1
ksn,k (2.27)
Also from (2.26) we get,
Sn(1) =n−1∏
j=0
(x+ j) = n! (2.28)
From (2.23) we have,
exch[X] =1
n!
n∑
k=1
ksn,k =S ′
n(1)
Sn(1)(2.29)
The second equality above uses (2.27) and (2.29). The derivative of (2.26)
after taking natural logarithm on both the sides gives,
S ′n(x)
Sn(x)=
1
x+
1
(x+ 1)+ · · ·+ 1
(x+ n− 1)
Substituting x = 1, this yields, S ′n(1)/Sn(1) = Hn , the nth Harmonic number
which is the value of exch[X]. Hence the average time FindMax takes under
the permutation model is
tmaxfAV Gn = c0 + c1n+ c2Hn.
Exercises
1. Let A = a1, a2, a3, a4, a5 be a set of five integers. Show that for any
permutation aπ1, aπ2, aπ3, aπ4, aπ5 of A, the product,
(a1 − aπ1)(a2 − aπ2) · · · (a5 − aπ5)
Chapter 2 Combinatorics 111
is always divisible by 2.
2. This problem concerns one instance of what is known as Langford’s
Problem. A 27-digit sequence includes the digits 1 through 9, three
times each. There is just one digit between the first two 1’s and between
the last two 1’s. There are just two digits between the first two 2’s and
between the last two 2’s and so on. The problem asks to find all such
sequences. The naıve method is:
Step 1. Generate the first/next sequence of 27-digits using the digits
1 through 9, each digit occuring exactly three times.
Step 2. Check if the current sequence satisfies the given constraint.
Step 3. Output the current sequence as a solution if it satisfies the
constraint.
Step 4. If all sequences have been generated then stop; else go to Step
1.
How many sequences are examined in the above method?
Bonus: How is it possible to do better?
3. How many ways are there to choose three or more people from a set of
eleven people?
4. Three boys and four girls are to sit on a bench. The boys must sit
together and the girls must sit together. In how many ways can this
be done?
5. An urn contains 5 red marbles, 2 blue marbles and 5 green marbles.
Assume that marbles of the same color are indistinguishable. How
Chapter 2 Combinatorics 112
many different sequences of marbles of length four can be chosen?
6. Let X =⊆ 1, 2, 3, 4, . . . , (2n − 1) and let |X| ≥ (n + 1). Prove the
following result (due to P. Erdos):
“There are two numbers a, b ∈ X, with a < b such that a divides b”.
In the above problem, if we prescribe |X| = n, will the above result be
still true?
7. Find the approximate value (correct to three decimal places) of (1.02)10.
(Hint: Use the Binomial Theorem.)
8. Without using the general formula for φ(n) above, reason that when
n = pα (α is a natural number) is a prime power, φ(n) can be expressed
as pα(1− 1/p).
9. If m and n are relatively prime, then argue that φ(mn) = φ(m)φ(n).
10. For an arbitrary natural number n prove that,∑
d|n φ(d) = n (the sum
is over all natural numbers dividing n).
11. If p is a prime and n is a natural number, then argue that, φ(pn) =
pφ(n) if p divides n and φ(pn) = (p− 1) · φ(n), otherwise.
12. Argue that the number of partitions of a number n into exactly m
terms is equal to the number of partitions of n−m into no more than
m terms.
13. Find the number of ways in which eight rooks may be placed on a
conventional 8× 8 chessboard so that no rook can attack another and
the white diagonal is free of rooks.
Chapter 2 Combinatorics 113
14. What is the number An of ways of going up a staircase with n steps if
we are allowed to take one or two steps at a time?
15. Consider the evaluation of n × n determinant by the usual method of
expansion by cofactors. If fn is the number of multiplications required
then argue that fn satisfies the recurreence
fn = n(fn−1 + 1), n > 1, f1 = 0
Solve the above recurrence and hence show that fn ≤ en!, for all n > 1.
16. Without using induction prove that
n∑
i=1
i(i+ 1)(i+ 2) =1
4n(n+ 1)(n+ 2)(n+ 3)
17. This problem is due to H. Larson (1977). Consider the infinite sequence
an with a1 = 1, a5 = 5, a12 = 144 and an + an+3 = 2an+2.
Prove that an is the nth Fibonacci number.
18. Solve the recurrence relation
an = 7an−1 − 13an−2 − 3an−3 + 18an−4
where, a0 = 5, a1 = 3, a2 = 6, a3 = −21.
[Solution: Characteristic equation is (x + 1)(x − 2)(x − 3)2, an =
2(−1)n + 2n + 2 · 3n − n3n, n ≥ 0.]
19. Consider the following recurrence:
an − 6an−1 − 7an−2 = 0, n ≥ 5
with a3 = 344, a4 = 2400. Show that an = 7n + (−1)n+1, n ≥ 3.
Chapter 2 Combinatorics 114
20. Consider the recurrence relation
an = 6an−1 − 9an−2,
with a0 = 2 and a1 = 3. Show that the associated generating function
g(x) is given by
g(x) =2− 9x
1− 6x+ 9x2.
Hence obtain an.
Solution: an = (2− n) · 3n, n ≥ 0.
21. Let N be the number of strings of length n made up of the letters x, y
and z, where z occurs an even number of times.
Show that N = 12(3n + 1).
22. How many sequences of length n can be composed from a, b, c, d in
such a way that a and b are never neighboring elements?
(Hint: Let xn = number of such sequences that start from a or b;
let yn = number of such sequences that start from c or d; form two
recurrences involving xn and yn.)
Chapter 3
Basics of Number Theory
3.1 Introduction
In this chapter, we present some basics of number theory. These include
divisibility, primes, congruences, some number-theoretic functions and the
Euclidean algorithm for finding the gcd of two numbers. We also explain
the big O notation and polynomial–time algorithms. We show, as examples,
that the Euclidean algorithm and the modular exponentiation algorithm are
polynomial–time algorithms. We denote by Z, the set of integers and N, the
set of positive numbers.
3.2 Divisibility
Definition 3.2.1:
Let a and b be any two integers and a 6= 0. Then b is divisible by a (equiva-
lently, a is a divisor of b) if there exists an integer c such that b = ac.
115
Chapter 3 Basics of Number Theory 116
If a divides b, it is denoted by a | b. If a does not divide b, we denote it
by a ∤ b.
Theorem 3.2.2: (i) a | b implies that a | bc for any integer c.
(ii) If a | b and b | c, then a | c.
(iii) If a divides b1, b2, . . . , bn, then a divides b1x1 + · · · + bnxn for integers
x1, . . . , xn.
(iv) a | b and b | a imply that a = ±b.
The proofs of these results are trivial and are left as exercises. (For
instance, to prove (iii), we note that if a divides b1, . . . , bn, there exist integers
c1, c2, . . . , cn such that b1 = ac1, b2 = ac2, . . . , bn = acn. Hence b1x1 + · · ·+
bnxn = a(c1x1 + · · ·+ cnxn), and so a divides b1x1 + · · · bnxn).
Theorem 3.2.3 (The Division Algorithm):
Given any integers a and b with a 6= 0, there exist unique integers q and r
such that
b = qa+ r, 0 ≤ r < |a|
Proof. Consider the arithmetic progression
. . . b− 3|a|, b− 2|a|, b− |a|, b, b+ |a|, b+ 2|a|, b+ 3|a|, . . .
with common difference |a| and extended infinitely in both the directions.
Certainly, this sequence contains a least non-negative integer r. Let this
term be b+ q|a|, q ∈ Z. Thus
b+ q|a| = r (3.1)
Chapter 3 Basics of Number Theory 117
and, its previous term, namely, r − |a| 6= 0 so that r < |a|. If a > 0, then
(3.1) gives
b = −qa+ r, 0 ≤ r < |a| = a,
while if a < 0, |a| = −a so that b = qa + r, 0 ≤ r < |a|. It is clear that the
numbers q and r are unique.
Theorem 3.2.3 gives the division algorithm, that is, the process by means
of which division of one integer by a nonzero integer is made. An algorithm is
a step by step procedure to solve a given mathematical problem in finite time.
We next present Euclid’s algorithm to determine the gcd of two numbers a
and b. Euclid’s algorithm (Euclid B.C. ) is the first known algorithm in the
mathematical literature. It is just the usual algorithm taught in High School
Algebra.
3.3 The Greatest Common Divisor (gcd) and
the Least Common Multiple (lcm) of two
integers
Definition 3.3.1:
Let a and b be two integers, at least one of which is not zero. A common
divisor of a and b is an integer c(6= 0) such that c | a and c | b. The greatest
common divisor of a and b is the greatest of the common divisors of a and b.
It is denoted by (a, b).
If c divides a and b, then so does −c. Hence (a, b) > 0 and is uniquely
defined. Moreover, if c is a common divisor of a and b, that is, if c | a and
Chapter 3 Basics of Number Theory 118
c | b, then
a = a′c and b = b′c
for integers a′ and b′. Hence
(a, b) = c(a′, b′)
so that c | (a, b). Thus any common divisor of a and b divides the gcd of a
and b. Hence (a, b) is the least common divisor of a and b that is divisible by
every common divisor of a and b. Moreover, (a, b) = (±a,±b).
Proposition 3.3.2:
If c | ab and (c, b) = 1, then c | a.
Proof. By hypothesis c | ab. Trivially c | |ac. Hence c is a common divisor
of ab and ac. Hence c is a divisor of (ab, ac) = a(b, c) = a, as (b, c) = 1.
Definition 3.3.3:
If a, b and c are nonzero integers and if a | c and b | c, then c is called a
common multiple of a and b.
The least common multiple (lcm) of a and b is the smallest of the positive
common multiples of a and b and is denoted by [a, b]. As in the case of gcd,
[a, b] = [±a, ±b].
Euclid’s Algorithm
Since (±a, ±b) = (a, b), we may assume without loss of generality that a > 0
and b > 0 and that a > b (If a = b, then (a, b) = (a, a) = a). By Division
Chapter 3 Basics of Number Theory 119
Algorithm (Theorem 3.2.3), there exist integers q1 and r1 such that
a = q1b+ r1, 0 ≤ r1 < b. (3.2)
Next divide b by r1 and get
b = q2r1 + r2, 0 ≤ r2 < r1. (3.3)
Next divide r1 by r2 and get
r1 = q3r2 + r3, 0 ≤ r3 < r2. (3.4)
At the (i+ 2)-th stage, we get the equation
ri = qi+2ri+1 + ri+2, 0 ≤ ri+2 < ri+1. (3.5)
Since the sequence of remainders r1, r2, . . . is strictly decreasing, this proce-
dure must stop at some stage, say,
rj−1 = qj+1rj. (3.6)
Then (a, b) = rj.
Proof. First we show that rj is a common divisor of a and b. To see this
we observe from equation (3.6) that rj | rj−1. Now the equation (3.5) for
i = j − 2 is
rj−2 = qjrj−1 + rj. (3.7)
Since rj | rj−1, rj divides the expression on the right side of (3.7), and so rj |
rj−2. Going backward, we get successively that rj divides rj−1 | rj−2, . . . , r1, b
and a. Thus rj | a and rj | b.
Next, let c | a and c | b. Then from equation (3.2), c | r1, and this when
substituted in equation (3.3), gives c | r2. Thus the successive equations of
Chapter 3 Basics of Number Theory 120
the algorithm yield c | a, c | b, c | r1, c | r2, . . . , c | rj. Thus any common
divisor of a and b is a divisor of rj.
Consequently, rj = (a, b).
Extended Euclidean Algorithm
Theorem 3.3.4:
If rj = (a, b), then it is possible to find integers x and y such that
ax+ by = rj (3.8)
Proof. The equations preceding (3.6) are
rj−3 = qj−1rj−2 + rj−1, and
rj−2 = qjrj−1 + rj. (3.9)
Equation (3.9) expresses rj in terms of rj−1 and rj−2 while the equation
preceding it expresses rj−1 in terms of rj−2 and rj−3. Thus
rj = rj−2 − qjrj−1
= rj−2 − qj (rj−3 − qj−1rj−2)
= (1 + qjqj−1) rj−2 − qjrj−3.
Thus we have expressed rj as a linear combination of rj−2 and rj−3, the coef-
ficients being integers. Working backward, we get rj as a linear combination
of a, b with the coefficients being integers.
The process given in the proof of Theorem 3.8 is known as the Extended
Euclidean Algorithm.
Chapter 3 Basics of Number Theory 121
Corollary 3.3.5:
If (a, m) = 1, then there exists an integer u such that au ≡ 1(mod m) and
any two such integers are congruent modulo m.
Proof. By the Extended Euclidean Algorithm, there exist integers u and v
such that au + mv = 1. This however means that au ≡ 1(mod m). The
second part is trivial as (a, m) = 1.
Example 3.3.6:
Find integers x and y so that 120x+ 70y = 1.
We apply Euclid’s algorithm to a = 120 and b = 70. We have
120 = 1 · 70 + 50
70 = 1 · 50 + 20
50 = 2 · 20 + 10
20 = 2 · 10
Hence gcd(120, 70) = 10.
Now starting from the last but one equation and going backward, we get
10 = 50− 2 · 20
= 50− 2(70− 1 · 50)
= 3 · 50− 2 · 70
= 3 · (120− 1 · 70)− 2 · 70
= 3 · 120− 5 · 70.
Therefore x = 3 and y = −5 fulfil the requirement.
Chapter 3 Basics of Number Theory 122
3.4 Primes
Definition 3.4.1:
An integer n > 1 is a prime if its only positive divisors are 1 and n. A natural
number greater than 1 which is not a prime is a composite number.
Naturally, 2 is the only even prime. 3, 5, 7, 11, 13, 17, . . . are all odd
primes. The composite numbers are 4, 6, 8, 9, 10, . . .
Theorem 3.4.2:
Every integer n > 1 can be expressed as a product of primes, unless the
number n itself is a prime.
Proof. The result is obvious for n = 2, 3 and 4. So assume that n > 4 and
apply induction. If n is a prime there is nothing to prove. If n is not a prime,
then n = n1n2, where 1 < n1 < n and 1 < n2 < n. By induction hypothesis,
both n1 and n2 are products of primes. Hence n itself is a product of primes.
(Note that the prime factors of n need not all be distinct.)
Suppose in a prime factorization of n, the distinct prime factors are
p1, p2, . . . , pr, and that pi is repeated αi times, 1 ≤ i ≤ r. Then
n = pα11 p
α22 · · · pαr
r . (3.10)
We now show that this factorization is unique in the sense that in any prime
factorization, the prime factors that occur are the same and that the prime
powers are also the same except for the order of the prime factors. For
instance, 200 = 23× 52, and the only other way to write it in the form (3.10)
is 52 × 23.
Chapter 3 Basics of Number Theory 123
Theorem 3.4.3 (Unique factorization theorem):
Every positive integer n > 1 can be expressed uniquely in the form
n = pα11 p
α22 · · · pαr
r
where pi, 1 ≤ i ≤ r, are distinct primes; the above factorization is unique
except for the order of the primes.
To prove Theorem 3.4.3, we need a property of primes.
Lemma 3.4.4:
If p is a prime such that p | (ab), but p ∤ a, then p | b.
Proof. As p ∤ a, (p, a) = 1. Now apply Proposition 3.3.2.
Note 3.4.5:
Theorem 3.4.4 implies that if p is a prime and p ∤ a and p ∤ b, then p ∤
(ab). More generally, p ∤ a1, p ∤ a2, . . . p ∤ an imply that p ∤ (a1a2 · · · an).
Consequently if p | (a1a2 · · · an), then p must divide at least one ai, 1 ≤ i ≤ n.
Proof. (of Theorem 3.4.3) Suppose
n = pα11 p
α22 · · · pαr
r = qβ1
1 qβ2
2 · · · qβj
j · · · qβs
s (3.11)
are two prime factorizations of n, where the pi’s and qi’s are all primes. As
p1 | (pα11 · · · pαr
r ), p1 |(qβ1
1 · · · qβrs
). Hence by Note 3.4.5, p1 must divide some
qj. As p1 and qj are primes and p1 | qj, p1 = qj. Cancelling p1 on both the
sides, we get
pα1−11 pα2
2 · · · pαr
r = qβ1
1 qβ2
2 · · · qβj−1j · · · qβs
s . (3.12)
Chapter 3 Basics of Number Theory 124
Now argue as before with p1 if α1 − 1 ≥ 1. If α1 < βj, this procedure will
result in the relation
pα22 · · · pαr
r = qβ1
1 qβ2
2 · · · qβj−α1
j · · · qβs
s . (3.13)
Now p1 divides the right hand expression of (3.13) and so must divide the
left hand expression of (3.13). But this is impossible as the pi’s are distinct
primes. Hence α1 = βj. Cancellation of pα11 on both sides of (3.11) yields
pα22 · · · pαr
r = qβ1
1 qβ2
2 · · · qβj−1
j−1 qβj+1
j+1 · · · qβs
s .
Repetition of our earlier argument gives p2 = one of the qi’s, 1 ≤ i ≤ s, say,
qk, k 6= j. Hence α2 = βk and so on. This shows that each pi = some qt and
that αi = βt so that pαi
i = qαt
t . Cancellation of pα11 followed by pα2
2 , . . . , pαrr
on both sides will leave 1 on the left side expression of (3.11) and so the right
side expression of (3.11) should also reduce to 1.
The unique factorization of numbers enables us to compute the gcd and
lcm of two numbers. Let a and b be any two integers ≥ 2. Let p1, . . . , pr be
the primes which divide at least one of a and b. Then a and b can be written
uniquely in the form
a = pα11 p
α22 · · · pαr
r
b = pβ1
1 pβ2
2 · · · pβr
r ,
where αi’s and βj’s are nonnegative integers. (Taking αi’s and βj’s to be
nonnegative integers in the prime factorizations of a and b, instead of taking
them to be positive, enables us to use the same prime factors for both a and
b. For instance, if a = 72 and b = 45, we can write a and b as,
a = 23 · 32 · 50
Chapter 3 Basics of Number Theory 125
and b = 20 · 32 · 51.)
Then clearly,
(a, b) =r∏
i=1
pmin(αi, βi)i , and
[a, b] =r∏
i=1
pmax(αi, βi)i
We next establish two important properties of prime numbers.
Theorem 3.4.6 (Euclid):
The number of primes is infinite.
Proof. The proof is by contradiction. Suppose there are only finitely many
primes, say, p1, p2, . . . , pr. Then the number n = 1 + p1p2 · · · pr is larger
than each pi, 1 ≤ i ≤ r, and hence composite. Now any composite number
is divisible by some prime. But none of the primes pi, 1 ≤ i ≤ r, divides
n. (For, if pi divides n, then pi | 1, an impossibility). Hence the number of
primes is infinite.
Theorem 3.4.7 (Nagell):
There are arbitrarily large gaps in the sequence of primes. In other words,
for any positive integer k ≥ 2, there exist k consecutive composite numbers.
Proof. The k numbers
(k + 1)! + 2, (k + 1)! + 3, . . . , (k + 1)! + k, (k + 1)! + (k + 1)
are consecutive. They are all composite since (k + 1)! + 2 is divisible by 2,
(k + 1)! + 3 is divisible by 3 and so on. In general (k + 1)! + j is divisible by
j for each j, 2 ≤ j ≤ k + 1.
Chapter 3 Basics of Number Theory 126
Definition 3.4.8:
Two numbers a and b are coprime or relatively prime if they are prime to
each other, that is, if (a, b) = 1.
3.5 Exercises
1. For any integer n, show that n2 − n is divisible by 2, n3 − n by 6 and
n5 − n by 30.
2. Show that (n, n+ 1) = 1 and that [n, n+ 1] = n(n+ 1).
3. Use the unique factorization theorem to prove that for any two positive
integers a and b, (a, b)[a, b] = ab, and that (a, b) | [a, b]. (Remark:
This shows that if (a, b) = 1, then [a, b] = ab. More generally, if
a1, a2, . . . , ar, is any set of positive integers, then (a1, a2, . . . , ar)
divides [a1, a2, . . . , ar]. Here (a1, a2, . . . , ar) and [a1, a2, . . . , ar] denote
respectively the gcd and lcm of the numbers a1, a2, . . . , ar.
4. Prove that no integers x, y exist satisfying x+ y = 100 and (x, y) = 3.
Do x, y exist if x+ y = 99, (x, y) = 3 ?
5. Show that there exist infinitely many pairs (x, y) such that x+ y = 72
and (x, y) = 9.
Hint: One choice for (x, y) is (63, 9). Take x′ prime to 8, that is,
(x′, 8) = 1 and take y′ = 8− x′. Now use the pairs (x′, y′).
6. If a+b = c, show that (a, c) = 1, iff (b, c) = 1. Hence show that any two
consecutive Fibonacci numbers are coprime. (The Fibonacci numbers
Chapter 3 Basics of Number Theory 127
Fn are defined by the recursive relation Fn = Fn−1 +Fn−2, where F0 =
1 = F1. Hence the Fibonacci sequence is 1, 1, 2, 3, 5, 8, . . ..
7. Find (i) gcd (2700, 15120). (ii) lcm [2700, 15120].
8. Determine integers x and y so that
(i) 180x+ 72y = 36.
(ii) 605x+ 96y = 67.
9. For a positive integer n, show that there exist integers a and b such
that n is a multiple of (a, b) = d and ab = n iff d2 | n.
(Hint: By Exercise 3 above, (a, b)[a, b] = ab = n and that (a, b) | [a, b].
Hence d2 |n. Conversely, if d2 |n, n = d2c. Now take d = a and dc = b.)
10. Prove that amn − 1 is divisible by am − 1, when m and n are positive
integers. Hence show that if an − 1 is prime, a ≥ 2, then n must be
prime.
11. Prove that2N−1∑n=1
1
2j − 1is not an integer for n > 1.
12. Let pn denote the n-th prime (p1 = 2, p2 = 3 and so on). Prove that
pn > 22n
.
13. Prove that if an = 22n
+ 1, then (an, an + 1) = 1 for each n ≥ 1. (Hint:
Set 22n
= x).
14. If n = pa11 · · · par
r is the prime factorization of n, show that d(n), the
number of distinct divisors of n is (a1 + 1) · · · (ar + 1).
Chapter 3 Basics of Number Theory 128
3.6 Congruences
A congruence is a division with reference to a number or a function. The
congruence relation has a notational convenience that could be employed
in making addition, subtraction, multiplication by constants and division in
some special cases.
Definition 3.6.1:
Given integers a, b and n (6= 0), a is said to be congruent to b modulo n, if
a − b is divisible by n, that is, a − b is a multiple of n. In symbols, it is
denoted by a ≡ b (mod n), and is read as a is congruent to b modulo n. The
number n is the modulus of the congruence.
Definition 3.6.2:
If f(x), g(x) and h(x) (6= 0) are any three polynomials with real coefficients
then by f(x) ≡ g(x) (mod h(x)), we mean that f(x) − g(x) is divisible by
h(x) over R, that is to say, there exists a polynomial q(x) with real coefficients
such that
f(x)− g(x) = q(x)h(x).
The congruence given in Definition 3.6.1 is numerical congruence while
that given in Definition 3.6.2 is polynomial congruence. We now concentrate
on numerical congruence. Trivially, a ≡ b (mod m), iff a ≡ b(
mod (−m)).
Hence we assume without loss of generality that the modulus of any numerical
congruence is a positive integer.
Proposition 3.6.3: 1. a ≡ b (mod m) iff b ≡ a (mod m).
Chapter 3 Basics of Number Theory 129
2. If a ≡ b(mod m), and b ≡ c(mod m), then a ≡ c(mod m).
3. If a ≡ b (mod m), then for any integer k, ka ≡ kb (mod m). In par-
ticular, taking k = −1, we have −a ≡ −b (mod m), whenever a ≡ b (
mod m).
4. If a ≡ b (mod m), and c ≡ d (mod m), then
(a) a± c ≡ b± d (mod m), and
(b) ac ≡ bd (mod m).
Proof. We prove only 3 and 4; the rest follow immediately from the definition.
3. If a ≡ b (mod m), then a−b is divisible by m, and hence so is k(a−b) =
ka− kb. Thus ka ≡ kb (mod m).
4. If a− b and c− d are multiples of m, say, a− b = km and c− d = k′m
for integers k and k′, then (a + c) − (b + d) = (a − b) + (c − d) =
km+ k′m = (k+ k′)m, a multiple of m. This means that a+ c ≡ b+ d (
mod m). Similarly a − c ≡ b − d (mod m). Next, if a ≡ b (mod m),
then by (3), ac ≡ bc (mod m). But then c ≡ d (mod m) gives bc ≡ bd (
mod m). Hence ac ≡ bd (mod m).
Proposition 3.6.4:
If ab ≡ ac (mod m), and (a, m) = 1, then b ≡ c (mod m).
Proof. ab ≡ ac (mod m) gives that a(b − c) = km for some integer k. As
a | a(b− c), a | km. But (a, m) = 1. Hence a | k so that k = ak′, k′ ∈ Z, and
Chapter 3 Basics of Number Theory 130
a(b − c) = km = ak′m. This however gives that b − c = k′m, and therefore
b ≡ c (mod m).
Corollary 3.6.5:
If ab ≡ ac (mod m), then b ≡ c (mod md
), where d = (a, m).
Proof. As d = (a, m), d | a and d | m. Therefore, a = da′ and m = dm′,
where (a′, m′) = 1. Then ab ≡ ac (mod m) gives that
da′b ≡ da′c ( mod m), (3.14)
that is da′b ≡ da′c ( mod dm′) (3.15)
and therefore a′b ≡ a′c(mod m′). But (a′, m′) = 1. Hence b ≡ c.
(mod m′).
Proposition 3.6.6:
If (a,m) = (b,m) = 1, then (ab,m) = 1.
Proof. Suppose (ab,m) = d > 1 and p is a prime divisor of d. Then p | m
and p | ab. But p | ab means that p | a or p | b as p is prime. If p | a,
then (a, m) ≥ p > 1, while if p | b, (b, m) ≥ p > 1. This shows that
(ab, m) = 1.
Proposition 3.6.7:
If ax ≡ 1 (mod m) and (a, m) = 1, then (x, m) = 1.
Proof. Suppose (x, m) = d > 1. Let p be a prime divisor of d. Then p | x
and p | m. This however means, since ax − 1 = km for some k ∈ Z, that
p | 1, a contradiction.
Chapter 3 Basics of Number Theory 131
Proposition 3.6.8:
If a ≡ b (mod mi), 1 ≤ i ≤ r, then a ≡ b (mod [m1, . . . ,mr]), where
[m1, . . . ,mr] stands for the lcm of m1, . . . ,mr.
Proof. The hypothesis implies that a−b is a common multiple of m1, . . . ,mr
and hence it is a multiple of the least common multiple of m1, . . .mr. (Be-
cause if a − b = αimi, 1 ≤ i ≤ r, and mi has the prime factorization
mi = pαi
11 p
αi2
2 · · · pαi
t
t , 1 ≤ i ≤ r, then pαi
j
j | mi for each j, in 1 ≤ j ≤ t and
for each i, 1 ≤ i ≤ r. (Here the exponents αij are nonnegative integers. This
enables us to take the same prime factors p1, . . . , pt for all the mi’s. See
Section 3.4.) Hence mi is divisible by(p
maxi αij
j
)for each j in 1 ≤ j ≤ t.
Thus each of these t numbers divides a − b and they are pairwise coprime.
Hence a− b is divisible by their product. But their product is precisely the
lcm of m1, . . . ,mr. (See Section 3.4.)
3.7 Complete System of Residues
A number b is called a residue of a number a modulo m if a ≡ b (mod m).
Obviously, b is a residue of b modulo m.
Definition 3.7.1:
Given a positive integer m, a set S = x1, . . . , xm of m numbers is called
a complete system of residues modulo m if for any integer x, there exists a
unique xi ∈ S such that x ≡ xi (mod m).
We note that no two numbers xi and xj, i 6= j, in S are congruent modulo
m. For if xi ≡ xj (mod m), then since xi ≡ xi (mod m) trivially, we have
Chapter 3 Basics of Number Theory 132
a contradiction to the fact that S is a complete residue system modulo m.
Conversely, it is easy to show that any set of m numbers no two of which
are congruent modulo m forms a complete residue system modulo m. In
particular, the set 0, 1, 2, . . . ,m− 1 is a complete residue system modulo
m.
Next, suppose that (x, m) = 1 and x ≡ xi (mod m). Then xi is also prime
to m. (If xi and m have a common factor p > 1 then p | x as x − xi = km,
k ∈ Z. This however means that (x, m) ≥ p > 1, a contradiction to our
assumption.) Thus if S = x1, . . . , xm is a complete system of residues
modulo m and x ≡ xi (mod m), then (x, m) = 1 iff (xi, m) = 1. Then
deleting the numbers xj of S that are not coprime to m, we get a subset S ′
of S consisting of a set of residues modulo m each of which is relatively prime
to m. Such a system is called a reduced system of residues modulo m. For
instance, taking m = 10, 0, 1, 2, . . . . , 9 is a a complete system of residues
modulo 10, while S ′ = 1, 3, 7, 9 is a reduced system of residue modulo 10.
The numbers in S ′ are all the numbers that are less than 10 and prime to 10.
Euler’s φ-Function
Definition 3.7.2:
The Euler function φ(n) (also called the totient function) is defined to be
the number of positive integers less than n and prime to n. It is also the
cardinality of a reduced residue system modulo n.
We have seen earlier that φ(10) = 4. We note that φ(12) is also equal to
4 since 1, 5, 7, 11 are all the numbers less than 12 and prime to 12. If p is a
prime, then all the numbers in 1, 2, . . . , p− 1 are less than p and prime to
Chapter 3 Basics of Number Theory 133
p and so φ(p) = p− 1. We now present Euler’s theorem on the φ-function.
Theorem 3.7.3 (Euler):
If (a, n) = 1, then
aφ(n) ≡ ( mod n).
Proof. Letr1, . . . , rφ(n)
be a reduced residue system modulo n. Now
(ri, n) = 1 for each i, 1 ≤ i ≤ φ(n). Further, as (a, n) = 1, by Proposi-
tion 3.6.6, (ari, n) = 1. Moreover, if i 6= j, ari 6≡ arj (mod n). For, ari ≡ arj(
mod n) implies (as (a, n) = 1), by virtue of Proposition 3.6.4, that ri ≡ rj (
mod n), a contradiction to the fact thatr1, . . . , rφ(n)
is a reduced residue
system modulo n. Henceari, . . . , arφ(n)
is also a reduced residue system
modulo n andφ(n)∏
i=1
(ari) ≡
φ(n)∏
j=1
rj
( mod n).
This gives that aφ(n)∏φ(n)
i=1 ri ≡∏φ(n)
j=1 rj (mod n). Further (ri, n) = 1 for
each i = 1, 2, . . . , φ(n) gives that(∏φ(n)
i=1 ri, n)
= 1 by Proposition 3.6.6.
Consequently, by Proposition 3.6.4,
aφ(n) ≡ 1 ( mod n).
Corollary 3.7.4 (Fermat’s Little Theorem):
If n is a prime and (a, n) = 1, then an−1 ≡ 1 (mod n).
Proof. If n is a prime, then φ(n) = n − 1. Now apply Euler’s theorem
(Theorem 3.7.3).
We see more properties of the Euler function φ(n) in Section 3.11. An-
other interesting theorem in elementary number theory is Wilson’s theorem.
Chapter 3 Basics of Number Theory 134
Theorem 3.7.5:
If u ∈ [1, m − 1] is a solution of the congruence ax ≡ 1 (mod m), then all
the solutions of the congruence are given by u + km, k ∈ Z. In particular,
there exists a unique u ∈ [1, m− 1] such that au ≡ 1(mod m).
Proof. Clearly, u+km, k ∈ Z, is a solution of the congruence ax ≡ 1 (mod m)
because a(u+ km) = au+ akm ≡ au ≡ 1 (mod m). Conversely, let ax0 ≡ 1 (
mod m). Then a(x0 − u) ≡ 0 (mod m), and therefore by Proposition 3.6.7,
(x0 − u) ≡ 0 (mod m) as (a, m) = 1 in view of au ≡ 1 (mod m). Hence
x0 = u+ km for some k ∈ Z. The proof of the latter part is trivial.
Theorem 3.7.6 (Wilson’s theorem):
If p is a prime, then
(p− 1)! ≡ −1 ( mod p).
Proof. The result is trivially true if p = 2 or 3. So let us assume that prime
p ≥ 5. We look at (p− 1)! = 1 · 2 · · · (p− 1). Now
1 ≡ 1 ( mod p), and
p− 1 ≡ −1 ( mod p).
Hence it is enough if we prove that
2 · 3 · · · (p− 2) ≡ 1 ( mod p),
since the multiplication of the three congruences (See Proposition 3.6.3) will
yield the required result.
Now, as p (≥ 5) is an odd prime, the cardinality of L is even, where
L = 2, 3, . . . , p− 2. For each i ∈ L, by virtue of Corollary 3.3.5, there
Chapter 3 Basics of Number Theory 135
exists a unique j, 1 ≤ j ≤ p − 1 such that ij ≡ 1 (mod p). Now j 6= 1, and
j 6= p − 1.(If j = p − 1, then ij = i(p − 1) ≡ −i (mod p) and therefore
−i ≡ 1 (mod p). This means that p | (i + 1). This is not possible as i ∈ L).
Also ij = ji. Moreover j 6= i since j = 1 implies that i2 ≡ 1 (mod p) and
therefore p | (i − 1) or p | (i + 1). However this is not possible as this will
imply that i = 1 or (p− 1). Thus each i ∈ L can be paired off with a unique
j ∈ L such that ij ≡ 1 (mod p). In this way we getp− 3
2congruences.
Multiplying thesep− 3
2congruences, we get
2 · 3 · · · (p− 2) ≡ 1 · · · 1 ≡ 1 ( mod p)
Example 3.7.7:
As an application of Wilson’s theorem, we prove that 712!+1 ≡ 0 (mod 719).
Proof. Since 719 is a prime, Wilson’s theorem implies that 718! + 1 ≡ 0 (
mod 719). We now rewrite 718! in terms of 712 as
718! = (712)!× 713× 714× 715× 716× 717× 718
= 712!(719− 6)(719− 5) · · · (719− 1)
= 712! (M(719) + 6!) .
(M(719) stands for a multiple of 719.
)
≡(712!× 6!
)( mod 719)
≡(712!× 720
)( mod 719)
≡(712!× (719 + 1)
)( mod 719)
≡((712!× 719) + 712!
)( mod 719)
≡(712!
)( mod 719).
Chapter 3 Basics of Number Theory 136
Thus 712! + 1 ≡ 718! + 1 ≡ 0 (mod 719)
If a ≡ b (mod m), then by Proposition 3.6.3, a2 ≡ b2 (mod m), and so
a3 ≡ b3 (mod m) and so on. In general, ar ≡ br (mod m) for every positive
integer r and hence, again by Proposition 3.6.3, tar ≡ tbr for every integer
t. In particular, if f(x) = a0 + a1x + · · · + anxn is any polynomial in x
with integer coefficients, then f(a) = a0 + a1 · a + a2 · a2 + · · · + an · an ≡
a0 + a1 · b + a2 · b2 + · · · + an · bn = f(b)(mod m). We state this result as a
theorem.
Theorem 3.7.8:
If f(x) is a polynomial with integer coefficients, and a ≡ b(mod m), then
f(a) ≡ f(b) (mod m).
3.8 Linear Congruences and Chinese Remain-
der Theorem
Let f(x) be a polynomial with integer coefficients. By a solution of the
polynomial congruence
f(x) ≡ 0 ( mod m), (3.16)
we mean an integer x0 with f(x0) ≡ 0 (mod m). If x0 ≡ y0 (mod m), by
Theorem 3.7.8, f(x0) ≡ f(y0) (mod m), and hence y0 is also a solution of
the congruence (3.16). Hence, when we speak of all the solutions of (3.16),
we consider congruent solutions as forming a class. Hence by the number of
solutions of a polynomial congruence, we mean the number of distinct con-
gruence classes of solutions. Equivalently, it is the number of incongruent
Chapter 3 Basics of Number Theory 137
solutions modulo m of the congruence. Since any set of incongruent num-
bers modulo m is of cardinality at most m, the number of solutions of any
polynomial congruence modulo m is at most m.
The congruence (3.16) is linear if f(x) is a linear polynomial. Hence a
linear congruence is of the form
ax ≡ b ( mod m) (3.17)
It is not always necessary that a congruence modulo m has m solutions. In
fact, a congruence modulo m may have no solution or less than m solutions.
For instance, the congruence 2x ≡ 1 (mod 6) has no solution since 2x− 1 is
an odd integer and hence cannot be a multiple of 6. The congruence x3 ≡ 1 (
mod 7) has exactly 3 solutions given by x ≡ 1, 2, 4 (mod 7).
Theorem 3.8.1:
Let (a, m) = 1 and b an integer. Then the linear congruence
ax ≡ b ( mod m) (3.18)
has exactly one solution.
Proof. (See also Corollory 3.3.5.) The numbers 1, 2, . . . , m form a complete
residue system modulo m. Hence, as (a, m) = 1, the numbers a · 1, a · 2,
· · · , a ·m also form a complete residue system modulo m. Now any integer is
congruent modulo m to a unique integer in a complete residue system modulo
m (by definition of a complete residue system). Hence b is congruent modulo
m to a unique a · i, 1 ≤ i ≤ m. Thus there exists a unique x ∈ 1, 2, . . .m
such that
ax ≡ b ( mod m)
Chapter 3 Basics of Number Theory 138
If (a, m) = 1, taking b = 1 in Theorem 3.8.1, we see that there exists a
unique x in 1, 2, . . .m such that
ax ≡ 1 ( mod m)
This unique x is called the reciprocal of a modulo m.
We have seen in Theorem 3.8.1 that if (a, m) = 1, the congruence has
exactly one solution. What happens if (a, m) = d?
Theorem 3.8.2:
Let (a, m) = d. Then the congruence
ax ≡ b ( mod m) (3.19)
has a solution iff d | b. If d | b, the congruence has exactly d solutions. The
d solutions are given by
x0, x0 +m/d, x0 + 2m/d, . . . , x0 + (d− 1)m/d, (3.20)
where x0 is the unique solution in 1, 2, . . . ,m/d of the congruence
a
dx ≡ b
d
(mod
m
d
).
Proof. Suppose x0 is a solution of the congruence (3.19). Then ax0 = b+km,
k ∈ Z. As d | a and d | m, d | b. Conversely, if d | b, let b = db0. Further, if
a = da0 and m = dm0. Then the congruence (3.19) becomes
a0dx ≡ db0 ( mod dm0)
and therefore
a0x ≡ b0 ( mod m0) (3.21)
Chapter 3 Basics of Number Theory 139
where (a0, m0) = 1. But the latter has a unique solution x0 ∈ 1, 2, . . .m0 =
m/d. Hence a0x0 ≡ b0 (mod m0). So x0 is also a solution of (3.19).
Assume now that d | b. Let y be any solution of (3.19). Then ay ≡
b (mod m). Also ax0 ≡ b (mod m). Hence ay ≡ ax0 (mod m) so that
a0dy ≡ a0dx0 (mod (m/d)d). Hence a0y ≡ a0x0 (mod (m/d)). As d =
(a, m), (a0, m/d) = 1. So by Proposition 3.6.4, y ≡ x0 (mod (m/d)) and
so y = x0 + k(m/d) for some integer k. But k ≡ r (mod d) for some r,
0 ≤ r < d. This gives kmd≡ rm
d(mod m). Thus x0 + k(m/d) ≡ x0 + r(m/d) (
mod m), 0 ≤ r < d and so y is congruent modulo m to one of the numbers
in (3.20).
Chinese Remainder Theorem
Suppose there are more than one linear congruences. In general, they need
not possess a common solution. (In fact, as seen earlier, even a single linear
congruence may not have a solution.) The Chinese Remainder Theorem
ensures that if the moduli of the linear congruences are pairwise coprime,
then the simultaneous congruences all have a common solution. To start
with, consider congruences of the form x ≡ bi (mod mi).
Theorem 3.8.3:
Letm1, . . . ,mr be positive integers that are pairwise coprime, that is, (mi, mj) =
1 whenever i 6= j. Let b1, . . . , br be arbitrary integers. Then the system of
congruences
x ≡ b1 ( mod m1)
...
Chapter 3 Basics of Number Theory 140
x ≡ br ( mod mr)
has exactly one solution modulo M = m1 · · ·mr.
Proof. Let Mi = M/mi, 1 ≤ i ≤ r. Then, by hypothesis,
(Mi, mi) =( r∏
k=1k 6=i
mk, mi
)= 1.
Hence each Mi has a unique reciprocal M ′i modulo mi, 1 ≤ i ≤ r. Let
x = b1M1M′1 + · · ·+ brMrM
′r. (3.22)
Now mi divides each Mj, j 6= i. Hence, taking modulo mi on both sides of
(3.22), we get
x ≡ biMiM′i ( mod mi)
≡ bi ( mod mi) as MiM′i ≡ 1 ( mod mi).
Hence x is a common solution of all the r congruences.
We now show that x is unique modulo M . In fact, if y is another common
solution, we have
y ≡ bi ( mod mi), 1 ≤ i ≤ r,
and, therefore,
x ≡ y ( mod mi), 1 ≤ i ≤ r.
This means, as the mi’s are pairwise coprime, that
x ≡ y ( mod M).
We now present the general form of the Chinese Remainder Theorem.
Chapter 3 Basics of Number Theory 141
Theorem 3.8.4 (Chinese Remainder Theorem):
Let m1, . . . ,mr be positive integers that are pairwise coprime. Let b1, . . . , br
be arbitrary integers and let integers a1, . . . , ar satisfy (ai, mi) = 1, 1 ≤ i ≤ r.
Then the system of congruences
a1x ≡ b1 ( mod m1)
...
arx ≡ br ( mod mr)
has exactly one solution modulo M = m1m2 · · ·mr.
Proof. As (ai, mi) = 1, ai has a unique reciprocal a′i modulo mi so that
aia′i ≡ 1 (mod mi). Then the congruence aix ≡ bi (mod mi) is equivalent
to a′iaix ≡ a′ibi (mod mi), that is, to x ≡ a′ibi (mod mi), 1 ≤ i ≤ r. By
Theorem 3.8.3, these congruences have a common unique solution x modulo
M = m1 · · ·mr. Because of the equivalence of the two sets of congruences, x
is a common solution to the given set of r congruences as well.
3.9 Lattice Points Visible from the Origin
A lattice point of the plane is a point both of whose cartesian coordinates
(with reference to a pair of rectangular axes) are integers. For example,
(2, 3) is a lattice point while (2.5, 3) is not. A lattice point (a, b) is said to
be visible from another lattice point (a′, b′) if the line segment joining (a′, b′)
with (a, b) contains no other lattice point. In other words, it means that
there is no lattice point that obstructs the view of (a, b) from (a′, b′). It is
clear that (±1, 0) and (0, ±1) are the only lattice points on the coordinate
Chapter 3 Basics of Number Theory 142
axes visible from the origin. Further, the point (2, 3) is visible from the
origin, but (2, 2) is not (See Figure 3.1). Hence we consider here lattice
points (a, b) not on the coordinate axes but visible from the origin. Without
loss of generality, we may assume that a ≥ 1 and b ≥ 1.
Ox
y
b
b
b
(1,1)
(2,2)
(2,3)
Figure 3.1: Lattice points visible
from the origin
Ox
yb
b
b (a′, b′)
(a, b)
Figure 3.2:
Lemma 3.9.1:
The lattice point (a, b) (not belonging to any of the coordinate axes) is visible
from the origin iff (a, b) = 1.
Proof. As mentioned earlier, assume without loss of generality, that a ≥ 1
and b ≥ 1. Similar argument applies in the other cases.
Suppose (a, b) = 1. Then (a, b) must be visible from the origin. If not,
there exists a lattice point (a′, b′) with a′ < a and b′ < b in the segment
joining (0, 0) with (a, b). (See Figure 3.2) Thenb
a=b′
a′(= slope of the line
joining (0, 0) with (a, b)) so that ba′ = b′a. Now a | b′a and so a | ba′. But
(a, b) = 1 and hence by Proposition 3.3.2, a | a′. But this is a contradiction
since a′ < a.
Next assume that (a, b) = d > 1. Then a = da′, b = db′ for positive
Chapter 3 Basics of Number Theory 143
integers a′, b′. Then the lattice point (a′, b′) lies on the segment joining (0,
0) with (a, b), and since a′ < a and b′ < b, (a, b) is not visible from the
origin.
Corollary 3.9.2:
The lattice point (a, b) is visible from the lattice point (c, d) iff (a−c, b−d) =
1.
Proof. Shift the origin to (c, d) through parallel axes. Then, the new origin
is (c, d) and the new coordinates of the original point (a, b) with respect to
the new axes are (a− c, b− d). Now apply Lemma 3.9.1.
We now give an application of the Chinese remainder theorem to the set
of lattice points visible from the origin.
Theorem 3.9.3:
The set of lattice points visible from the origin contains arbitrarily large
square gaps. That is, given any positive integer k, there exists a lattice point
(a, b) such that none of the lattice points
(a+ r, b+ s), 1 ≤ r ≤ k, 1 ≤ s ≤ k,
is visible from the origin.
Proof. Let p1, p2, . . . be the sequence of primes. Given the positive integer
k, construct a k by k matrix M whose first row is the sequence of first
k primes p1, p2, . . . , pk, the second row is the sequence of next k primes,
Chapter 3 Basics of Number Theory 144
b
b
b
b
b
(a, b) x
y
(a+ t, b+ k) (a+ k, b+ k)
(a+ r, b+ s) (a+ k, b+ t)
Figure 3.3: Lattice points visible from (a, b)
namely pk+1, . . . , p2k, and so on.
M :
p1 p2 . . . ps . . . pk
pk+1 pk+2 . . . pk+s . . . p2k
......
. . ....
...
Let mi (resp. Mi) be the product of the k primes in the i-th row (resp.
column) of M . Then for i 6= j, (mi, mj) = 1 and (Mi, Mj) = 1 because in
the products mi and mj (resp. Mi and Mj), there is no repetition of any
prime. Now by Chinese Remainder Theorem, the set of congruences
x ≡ −1 ( mod m1)
x ≡ −2 ( mod m2)
...
x ≡ −k ( mod mk)
has a unique common solution a modulo m1 · · ·mk. Similarly, the system
y ≡ −1 ( mod M1)
y ≡ −2 ( mod M2)
Chapter 3 Basics of Number Theory 145
...
y ≡ −k ( mod Mk)
has a unique common solution b modulo M1 · · ·Mk. Then a ≡ −r (mod mr),
and b ≡ −s (mod Ms), 1 ≤ r, s ≤ k. Hence a+ r is divisible by the product
of all the primes in the r-th row of M , and similarly b+ s is divisible by the
product of all the primes in the s-th column of M . Hence the prime common
to the r-th row and s-th column of M divides both a+ r and b+ s. In other
words (a+ r, b+ s) 6= 1. So by Lemma 3.9.1, the lattice point (a+ r, b+ s) is
not visible from the origin. Now any lattice point inside the square is of the
form (a+ r, b+ s), 0 < r < k, 0 < s < k. (For 1 ≤ r ≤ k− 1, 1 ≤ s ≤ k− 1,
we get lattice points inside the square while for r = k, or s = k, we get lattice
points on the boundary of the square.)
3.10 Exercises
1. If a is prime to m, show that 0 · a, 1 · a, 2 · a, . . . , (m − 1) · a form a
complete system of residues mod m. Hence show that for any integer
b and for (a, m) = 1, the set b, b+ a, b+ 2a, . . . , b+ (m− 1)a forms
a complete system of residues modulo m.
2. Show that the sum of the numbers less than n and prime to n is 12nφ(n).
3. Prove that 18! + 1 is divisible by 437.
4. If p and q are distinct primes, show that pq−1 + qp−1 − 1 is divisible by
pq.
Chapter 3 Basics of Number Theory 146
5. If p is a prime, show that 2(p− 3)! + 1 ≡ 0 (mod p). [Hint: (p− 1)! =
(p− 3)!(p− 2)(p− 1) ≡ 2(p− 3)!].
6. Solve: 5x ≡ 2 (mod 6). [Hint: See Theorem 3.8.1].
7. Solve: 3x ≡ 2 (mod 6). [Hint: Apply Theorem 3.8.2].
8. Solve: 5x ≡ 10 (mod 715). [Hint: Apply Chinese Remainder Theorem].
9. Solve the simultaneous congruences:
(i) x ≡ 1 (mod 3), x ≡ 2 (mod 4); 2x ≡ 1 (mod 5).
(ii) 2x ≡ 1 (mod 3), 3x ≡ 1 (mod 4); x ≡ 2 (mod 5).
10. Prove the converse of Wilson’s theorem, namely, if (n − 1)! + 1 ≡ 0 (
mod n), then n is prime. [Hint: Prove by contradiction].
3.11 Some Arithmetical Functions
Arithmetical functions are real or complex valued functions defined on N,
the set of positive integers. We have already come across the arithmetical
function φ(n), the Euler’s totient function. In this section, we look at the
basic properties of the arithmetical functions φ(n), the Mobius function µ(n)
and the divisor function d(n).
Definition 3.11.1:
An arithmetical function or a number-theoretical function is a function whose
domain is the set of natural numbers and codomain is the set of real or
complex numbers.
Chapter 3 Basics of Number Theory 147
The Mobius Function µ(n)
Definition 3.11.2:
The Mobius function µ(n) is defined as follows:
µ(1) = 1;
If n > 1, and n = pa11 · · · par
r is the prime factorization of n,
µ(n) =
(−1)r if a1 = a2 = · · · = ar = 1
(that is, if n is a product of r distinct primes)
0 otherwise (that is, n has a square factor > 1).
For instance, if n = 5 · 11 · 13 = 715, a product of three distinct primes,
µ(n) = (−1)3 = −1, while if n = 52 · 11 · 13 or n = 73 · 13, µ(n) = 0. Most of
these arithmetical functions have nice relations connecting n and the divisors
of n.
Theorem 3.11.3:
If n ≥ 1, we have
∑
d|nµ(d) =
⌊1
n
⌋=
1 if n = 1
0 if n > 1.
(3.23)
(Recall that for any real number x, ⌊x⌋ stands for the floor of x, that is,
the greatest integer not greater than x. (For example, ⌊152⌋ = 7). )
Proof. If n = 1, µ(1) is, by definition, equal to 1 and hence the relation
(3.23) is valid. Now assume that n > 1 and that p1, . . . , pr are the distinct
prime factors of n. Then any divisor of n is of the form pa11 · · · par
r , where
Chapter 3 Basics of Number Theory 148
each ai ≥ 0 and hence the divisors of n for which the µ-function has nonzero
values are the numbers in the set pσ11 · · · pσr
r : σi = 0 or 1, 1 ≤ i ≤ r =
1; p1, . . . , pr; p1p2, p1p3, . . . , pr−1pr; . . . ; p1p2 · · · pr. Now µ(1) = 1; µ(pi) =
(−1)1 = −1; µ(pipj) = (−1)2 = 1; µ(pipjpk) = (−1)3 = −1 and so on.
Further the number of terms of the form pi is(
n1
), of the form pipj is
(n2
)and
so on. Hence if n > 1,
∑
d|nµ(d) = 1−
(n
1
)+
(n
2
)− · · ·+ (−1)r
(n
r
)
= (1− 1)r = 0
A relation connecting φ and µ
Theorem 3.11.4:
If n ≥ 1, we have
φ(n) =∑
d|nµ(d)
n
d(3.24)
Proof. If (n, k) = 1, then⌊
1(n, k)
⌋=⌊
11
⌋= 1, while if (n, k) > 1,
⌊1
(n, k)
⌋=
⌊a positive number less than 1⌋ = 0. Hence
φ(n) =n∑
k=1
⌊1
(n, k)
⌋.
Now replacing n by (n, k) in Theorem 3.11.3, we get
∑
d|(n, k)
µ(d) =
⌊1
(n, k)
⌋.
Hence
φ(n) =n∑
k=1
∑
d|(n, k)
µ(d)
Chapter 3 Basics of Number Theory 149
=n∑
k=1
∑
d|n
d|k
µ(d). (3.25)
For a fixed divisor d if n, we must sum over all those k in the range 1 ≤ k ≤ n
which are multiples of d. Hence if we take k = qd, then 1 ≤ q ≤ n/d.
Therefore (3.25) reduces to
φ(n) =∑
d|n
n/d∑
q=1
µ(d)
=∑
d|nµ(d)
n/d∑
q=1
1 =∑
d|nµ(d)
n
d
Theorem 3.11.5:
If n ≥ 1, we have∑
d|n φ(d) = n.
Proof. For each divisor d of n, let A(d) denote those numbers k, 1 ≤ k ≤ n,
such that (k, n) = d. Clearly, the sets A(d) are pairwise disjoint and their
union is the set 1, 2, . . . , n. (For example, if n = 6, d = 1, 2, 3 and
6. Moreover A(1) = k : (k, n) = 1 = set of numbers ≤ n and prime to
n = 1, 5. Similarly, A(2) = 2, 4, A(3) = 3, A(4) = φ = A5, and
A6 = 6. Clearly, the sets A(1) to A(6) are pairwise disjoint and their
union is the set 1, 2, 3, 4, 5, 6). Then if |A(d)| denotes the cardinality of
the set A(d),∑
d|n| A(d) |= n (3.26)
But (k, n) = d iff
(k
d,n
d
)= 1 and 0 < k ≤ n, that is, iff 0 < k
d≤ n
d. Hence
if we set q = kd, there is a 1− 1 correspondence between the elements in A(d)
and those integers q satisfying 0 < q ≤ nd, where (q, n/d) = 1. The number of
such q’s is φ(n/d). [Note: If q = n/d, then, (q, n/d) = (n/d, n/d) = n/d = 1
Chapter 3 Basics of Number Theory 150
iff d = n. In this case, q = 1 = φ(1) = φ(nd
). Thus | A(d) |= φ(n/d) and
(3.26) becomes∑
d|nφ(n/d) = n.
But this is equivalent to∑
d|n φ(d) = n since as d runs through all the divisors
of n, so does n/d.
As an example, take n = 12. Then d runs through 1, 2, 3, 4, 6 and 12.
Now φ(1) = φ(2) = 1, φ(3) = φ(4) = φ(6) = 2, and φ(12) = 4 and∑d|nφ(d) =
φ(1) +φ(2) +φ(3) +φ(4) +φ(6) +φ(12) = (1 + 1) + (2 + 2 + 2) + 4 = 12 = n.
A Product Formula for φ(n)
We now present the well-known product formula for φ(n) expressing it as a
product extended over all the distinct prime factors of n.
Theorem 3.11.6:
For n ≥ 2, we have
φ(n) = n∏
p|n
p=a prime
(1− 1
p
)(3.27)
Proof. We use the formula φ(n) =∑
d|n µ(d)nd
of Theorem 3.11.4 for the
proof. Let p1, . . . , pr be the distinct prime factors of n. Then
∏
p|np=a prime
(1− 1
p
)=
r∏
i=1
(1− 1
pi
)= 1−
∑
i
1
pi
+∑
i6=j
1
pipj
−∑ 1
pipjpk
+ · · · ,
(3.28)
where, for example, the sum∑
1pipjpk
is formed by taking distinct prime
divisors pi, pj and pk of n. Now, by definition of the µ-function, µ(pi) = −1,
Chapter 3 Basics of Number Theory 151
µ(pipj) = 1, µ(pipjpk) = −1 and so on. Hence the sum on the right side of
(3.28) is equal to
1 +∑
pi
µ(pi)
pi
+∑
pi,pj
µ(pipj)
pipj
+ · · · =∑
d|n
µ(d)
d,
since all the other divisors of n, that is, divisors which are not products of
distinct primes, contain a square and hence their µ-values are zero. Thus
n∏
p|n
p=a prime
(1− 1
p
)=∑
d|nµ(d)
n
d= φ(n) (by Theorem 3.11.4)
The Euler φ-function has the following properties.
Theorem 3.11.7: (i) φ(pr) = pr − pr−1 for prime p and r ≥ 1.
(ii) φ(mn) = φ(m)φ(n)(d/φ(d)), where d = (m, n).
(iii) φ(mn) = φ(m)φ(n), if m and n are relatively prime.
(iv) a | b implies that φ(a) | φ(b).
(v) φ(n) is even for n ≥ 3. Moreover, if n has k distinct odd prime factors,
then 2k | φ(n).
Proof. (i) By the product formula,
φ(pr) = pr
(1− 1
p
)= pr − pr−1.
(ii) We have φ(mn) = mn∏
p|mn
p=a prime
(1− 1
p
). If p is a prime that divides
mn, then p divides either m or n. But then there may be primes which
divide both m and n and these are precisely the prime factors of (m, n).
Chapter 3 Basics of Number Theory 152
Hence if we look at the primes that divide both m and n separately,
the primes p that divide (m, n) = d occur twice. Therefore
∏
p|mn
(1− 1
p
)=
∏p|m
(1− 1
p
)·∏p|n
(1− 1
p
)
∏p|(m, n)
(1− 1
p
)
=
φ(m)
m· φ(n)
nφ(d)
d
(by the product formula)
=1
mnφ(m)φ(n)
d
φ(d)
This gives the required result since the term on the left is1
mnφ(mn).
(iii) If (m, n) = 1, then d in (ii) is 1. Now apply (ii).
(iv) If a | b, then every prime divisor of a is a prime divisor of b. Hence
a∏p|a
(1− 1
p
)| b∏
p|b
(1− 1
p
). This however means that φ(a) | φ(b).
(v) If n ≥ 3, either n is a power of 2, say, 2r, r ≥ 2, or else n = 2km,
where m ≥ 3 is odd. If n = 2r, φ(n) = 2r−1, an even number. In
the other case, by (iii), φ(n) = φ(2k)φ(m). Now if m = pk11 · · · pks
s is
the prime factorisation of m, by (iii) φ(m) =∏s
i=1 φ(pki
i ) =(by (i)
)
∏si=1 p
ki−1i · (pi− 1). Now each pi− 1 is even and hence 2s is a factor of
φ(n).
There are other properties of the three arithmetical functions described
above as also other arithmetical functions not described here. The interested
reader can consult [1, 2, 54].
We now present an application of Theorem 3.11.5.
Chapter 3 Basics of Number Theory 153
An Application
We prove that
∣∣∣∣∣∣∣∣∣∣∣∣∣
(1, 1) (1, 2) . . . (1, n)
(2, 1) (2, 2) . . . (2, n)
......
. . ....
(n, 1) (n, 2) . . . (n, n)
∣∣∣∣∣∣∣∣∣∣∣∣∣
= φ(1)φ(2) · · ·φ(n).
Proof. Let D be the diagonal matrix
φ(1) 0 . . . 0
0 φ(2) . . . 0
......
. . ....
0 φ(n)
Then detD = φ(1)φ(2) · · ·φ(n). Define the n by n matrix A = (aij) by
aij =
1 if i | j
0 otherwise.
Then A is an upper triangular matrix (See Definition ??) with all diagonal
entries equal to 1. Hence detA = 1 = detAt. Set S = AtDA. Then
detS = detAt · detD · detA
= 1 · (φ(1)φ(2) · · ·φ(n)) · 1
= φ(1)φ(2) · · ·φ(n).
We now show that S = (sij), where sij = (i, j). This would prove our
statement.
Chapter 3 Basics of Number Theory 154
Now At = (bij), where bij = aji. Hence if D is the matrix (dαβ), then the
(i, j)-th entry of S is given by:
sij =n∑
α=1
n∑
β=1
biαdαβaβj.
Now dαβ = 0 if α 6= β, and dαα = φ(α) for each α. Therefore
sij =n∑
α=1
biαdααaαj
=n∑
α=1
biαaαjφ(α)
=n∑
α=1
aαiaαjφ(α)
Now by definition aαi = 0 iff α ∤ i and aαi = 1 if α | i. Hence the nonzero
terms of the last sum are given by those α that divide i as well as j. Now
α | i and α | j iff α | (i, j). Thus sij =∑
α|(i, j)φ(α). But the sum on the right
is, by Theorem 3.11.5, (i, j).
3.12 Exercises
1. Find those n for which φ(n) | n.
2. An arithmetical function f is called multiplicative if it is not identically
zero and f(mn) = f(m)f(n) whenever (m, n) = 1. It is completely
multiplicative if f(mn) = f(m)f(n) for all positive integers m and n.
Prove the following:
(i) φ is multiplicative but not completely multiplicative.
Chapter 3 Basics of Number Theory 155
(ii) µ is multiplicative but not completely multiplicative.
(iii) d(n), the number of positive divisors of n is not even multiplica-
tive.
(iv) If f is multiplicative, then prove that
∑
d|nµ(d)f(d) =
∏
p|n
p=a prime
(1− 1
p
).
3. Prove that the sum of the positive integers less than n and prime to n
is1
2nφ(n).
4. Let σ(n) denote the sum of the divisors of n. Prove that σ(n) is multi-
plicative. Hence prove that if n = par
1 · · · parr is the prime factorization
of n, then
σ(n) =r∏
i=1
pai+1i − 1
pi − 1.
3.13 The big O notation
The big O notation is used mainly to express an upper bound for a given
arithmetical function in terms of a another simpler arithmetical function.
Definition 3.13.1:
Let f : N→ C be an arithmetical function. Then f(n) is O(g(n)) (read big
O of g(n)), where g(n) is another arithmetical function provided that there
exists a constant K > 0 such that
|f(n)| ≤ K|g(n)| for all n ∈ N.
More generally, we have the following definition for any real-valued func-
tion.
Chapter 3 Basics of Number Theory 156
Definition 3.13.2:
Let f : R → C be a real valued function. Then f(x) = O(g(x)
), where
g : R→ C is another function if there exists a constant K > 0 such that
|f(x)| ≤ K|g(x)| for each x in R.
An equivalent formulation of Definition 3.13.1 is the following.
Definition 3.13.3:
Let f : N → C be an arithmetical function. Then f(n) is O(g(n)), where
g(n) is another arithmetical function if there exists a constant K > 0 such
that
|f(n)| ≤ K|g(n)| for all n ≥ n0, for some positive integer n0 (3.29)
Clearly, Definition 3.13.1 implies Definition 3.13.3. To prove the converse,
assume that (3.29) holds. Choose positive numbers c1, c2, . . . , cn0−1 such that
|f(1)| < c1|g(1)|, |f(2)| < c2|g(2)| . . . , |f(n − 1)| < cn−1|g(n − 1)|. Let
K0 =max (c1, c2, . . . , cn0−1, K). Then
|f(n)| ≤ K0|g(n)| for each n ∈ N.
This is precisely Definition 3.13.1.
The time complexity of an algorithm is the number of bit operations
required to execute the algorithm. If there is just one input, and n is the size
of the input, the time complexity is a function T (n) of n. In order to see that
T (n) is not very unwieldy, usually, we try to express T (n) = O(g(n)), where
g(n) is a known less complicated function of n. The most ideal situation is
where g(n) is a polynomial in n. Such an algorithm is known as a polynomial–
Chapter 3 Basics of Number Theory 157
time algorithm. We now give the definition of a polynomial–time algorithm
where there are, not just one, but k inputs.
Definition 3.13.4:
Let n1, . . . , nr be positive integers and let ni be a ki-bit integer (so that the
size of ni is ki), 1 ≤ i ≤ r. An algorithm to perform a computation involving
n1, . . . , nr is said to be a polynomial–time algorithm if there exist nonnegative
integers m1, . . .mr such that the number of bit operations required to perform
the algorithm is the O(km11 . . . kmr
r ).
Recall that the size of a positive integer is the number of bits in it. For
instance, 8 = (1000) and 9 = (1001). So both 8 and 9 are of size 4. In fact
all numbers n such that 2k−1 ≤ n < 2k are k-bits. Taking logarithms with
respect to base 2, we get k− 1 ≤ log2 n < k and hence k− 1 ≤ ⌊log2 n⌋ < k,
so that ⌊log2 n⌋ = k−1.Thus k = 1+⌊log2 n⌋ and hence k is O(log2 n). Thus
we have proved the following result.
Theorem 3.13.5:
The size of n is O(log2 n).
Note that in writing O(log n), the base of the logarithm is immaterial.
For, if the base is b, then any number that is O(log2 n) is O(logb n) and vice
verse. This is because log2 n = logb n, log2b, and log2 b can be absorbed in
the constant K of Definition 3.13.1.
Example 3.13.6:
Let g(n) be a polynomial of degree t. Then g(n) is O(nt).
Chapter 3 Basics of Number Theory 158
Proof. Let g(n) = a0nt + a1n
t−1 + · · ·+ at, ai ∈ R. Then
|g(n)| ≤|a0|nt + |a1|nt−1 + . . .+ |at|
≤nt(|a0|+ |a1|+ . . .+ |at|)
=Knt, where K =t∑
i=0
|ai|.
Thus g(n) is O(nt).
We now present two examples of a polynomial time algorithm.
We have already described (see Section 3.3) Euclid’s method of computing
the gcd of two positive integers a and b.
Theorem 3.13.7:
Euclid’s algorithm is a polynomial time algorithm.
Proof. We show that Euclid’s algorithm of computing the gcd (a, b), a > b,
can be performed in time O(log3 a) .
Adopting the same notation as in (3.5), we have
rj = qj+2rj+1 + rj+2, 0 ≤ rj+2 < rj+1.
Now the fact that qj+2 ≥ 1 gives,
rj ≥ rj+1 + rj+2 > 2rj+2,
Hence rj+2 < 12rj for each j. This means that the remainder in every
other step in the Euclidean algorithm is less than half of the original re-
mainder. Hence if a = O(2k), then are at most k steps in the Euclidean
Chapter 3 Basics of Number Theory 159
algorithm. Now, how about the number of arithmetic operations in each
step? In equation (3.5), the number rj is divided by rj+1 and the remain-
der rj+2 is computed. Since both rj and rj+1 are numbers less than a they
are O(log a). Hence equation (3.5), involves O(log2 a) bit operations. Since
there are k = O(log a) such steps, the total number of bit operations is
(O(log3 a)).
Next we show that the modular exponentiation ac(mod m) for positive
integers a, c and m can be performed in polynomial time. Here we can take
without loss of generality that a < m (Because if a ≡ a′(mod m), then
ac ≡ (a′)c (mod m), where a′ can be taken to be < m). We now show that
ac(mod m) can be computed in O(log a log2m) bit operations. Note that
O(log a) is O(logm), Write c in binary. Let
c = (bk−1bk−2 · · · b1b0) in the binary scale.
Then c = bk−12k−1 + bk−22
k−2 + · · · + b020, and therefore ac = abk−12
k−1 ·
abk−22k−2 · ab12ab0 , where each bi = 0 or 1. We now compute ac(mod m)
recursively by reducing the number computed at each step by mod m. Set
y0 = abk−1 = a
y1 = y20a
bk−2 = abk−1·2abk−2
y2 = y21a
bk−3 = abk−122
abk−22abk−3
...
yi+1 = y2i a
bk−i−2 = abk−12i
abk−22i−1 · · · abk−i−2
...
yk−1 = y2k−2a
bk−k = abk−12k−1
abk−22k−2 · · · ab0
Chapter 3 Basics of Number Theory 160
= a(bk−12k−1+bk−22
k−2+···+b0) = ac( mod m).
There are k−1 steps in the algorithm. Note that yi+1 is computed by squaring
yi and multiplying the resulting number by 1 if bk−i−2 = 0 or else multiplying
the resulting number by a if bk−i−2 = 1. Now yi(mod m) being a O(logm)
number, to compute y2i , we make O(log2m) = O(t2) where t = O(log2m)
bit operations. yi being t-bit, y2i is a 2t or (2t + 1) bit number and so
it is also a O(t)-bit number. Now we reduce y2i modulo m, that is, we
divide the O(t) number y2i by the O(t) number m. Hence this requires an
additional O(t2) bit operations. Thus in all we have performed until now
O(t2) +O(t2) bit operations, that is O(t2) bit operations. Having computed
y2i (mod m), we next multiply it by a0 or a1. As a is O(log2m) = O(t),
this requires O(t2) bit operations. Thus in all, computation of yi+1 from yi
requires O(t2) bit operations. But then there are k − 1 = O(log2 c) steps
in the algorithm. Thus the number of bit operations in the computation of
ac(mod m) is O(log2 c log22m) = O(kt2). Thus the algorithm is a polynomial–
time algorithm.
Next, we give an algorithm that is not a polynomial–time algorithm.
(Sieve of Eratosthenes)
Chapter 4
Mathematical Logic
The study of logic can be traced back to the ancient Greek philosopher Aris-
totle (384–322 B.C ). Modern logic started seriously in mid-19th century
mainly due to the British mathematicians George Boole and Augustus de
Morgan. The German mathematician and philosopher Gottlob Frege (1848–
1925) is widely regarded as the founder of modern mathematical logic. Logic
is implicit in every form of common reasoning. It is concerned with the rela-
tionships between language (syntax), reasoning (deduction and computation)
and meaning (semantics). A simple and popular definition of logic is that it
is the analysis of the methods of reasoning. In the study of these methods,
logic is concerned with the form of arguments rather than the contents or the
meanings associated with the statements. To illustrate this point, consider
the following arguments:
(a) All men are mortal. Socrates is a man. Therefore Socrates is a mortal.
(b) All cyclones are devastating. Bettie is a cyclone. Therefore Bettie is
devastating.
161
Chapter 4 Mathematical Logic 162
Both (a) and (b) have the same form: All A are B; x is an A; therefore x
is a B. The truth or the falsity of the premise and the conclusion is not the
primary concern. Again, consider the following pattern of argument:
The program gave incorrect output because of incorrect input or a bug.
The input has been validated to be correct.
Therefore, the incorrect output is due to a bug.
We can extract the pattern of reasoning: A occurs due to B or due to
C; B is shown to be absent; therefore, A has occurred due to C. In general,
whether a given set of premises logically lead to a conclusion is of interest.
Logic provides the formal basis to the theory of Computer Science (e.g.,
Computability, Decidability, Complexity and Automata Theory) and Math-
ematics (e.g., Set Theory). The classical theory of computation has it origins
in the works of logicians such as Godel, Turing, Church, Kleene, Post and
others in the 1930s. In Computer Science, the methods employed (primarily,
programming activity) in its study are themselves rooted in logic. Digital
logic is at the heart of the operational functions in all modern digital comput-
ers. Logical techniques have been successful in understanding and formalizing
algorithms, program development, program specification. and program veri-
fication. More practical examples can be cited. Predicate Calculus has been
successfully employed in the design of relational database query languages.
The programming language PROLOG is a practical realization of programming
with logic. Also, relational databases can be extended with logical rules to
give deductive databases.
The focus of the present chapter is on the first principles of propositional
calculus and predicate calculus (together, they comprise first-order logic).
Chapter 4 Mathematical Logic 163
We start with informal ideas in the next section.
4.1 Preliminaries
In simple terms, an assertion can be defined as a statement. Consider the
following examples:
“This topic is very interesting”
“I am writing this sentence using English alphabets”
“x+ y < z”
A proposition is an assertion which is either true or false but not both at
any time. Thus the following statements are (mathematical) propositions:
“ 2 is an even number”
“ 52! is always less than 100!”
“If x, y and z are the sides of a triangle, then x+ y = z”
An assertion need not be always a proposition. This can be illustrated by
the following two examples:
(i) Let C be any input-free computational procedure. Assume that there
is a decision procedure HALT( ) such that HALT(C) returns true if C
halts and returns false otherwise. Now consider the procedure given
below: procedure ABSURD;
begin if HALT(ABSURD) then
while true do ‘‘print Running’’
end
Chapter 4 Mathematical Logic 164
Next, consider the assertion, “ procedure ABSURD never halts”—it is not
too difficult to reason that this assertion cannot be assigned a truth
value consistent with the behaviour of procedure ABSURD. In this case
the absurdity of procedure ABSURD arises because it is assumed that
HALT( ) exists with the stated behaviour.
(ii) In English, call a word homological if the word’s meaning applies to
itself; otherwise, call it heterological. For instance, the words English,
erudite, polysyllabic are all homological because English is an English
word, erudite is in the learned people’s vocabulary, and polysyllabic is
made of more than one syllable. By a similar reasoning it follows that
German, monosyllabic and indecipherable are all heterological. Now
consider the following statement:
“Heterological is heterological”.
It is not difficult to reason that a truth-value true or false cannot be
assigned to the above assertion (consistent with the defined meanings)
and hence it is not a proposition.
In the subsequent discussions we are mostly concerned with different
propositions and their truth values (the values can be true denoted by ⊤
or false denoted by ⊥) rather than their actual meanings. In such cases it
is convenient to denote the propositions simply by letters of English.. Thus
we can speak of a propositional variable p whose truth value is True i.e.,
⊤. Using different types of logical connectives, we form compound state-
ments from simple statements. We say that atomic statements are those
that have no connectives. Logical connectives (or operators) are precisely
Chapter 4 Mathematical Logic 165
defined by specifying appropriate truth tables which define the value of a
compound statement, given the values for the propositional variables. The
basic connectives are ¬ (read not, indicating negation), ∧ (read and, indicat-
ing conjunction) and ∨ (read or, indicating disjunction) are defined by the
following truth tables:
p ¬p p q p∧q p q p∨q
⊤ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥
⊥ ⊤ ⊥ ⊤ ⊥ ⊥ ⊤ ⊤
⊤ ⊥ ⊥ ⊤ ⊥ ⊤
⊤ ⊤ ⊤ ⊤ ⊤ ⊤
The logical connectives ⊕ (called exclusive-or), ⇒ (called implication) and
⇔ (called biconditional) are defined by the following truth tables:
p q p⊕ q p q p⇒ q p q p⇔ q
⊥ ⊥ ⊥ ⊥ ⊥ ⊤ ⊥ ⊥ ⊤
⊥ ⊤ ⊤ ⊥ ⊤ ⊤ ⊥ ⊤ ⊥
⊤ ⊥ ⊤ ⊤ ⊥ ⊥ ⊤ ⊥ ⊥
⊤ ⊤ ⊥ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤
The truth value of p⊕ q is ⊤ if and only if exactly one of p or q has a truth
value ⊤.
In the statement, p ⇒ q we say that p is the premise, hypothesis or
antecedent and we say q is the conclusion or consequence. To understand
the meaning of p⇒ q, we first remark that, no relationship is to be assumed
between p and q, unlike in ordinary reasoning using the verb “implies”. Let
p denote “I have money” and let q denote “I own a car”. Then, it appears
that the implication, “I have money” implies “I don’t own a car” is false if we
Chapter 4 Mathematical Logic 166
realize that money is sufficient to buy and therefore own a car. By a similar
reasoning, the implication, “I have money” implies “I own a car” appears to
be true. The implication, “I don’t have money” implies “I don’t own a car”
appears to be not a false statement. The implication, “I don’t have money”
implies “I own a car” is not clear. If money is a necessary prerequisite to
buy, and therefore own a car, then this last implication is false. But since p
and q are unrelated, we can, in a relaxed manner, reason that in the absence
of money also, owning a car may be possible. We make this allowance and
take the implication to be true.
If p ⇒ q is true then p is said to be a stronger assertion than q. For
example, consider the implications,
(x is a positive integer) ⇒ (x is an integer)
( ABC is equilateral) ⇒ ( ABC is isosceles).
It certainly follows that the premise is a stronger assertion. The assertion,
( ABC is equilateral) ⇒ ( ABC has two equal angles)
can be made stronger in two different ways as given below:
( ABC is isosceles) ⇒ ( ABC has two equal angles)
( ABC is equilateral) ⇒ ( ABC has three equal angles).
This means that we can strengthen p⇒ q by making q stronger or p weaker.
We note that the expression p ⇒ q is exactly equivalent to the expression
¬ p∨q in the sense that both have identical truth table. Given the statement
p ⇒ q we say that the converse statement is q ⇒ p while the contrapositive
statement is ¬ q ⇒ ¬ p. We can verify using the truth table that if p ⇒ q
Chapter 4 Mathematical Logic 167
then we have ¬ q ⇒ ¬ p. This can be illustrated by the following examples:
Example 4.1.1:
f(x) = x2 ⇒ f ′(x) = 2x has the contrapositive form f ′(x) 6= 2x⇒ f(x) 6= x2
Example 4.1.2:
a ≥ 0⇒ √a is real has the contrapositive form√a is not real ⇒ a < 0.
The biconditional of p and q has the truth-value ⊤ if p and q both have the
truth-value ⊤ or both have the truth-value ⊥; otherwise, the biconditional
has truth-value ⊥.
4.2 Fully Parenthesized Propositions and their
Truth Values
Using the different logical connectives introduced above, given any set of
prepositional variables, we can form meaningful fully parenthesized proposi-
tions or formulas . Below, we first define the rules which describe the syntactic
structure of such formulas:
(i) T and F are constant propositions.
(ii) Assertions denoted by small letters such as p, q etc., are atomic propo-
sitions or formulas (which are defined to take values ⊥ or ⊤).
(iii) If P is any (atomic or not) proposition then (¬P ) is also a proposition.
Chapter 4 Mathematical Logic 168
(iv) If P and Q are any propositions then so are (P ∧Q), (P ∨Q), (P ⊕Q),
(P ⇒ Q) and (P ⇔ Q).
Note:
(i) An atomic formula or the negation of an atomic formula is called a
literal.
(ii) When there is no confusion, we relax the above rules and we will denote
(P ) simply by P , (R ∧ S) simply by R ∧ S etc.
(iii) If one formula is a substring of another, we say that it is a subformula
of the other—an atomic subformula if it is atomic e.g., if R =(
(A ∨
B)∧ (¬B ∨C))
, then trivially R itself is a subformula. Also, (A∨B),
(¬B ∨ C), A, B, ¬B and C are subformulas. A, B and C are atomic
subformulas.
The rules given above specify the syntax for propositions formed using
the different logical connectives. The meaning or semantics of such formulas
is given by specifying how to systematically evaluate any fully parenthesized
proposition. The following cases arise in evaluating a fully parenthesized
proposition:
Case 1. The value of the proposition T is ⊤; the value of the proposition F
is ⊥.
Case 2. The values of (¬P ), (P ∧ Q), (P ∨ Q), (P ⊕ Q), (P ⇒ Q) and
(P ⇔ Q), when the values of P and Q are specified, are determined by
using the corresponding operator’s truth table (described above). In
the truth tables above, we allow p to be any formula P and q to be any
formula Q.
Chapter 4 Mathematical Logic 169
Case 3. The value of a formula with more than one operator is found by
repeatedly applying the Case 2 to the subformulas and replacing every
subformula by its value until the given proposition is reduced to ⊥ or
⊤.
The following examples illustrate the systematic evaluation of propositions.
Example 4.2.1:
Consider evaluating(
(T ∧T )⇒ F)
. We first substitute the values for T and
F and get the proposition(
(⊤∧⊤)⇒ ⊥)
. This is first reduced to (⊤ ⇒ ⊥)
by evaluating (⊤ ∧⊤) and then is reduced to ⊥.
Example 4.2.2:
To construct the truth table for the proposition(
(p ⇒ q) ∧ (q ⇒ p)), we
proceed in stages by assigning all possible combinations of truth values to the
prepositions p and q. We then evaluate the “inner” propositions and finally
determine the truth-value of the given proposition. This is summarized in
the following truth table which contains the intermediate results also.
p q p⇒ q q⇒ p(
(p⇒ q) ∧ (q⇒ p))
⊥ ⊥ ⊤ ⊤ ⊤
⊥ ⊤ ⊤ ⊥ ⊥
⊤ ⊥ ⊥ ⊤ ⊥
⊤ ⊤ ⊤ ⊤ ⊤
We note that the given formula has truth value ⊤ whenever p and q have
identical truth values.
Example 4.2.3:
Chapter 4 Mathematical Logic 170
To construct the truth table for the proposition((
(p ∧ q) ∨ (¬ r))⇔ p
)we
have to consider all combinations of truth values to p, q and r. This results
in the following table.
p q r (p ∧ q) (¬ r)(
(p ∧ q) ∨ (¬ r)) ((
(p ∧ q) ∨ (¬ r))⇔ p
)
⊥ ⊥ ⊥ ⊥ ⊤ ⊤ ⊥
⊥ ⊥ ⊤ ⊥ ⊥ ⊥ ⊤
⊥ ⊤ ⊥ ⊥ ⊤ ⊤ ⊥
⊥ ⊤ ⊤ ⊥ ⊥ ⊥ ⊤
⊤ ⊥ ⊥ ⊥ ⊤ ⊤ ⊤
⊤ ⊥ ⊤ ⊥ ⊥ ⊥ ⊥
⊤ ⊤ ⊥ ⊤ ⊤ ⊤ ⊤
⊤ ⊤ ⊤ ⊤ ⊥ ⊤ ⊤
The last two examples above illustrate that, given a propositional formula
we can determine its truth value by considering all possible combinations of
truth values to its constituent atomic variables. On the other hand given the
propositional variables (such as p, q, r, etc.), for each possible combination of
truth values to these variables we can define a functional value by assigning
a corresponding truth value (conventionally denoted in Computer Science
by 0 or 1) to the function. This corresponds to the definition of a Boolean
function.
Definition 4.2.4:
A Boolean function is a function f : 0, 1n → 0, 1, where n ≥ 1.
Thus a truth table can be considered as a tabular representation of a
Boolean function and every propositional formula defines a Boolean function
Chapter 4 Mathematical Logic 171
by its truth table. The notion of assigning a combination of truth values ⊤
and ⊥ to atomic variables is captured by a truth-assignment, A defined as
follows:
Definition 4.2.5:
A truth-assignment A is a function from the set of atomic variables to the
set ⊥, ⊤ of truth values.
A given proposition is said to be well-defined with respect to a truth-
assignment if each atomic variable in the proposition is associated with either
the value ⊥ or the value ⊤. Thus any well-defined proposition can be evalu-
ated by the technique described in the above examples. On the other hand,
if A is a truth-assignment and P is a formula, we say A is appropriate to P
if each atomic variable of P is in the domain of A. Note that we can extend
the idea of A—we have already done this in the evaluation of propositions
considered above under cases 1, 2 and 3 above. We repeat this in a more
formal way in the following definition.
Definition 4.2.6:
Let A be any function from the set of atomic variables to ⊥, ⊤. Then, we
extend the notion of A as follows:
1. A(R ∨ S) = ⊤ if A(R) = ⊤ or A(S) = ⊤
= ⊥ otherwise.
2. A(R ∧ S) = ⊤ if A(R) = ⊤ and A(S) = ⊤
= ⊥ otherwise.
Chapter 4 Mathematical Logic 172
3. A(¬R) = ⊤ if A(R) = ⊥
= ⊥ otherwise.
4. A(R⇒ S)= ⊤ if A(R) = ⊥ or A(S) = ⊤
= ⊥ otherwise.
5. A(R⇒ S)= ⊤ if A(R) = A(S)
= ⊥ otherwise.
6. A(R⊕ S) = ⊤ if A(R) 6= A(S)
= ⊥ otherwise.
4.3 Validity, Satisfiability and Related Con-
cepts
. Let P be a given formula and let A be a truth-assignment appropriate to
P . If A(P ) = ⊤ then we say that A verifies P ; if A(P ) = ⊥ then we say
that A falsifies P . Also if we have a set S = P1, P2, . . . , Pk of formulas
and if A is appropriate to each Pi(1 ≤ i ≤ n) then we say that A verifies
S if A(Pi) = ⊤ for all i. If A verifies a formula or a set of formulas then
A is said to be a model for that formula or set of formulas. Note that any
truth-assignment is a model for the empty set.
A formula or a set of formulas is satisfiable if it has at least one model;
otherwise the formula or set of formulas is unsatisfiable. We remark that the
general problem of determining whether formulas in propositional calculus
are satisfiable is a tough problem—no one has found and efficient algorithm
Chapter 4 Mathematical Logic 173
for this problem but no one has proved that an efficient solution does not
exist. In classical complexity theory, the problem is termed NP-Complete.
A formula P is valid if every truth-assignment appropriate to P , verifies
P . We then call P , a tautology or a universally valid formula.
Trivially T is a tautology and F is not. It easily follows that the proposi-
tion (P ∨¬P ) is a tautology. It is easy to see that if S and T are equivalent,
then (S ⇔ T ) is a tautology.
A tautology can be established by constructing a truth table as the fol-
lowing example shows.
Example 4.3.1:
To show that(
(¬ p)⇒ (p⇒ q))
is a tautology.
We do this by constructing the following truth-table:
p q (¬p) (p⇒ q)((¬p)⇒ (p⇒ q)
)
⊥ ⊥ ⊤ ⊤ ⊤
⊥ ⊤ ⊤ ⊤ ⊤
⊤ ⊥ ⊥ ⊥ ⊤
⊤ ⊤ ⊥ ⊤ ⊤
If P is a formula and if, for all truth-assignments A appropriate to P , if
A(P ) = ⊥ then P is called a contradiction. Obviously the negation of a
contradiction is a tautology.
We also say that a statement P (not atomic, in general) tautologically
implies a statement Q (not atomic, in general) if and only if (P ⇒ Q)
is a tautology. For example it is easy to verify that((p ∧ q) ⇒ p
)and
((p∧(p→ q)
)⇒ q
)are tautological implications. The following tautological
Chapter 4 Mathematical Logic 174
implications are well-known:
(i) Addition: p⇒ (p ∨ q)
(ii) Simplification: (p ∧ q)⇒ p
(iii) Modus Ponens:(p ∧ (p⇒ q)
)⇒ q
(iv) Modus Tollens:(
(p⇒ q) ∧ ¬ q)⇒ ¬ p
(v) Disjunctive Syllogism:(¬ p ∧ (p ∨ q)
)⇒ q
(vi) Hypothetical Syllogism:(
(p⇒ q) ∧ (q ⇒ p))⇒ (p⇒ r)
Consider any formula P with n distinct atomic subformulas. There are
only 2n different truth-assignments possible to the n atomic.subformulas.
It is quite possible that there may be a syntactically different (in particular,
shorter) formula Q such that under any of the 2n possible truth-assignments,
P and Q both evaluate to ⊤ or both evaluate to ⊥. In such a case P and Q
are equivalent. As an extreme case, consider the formula(
(¬ p)⇒ (p⇒ q))
of Example 4.3.1 above. This formula was shown to be a tautology and so it
is equivalent to T . In general if we show that P is equivalent to a syntacti-
cally shorter formula Q, we can construct the truth-table for Q (and hence
for P ) with lesser effort. The laws of equivalence, stated below, will be useful
in transforming a given P to an equivalent Q.
Definition 4.3.2:
Two propositional formulas P and Q are said to be equivalent (and we write
P ≡ Q), if and only if they are assigned the same truth-value by every truth
assignment appropriate to both.
Chapter 4 Mathematical Logic 175
We note that P ≡ Q is a shorthand for the assertions that P and Q are
equivalent formulas. On the other hand, the formula (P ⇔ Q) is not an
assertion. It is easy to see that (logical) equivalence between two formulas
is an equivalence relation on the set of formulas. It is also easy to see that
any two tautologies are equivalent and any two unsatisfiable formulas are
equivalent. This means that two formulas with different atomic subformulas
can be equivalent.
The following laws of equivalence are well-known:
i) Idempotence of ∧ and ∨ : (a) P ≡ (P ∧ P ) and
(b) P ≡ (P ∨ P )
ii) Commutativity Laws : (a) (P ∧Q) ≡ (Q ∧ P ),
(b) (P ∨Q) ≡ (Q ∨ P )
and (c) (P ⇔ Q) ≡ (Q⇔ P )
iii) Associativity Laws : (a)(
(P ∨Q) ∨R)≡(P ∨ (Q ∨R)
)and
(b)(
(P ∧Q) ∧R)≡(P ∧ (Q ∧R)
)
iv) Distributive Laws : (a) P ∧ (Q ∨R) ≡ (P ∧Q) ∨ (Q ∧R) and
(b) P ∨ (Q ∧R) ≡ (P ∨Q) ∧ (Q ∨R)
v) de Morgan’s Laws : (a) ¬ (P ∨Q) ≡ (¬P ∧ ¬Q)
and (b) ¬ (P ∧Q) ≡ (¬P ∨ ¬Q)
vi) Law of Negation : ¬ (¬P ) ≡ P
vii) Law of the excluded Middle : P ∨ ¬P ≡ T
viii) Law of Contradiction : P ∧ ¬P ≡ F
ix) Law of Implication : (P ⇒ Q) ≡ (¬P ∨Q)
x) Equivalence or Law of Equality: P ⇔ Q ≡ (P ⇒ Q) ∧ (Q⇒ P )
xi) Laws of OR-Simplification : (a) P ∨ P ≡ P , (b) P ∨ T ≡ T ,
Chapter 4 Mathematical Logic 176
(c) P ∨ F ≡ P and (d) P ∨ (P ∧Q) ≡ P
xii) Laws of AND-Simplification : (a) P ∧ P ≡ P , (b) P ∧ T ≡ P ,
(c) P ∧ F ≡ F and (d) P ∧ (P ∨Q) ≡ P
xiii) Law of Exportation : P ⇒ (Q⇒ R) ≡ (P ∧Q)⇒ R
xiv) Law of Contrapositivity : (P ⇒ Q) ≡ (¬Q⇒ ¬P )
Obviously, each of the above laws, of the form P ≡ Q, can be proved by
proved by constructing the truth-tables for P and for Q separately and by
checking that they are identical.
Example 4.3.3:
We show that it is possible to use the laws of equivalence to check if a formula
is a tautology. Consider the formula P where,
P =(¬ (p⇒ q) ∧ ¬ (¬ p⇒ (q ∨ r)
))⇒ (¬ q ⇒ r)
Eliminating the main implication (using the law of implication), we get,
¬(¬ (p⇒ q) ∧ ¬
(¬ p⇒ (q ∨ r)
))∨ (¬ q ⇒ r)
Next, using de Morgan’s law, we get,
¬¬ (p⇒ q) ∨ ¬¬(¬ p⇒ (q ∨ r)
)∨ (¬ q ⇒ r)
Application of the law of negation gives,
(p⇒ q) ∨(¬ p⇒ (q ∨ r)
)∨ (¬ q ⇒ r)
Again, using the laws of implication and negation, we get,
(¬ p ∨ q) ∨(p ∨ (q ∨ r)
)∨ (q ∨ r)
Chapter 4 Mathematical Logic 177
Using laws of Associativity, Commutativity and OR-simplification, we get,
¬ p ∨ p ∨ q ∨ r
This last formula, by laws of Excluded Middle and OR-simplification evalu-
ates to T. We therefore conclude that the original formula P is a tautology.
Let P ≡ Q and let R be any formula involving P ; say, R =(
(A ∨ P ) ∧(¬ (C ∧ P )
)). In the process of the evaluation of R, every occurrence of P
will result in a truth-value. In each such occurrence of P in R, the same
truth-value will result if any (all) occurrence(s) of P is (are) replaced by Q
in R. Thus, in general, if R′ is the formula that results by replacing some
occurrence of P by Q within R, then we will have R ≡ R′.
4.4 Normal forms
Given the formulas F1, F2, . . . , Fn , we write,
(a)(∨n
i=1Fi
)for (F1 ∨ F2 ∨ . . . ∨ Fn)
(b)(∧n
i=1Fi
)for (F1 ∧ F2 ∧ . . . ∧ Fn)
—this is possible because the associativity laws imply that in nested con-
junctions or disjunctions the grouping of the subformulas is immaterial. We
use the above notations even when n = 1, in which case, both representa-
tions simply denote F1. We call (a) and (b) respectively the disjunction and
conjunction of the formulas F1, F2, . . . , Fn.
Chapter 4 Mathematical Logic 178
4.4.1 Conjunctive and Disjunctive Normal Forms
Definition 4.4.1:
A formula is in conjunctive normal form (CNF) if it is a conjunction of
disjunctions of literals. Similarly, a formula is in disjunctive normal form
(DNF) if it is a disjunction of conjunctions of literals.
For example, the formula(
(a∧ b)∨ (¬ a∧ c))
is in DNF and the formula(
(a∨ b)∧ (b∨ c∨¬ a)∧ (a∨ c))
is in CNF. The formulas (a∨ b) and (a∧¬ c)
are both in CNF as well as in DNF.
(Notation: if F = ¬G, then let F ′ = G; if F is not a a negation of any
formula, let F ′ = ¬F ; this fits well with the case if F is atomic say A; then
A′ = ¬A and ¬A′ = A)
Theorem 4.4.2:
Every formula has at least one CNF and at least one DNF representation.
Furthermore, there is an algorithm that transforms any given formula into a
CNF or a DNF as desired.
Proof. The proof is by induction on the number of occurrences of the logical
connectives in the given formula. A formula with no logical connectives is
atomic and is already in CNF or DNF. We now assume that the theorem is
true for formulas with k or fewer occurrences of logical connectives. Let F
be the formula with (k+ 1) occurrences of logical connectives. The following
cases arise:
(a) F = ¬G for some formula G with k occurrences of logical connectives.
Chapter 4 Mathematical Logic 179
By induction hypothesis, we can find both CNF and DNF of G. If
(n∧
i=1
(mi∨j=1
Gij
))
is a CNF of G then by de Morgan’s Law,
(n∨
i=1
(mi∧j=1
G′ij
))
is a DNF of F .
Similarly a CNF of F can be obtained from a DNF of G.
(b) F = (G1 ∨G2) for some formulas G1 and G2 each of which has k or
fewer occurrences of logical connectives. To obtain a DNF of F we simply
take the disjunction of a DNF of G1 and DNF of G2. To obtain a CNF of
F , find CNFs of G1 and G2, such as(
m∧i=1
Hi
)and
(n∧
j=1Ji
), where H1, . . . ,
Hm and J1, . . . , Jn are disjuncts of literals, then by Distributivity Law
(m∧
i=1
(n∧
j=1(Hi ∨ Ji)
)),
is equivalent to (G1 ∨G2). Each formula (Hi ∨ Ji) is a disjunction of literals,
so this formula is a CNF of F.
(c) F = (G1 ∧ G2) for some formulas G1 and G2. This case is entirely
analogous to case (b) above.
(d) F = (G1 ⇒ G2). As (G1 ⇒ G2) can be written in the equivalent form
¬G1 ∨G2. we can apply the reasoning of case (b)
(e) F = (G1 ⇔ G2). The proposition can be written as (G1 ⇒ G2) ∧
(G2 ⇒ G1). We can apply case (c) now.
Chapter 4 Mathematical Logic 180
4.5 Compactness
In Mathematics, any property that can be inferred of a countably infinite
set, on the basis of finite “approximations” to that set is called the compact-
ness phenomenon. Compactness is a fundamental property of formulas in
propositional calculus which is stated in the following theorem:
Theorem 4.5.1:
A set of formulas is satisfiable if and only if each of its finite subsets is
satisfiable. (Alternatively, if a set of formulas is unsatisfiable, some finite
subset of the given set must be unsatisfiable.)
Proof. Given an infinite set of formulas which is satisfiable, it easily follows
that any of its finite subsets is satisfiable. Proving the converse is harder.
Now, let S be an infinite set of formulas such that each finite subset
of S is satisfiable. Let A1, A2, A3, . . . , An, be the listing of all atomic
subformulas in S. For each n ≥ 0, denote by sn, the set of all formulas
whose atomic formulas are among A1, A2, A3, . . . , An. Obviously in sn
there are infinitely many formulas. However, it is not too difficult to see
that there are only 22n
equivalence classes in sn where all the formulas in
any one class are all equivalent to one another. Since each finite subset of
S is satisfiable S ∩ sn is also satisfiable for each n. To see this, we choose
from sn only one representative from the 22n
equivalence classes and any
truth-assignment verifying the representative subset verifies S ∩ sn also.
As S ∩ sn is satisfiable for each n = 0, 1, 2, ... there is a truth-assignment
An appropriate to sn such that An verifies S ∩ sn. But, this does not mean
that there is a k such that Ak verifies S; in fact, none of the An need even
Chapter 4 Mathematical Logic 181
be defined on all the atomic subformulas of S. We, however, claim that from
A0, A1, A2, . . . we can construct a truth-assignment A that does verify
S. For each n ≥ 0, this construction specifies how to build, a set Un of
truth-assignments. The construction specifies:
(a) For all n > 0, Un is a subset of Un−1.
(b) Un is an infinite subset of An, An+1,An+2, . . ..
(c) For 1 ≤ n ≤ m, any two truth-assignments in Um agree on the
truth-value assigned to the atomic formula An.
We set U0 = A0, A1, A2, . . .—this means, for n = 0 (a) through (c) are
true. Once Un has been defined, we define Un+1 as follows: By (b), Un
contains Ai for infinitely many i ≥ n+ 1. That is, the set i | i ≥ n+ 1 and
Ai ∈ Un is infinite. This set can be partitioned into two disjoint subsets say
J1 and J2 where:
J1 = i | i ≥ n+ 1 and Ai ∈ Un and Ai verifies An+1 and
J2 = i | i ≥ n+ 1 and Ai ∈ Un and Ai does not verify An+1
At least, one of J1 and J2 must be infinite because the union of two finite
sets is finite. Let the infinite one of these (or any one if both are infinite) be J
(= J1 or J2, whichever is infinite; = either, if both J1 and J2 are infinite) and
let Un+1 = Ai | i ∈ J. Then, Un+1 is definitely a subset of Un; also, Un+1
contains Ai for infinitely many i > n+ 1. Further, all the truth-assignments
in Un+1 agree on A1, . . . , An and on An+1.
We now define a truth-assignment, A : A1, A2, A3, . . . → ⊤, ⊥ as
follows: A(Am) is the common value of B(Am) for all B ∈ Un, n ≥ m (such
a B(Am) exists by (a) and (c) ). The truth-assignment A verifies S: if any
Chapter 4 Mathematical Logic 182
formula F ∈ S, then F ∈ S ∩ sm for some m and so An verifies F for all
n ≥ m. Hence, B verifies F for all B ∈ Un, if n ≥ m; and hence, A which
agrees with all these B ∈ Un for all n ≥ m on formulas in sm also verifies
F .
4.6 The Resolution Principle in Propositional
Calculus
We first introduce the idea of a clause and a clause set. Consider a CNF
formula F =(
(p ∨ q) ∧ (¬ r ∨ s ∨ t))
. There are other equivalent ways of
writing F e.g.,(
(q ∨ p) ∧ (¬ r ∨ t ∨ s))
. All such other syntactical forms of
F and F itself can therefore be just regarded as a set of set of literals. For
example, F is captured by the setp, q, t, s, ¬ r
.
Formally, a clause is a finite set of literals. Each disjunction of literals
corresponds to a clause and each nonempty clause corresponds to one or more
disjunction of literals. We also allow the empty set as a clause, in which case,
it does not correspond to any formula. We write 2 for the empty clause.
A clause set is a set of clauses (empty clause allowed), possibly empty and
may be infinite. Every CNF formula naturally corresponds to a clause set and
every finite clause set not containing the empty clause and not itself empty
corresponds to one or more formulas in CNF. The empty clause set is not
the same as the empty clause, although they are identical when considered
as sets. We write ∅ to denote the empty clause set.
We can carry forward the notion of a truth-assignment A as appropriate
to a clause set. If S is a clause set and every atomic formula in S is in the
Chapter 4 Mathematical Logic 183
domain of A then we say that A is appropriate to S. The truth-value of any
clause in S or the truth-value of S as a whole can be analogously determined
by taking the truth-value of a corresponding formula for the clause or the
clause set. We also say that A verifies a clause if and only if A verifies at
least one of its members and we say A verifies a clause set if and only if A
verifies each of its members.
Example 4.6.1:
Let S be the clause setp, q, ¬ r
. If a truth-assignment A verifies p,
verifies ¬ q and verifies ¬ r, then, A also verifies S. On the other hand, if A
verifies all of p, q and r then it follows that A does not verify S.
It follows that any truth-assignment A does not verify 2 because 2 has
no members for A to verify. But, any truth-assignment A verifies ∅ because
A verifies each clause in ∅ (of which there is none). On the other hand, A
does not verify 2 for any A. Thus, the difference between 2 and ∅ is
apparent. We can also say that two given clause sets are equivalent if any
truth-assignment appropriate each assigns to both, the same truth-value.
It follows that a clause or a clause set may be satisfiable, unsatisfiable
or a tautology. The empty clause set is a tautology but the empty clause
is unsatisfiable. Any clause set containing the empty clause is unsatisfiable.
The clause setp, ¬ p
, although not empty, is also unsatisfiable.
The resolution rule can be illustrated with a simple example. Consider
the clause set S =p, q, ¬ r, q, r, s
. To verify S, any appropriate
truth-assignment must verify p, q, ¬ r and must also verify q, r, s. This,
it is not difficult to reason, leads to the conclusion that the clause set p, q, s
Chapter 4 Mathematical Logic 184
must also be verified by the same truth- assignment. In fact the clause setp, q, ¬ r, q, r, s, p, q, s
can be shown to be equivalent to S. We say
that the clause set p, q, s is a resolvent of the clause sets p, q, ¬ r and
q, r, s.
Definition 4.6.2:
Let C1 and C2 be two clauses. Then the clause D is a resolvent of C1
and C2 if and only if for some literal l we have l ∈ C1 and ¬ l ∈ C2 and
D =(C1\l
)∪(C2\¬ l
).
Thus a resolvent of two clauses is any clause obtained by striking out a
complementary pair of literals, one from each clause and merging the remain-
ing literals into a single clause. Evidently, from C1 and C2, in general it is
possible to produce more than one D.
The Resolution Rule
Let S be a clause set with two or more elements and let D be a resolvent of
any two clauses in S. Then S ≡ S ∪ D.
Proof. Let A be a truth-assignment appropriate to S. Clearly, if A verifies
S ∪ D then A verifies S as well. Now, let it be that A verifies S and let
D =(C1\l
)∪(C2\¬ l
)where C1, C2 ∈ S, l ∈ C1 and ¬ l ∈ C2. As A
cannot verify both l and ¬ l, in order that A verifies both C1 and C2, it must
verify some literal in(C1\l
)∪(C2\¬ l
)i.e., some literal in D.
Starting with a clause set S, the resolution rule allows us to build a new
clause set S ′ by adding all resolvents to S (in the sense that S and S ′ are
Chapter 4 Mathematical Logic 185
equivalent). Now we can replace S by S ′ and add new resolvents again and
we can repeat this procedure as long as it is possible. We precisely formalize
this in the following:
Let S be a finite clause set with two or more elements. We define,
AddRes(S) = S ∪ D |D is a resolvent of two clauses in S.
Evidently, AddRes(S) ≡ S. Now, we let AddRes0(S) to be S
itself. We can then define,
AddResi+1(S) = AddRes (AddRes i(S)) for each i ≥ 0.
Finally we let, AddRes⋆(S) = ∪ AddRes i(S)) | i ≥ 0.
In other words, AddRes⋆(S) is the closure of S under the operation AddRes
of adding all resolvents of clauses already present. Since each clause in S is
finite, there are only a finite number of clauses that can be formed by the
atomic formulas appearing in S. Hence only a finite number of resolvents
can ever be formed starting from a finite S. So there exists an i > 0 such
that, AddResi+1(S) =AddResi(S). Then AddRes⋆(S) = AddRes i(S). By
induction, it follows that S ≡ AddRes⋆(S).
Determining AddRes⋆(S) by repeated application of the resolution rule
can be used to find out whether a clause set is satisfiable. In turn, this
implies that we can have a computational procedure to determine whether a
formula is satisfiable. This is possible because of the following theorem:
Theorem 4.6.3 (The Resolution Theorem):
A clause set S is unsatisfiable if and only if 2 ∈ AddRes⋆(S).
Proof. Clearly, if 2 ∈ AddRes⋆(S) then AddRes⋆(S) and so S is unsatisfiable.
To prove the converse, assume that S is unsatisfiable. By the Compactness
Chapter 4 Mathematical Logic 186
Theorem, some finite subset of S is unsatisfiable.
Let An(n ≥ 1) denote all possible atomic formulas appearing in S. Let
Cn be the set of all clauses that can be constructed using only A1, A2, . . . ,
An. We denote 2 by C0. Then there is an n > 0 such that S ∩ Cn and
hence AddRes⋆(S) ∩ Cn is unsatisfiable.
We now claim that for each k = n, n − 1, . . . , 0, and for each truth-
assignment A appropriate to A1, A2, . . . , An there exists some clause C in
AddRes⋆(S) ∩ Ck such that A does not verify C. That is, for every truth-
assignment that assigns truth-values to the first k atomic formulas, there is
some clause in AddRes⋆(S) that contains only these atomic formulas and is
falsified by the truth-assignment. Since the only clause that can be falsified
when k = 0 is 2, it follows that 2 ∈ AddRes⋆(S).
We apply induction to prove the above claim. For k = n, the claim is
immediate, since AddRes⋆(S) ∩ Cn is unsatisfiable.
Now suppose that the claim holds for some k + 1 but fails for k. Then
there is some truth-assignment A appropriate to A1, A2, . . . , Ak such that
A verifies AddRes⋆(S) ∩ Ck. Now let A1 and A2 be two truth-assignments
that assign the same truth values as A to A1, A2, . . . , Ak and are such that
A1 verifies Ak+1 but A2 falsifies Ak+1. By the induction hypothesis there are
clauses C1, C2 ∈ AddRes⋆(S) ∩ Ck+1 such that A1 does verify C1 and A2
does not verify C2. Now both C1 and C2 must contain Ak+1 as otherwise one
of them would be in AddRes⋆(S)∩Ck and would be falsified by A as well as
by A1 or by A2 also. It is easy to reason that C1 must contain ¬Ak+1 as its
member but not Ak+1 while C2 must contain Ak+1 and not ¬Ak+1. Thus, in
any case, A1 must verify C1 or A2 must verify C2. Then we can produce a
Chapter 4 Mathematical Logic 187
resolvent D from C1 and C2 as, D =(C1\¬Ak+1
)∪(C2\Ak+1
). Now,
D is in Ck because all occurrences of Ak+1 and ¬Ak+1 are discarded from
C1 and C2 in forming D. Also, D is in AddRes⋆(S) because it is a resolvent
of clauses in AddRes⋆(S). Therefore, D is in AddRes⋆(S) ∩ Ck. Further, A
does not verify D; for if it does then either A verifies D =(C1\¬Ak+1
)
or(C2\Ak+1
)and then A1 verifies C1 or A2 verifies C2. But this is a
contradiction because we assumed that A verifies AddRes⋆(S)∩Ck and here
we have got a D ∈ AddRes⋆(S) ∩ Ck such that A does not verify D. This
completes the induction and therefore the proof.
The resolution theorem can be used to determine the satisfiability of a
given formula. The following procedure precisely does this by brute-force:
Step 1: Convert the given formula into a CNF.
Step 2: From the CNF formula, get the corresponding clause set,
say S.
Step 3: Compute AddRes1(S), AddRes2(S), . . . ,
until AddResi+1(S) = AddRes i(S) for some i = 1.
Step 4: If 2 ∈ AddResi(S) then conclude that S is unsatisfiable;
otherwise S is satisfiable.
Strictly speaking, not all the intermediate resolvents in AddRes⋆(S) are
really needed to check whether 2 ∈ AddRes⋆(S). The following example is
illustrative.
Example 4.6.4:
Show that the following formula is unsatisfiable:((
(A∧B)∨ (A∧C)∨ (B∧C))∨(
(¬A∧¬B)∨ (¬A∧¬C)∨ (¬B∧¬C)))
Chapter 4 Mathematical Logic 188
The given formula corresponds to the clause set,
A, B, A, C, B, C, ¬A,¬B, ¬A,¬C, ¬B,¬C
.
Starting with the given clause set, the following self-explanatory “tree dia-
gram” gives the intermediate resolvents that are needed to show that 2 is
also a resolvent belonging to AddRes⋆(S):
A,C ¬A,¬B
C,¬B
C
B,C A,B
B,¬C
¬A,¬C
¬C
¬B,¬C
2The following exercise gives an important result where the resolution tech-
nique holds the key.
4.7 Predicate Calculus – Basic Ideas
Formulas in propositional calculus are finite and are limited in expressive
power. There is no way of making an assertion in a single formula that
covers infinite similar cases. For example, even certain simple assertions in
Mathematics do not fit into the language of propositional calculus. Thus
assertions such as, “x = 3” , “y ≥ z” and “x + y > z” are not propositions
Chapter 4 Mathematical Logic 189
as truth-values ⊤ or ⊥ cannot be assigned to them. However, if integer
values are assigned to the variables x, y, z in the above assertions then each
assertion becomes a proposition. Similarly it is possible to consider sentences
in English where pronouns and improper nouns act as variables, as in the
assertions, “He is tall and blonde” (x is tall and blonde) and “She does not
smoke” (y does not smoke) These assertions can be regarded as “templates”
or “patterns”, expressing relationships between objects—these templates are
technically called “predicates”. Using a uniform notation, we can write the
above assertions as,
EQ(x, 3), GTE(y, z), SGT(x, y, z), TNB(x), DOSNSMOKE(y).
We further note that in the above, we implicitly assume that x, y, z are
integers in one case whereas in another, we assume x, y to be the persons (or
names of persons). In Predicate Calculus, we make general statements about
objects in a fixed set, called the Universe. In an assertion like P (x, y), P is
called the Predicate sign and x and y are called variables. More technically,
P (x, y) is simply written as Pxy. An assertion such as P (x1, . . . , xn), where
x1, . . . , xn are variables is said to be an n-place predicate. Note that if P is
an n-place predicate constant and values C1, . . . , Cn from the Universe are
assigned to each of the individual variables, the result is a proposition.
Predicates are commonly used in control statements in programming lan-
guages. For example, consider the statement in a Pascal-like language,
if ( x > 3 ) then goto L ;
During program execution, when the if- statement is encountered, the cur-
rent value of x is substituted to determine whether (x > 3) evaluates to true
or false (and then the program control is transferred conditionally).
Chapter 4 Mathematical Logic 190
Restricting the Universe of discourse to positive integers, now consider
the assertions,
“ For all even integers x, we have x > 1”
“ If x is an even integer there exists integers y, z such that x = y + z”.
Both the above assertions are true and hence these are propositions. To
succinctly express the above, we need two special symbols ∀ (read for-all)
and ∃ (read there-exists) respectively known as universal quantifier and ex-
istential quantifier . We can now write the above propositions as,
∀x(
(x mod 2 = 0)⇒ (x > 1))
∀x∃y∃z((
(x mod 2) = 0)⇒ (y + z) = x
)
Taking mankind as the universe of discourse, if Lxy is used to denote “x
loves y” then the formula, ∀x(Lxx ⇒ ∃yLxy) can be interpreted to mean
the statement, “Anyone who loves himself loves someone”.
Let M(x) stand for “x is a man”; let T (y) stand for “y is a truck”; and let
D(x, y) stan for “x drives y”. Consider the following, a formula in predicate
calculus:
∀x(M(x)⇒ ∃y
(T (y) ∧D(x, y)
)
—this says, “for all x, if x is a man then there exists a y such that y is a
truck and x drives y.” In other words, it says “every man drives at least one
truck”. The following formula,
∀y(T (y)⇒ ∃x
(M(x) ∧D(x, y)
))
says that every truck is driven by at least one man. We remark that paren-
theses add to clarification. For example, the formula ¬T (y)∧D(x, y) can be
Chapter 4 Mathematical Logic 191
interpreted in two ways. It can denote(¬T (y)
)∧D(x, y) which means, “y
is not a truck and x drives y”. Alternately, it can denote, ¬(T (y)∧D(x, y)
)
which means, “y is not a truck that x drives”.
More generally, we can make statements about relations that are not
explicitly specified. For example, ∀x∀yP (x, y)⇔ P (y, x) simply states that
P is a symmetric relation; the formula ∃x¬P (x, x) can be interpreted to
mean that P is not reflexive.
Apart from predicate signs and variables, predicate calculus also allows
function signs, Thus, if f is a function sign corresponding to a binary function
f(x, y) where x and y are variables (denoting objects in a fixed universe),
then ∀x∀yPfxyfyfyx (which may be informally written as)
∀x∀yP(f(x, y), f
(y, f(y, x)
))
is a legal formula.
Formulas in Predicate Calculus turn out to be true or false depending on
the interpretation of predicates and function signs. Thus ∃x¬Pxx is true if
and only if P is interpreted as a binary relation that is not reflexive. Also,
the formula ∀x∀y∀zPxy ∧ Pyz ⇒ Pxz is true if and only if P is interpreted
as a transitive relation. In the universe of discourse U , for any predicate P
and for any element m ∈ U , the formula ∀x(P (x)⇒ P (m)
)is always true.
In the sequel, we will prefer to use the informal style of writing the formulas
to provide some clarity.
4.8 Formulas in Predicate Calculus
We now define the syntax of formulas in predicate calculus.
Chapter 4 Mathematical Logic 192
1. We use P,Q,R, . . . to denote predicate signs; f, g, h . . . to denote func-
tion signs; and x, y, z, . . . to denote variables. A 0-place function sign
stands for a specific element in the Universe–we call these constant
signs and denote them by a, b, c, . . .
2. We next define terms inductively as:
(a) Every variable is a term.
(b) If f is an n-place function sign and t1, . . . , tn are terms, then
f(t1, . . . , tn) is also a term.
3. Atomic formulas are defined as:
If P is an n-place predicate sign and t1, . . . , tn are terms, then P (t1, . . . , tn)
is an atomic formula.
4. Formulas are defined as follows:
(a) Atomic formulas are formulas.
(b) If F and G are any formulas, then so are (F ∨G), (F ∧G) and ¬F .
(c) If F is any formula and x is a variable then ∀xF and ∃xF are
formulas.
A subformula of a formula is a substring of that formula which is itself a
formula. The matrix of a formula is the result of deleting all occurrences
of quantifiers and the variables immediately following those occurrences of
quantifiers. It is easy to see that the result of the process is a formula.
As in propositional calculus, we call (F ∨G) the disjunction of F and G;
we call (F ∧G), the conjunction of F and G; we call ¬F , the negation of F .
Chapter 4 Mathematical Logic 193
Also, we introduce the notions of the conditional and biconditional of two
formulas: (F ⇒ G) is used as an abbreviation for (¬F ∨G) and (F ⇔ G) is
used as an abbreviation for((F ⇒ G) ∧ (G⇒ F )
).
The scope of an occurrence of the negative sign or a quantifier in a formula
is another syntactic notion. Any occurrence of ¬ or ∀ or ∃ in a formula
F refers to or embraces a particular subformula of F . The scope of an
occurrence of ¬ in F is that uniquely determined subformula G such a that
the occurrence of ¬ is the leftmost symbol of ¬G. Similarly the scope of an
occurrence of ∀ or ∃ is that unique formula G such that for some variable x,
the occurrence of the ∀ or ∃ is the leftmost symbol of ∀xG or ∃xG. Consider
the formula, (∀xP (x, y)) ∨ (∃xQ(x) ∧ ∃x¬Q(x)). The scope of ∀ is P (x, y).
The scope of first occurrence of ∃ is Q(x) and that of the second occurrence
of ∃ is ¬Q(x). The scope of occurrence of ¬ is Q(x).
4.8.1 Free and bound Variables
An occurrence of a variable in a formula is said to be free if it is governed
by no quantifier (in that formula) containing that variable. Otherwise, the
variable is said to be bound. We illustrate this idea in an informal way.
Consider the predicate ∀i(x ∗ i > 0), where i is specified to be an integer
between m and n(m < n) and x is an integer. The predicate is true if both x
and m are greater than 0 or if x and n are less than 0. Hence it is equivalent
to the predicate,
(x > 0 ∧m > 0) ∨ (x < 0 ∧ n < 0)
Thus the truth value of the predicate depend on the values of m,n and x but
not on the value of i. It is obvious that the truth value of the predicate does
Chapter 4 Mathematical Logic 194
not change if all occurrence of i are replaced by a new variable, say j. The
variable i is bound to the quantifier ∀ in the predicate. The variables m,n
and x are free in the predicate. A predicate such as (i > 0)∧(∀i(x∗i > 0)) can
be confusing. In such cases we rewrite the predicate to remove the ambiguity.
For example, we can rewrite this last predicate as (i > 0) ∧ (∀j(x ∗ j) > 0).
More precisely the free variable of a formula are defined inductively as follows:
(a) The free variables of an atomic formula are all the variables occurring in
it.
(b) The free variables of the formulas (F∨G) or (F∧G) are the free variables
of F and the free variables of G; the free variables of ¬F are the free
variables of F
(c) The free variables of ∀xF and ∃xF are the free variables of F , except for
x (if x happens to be a free variable in F ).
When there are no free occurrences of any variable in a formula, the formula
is called a closed formula or sentence.
4.9 Interpretation of Formulas of Predicate
Calculus
A formula in predicate calculus can be considered to be true or false depend-
ing on the interpretation. Any interpretation for a formula must contain
sufficient information to determine if the formula is true or not. This point
can be understood by considering a statement such as “all students have
Chapter 4 Mathematical Logic 195
passed” (may be written as ∀sP (s), where s is a variable denoting a stu-
dent, P is the predicate “have passed”). To decide if this statement is true
we need to know who the students are i.e., we need to know a universe of
discourse. Also, we need to know who has failed. That is we need some
type of assignment of the predicate “have passed”. More generally, we must
have interpretations for predicate and functions signs as relations and func-
tions respectively. Also, we may have to interpret some of the variables as
particular objects. To do this, we first define the notion of a structure.
4.9.1 Structures
A structure A is a pair ([A],IA). where [A] is any nonempty set called the
universe of A and IA is a function whose domain is a set of predicate and
function signs. Specifically
- if P is an n-place predicate sign in the domain of IA, then IA(P ) is an
n-ary relation on [A].
- if f is an n-place function sign in the domain of IA, then IA(f) is a function
from [A]n to [A].
We also write PA for IA(P ) and fA for IA(f). In the case the domain of IAis finite, say P1, . . . , Pm, f1, . . . , fn we also write A as the (m+n+1)-tuple:
([A], PA1 , . . . , P
Am , f
A1 , . . . , f
An ).
If A is structure and F is a formula such that each predicate letter and
function sign of F is assigned a valued by IA, then A is said to be appropriate
to F .
Chapter 4 Mathematical Logic 196
Example 4.9.1:
Let P be a 2-place predicate and f be a 1-place function sign and let F be
the formula ∀xP (x, f(x)).
The structure A = ([A], PA, fA), defined as follows is appropriate to F :
[A] = 0, 1, 2, . . ., the set of natural numbers N
PA = (m,n)|m,n ∈ N and m < n
fA is the successor function i.e., f(n) = n+ 1 for each n ∈ N.
We regard F as true in the structure A since every number is less than its
successor. If we define a new structure B which is the same as A except that,
PA = (m,n)|m,n ∈ N and m > n, then F is false in B.
The formula P (f(x), y) cannot be regarded as true or false in A or in B
without knowing what x and y are.
The next subsection formalizes these ideas.
4.9.2 Truth Values of formulas in Predicate Calculus
Let F be a given formula and let A be a structure appropriate to F . Let ξ
be a function with a domain that includes all variables of F and with a range
that is a subset of [A]. We then say ξ is appropriate to F and A.
Let t be a term and G be a formula that can be constructed by using the
variables, predicate signs and function signs of F . Now, given F , A and ξ,
for each t or G, we define A(t)ξ in [A] or A(G)ξ in ⊤,⊥ as follows:
1. (a) If x is a variable of F , then A(x)ξ = ξ(x).
Chapter 4 Mathematical Logic 197
(b) If t1, . . . , tn are terms and f is an n-place function of F , then
A(f(t1, . . . , tn))ξ = fA(A(t1)ξ, . . . ,A(tn)ξ)
In other words, A(f(t1, . . . , tn))ξ is calculated by first finding A(t1)ξ,
. . . , A(tn)ξ which are members of [A] and then applying to these
values, the function, fA : [A]n → [A].
We require one or more auxiliary definition. If ξ is as described above,
x is a variable of F and a is a member of [A], then we take ξ[x/a] to be
the function ξ′ which is identical to ξ except that ξ′(x) = a (whatever
ξ(x) might be). We now define the value of A(G)ξ considering the
different cases. We let G and H to be formulas that can be constructed
by using the variables, predicate signs and functions signs of F
2. (a) If t1, . . . , tn are terms and P is an n-place predicate sign of F , then,
A(P (t1, . . . , tn))ξ =
⊤ if(A(t1)ξ, . . . ,A(tn)ξ
)∈ PA
⊥ otherwise
(b) A(
(G ∨H))
ξ=
⊤ if A(G)ξ = ⊤, or A(H)ξ = ⊤
⊥ otherwise
(c) A(
(G ∧H))
ξ=
⊤ if A(G)ξ = ⊤, and A(H)ξ = ⊤
⊥ otherwise
(d) A(¬G)ξ =
⊤ if A(G)ξ = ⊥
⊥ otherwise
(e) A(∀xG)ξ =
⊤ if A(G)ξ[x/a] = ⊤, for each a ∈ [A]
⊥ otherwise
Chapter 4 Mathematical Logic 198
(f) A(∃xG)ξ =
⊤ if A(G)ξ[x/a] = ⊤, for some a ∈ [A]
⊥ otherwise
This completes the task of evaluating the truth values of formulas in pred-
icative calculus.
We write A ξ G if and only if A(G)ξ = ⊤.
Example 4.9.2:
Let us consider the structure A of Example 4.9.1 and the formula P (x, f(y)).
Let ξ be the function from x, y to [A] = N such that,
ξ(x) = 1 ξ(y) = 2
Now A(y)ξ = ξ(y) = 2 and A(f(y))ξ = fA(A(y)) = fA(2) = 2 + 1 = 3. Next
A(P (x, f(y))
)
ξ= T if and only if
(A(x)ξ,A(f(y))ξ
)∈ PA. That is, if and
only if (1,3)∈ PA which in indeed true. Hence A ξ P (x, f(y)).
Example 4.9.3:
For the structure A and the function ξ as in example 4.9.2, let us consider
the formula ∀xP (x, f(x)). By definition, A(∀xP (x, f(x))
)
ξ= ⊤ if and only
of A(x, f(x))ξ[x/a] = ⊤ for each a ∈ [A] = N.
Now A(P (x, f(x))
)
ξ[x/a]= ⊤ if and only if
(A(x)ξ[x/a],A(f(x)ξ[x/a])
)∈ PA.
We see that A(x)ξ[x/a] = a and A(f(x))ξ[x/a] = fA(A(f(x))ξ[x/a]
)= fA(a) =
a+ 1. Since (a, a+ 1) ∈ PA for each a ∈ N, we conclude A ξ ∀xP (x, f(x)).
Example 4.9.4:
Let L be a 2-place predicate sign and f , a 2-place function sign. Let x and
Chapter 4 Mathematical Logic 199
y be variables. Consider the structure A, where,
[A] = N
LA = (m,n)|m < n, m, n ∈ N
fA(m,n) = m+ n
Let ξ be the function from x, y to N such that ξ(x) = 5, ξ(y) = 2.
Consider the formula ∀x∀ yL(x, f(x, y)).
Now
A(∀x∀ yL(x, f(x, y))
)
ξ= ⊤
if and only if
A(∀ yL(x, f(x, y))
)
ξ[x/a]= ⊤ for each a ∈ [A]
i.e., if and only if
A(L(x, f(x, y))
)
ξ[x/a][y/b]= ⊤ for each a, b ∈ [A] = N
i.e., if and only if
(A(x)ξ[x/a][y/b],A
(f(x, y)
)ξ[x/a][y/b]
)∈ LA for each a, b ∈ N
i.e., if and only if
(a, fA(A(x)ξ[x/a][y/b],A(y)ξ[x/a][y/b])
)∈ LA for each a, b ∈ N
i.e., if and only if
(a, fA(a, b)
)∈ LA for each a, b ∈ LA
Chapter 4 Mathematical Logic 200
i.e., if and only if
(a, a+ b) ∈ LA for each a, b ∈ N
But there is a value of (a, a + b) namely (0, 0) ∈ N such that (0, 0) 6∈ LA.
Therefore A 2ξ ∀x∀ yL(x, f(x, y)).
It is easy to reason that if x has no free occurrences in F ,then the value
of A(F )ξ is independent of the value of ξ(x).
We write A ξ G if and only if A(G)ξ = ⊤. If F is a closed formula, then
A ξ F for some appropriate ξ if and only if A ξ F for every appropriate ξ.
In this case we simply write A F and we say A is a model for F . If A F
for every A appropriate to F , then F is said to be valid. Valid formulas in
predicate calculus play the same role as tautologies in propositional calculus.
A closed formula is said to be satisfiable if it has at least one model; other
wise it is unsatisfiable.
We remark that there is no method similar to the “true-table” approach
to check if a formula in predicate calculus is satisfiable.
4.10 Equivalence of Formulas in Predicate Cal-
culus
Two given formulas F and G are equivalent if and only if, for every struc-
ture A and function ξ appropriate to both F and G, A(F )ξ = A(G)ξ.As in
propositional calculus, write F ≡ G if F and G are equivalent.
It should be obvious that ≡ is an equivalence relation on the set of sen-
tences. All the laws of equivalence in propositional calculus (seen earlier)
Chapter 4 Mathematical Logic 201
such as associativity and commutativity laws for ∧ and ∨, DeMorgan’s laws,
the law of double negation etc. continue to hold good in predicate calculus
also. In addition, we also have the following important equivalences.
Lemma 4.10.1:(a) For any formula F and variable x,
¬∀xF ≡ ∃x¬F
¬∃xF ≡ ∀x¬F
(b) For any formulas F and G variable x, such that x has no occurrence in
G we have,
(∀xF ∨G) ≡ ∀x(F ∨G)
(∀xF ∧G) ≡ ∀x(F ∧G)
(∃xF ∨G) ≡ ∃x(F ∨G)
(∃xF ∧G) ≡ ∃x(F ∧G)
(c) For any formulas F and G and any variable x,
∀x(F ∧G) ≡ (∀xF ∧ ∀xG)
∃x(F ∨G) ≡ (∃xF ∨ ∀xG)
Proof. (a) We prove that ¬∀xF ≡ ∃x¬F . The other equivalence can be
proved in a similar way.
A(¬∀xF )ξ = ⊤
Chapter 4 Mathematical Logic 202
if and only if A(∀xF )ξ = ⊥,
i.e., if and only if A(F )ξ[x/a] = ⊥ for some a ∈ [A]
i.e., if and only if A(¬F )ξ[x/a] = ⊤ for some a ∈ [A]
i.e., if and only if A(∃x¬F )ξ = ⊥
(b) Consider the formula (∀xF ∨G) ≡ ∀x(F ∨G).
A((∀xF ∨G))ξ = ⊤
if and only if A(∀xF )ξ = ⊤ or A(G)ξ = ⊤
i.e., if and only if, A(F )ξ[x/a] = ⊤ for each a ∈ [A] or A(G)ξ = ⊤
i.e., if and only if, for each a ∈ [A]
either A(F )ξ[x/a] = ⊤ or A(G)ξ[x/a] = ⊤
(since x has no free occurrences in G)
i.e., if and only ifA((F ∨G)
)ξ[x/a]
= ⊤ for each a ∈ [A]
i.e., if and only if, A(F ∨G)ξ[x/a] = ⊤ for each a ∈ [A]
i.e., if and only if, A(∀x(F ∨G)) = ⊤
The other equivalence may be proved along similar lines.
(c) We consider the first equivalence ∀x(F ∧G) ≡ (∀xF ∧ ∀xG).
Now A(∀x(F ∧G)
)ξ
= ⊤
if and only if A((F ∧G)ξ[x/a] = ⊤, for each a ∈ [A]
i.e., if and only if A(F )ξ[x/a] = ⊤,
and A(G)ξ[x/a] = ⊤, for each a ∈ [A]
Chapter 4 Mathematical Logic 203
i.e., if and only if A(F )ξ[x/a] = ⊤, for each a ∈ [A]
and A(G)ξ[x/a] = ⊤, for each a ∈ [A]
if and only if A(∀xF )ξ = ⊤, and A(∀xG)ξ = ⊤
if and only if A((∀xF ∧ ∀xG)
)ξ
= ⊤
The other equivalence may be proved similarly.
4.11 Prenex Normal Form
A formula is in prenex form (or prenex normal form) if and only if all quanti-
fiers (if any) occur at the extreme left without intervening parentheses. The
prenex form is
Q1v1 . . . QnvnG
where G is a quantifier-free formula and each Qi is either ∀ or ∃.
The formula F1 = (∃xPx ⇒ Py) is not in prenex form. The equivalent
formula F2 = ∀x(Px⇒ Py) is a prenex form. Given any formula that is not
in prenex form, we can systematically use the previous lemma to successively
move quantifiers to the left (in a sequence of logically equivalent formulas)
until finally a prenex formula is obtained. In this process, it may be necessary
to rename variables so that no variables is both free and bound in the same
formula. Thus to transform any formula into an equivalent prenex form,we
carry out the following steps:
Step 1 Rename the variable, if necessary, so that no variable is both free and
bound and so that there is at most once occurrence of a quantifier with
any particular variable (we get what is called as a rectified formula).
Chapter 4 Mathematical Logic 204
Step 2 Apply the equivalences of Lemma 9.10.2 to move the quantifiers to
the left end.
It is easy to see that a give n formula may have different prenex forms.
Example 4.11.1:
We wish to get the prenex form of the formula.
(¬∀xP (x, y) ∨ ∀xR(x, y))
The given formula is equivalent to the rectified formula (¬∀xP (x, y)∨∀zR(z, y)).
This formula is equivalent to (∃x¬P (x, y) ∨ ∀zR(z, y)) which is of the form
(∃xF ∨ G) with x having no free occurrence in G, which is equivalent to
∃ x(F ∨ G). So the formula is equivalent to ∃ x(¬P (x, y) ∨ ∀ zP (z, y)) ≡
∃x(∀ zP (z, y) ∨ ¬P (x, y)) ≡ ∃x∀ z(P (z, y) ∨ ¬P (x, y)) which is in prenex
form.
4.12 The Expansion Theorem
Given a formula F in predicate calculus,. it can be reduced in a systematic
way to a countable set of formulas without quantifiers or variables. The set
of formulas can therefore be regarded as formulas in propositional calculus.
That is, for each F we can generate E(F ), the collection of quantifier-free
formulas. E(F ) is known as th Herbrand expansion of F . The way to obtain
E(F ) from F is best illustrated by considering an example.
Let F = (∀ y ∃xP (y, x) ∧ ∃ z ∀w¬P (w, z)). Let A be a structure ap-
propriate to F . In the Herbrand expansion, for A to be a model for F we
prescribe values of existentially quantified variables corresponding to various
Chapter 4 Mathematical Logic 205
ways of substituting values for the universally quantified variables so that the
matrix urns out to be true in each case. For the above case, we first select a
1-place function sign f and a 0-place function sign a. We next replace F by
its functional form,
(∀ yP (y, f(y)) ∧ ∀w¬P (w, a)
)
Here f(y) denotes a choice for the object x corresponding to y such that
P (y, x) holds. For each y, there can be many possible x—we simply say
that there must be at least one choice for x. Similarly a stands for some
fixed object such that ¬P (w, a) holds, whatever be the value of w. To make
P (y, x) true we can also take y to be the object f(a) itself and take x to be
f(f(a)), and so on.
The Herbrand Universe of F is the set of terms that can be formed from
a and f , namely,
a, f(a), f(f(a)), . . .
It turns out that in order to test whether F is satisfiable, it is sufficient to
consider the functional form and to consider values for y and w drawn from
the Herbrand universe. Moreover, it is not necessary to consider arbitrary
interpretations for the function signs f and a. It suffices to interpret f
syntactically; that is, f is simply considered as a function from the Herbrand
Universe to itself.
To get the Herbrand expansion of F , we first obtain the matrix F ∗ of the
functional form of F . For the above example,
F ∗ = P (y, f(y)) ∧ ¬P (w, a)
By substituting different values for y, we get the Herbrand expansion as
Chapter 4 Mathematical Logic 206
follows:
P (a, f(a)) ∧ ¬P (a, a) = F ∗[y/a, w/a]
P(f(a), f(f(a))
)∧ ¬P (a, a) = F ∗[y/f(a), w/a]
P (a, f(a)) ∧ ¬P (f(a), a) = F ∗[y/a, w/f(a)]
P(f(a), f(f(a))
)∧ ¬P (f(a), a) = F ∗[y/f(a), w/f(a)]
...
Let us take the formula G = (∀ y ∃xP (y, x) ∧ ∃ z∀w¬P (a, w). The cor-
responding functional form is, ∀ yP (y, f(y)) ∧ ∀w¬P (a, w). We have the
Herbrand expansion as,
P (a, f(a)) ∧ ¬P (a, a)
P(f(a), f(f(a))
)∧ ¬P (a, a)
P (a, f(a)) ∧ ¬P (a, f(a))
...
We can see that G is unsatisfiable because in th Herbrand expansion both
P (a, f(a)) and ¬P (a, f(a)) must be true.
We state the main result of this section without proof.
Theorem 4.12.1 (The Expansion Theorem):
Chapter 4 Mathematical Logic 207
A closed formula is satisfiable if and only if its Herbrand expansion is satis-
fiable.
The expansion theorem gives a procedure to detect unsatisfiable formulas
in predicate calculus. A formula is unsatisfiable if and only if its Herbrand
expansion is unsatisfiable. By Compactness theorem for propositional cal-
culus, the expansion is unsatisfiable if and only if some finite subset of it is
unsatisfiable. These two facts together suggest the following computational
procedure for testing unsatisfiability.
Generate the Herbrand expansion in small portions in some systematic
way. At periodic intervals stop and test (using truth-tables for example)
whether the portion generated is unsatisfiable. If the original formula is
unsatisfiable, then the fact will be discovered at some point. However if the
original formula is satisfiable this procedure may not halt.
Exercises
1. The Sheffer stroke ↑ (called the nand operator in Computer Science)
and the Pierce arrow ↓ (called the nor operator in Computer Science)
are defined by the following truth tables:
p q p ↑ q p q p ↓ q
⊥ ⊥ ⊤ ⊥ ⊥ ⊤
⊥ ⊤ ⊤ ⊥ ⊤ ⊥
⊤ ⊥ ⊤ ⊤ ⊥ ⊥
⊤ ⊤ ⊥ ⊤ ⊤ ⊥
Express ¬ p, p ∧ q and p ∨ q in terms of ↑ and ↓ operators.
Chapter 4 Mathematical Logic 208
2. If P =(
(A ∧ B) ∨ (B ∧ C) ∨ (A ∧ C))
then show that a DNF of ¬P
is(
(¬A ∧ ¬B) ∨ (¬B ∧ ¬C) ∨ (¬A ∧ ¬C))
.
3. Show that:
(a)(P ∨ (Q ∧R)
)≡(
(P ∨Q) ∧ (P ∨R))
(b) ¬ (P ∨Q) ≡ (¬P ∧ ¬Q)
(c) (P ∨Q) ≡ Q, if P is unsatisfiable.
4. Show that the formula(
(a ∨ b ∨ c) ∧ (c ∨ ¬ a))
is equivalent to the
formula(c ∨ (b ∧ ¬ a)
).
5. Let S be a finite clause set such that |C| ≤ 2 for each C ∈ S. Show that
the resolution technique provides a polynomial-time decision procedure
for determining the satisfiability of S.
6. Prove the following equivalences:
(i) ∃x(Px⇒ ∀xPx) ≡ ∃x∀y(Px⇒ Py)
(ii) ∃x(∃xPx⇒ Px) ≡ ∃x∀y(Py ⇒ Px)
7. Obtain the prenex normal form of
∀x((∃ yR(x, y) ∧ ∀ y ¬ S(x, y)
)⇒ ¬
(∃ yR(x, y) ∧ P
))
8. Show that the definitions 4, 5 and 6 in Definition 4.2.6 are direct con-
sequences of the other definitions.
Chapter 5
Algebraic Structures
5.1 Introduction
In this chapter, the basic properties of the fundamental algebraic structures,
namely, matrices, groups, rings, vector spaces and fields are presented. In ad-
dition, the basic properties of finite fields that are so basic to finite geometry,
coding theory and cryptography are also discussed.
5.2 Matrices
A complex matrix A of type (m,n) or an m by n complex matrix is an
arrangement of mn complex numbers in m rows and n columns in the form:
A =
a11 a12 . . . a1n
a21 a22 . . . a2n
. . . . . . . . . . .
am1 am2 . . . amn
.
209
Chapter 5 Algebraic Structures 210
A is usually written in the shortened form A = (aij), where 1 ≤ i ≤ m and
1 ≤ j ≤ n. If m = n, A is a square matrix of order n. aij is the (i, j)-th entry
of A. All the matrices that we consider in this section are complex matrices.
5.3 Addition, Scalar Multiplication and Mul-
tiplication of Matrices
If A = (aij) and B = (bij) are two m by n matrices, then A + B is the
m by n matrix (aij + bij), and for a scalar (that is, a complex number) α,
αA = (αaij). Further, if A = (aij) and B = (bij) is an n by p matrix, then
the product AB is defined to be the m by p matrix (cij), where,
cij = ai1b1j + ai2b2j + · · ·+ ainbnj
= the scalar product of the i-th row vector Ri of A
and the j-th column vector C ′j of B.
Thus Cij = Ri · C ′j. Both Ri and C ′
j are vectors of length n. It is well-
known that the matrix product satisfies both the distributive laws and the
associative laws, namely, for matrices A, B and C,
A(B + C) = AB + AC,
(A+B)C = AC +BC, and
(AB)C = A(BC)
whenever these sums and products are defined.
Chapter 5 Algebraic Structures 211
5.3.1 Transpose of a Matrix
If A = (aij) is an m by n matrix, then the n by m matrix (bij), where Bij = aji
is called the transpose of A. It is denoted by At. Thus At is obtained from
A by interchanging the row and column vectors of A. For instance, if
A =
1 2 3
4 5 6
7 8 9
, then At =
1 4 7
2 5 8
3 6 9
.
It is easy to check that
(i) (At) =A, and
(ii) (AB)t=BtAt, whenever the product AB is defined
5.3.2 Inverse of a Matrix
Let A = (aij) be an n by n matrix, that is, a matrix of order n. Let Aij be
the cofactor of aij in detA(= determinant of A). Then the matrix (Aij)t of
order n is called the adjoint (or adjucate) of A, and denoted by adjA.
Theorem 5.3.1:
For any square matrix A of order n,
A(adjA) = (adjA)A = (detA)In,
where In is the identity matrix of order n. (In is a matrix of order n in which
the n diagonal entries are 1 and the remaining entries are 0).
Proof. By a property of determinants, we have
ai1A1j + ai2A2j + · · ·+ ainAnj = A1ia1j + A2ia2j + · · ·+ Anianj = detA or 0
Chapter 5 Algebraic Structures 212
according to whether i = j or i 6= j. We note that in adjA, (Aj1, . . . , Anj)
is the j-th column vector and (A1j, . . . , Ajn) is the j-th row vector. Hence,
actual multiplication yields
A(adjA) = (adjA)A =
detA 0 . . . 0
0 detA . . . 0
. . . . . . . . . . . . . . .
0 0 . . . detA
= (detA)In.
Corollary 5.3.2:
Let A be a nonsingular matrix that is, (detA 6= 0). Set A−1 = 1adj A
(detA).
Then AA−1 = A−1A = In, where n is the order of A.
The matrix A−1, as defined in Corollary 5.3.2, is called the inverse of the
(nonsingular) matrix A. If A,B are square matrices of the same order with
AB = I, then B = A−1 and A = B−1. This is seen by premultiplying the
equation AB = I by A−1 and postmultiplying it by B−1. Note that A−1 and
B−1 exist since taking determinants of both sides of AB = I, we get
det(AB) = detA · detB = det I = 1
and hence detA 6= 0 as well as detB 6= 0.
5.3.3 Symmetric and Skew-symmetric matrices
A matrix A is said to be symmetric iff A = At. A is skew-symmetric iff
A = −At. Hence if A = (aij), then A is symmetric if aij = aji for all i and j; it
Chapter 5 Algebraic Structures 213
is skew-symmetric if aij = −aji for all i and j. Clearly, symmetric and skew-
symmetric matrices are square matrices. If A = (aij) is skew-symmetric,
then aii = −aii, and hence aii = 0 for each i. Thus in a skew-symmetric
matrix, all the diagonal entries are zero.
5.3.4 Hermitian and Skew-Hermitian matrices
Let H = (hij) denote a complex matrix. The conjugate H of H is the matrix
(hij). The conjugate-transpose of H is the matrix H⋆ = (H)t = (H t) =
(hji) = (h⋆ij). H is Hermitian iff H⋆ = H; H is skew-hermitian iff H⋆ = −H.
For example, the matrix H =(
1 2+3i2−3i
√5
)is Hermitian, while the matrix
S =( −i 1+2i−1+2i 5i
)is skew-hermitian. Note that the diagonal entries of a
skew-hermitian matrix are purely imaginary.
5.3.5 Orthogonal and Unitary matrices
A real matrix (that is, a matrix whose entries are real numbers) P of order
n is called orthogonal if PP t = In. If PP t = In, then P t = P−1. Thus the
inverse of an orthogonal matrix is its transpose. Further as P−1P = In, we
also have P tP = In. If R1, . . . , Rn are the row vectors of P , the relation
PP t = In implies that Ri · Rj = δij, where δij = 1 if i = j, and δij = 0 if
i 6= j. A similar statement also applies to the column vectors of P . As an
example, the matrix ( cos α sin α− sin α cos α ) is orthogonal. Indeed, if (x, y) are cartesian
coordinates of a point P referred to a pair of rectangular axes and if (x′, y′) are
the coordinates of the same point P with reference to a new set of rectangular
axes got by rotating the original axes through an angle α about the origin,
Chapter 5 Algebraic Structures 214
then
x = x′ cosα + y′ sinα
y = −x′ sinα + y′ cosα,
that is,
x
y
=
cosα − sinα
− sinα cosα
x′
y′
so that rotation is effected by an orthogonal matrix.
Again, if (l1,m1, n1), (l2,m2, n2) and (l3,m3, n3) are the direction cosines
of three mutually orthogonal directions referred to an orthogonal coordinate
system in the Euclidean 3-space, then the matrix(
l1 m1 n1l2 m2 n2l3 m3 n3
)is orthogonal.
In passing, we mention that rotation in higher dimensional Euclidean spaces
is defined by means of an orthogonal matrix.
A complex matrix U of order n is called unitary if UU⋆ = In. Again, this
means that U⋆U = In. Also a real unitary matrix is simply an orthogonal
matrix. The unit matrix is both orthogonal as well as unitary. For example,
the matrix U = 15
( −1+2i −4−2i2−4i −2−i
)is unitary.
Exercises 5.2
1. If A =(
3 −41 −1
), prove by induction that Ak =
(1+2k −4k
k 1−2k
)for any posi-
tive integer k.
2. If M = ( cos α sin α− sin α cos α ), prove that MN = ( cos nα sin nα
− sin nα cos nα ), n ∈ N .
3. Compute the transpose, adjoint and inverse of the matrix(
1 −1 00 1 −11 0 1
).
4. If A = ( 1 3−2 2 ), show that A2 − 3A+ 8I = 0. Hence compute A−1.
5. Give two matrices A and B of order 2, so that
Chapter 5 Algebraic Structures 215
(i) AB 6= BA
(ii) (AB)t 6= AB
6. Prove: (i) (AB)t = BtAt; (ii) If A and B are nonsingular, (AB)−1 =
B−1A−1.
7. Prove that the product of two symmetric matrices is symmetric iff the
two matrices commute.
8. Prove: (i) (iA)⋆ = −iA⋆ (ii) H is Hermitian iff iH is skew-Hermitian.
9. Show that every real matrix is the unique sum of a symmetric matrix
and a skew-symmetric matrix.
10. Show that every complex matrix is the unique sum of a Hermitian and
a skew-Hermitian matrix.
5.4 Groups
Groups constitute an important basic algebraic structure that occur very nat-
urally not only in mathematics but also in many other fields such as physics
and chemistry. In this section, we present the basic properties of groups. In
particular, we discuss abelian and nonabelian groups, cyclic groups, permuta-
tion groups and homomorphisms and isomorphisms of groups. We establish
Lagrange’s theorem for finite groups and the basic isomorphism theorem for
groups.
Chapter 5 Algebraic Structures 216
Abelian and Nonabelian Groups
Definition 5.4.1:
A binary operation on a nonempty set S is a map.
∴ S× S → S, that is, for every ordered pair (a, b) of elements of S, there is
associated a unique element a · b of S. A binary system is a pair (S, ·), where
S is a nonempty set and · is a binary operation on S. The binary system
(S, ·) is associative if · is an associative operation on S, that is, for all a, b, c
in S, (a · b) · c = a · (b · c)
Definition 5.4.2:
A semi group is an associative binary system. An element e of a binary
system (S, ·) is an identity element of S if a · e = e · a = a for all a ∈ S.
We use the following standard notations:
Z =the set of integers (positive integers, negative integers and zero)
Z+ =the set of positive integers
=the set of natural numbers 1, 2, . . . , n, . . . = N
Q =the set of rational numbers
Q+=the set of positive rational numbers
Q∗ =the set of nonzero rational numbers
R =the set of real numbers
R∗ =the set of nonzero real numbers
C =the set of complex numbers
C∗ =the set of nonzero complex numbers
Chapter 5 Algebraic Structures 217
Examples
1. (N, ·) is a semigroup, where · denotes the usual multiplication.
2. The operation subtraction is not a binary operation on N (for example,
3− 5 /∈ N).
3. (Z,−) is a binary system which is not a semigroup since the associative
law is not valid in (Z,−); for instance, 10− (5− 8) 6= (10− 5)− 8.
We now give the definition of a group.
Definition 5.4.3:
A group is a binary system (G, ·) such that the following axioms are satisfied:
(G1): The operation · is associative on G, that is, for all a, b, c ∈ G, (a ·
b) · c = a · (b · c).
(G2): (Existence of identity)There exists an element e ∈ G (called an identity
element of G with respect to the operation ·) such that a · e = e · a = a for
all a ∈ G.
(G3): (Existence of inverse) To each element a ∈ G, there exists an element
a−1 ∈ G (called an inverse of a) with respect to the operation ·) such that
a · a−1 = a−1 · a = e.
Before proceeding to examples of groups, we show that identity element
e, and inverse element a−1 of a, given in Definition 5.4.3 are unique.
Suppose G has two identities e and f with respect to the operation ·.
Then
e = e · f (as f is an identity of (G, ·))
Chapter 5 Algebraic Structures 218
= f (as e is an identity of (G, ·)).
Next, let b and c be two inverses of a in (G, ·). Then
b = b · e = b · (a · c)
= (b · a) · c by the associativity of ·
= e · c
= c.
Thus henceforth we can talk of “The identity element e” of the group (G, ·),
and “The inverse element a−1 of a” in (G, ·).
If a ∈ G, then a ·a ∈ G; also, a ·a · · · (n times)∈ G. We denote a ·a · · · (n
times) by an. Further, if a, b ∈ G, a · b ∈ G, and (a · b)−1 = b−1 · a−1. (Check
that (a · b)(a · b)−1 = (a · b)−1(a · b) = e). More generally, if a1, a2, . . . , an ∈ G,
then (a1 · a2 · · · an)−1 = a−1n · a−1
n−1 · · · a−1a , and hence (an)−1 = (a−1)n =
(written as) a−n. Then the relation am+n = am · an holds for all integers m
and n, with a0 = e. In what follows, we drop the group operation · in (G, ·),
and simply write group G, unless the operation is explicitly needed.
Lemma 5.4.4:
In a group, both the cancellation laws are valid, that is, if a, b, c are elements
of a group G with ab = ac, then b = c (left cancellation law), and if ba = ca,
then b = c (right cancellation law).
Proof. If ab = ac, premultiplication by a−1 gives a−1(ab) = a−1(ac). So by
the associative law, (a−1a)b = (a−1a)c, and hence eb = ec. This implies that
b = c. The other cancellation is proved similarly.
Chapter 5 Algebraic Structures 219
Definition 5.4.5:
The order of a group G is the cardinality of G. The order of an element a of
a group G is the least positive n such that an = e, the identity element of G.
If no such n exists, the order of a is taken to be infinity.
Definition 5.4.6 (Abelian Group):
A group G is called abelian (after Abel) if the group operation of G is com-
mutative, that is, ab = ba for all a, b ∈ G.
A group G is nonabelian if it is not abelian, that is, there exists a pair of
elements x, y in G with xy 6= yx.
Examples of Abelian Groups
1. (Z,+) is an abelian group, that is, the set Z of integers is an abelian group
under the usual addition operation. The identity element of this group is
O, and the inverse of a is −a. (Z,+) is often referred to as the additive
group of integers. Similarly, (Q,+), (R,+), (C,+) are all additive abelian
groups.
2. The sets Q∗, R∗ and C∗ are groups under the usual multiplication opera-
tion.
3. Let G = C[0, 1], the set of complex-valued continuous functions defined
on [0, 1]. G is an abelian group under addition. Here, if f, g ∈ C[0, 1],
then f + g is defined by (f + g)(X) = f(X) + g(X). The zero function T
is the identity element of the group while the inverse of f is −f .
4. For any positive integer n, let Zn = 0, 1, . . . , n−1. Define addition + in
Chapter 5 Algebraic Structures 220
Zn as “congruent modulo n” addition, that is, if a, b ∈ Zn, then a+ b = c,
where c ∈ Zn and a+ b ≡ c ( mod n). Then (Zn,+) is an abelian group.
For instance, if n = 5, then in Z5, 2 + 2 = 4, 2 + 3 = 0, 3 + 3 = 1 etc.
5. Let G = rα : rα = rotation of the plane about the origin through an
angle α in the anticlockwise sense . Then if we set rβ · rα = rα+β (that
is, rotation through α followed by rotation through β = rotation through
α + β), then (G, ·) is a group. The identity element of (G, ·) is r0, while
(rα)−1 = r−α, the rotation of the plane about the origin through an angle
α in the clockwise sense.
Examples of Nonabelian Groups
1. Let G = GL(n,R), the set of all n by n nonsingular matrices with real
entries. Then G is an infinite nonabelian group under multiplication.
2. Let G = SL(n,Z) be the set of matrices of order n with integer entries
having determinant 1. G is again an infinite nonabelian multiplicative
group. (Note that if A ∈ SL(n,Z), then A−1 = 1det A
(adjA) ∈ SL(n,Z)
since detA = 1 and all the cofactors of the entries of A are integers. (See
Section 5.3.2)
3. Let S4 denote the set of all 1-1 maps f : N4 → N4, where N4 = 1, 2, 3, 4.
If · denotes composition of maps, then (S4, ·) is a nonabelian group of order
4! = 24. (See Section*** for more about such groups). For instance, let
f =
1 2 3 4
4 1 2 3
.
Chapter 5 Algebraic Structures 221
Here the parantheses notation signifies the fact that the image under f of
a number in the top row is the corresponding number in the bottom row.
For instance, f(1) = 4, f(2) = 1 and so on. Let
g =
1 2 3 4
3 1 2 4
Then g · f =
1 2 3 4
4 3 1 2
.
Note that (g · f)(1) = g(f(1)
)= g(4) = 4, while (f · g)(1) = f
(g(1)
)=
f(3) = 2, and hence f ·g 6= g ·f . In other words, S4 is a nonabelian group.
The identity element of S4 is the map
I =
1 2 3 4
1 2 3 4
and f−1 =
1 2 3 4
2 3 4 1
.
5.4.1 Group Tables
The structure of a finite group G can be completely specified y means of its
group table (sometimes called multiplication table). This is formed by listing
the elements of G in some order as g1, . . . , gn, and forming an n by n double
array (gij), where gij = gigj, 1 ≤ i ≤ j ≤ n. It is customary to take g1 = e,
the identity element of G.
Examples Continued
9. a [Klein’s 4-group K4] This is a group of order 4. If its elements are
e, a, b, c, the group table of K4 is given by Table 5.1.
It is observed from the Table that
ab = ba = c, and a(ba)b = a(ab)b
Chapter 5 Algebraic Structures 222
This gives
c2 = (ab)(ab) = a(ba)b = a(ab)b = a2b2 = ee = e.
Thus every element of K4 other than e is of order 2.
5.5 A Group of Congruent Transformations
(Also called Symmetries)
We now look at the congruent transformations of an equilateral triangle
ABC. Assume without loss of generality that the side BC of the triangle is
horizontal so that A is in the vertical through the middle point D of BC.
Let us denote the rotation of the triangle about its centre through an angle
of 120 in the anticlockwise sense and let f denote the flipping of the triangle
about the vertical through the middle point of the base. f interchanges the
base vertices and leaves all the points of the vertical through the third vertex
unchanged. Then fr denotes the transformation r followed by f and so on.
e a b c
e e a b c
a a e c b
b b c e a
c c b a e
Table 5.1: Group Table of Klein’s 4-group.
Chapter 5 Algebraic Structures 223
A
A AB B BC
C C
D
r f
Thus fr leaves B fixed and flips A and C inABC. There are six congruent
transformations of an equilateral triangle and they form a group as per the
following group table.
r3 = e r r2 f rf r2f
r3 = e e r r2 f rf r2f
r r r2 e rf r2f f
r2 r2 e r r2f f rf
f f fr fr2 e r2 r
rf rf f r2f r e r2
r2f r2f rf f r2 r e
Group Table of the Dihedral group D3
For instance, r2fr and rf are obtained as follows:
A
A AB B BC
C C
r f
A
C CB B AC
A B
f r
r2
CA
B
Chapter 5 Algebraic Structures 224
Thus r2fr = rf , and similarly the other products can be verified. The
resulting group is known as the dihedral group D3.
5.6 Another Group of Congruent Transfor-
mations
Let D4 denote the transformations that leave a square invariant. If r denotes
a rotation of 90 about the centre of the square in the anticlockwise sense,
and f denotes the flipping of the square about one of its diagonals, then the
defining relations for D4 are given by:
r4 = e = f 2 = (rf)2.
D4 is the dihedral group of order 2× 4 = 8. Both D3 and D4 are nonabelian
groups. The dihedral group Dn of order n is defined in a similar fashion as
the group of congruent transformations of a regular polygon of n sides. The
groups Dn, n ≥ 3, are all nonabelian. Dn is of order 2n for each n.
5.7 Subgroups
Definition 5.7.1:
A subset H of a group (G, ·) is a subgroup of (G, ·) if (H) is a group, under
the operation · of G.
Definition 5.7.1 shows that the group operation of a subgroup H of G is
the same as that of G.
Chapter 5 Algebraic Structures 225
Examples of Subgroups
1. Z is a subgroup of (Q,+).
2. 2Z is a subgroup of (Z,+). (Here 2Z denotes the set of even integers).
3. Q∗ is a subgroup of (R∗, ·). (Here · denotes multiplication.
4. Let H be the subset of maps of S4 (See Example 3 of Section 5.4) that fix
1, that is, H = f ∈ S4 : f(1) = 1. Then H is a subgroup of S4.
Note that the set N of natural numbers does not form a subgroup of (Z,+).
Subgroup Generated by an Element
Definition 5.7.2:
LEt S be a nonempty subset of a group G. The subgroup generated by S in
G, denoted by < S >, is the intersection of all subgroups of G containing S.
Before we proceed to the properties of < S >, we need a result.
Proposition 5.7.3:
The intersection of any family of subgroups of G is a subgroup of G.
Proof. Consider a family Gαα∈I of subgroups of G, and let H = ∩Gα. If
a, b ∈ H,then a, b ∈ Gα for each α ∈ I, and since Gα is a subgroup of
G, a · b ∈ Gα for each α ∈ H. Therefore ab ∈ H, and similarly a−1 ∈ H and
e ∈ H. The associative law holds in H, a subset of G, as it holds in G. Thus
H is a subgroup of G.
Corollary 5.7.4:
Chapter 5 Algebraic Structures 226
Let S be a nonempty subset of a group G. Then < S > is the smallest
subgroup of G containing S.
Proof. By definition, < S > is a subgroup of G containing S. If < S > is
not the smallest subgroup of G containing S, then there exists a subgroup H
of G such that S ⊂ H ⊂ < S >. But, by definition of < S >, any subgroup
that contains S must contain < S >. Hence < S >⊂ H. Thus H =< S >,
and so < S > is the smallest subgroup of G containing S.
5.8 Cyclic Groups
Definition 5.8.1:
Let G be a group and a, an element of of G. Then the subgroup generated
by a in G is 〈a〉, that is, the subgroup generated by the singleton subset
a. It is also denoted simply by 〈a〉.
By Corollary 5.7.4, 〈a〉 is the smallest subgroup of g containing a. As
a ∈< a >, all the powers of an, n ∈ Z, also belong to < a >. But then,
as may be checked easily, the set an : n ∈ Z of powers of a is already a
subgroup of G. Hence 〈a〉 = an : n ∈ Z. Note that a0 = e, the identity
element of G, and a−n = (a−1)n, the inverse of an. This makes am ·an = am+n
for all integers m and n. The subgroup A = 〈a〉 of G is called the subgroup
generated by a, and a is called a generator of A.
Now since an : n ∈ Z = (a−1) : n ∈ Z, a−1 is also a generator of
〈a〉. Suppose a is of finite order m in 〈a〉. This means that m is the
least positive integer with the property that am = e. Then the elements
Chapter 5 Algebraic Structures 227
a1 = a, a2, . . . , am−1, am = e are all distinct. Moreover, for for any integer m,
by Euclidean algorithm (see....), there are integers q and r such that
m = qn+ r, 0 ≤ r < n.
Then
am = a(qn+r) = (an)qar = eqar = ear = ar, 0 ≤ r < n,
and hence am ∈< a >. Thus in this case,
〈a〉 =a, a2, . . . , am−1, am = e
In the contrary case, there exists no positive integer m such that am = e.
Then all the powers ar : r ∈ N an distinct. If not, there exist integers r and
s, r 6= s, such that ar = as. Suppose r > s. Then r− s > 0 and the equation
ar = as gives a−sar = a−sas, and therefore ar−s = a0 = e, a contradiction. In
the first case, the cyclic group < a > is of order m while in the latter case,
it is of infinite order.
Examples of cyclic Groups
1. The additive group of integers (Z,+) is an infinite cyclic group. It is
generated by 1 as well as −1.
2. The group of n-th roots of unity, n ≥ 1. Let G be the set of n-th roots of
unity so that
G =
ω, ω2, . . . , ωn = 1; ω = cos
2π
n+ i sin
2π
n
.
Then G is a cyclic group of order n generated by ω, that is, G =< ω >.
In fact ωk, 1 ≤ k ≤ n, also generates k iff (k, n) = 1. (See****). Hence
Chapter 5 Algebraic Structures 228
the number of generators of G is φ(n). As G is a cyclic group of order n,
it follows that for each positive integer n, there exists a cyclic group of
order n.
If G =< a >= an : n ∈ Z, then since for any two integers n and m,
anam = an+m = aman, G is abelian. In other words, every cyclic group is
abelian. However, the converse is not true. K4, the Klein’s 4-group (See
Table 5.1 of Section 5.4) is abelian but not cyclic since K4 has no element of
order 4.
Theorem 5.8.2:
Any subgroup of a cyclic group is cyclic.
Proof. Let G =< a > be a cyclic group, and H, a subgroup of G. If H = e,
then H is trivially cyclic. So assume that H 6= e. As the elements of G
are powers of a, an ∈ H for some nonzero integer n. Then its inverse a−1
also belongs to H, and of n and −n at least one of them is a positive integer.
Let s be the least positive integer such that as ∈ H. (recall that H 6= e as
per our assumption). We claim that H =< as >, the cyclic subgroup of G
generated by as. To prove this we have to show that each element of H is a
power of as. Let g be any element of H. As g ∈ G, g = am for some integer
m. By division algorithm,
m = qs+ r, 0 ≤ r < s.
Hence ar = am−qs = am(as)−q ∈ H as am ∈ H and as ∈ H. Thus ar ∈ H.
This however implies, by the choice of s, r = 0 (otherwise ar ∈ H with
0 < r < s). Hence ar = a0 = e = am(as)−q, and therefore, g = am =
Chapter 5 Algebraic Structures 229
(as)q, q ∈ Z. Thus every element of H is a power of as and so H ⊂< as >.
Now since as ∈ H, all powers of as also ∈ H, and so < as >⊂ H. Thus
H =< as > and therefore H is cyclic.
Definition 5.8.3:
Let S be any nonempty set. A permutation on S is a bijective mapping from
S to S.
Lemma 5.8.4:
If σ1 and σ2 are permutations on S, then the map σ = σ1σ2 defined on S by
σ1σ2(s) = σ(s) = σ1 (σ2(s)) , s ∈ S
is also a permutation on S. σ is one-to-one as both σ1 and σ2 are one-to-one;
it is onto as both σ1 and σ2 are onto.
s2
s1 σ1(σ2s1) = σ(s1)
σ1(σ2s2) = σ(s2)
σ2(s1)
σ2(s1)
S S S
σ1
σ1
σ2
σ2
Proof. Indeed, we have, for s1, s2 in S, σ(s1) = σ(s2) gives that σ1(σ2s1) =
σ2(σ2s2). This implies, as σ1 is 1− 1, σ2s1 = σ2s2. Again, as σ2 is 1− 1, this
gives that s1 = s2. Thus σ is 1− 1. For a similar reason, σ is onto.
Let B denote the set of all bijections on S. Then it is easy to verify that
(B, ·), where · is the composition map, is a group. The identity element of
this group is the identity function e on S.
Chapter 5 Algebraic Structures 230
The case when S is a finite set is of special significance. So let S =
1, 2, . . . , n. The set P of permutations of S forms a group under the com-
position operation. Any σ ∈ P can be conveniently represented as:
σ :
1 2 . . . n
σ(1) σ(2) . . . σ(n)
.
Then σ−1 is just the permutation
σ−1 :
σ(1) σ(2) . . . σ(n)
1 2 . . . n
What is the order of the group P? Clearly σ(1) has n choices, namely,
any one of 1, 2, . . . , n. Having chosen σ(1), σ(2) has n − 1 choices (as σ is
1 − 1, σ(1) 6= σ(2)). For a similar reason, σ(3) has n − 2 choices and so
on, and finally σ(n) has just one left out choice. Thus the total number of
permutations on S is
n · (n− 1) · (n− 2) · · · 2 · 1 = n!
In other words, the group P of permutations on a set of n elements is of order
n! It is denoted by Sn and is called the symmetric group of degree n. Any
subgroup of Sn is called a permutation group of degree n.
Example
Let S = 1, 2, 3, 4, 5, and let σ and τ ∈ S5 be given by
σ =
1 2 3 4 5
2 3 5 4 1
, τ =
1 2 3 4 5
5 2 4 1 3
.
Then
σ · τ =
1 2 3 4 5
1 3 4 2 5
, and σ2 = σ · σ =
1 2 3 4 5
3 5 1 4 2
Chapter 5 Algebraic Structures 231
Definition 5.8.5:
A cycle in Sn is a permutation σ ∈ Sn that can be represented in the form
(a1, a2, . . . , ar), where the ai, 1 ≤ i ≤ r, r ≤ n, are all in S, and σ(ai) = ai+1,
1 ≤ i ≤ r − 1, and σ(ar) = a1, that is, each ai is mapped cyclically to the
next element (or) number ai+1 and σ fixes the remaining ai’s. For example,
if σ =
1 3 2 4
3 2 1 4
∈ S4, then σcan be represented by (132). Here σ leaves 4
fixed.
Now consider the permutation p =
1 2 3 4 5 6 7
3 4 2 1 6 5 7
. Clearly, p is the
product of the cycles
(1324)(56)(7) = (1324)(56).
Since 7 is left fixed by p, 7 is not written explicitly. In this way, every
permutation on n symbols is a product of disjoint cycles. The number of
symbols in a cycle is called the length of the cycle. For example, the cycle
(1 3 2 4) is a cycle of length 4. A cycle of length 2 is called a transposition.
For example, the permutation (1 2) is a transposition. It maps 1 to 2, and
2 to 1. Now consider the product of transpositions t1 = (13), t2 = (12) and
t3 = (14). We have
t1t2t3 = (13)(12)(14) =
1 3
3 1
1 2
2 1
1 4
4 1
=
1 2 3 4
4 3 1 2
= (1423) .
To see this, note that
(t1t2t3)(4) = (t1t2)(t3(4)) = (t1t2)(1) = t1(t2(1)) = t1(2) = 2, and so on.
In the same way, any cycle (a1a2 . . . an) = (a1an)(a1an−1) · · · (a1a2), a product
of transpositions. Since any permutation is a product of disjoint cycles and
Chapter 5 Algebraic Structures 232
any cycle is a product of transpositions, it is clear that any permutation is
a product of transpositions. Now in the expression of a cycle as a product
of transpositions, the number of transpositions need not be unique. For
instance, (12)(12) = identity permutation, and
(1324) = (14)(12)(13) = (12)(12)(14)(12)(13).
However, this number is always odd or always even.
Theorem 5.8.6:
Let σ be any permutation on n symbols. Then in whatever way σ is expressed
as a product of transpositions, the number of transpositions is always odd or
always even.
Proof. Assume that σ is a permutation on 1, 2, . . . , n. Let the product
P = (a1 − a2)(a1 − a3) · · · (a1 − an)
(a2 − a3) · · · (a2 − an)
. . . . . .
(an−1 − an)
=∏
1≤i<j≤n
(ai − aj) = det
∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣
1 1 · · · 1
a1 a2 · · · an
a21 a2
2 · · · a2n
· · · · · · · · · · · ·
an−11 an−1
2 · · · an−1n
∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣
.
Any transposition (aiaj) applied to the product P changes P to −P as this
amounts to the interchange of the i-th and j-th columns of the above de-
Chapter 5 Algebraic Structures 233
terminant. Now σ when applied to P has a definite effect, namely, either
it changes P to −P or leaves P unchanged. In case σ changes P to −P , σ
must always be a product of an odd number of transpositions; otherwise it
must be the product of an even number of transpositions.
Definition 5.8.7:
A permutation is odd or even according to whether it is expressible as the
product of an odd number or even number of transpositions.
Example 5.8.8:
Let σ =
1 2 3 4 5 6 7 8 9
4 5 1 2 3 7 9 8 6
Then σ = (14253)(679)(8)
= (13)(15)(12)(14)(69)(67)
= a product of an even number of transpositions
Hence σ is an even permutation.
5.9 Lagrange’s Theorem for Finite Groups
We now establish the most famous basic theorem on finite groups, namely,
Lagrange’s theorem. For this, we need the notion of left and right cosets of
a subgroup.
Definition 5.9.1:
Chapter 5 Algebraic Structures 234
Let G be a group and H, a subgroup of G. For a ∈ G, the left coset aH of a
in G is the subset ah : h ∈ H of G. The right coset Ha of a is defined in
an analogous manner.
Lemma 5.9.2:
Any two left cosets of a subgroup H of a group G are equipotent (that is,
have he same cardinality). Moreover, they are equipotent to H.
Proof. Let aH and bH be two left cosets of H in G. Consider the map
φ : aH → bH
defined by φ(ah) = bh, h ∈ H. φ is 1 − 1 since φ(ah1) = φ(ah2), for
h1, h2 ∈ H, implies that bh1 = bh2 and therefore h1 = h2. Clearly, φ is
onto. Thus φ is a bijection of aH onto bH. In other words, aH and bH are
equipotent. SInce eH = H, H itself is a left coset of H and so all left cosets
of H are equipotent to H.
Lemma 5.9.3:
The left coset aH is equal to H iff a ∈ H.
Proof. If a ∈ H, then aH = ah : h ∈ H ⊂ H as ah ∈ H. Further if
b ∈ H, then a−1b ∈ H, and so a(a−1b) ∈ H. But a(a−1b) = b. Hence b ∈ aH,
and therefore aH = H. (In particular if H is a group, then multiplication of
the elements of H by any element a ∈ H just gives a permutation of H.)
Conversely, if aH = H, then a = ae ∈ aH, as e ∈ H.
Example 5.9.4:
It is not necessary that aH = Ha for all a ∈ G. For example, consider S3,
Chapter 5 Algebraic Structures 235
the symmetric group of degree 3. The 3! = 6 permutations of S3 are given
by
S3 =
e =
1 2 3
1 2 3
,
1 2 3
1 3 2
= (23),
1 2 3
3 2 1
= (13)
1 2 3
2 1 3
= (12),
1 2 3
2 3 1
= (123),
1 2 3
3 1 2
= (132).
Let H be the subgroup e, (12). For a = (123), we have
aH = (123)e, (123)(12) = (13) , and
Ha = e(123), (12)(123) = (23)
so that aH 6= Ha.
Proposition 5.9.5:
The left cosets aH and bH are equal iff a−1b ∈ H.
Proof. If aH = bH, then there exist h1, h2 ∈ H such that ah1 = bh2, and
therefore a−1b = h1h−12 ∈ H, h−1
2 ∈ H. If a−1b ∈ H, let a−1b = h ∈ H so
that b = ah and bH = (ah)H = a(hH) = aH by Lemma ??.
Lemma 5.9.6:
Any two left cosets of the same subgroup of a group are either identical or
disjoint.
Proof. Suppose aH and bH are two left cosets of the subgroup H of a group
G, where a, b ∈ G. If aH and bH are disjoint there is nothing to prove.
Otherwise, aH∩bH 6= φ,and therefore, there exist h1, h2 ∈ H with ah1 = bh2.
Chapter 5 Algebraic Structures 236
This however means that a−1b = h−11 h2 ∈ H. So by Proposition 5.9.5,
aH = bH.
Example 5.9.7:
For the subgroupH of Example 5.9.4, we have seen that (123)H = (123), (13).
Now (12)H = (12)e, (12)(12) = (12), e = H, and hence (123)H ∩
(12)H = φ. Also (23)H = (23)e, (23)(12) = (23), (132), and (13)H =
(13)(123), (13)((13) = (123), e = (12) ∈ H
The last equation holds since (13)−1(123) = (13)(123) = (12) ∈ H (Refer
to Proposition 5.9.5).
Theorem 5.9.8:
[Lagrange’s Theorem, after the French mathematician J. L. Lagrange]Lagrange
The order of any subgroup of a finite group G divides the order of G.
Proof. Let H be a subgroup of the finite group G. We want to show that
o(H)| · o(G). We show this by proving that the left cosets of H in G form
a partition of G. First of all, if g is any element of G, g = ge ∈ gH. Hence
every element of G is in some left coset of H. Now by Lemma 5.9.6, the
distinct cosets of H are pairwise disjoint and hence form a partition of G.
Again, by Lemma ??, all the left cosets of H have the same cardinality as
H, namely, o(H). Thus if there are l left cosets of H in G, we have
l · o(H) = o(G) (5.1)
Consequently, o(H) divides o(G).
Definition 5.9.9:
Let H be a subgroup of a group G. Then the number (may be infinite) of
Chapter 5 Algebraic Structures 237
left cosets of H in G is called the index of H in G and denoted by iG(H)
If G is a finite group, then equation 5.1 in the proof of Theorem 5.9.8
shows that
o(G) = o(H)iG(H)
Example 5.9.10:
[An application of Lagrange’s theorem] If p is a prime, and n any positive
integer, then
n|φ(pn − 1)
First we prove a lemma.
Lemma 5.9.11:
If m ≥ 2 is a positive integer, and S, the set of positive integers less than m
and prime to it, then S is a multiplicative group modulo m.
Proof. If (a,m) = 1 and (b,m) = 1, then (ab,m) = 1. For if p is a prime
factor of ab and m, then as p divides ab, p must divide either a or b, say, p|a.
Then (a,m) ≥ p, a contradiction. Moreover, if ab ≡ c( mod m), 1 ≤ c < m,
then (c,m) = 1. Thus ab( mod m) = c ∈ S. Also as (1,m) = 1, 1 ∈ S. Now
for any a ∈ S, by Euclidean algorithm, there exists b ∈ N such that ab ≡ 1(
mod m). Then (b,m) = 1 (if not, there exists a prime p with p|b and p|m,
then p|1, a contradiction.) Thus a has an inverse b( mod m) in S. Thus S
is a multiplicative group modulo m, and o(S) = φ(m).
Proof. [of 5.9.10] We apply Lemma 5.9.11 by taking m = pn − 1. Let H =
1, p, p2, . . . , pn−1. All the numbers in H are prime to m and hence H ⊂ S
(as defined in Lemma 5.9.10). Further pj · pn−j = pn = 1( mod m). Hence
Chapter 5 Algebraic Structures 238
every element of H has an inverse modulo m. Therefore (as the other group
axioms are trivially satisfied byH), H is a subgroup of order n of S. By
Lagrange’s theorem, o(H)|o(S), and so n|φ(pn − 1).
As an application of Lagrange’s theorem, we have the following result.
Theorem 5.9.12:
Any group of prime order is cyclic.
5.10 Homomorphisms and Isomorphisms of
Groups
Consider the two groups:
G1 = the multiplicative group of the sixth roots of unity
= ω6 = 1, ω, ω2, . . . , ω5 : ω = a primitive sixty root of unity
and G2 = the additive group of Z = 0, 1, 2, . . . , 5G1 is a multiplicative group while G2 is an additive group. However, struc-
turewise, they are just the same. By this we mean that if we can make a
suitable identification of the elements of the two groups, then they behave in
the same manner. If we make the correspondence
ωi ←→ i,
we see that ωiωj ←→ i+ j as ωiωj = ωi+j, when i+ j is taken modulo 6. For
instance, in G1, ω3ω4 = ω7 = ω1, while in G2, 3 + 4 = 1 as 7 ≡ 1( mod 6).
The order of ω in G1 = 6 = the (additive) order of 1 in G2. G1 has 1, ω2, ω4
as a subgroup while G2 has 0, 2, 4 as a subgroup and so on. It is clear that
we can replace 6 by any positive integer n and a similar result holds good.
Chapter 5 Algebraic Structures 239
In the above situation, we say G1 and G2 are isomorphic groups. We now
formalize the above concept.
Definition 5.10.1:
Let G and G′ be groups (distinct or not). A homomorphism from G to G1 is
a map f : G→ G′ such that
f(ab) = f(a)f(b). (5.2)
In Definition 5.10.1 the multiplication operation has been used to denote
the group operations in G1 and G2. If, for instance, G1 is an additive group
and G2 is a multiplicative group, the equation 5.2 should be changed to
f(a+ b) = f(a)f(b)
and so on.
Definition 5.10.2:
An isomorphism from a group G to a group G′ is bijective homomorphism
from G to G′, that is, it is a map f : G→ G′ which is both a bijection and a
group homomorphism.
It is clear that if f : G → G′ is an isomorphism from G to G′, then
f−1 : G′ → G is an isomorphism from G′ → G. Hence if there exists a group
isomorphism from G to G′, we can say without any ambiguity that G and
G′ are isomorphic groups. A similar statement cannot be made for group
homomorphism. If G is isomorphic to G′, we write: G ≃ G′
Chapter 5 Algebraic Structures 240
Examples
1. Let G = (Z,+), and G′ = (nZ,+). (nZ is the set got by multiplying all
integers by n). The map f : G → G′ defined by f(m) = mn,m ∈ G, is a
group homomorphism from G onto G′.
2. Let G = (R,+) and G′ = (R+, ·). The map f : G → G′ defined by
f(x) = ex, x ∈ G, is a group homomorphism from G onto G′.
3. Let G = (Z,+), and G′ = (Z × Z,+). The map f : G → G′ defined by
f(n) = (0, n), n ∈ Z is a homomorphism from G to G′.
4. Let G = R2 = R × R, the real plane with addition + group operation
(this (x, y) + (x′, y′) = (x + x′, y + y′), and PX : R2 → R be defined by
PX(x, y) = x, the projection of R2 on the X-axis. PX is a homomorphism
from (R2,+) to (R,+).
We remark that the homomorphism in Example 3 above is not onto while
those in Examples 1, 2 and 4 are onto. The homomorphisms in Examples 1
and 2 are isomorphisms. The isomorphism in Example 1 is an isomorphism of
G onto a proper subgroup of G. We now check that the map f of Example 2
is an isomomorphism. First it is a group homomorphism since
f(x+ y) = ex+y = exey = f(x)f(y).
(Note that the group operation in G is addition while that in G′ is multipli-
cation). Next we check that f is 1 − 1. In fact, f(x) = f(y) gives ex = ey,
and therefore ex−y = 1. This means, as the domain of f is R, x − y = 0,
and hence x = y. Finally, f is onto. If y ∈ R+, then there exists x such that
Chapter 5 Algebraic Structures 241
ex = ey; in fact, x = l0gey, is the unique preimage of y. Thus f is a 1 − 1,
onto group homomorphism and hence it is a group isomorphism.
5.11 Properties of Homomorphisms of Groups
Let f : G → G′ be a group homomorphism. Then f satisfies the following
properties:
Property 1 f(e) = e′, that is, the image of the identity element e of G
under f is the identity element e′ of G′.
Proof. For x ∈ G, the equation xe = x in G gives, as f is a group
homomorphism, f(xe) = f(x)f(e) = f(x) = f(x)e′ in G′. As G′ is a
group, both the cancellation laws are valid in G′. Hence cancellation of
f(x) gives f(e) = e′.
Property 2 The image f(a−1) of the inverse of an element a of G is the
inverse of f(a) in G′, that is, f(a−1) = (f(a))−1.
Proof. The relation aa−1 = e in G gives f(aa−1) = f(e). But by Prop-
erty 1, f(e) = e′, and as f is a homomorphism, f(aa−1) = f(a)f(a−1).
Thus f(a)f(a−1) = e′ in G′. This implies that f(a−1) = (f(a))−1.
Property 3 The image f(G) ⊂ G′ is a subgroup of G′. In other words, the
homomorphic image of a group is a group.
Proof. (i) Let f(a), f(b) ∈ f(G), where a, b ∈ G. Then f(a)f(b) =
f(ab) ∈ f(G), as ab ∈ G.
Chapter 5 Algebraic Structures 242
(ii) The associative law is valid in f(G). As f(G) ⊂ G′ and G′, being
a group, satisfies the associative law.
(iii) By Property 1, the element f(e) ∈ f(G) acts as the identity ele-
ment of f(G).
(iv) Let f(a) ∈ f(G), a ∈ G. By Property 2, (f(a))−1 = f(a−1) ∈
f(G), as a−1 ∈ G.
Thus f(G) is a subgroup of G′.
Theorem 5.11.1:
Let f : G → G′ be a group homomorphism and K = a ∈ G : f(a) = e′,
that is K is the set of all those elements of G that are mapped by f to the
identity element e′ of G. Then K is a subgroup of G.
Proof. By Exercise***, it is enough to check that if a, b ∈ K, then ab−1 ∈ K.
Now f(ab−1) = (as f is a group homomorphism)
f(a)f(b−1) = f(a) (f(b))−1 = e′(e′)−1 = e′e′ = e′ = e′
and hence ab−1 ∈ K. Thus K is a subgroup of G.
Definition 5.11.2:
The subgroup K defined in the statement of Theorem 5.11.1 is called the
kernel of the group homomorphism f .
As before, let f : G→ G′ be a group homomorphism.
Property 4 For a, b ∈ G, f(a) = f(b) iff ab−1 ∈ K, the kernel of f .
Chapter 5 Algebraic Structures 243
Proof.
f(a) = f(b)⇔ f(a) (f(b))−1 = e′
⇔ f(a)f(b−1) = e′ (By Property 2)
⇔ f(ab−1) = e′
⇔ ab−1 ∈ K.
Property 5 f is a 1− 1 map iff K = e.
Proof. Let f be 1 − 1, a ∈ K. Then by the definition of the kernel,
f(a) = e′. But e′ = f(e), by Property 1. Thus f(a) = f(e), and this
implies, as f is 1− 1, that a = e.
Conversely, assume that K = e, and let f(a) = f(b). Then, by
Property 4, ab−1 ∈ K. Thus ab−1 = e and so a = b. Hence f is
1− 1.
Property 6 A group homomorphism f : G → G′ is an isomorphism iff
f(G) = G′ and K(= the kernel of f)= e.
Proof. f is an isomorphism iff f is a 1−1, onto homomorphism. Now f
is 1−1 iff K = e, by Property 5. Further, f is onto iff f(G) = G′.
Property 7 (Composition of homomorphisms) Let f : G→ G′ and g : G′ →
G′′ be group homomorphisms. Then the composition map h = gof :
G→ G′′ is also a group homomorphism.
Proof. h is a group homomorphism iff h(ab) = h(a)h(b) for all a, b ∈ G.
Now h(ab) = (gof)(ab) = g (f(ab))
Chapter 5 Algebraic Structures 244
= g (f(a)f(b)) , as f is a group homomorphism
= g (f(a)gf(b)) , as g is a group homomorphism
= (g · f) · (g · f)(b)
= h(a)h(b).
5.12 Automorphism of Groups
Definition 5.12.1:
An automorphism of a group G is an isomorphism of G onto itself.
Example 5.12.2:
Let G = ω0 = 1, ω, ω2, be the group of cube roots of unity, where ω =
cos 2π3
+ i sin 2π3
. Let f : G→ G′ be defined by f(ω) = ω2. To make f a group
homomorphism, we have to set f(ω2) = f(ω · ω) = f(ω)f(ω) = ω2 · ω2 = ω,
and f(1) = f(ω3) = (f(ω))3 = (ω2)3 = (ω3)2) = 13 = 1. In other words, the
homomorphism f : G→ G′ is uniquely defined on G once we set f(ω) = ω2.
Clearly, f is onto. Further, only 1 is mapped to 1 by f , while the other two
elements ω and ω2 are moved by f . Thus Ker f = 1. So by Property 7, f
is an isomorphism of G onto G, that is, an automorphism of G.
Our next theorem shows that there is a natural way of generating at least
one set of automorphisms of a group.
Theorem 5.12.3:
Let G be a group and a ∈ G. The map fa : G→ G defined by fa(x) = axa−1
is an automorphism of G.
Chapter 5 Algebraic Structures 245
Proof. First we show that fa is a homomorphism. In fact, for x, y ∈ G,
fa(xy) = a(xy)a−1 by the definition of fa
= a(xa−1ay)a−1
= (axa−1)(aya−1)
= fa(x)fa(y).
Thus fa is a group homomorphism. Next we show that fa is 1− 1. Suppose
for x, y ∈ G, fa(x) = fa(y). This gives axa−1 = aya−1, and so by the two
cancellation laws that are valid in a group, x = y. Finally, if y ∈ G, then
a−1ya ∈ G, and fa(a−1ya) = a(a−1ya)a−1 = (aa−1)y(aa−1) = eye = y, and
so f is onto. Thus f is an automorphism of the group G.
Definition 5.12.4:
An automorphism of a group G that is a map of the form fa for some a ∈ G
is called an inner automorphism of G.
5.13 Normal Subgroups
Definition 5.13.1:
A subgroup N of a group G is called a normal subgroup of G (equivalently,
N is normal in G) if
aNa−1 ⊆ N for each a ∈ G. (5.3)
In other words, N is normal in G if N is left invariant by the inner
automorphisms fa for each a ∈ G. We state this observation as a proposition.
Chapter 5 Algebraic Structures 246
Proposition 5.13.2:
The normal subgroup of a groups G are those subgroups of G that are left
invariant by all the inner automorphisms of G.
Now the condition aNa−1 ⊆ N for each a ∈ G shows, by replacing a by
a−1, a−1N(a−1)−1 = a−1Na ⊆ N . The latter condition is equivalent to
N ⊆ aNa−1 for each a ∈ G. (5.4)
The conditions (5.3) and (5.4) give the following equivalent definition of a
normal subgroup.
Definition 5.13.3:
A subgroup N of a group G is normal in G iff aNa−1 = N (equivalently,
aN = Na) for every a ∈ G.
Examples
1. Let G = S3, the group of 3! = 6 permutations on 1, 2, 3. Let N =
e, (123), (132). Then N is a normal subgroup of S3. First of all note
that N is a subgroup of G. In fact, we have (123)2 = (132), (132)2 = (123),
and (123)(132) = e. Hence (123)−1 = (132) and (132)−1 = (123).
Let a ∈ S3. If a ∈ N, then aN = N = Na (See 5.9.3). So let
a ∈ S \ N . Hence a = (12), (23) or (13). If a = (12), then aNa−1 =
(12)e, (12)(123)(12), (12)(132)(12) = e, (132), (123). In a similar
manner, we have (23)N(23) = N and (13)N(13) = N . Thus N is a
normal subgroup of S3.
2. Let H = e, (12) ⊂ S3. ThenH is is a subgroup of G that is not normal
Chapter 5 Algebraic Structures 247
in S3. In fact, if a = (23), we have
aHa−1 = (23) e, (12) (23)
= (23)e(23), (23)(12)(23)
= e, (13) 6= H
Hence H is not a normal subgroup of S3.
Definition 5.13.4:
The centre of a group G consists of those elements of G each of which
commutes with all the elements of G. It is denoted by C(G). Thus
C(G) = x ∈ G : xa = ax for each a ∈ G
For example, C(S3) = e, that is, the centre of S3 is trivial. Also, it is
easy to see that the centre of an abelian group G is G itself. Clearly the
trivial subgroup e is normal in G and G is normal in G. (Recall that
aG = G for each a ∈ G).
Proposition 5.13.5:
The centre C(G) of a group G is a normal subgroup of G.
Proof. We have for a ∈ G,
aC(G)a−1 =aga−1 : g ∈ C(G)
=
(ag)a−1 : g ∈ C(G)
=g(aa−1) : g ∈ C(G)
= g : g ∈ C(G) = C(G)
Chapter 5 Algebraic Structures 248
Theorem 5.13.6:
f : G → G′ be a group homomorphism. Then H = Ken f is a normal
subgroup of G.
Proof. We have, for a ∈ G, aKa−1 = aka−1 : h ∈ K. Now f(aka−1) =
f(a)f(k)f(a−1) = f(a)e′f(a−1) = f(a)f(a−1) = f(a)(f(a))−1 − e′. Hence
aka−1 ∈ K for each k ∈ K and so aKa−1 ⊂ Ker f = K for every a ∈ G.
This implies that H is a normal subgroup of G.
5.14 Quotient Groups (or Factor Groups)
Let G be a group, and H a normal subgroup of G. Let G/H (read G
modulo H) be the set of all left cosets of H. Recall that when H is a
normal subgroup of G, there is no distinction between the left coset aH
and the right coset Ha of H. The fact that H is a normal subgroup of G
enables us to define a group operation in G/H. We set, for any two cosets
aH and bH of H in G, aH · bH = (ab)H. This definition is well-defined.
By this we mean that if we take different representative elements instead
of a and b to define the cosets aH and bH, still we end up with the same
product. To be precise, let
aH = a1H, and bH = b1H (5.5)
Then (ab)H = (aH)(bH) = (a1H)(b1H) = (a1b1)H (5.6)
because (5.5) implies that
a−1a1 ∈ H and b−1b1 ∈ H,
Chapter 5 Algebraic Structures 249
and so (ab)−1(a1b1) = b−1(a−1a1)b1 = b−1hb1) (h = a−1a1 ⊂ H).
(5.7)
Now we apply Property ???. Thus the product of two (left) cosets of H
in G is itself a left coset of H. Further for a, b, c ∈ G,
(aH)(bH cH) = (aH)(bcH) = (abc)H = ((ab)H)cH = (aH bH)cH.
Thus the binary operation defined in G/H satisfies the associative law.
Further eH = H acts as the identity element of G/H as
(aH)(eH) = (ae)H = aH = (ea)H = eH · aH
Finally, the inverse of aH is a−1H since
(aH)(a−1H) = (aa−1)H = eH = H,
and for a similar reason (a−1H)(aH) = H. Thus G/H is called a group
under this binary operation. G/H is called the quotient group or factor
group of G modulo H.
Example 5.14.1:
We now present an example of a quotient group. Let G = (R2,+), the
additive group of points of the plane R2. (If (x1, y1) and (x2, y2) are two
points of R2, their sum (x1, y1)+(x2, y2) is defined as (x1+x2, y1+y2). The
identity element of this group (0, 0) and the inverse of (x, y) is (−x,−y)).
Let H be the subgroup: (x, 0) : x ∈ R = X-axis. If (a, b) is any point
of R2, then
(a, b) +H = (a, b) + (x, 0) = (a+ x, b) : x ∈ R
Chapter 5 Algebraic Structures 250
=line through (a, b) parallel to X-axis. Clearly (a, b) + H = (a′, b′) + H,
then ((a′ − a), (b′ − b) ∈ H) =X-axis and therefore the Y-coordinate b′ −
b = 0 and so b′ = b. In other words, the line through (a, b) and the line
through (a′, b′), both parallel to the X-axis, are the same iff b = b′, as
is expected (See Figure 5.15. For this reason, this line may be taken as
(0, b) + H. Thus the cosets of H in R ar the lines parallel to X-axis and
therefore the elements of the quotient group R/H are the lines parallel
to the X-axis. If (a, b) + H and (a′, b′) + H are two elements of R/H,
we define their sum to be (a + a′, b + b′) + H = (0, b + b′) + H, the line
through (0, b + b′) parallel to the X-axis. Note that (R,+) is an abelian
group and so H is a normal subgroup of R2. Hence the above sum is well-
defined. The above addition defines a group structured on the set of lines
parallel to the X-axis, that is, the elements of R/H. The identity element
of the quotient group is the X-axis = H, and the inverse of (0, b) + H is
(0,−b) +H.
Our next result exhibits the importance of factor groups.
5.15 Basic Isomorphism Theorem for Groups
If there exists a homomorphism f from a group G onto a group G′ with
kernel K then G/K ≃ G′.
Chapter 5 Algebraic Structures 251
OX
(x, 0)
(0, b)
(0, b′)
(0, b+ b′)
(a, b) (a+ x, b)
Y
x
(a+ x, b′)
︷ ︸︸ ︷
Figure 5.15
Proof. We have to prove that there exists an isomorphism φ from the
factor group G/K onto G′ (observe that the factor group G/K is defined,
as K is a normal subgroup of G (See Theorem 5.13.6). Define φ : G/
K → G′ by φ(gK) = f(G). The mapping φ is pictorially depicted in
Figure 5.15.
First we need to establish that φ is a well defined map. This is because, it
is possible that gK = g′K with g 6= g′ in G. Then as per our definition of
φ, φ(gK) = f(G), while φ(g′K) = f(g′). Hence our definition of φ will be
valid only if f(G) = f(g′). Now gK = g′K implies that g′−1g ∈ K. Let
g′−1g = k ∈ K. Then f(k) = e′, the identity element of G′. Moreover, e′ =
f(g′−1g) = f(g′−1)f(g) = f(g′)−1f(g) (as f is a group homomorphism,
by Property 2 holds). Thus f(g′) = f(g), and f is well defined. We next
show that φ is a group isomorphism.
(i) φ is a group homomorphism: We have for g1K, g2K in G/K,
φ(g1Kg2K) = φ((g1g2)K)
Chapter 5 Algebraic Structures 252
= f(g1g2), by the definition of φ
= f(g1)f(g2), as f is a group homomorphism
= φ(g1K)φ(g2K)
Thus φ is a group homomorphism.
(ii) φ is 1 − 1: Suppose φ(g1K) = φ(g2K), where g1K, g2K ∈ G/K.
This gives that f(g1) = f(g2), and therefore f(g1)f(g2)−1 = e′, the
identity element of G′. But f(g1)f(g2)−1 = f(g1)f(g−1
2 ) = f(g1g−12 ).
Hence g1g−12 = e′, and so g1g
′−12 ∈ K, and consequently, g1K = g2K
(by Property 4). Thus φ is 1− 1.
(iii) φ is onto: Let g′ ∈ G′. As f is onto, there exists g ∈ G with f(g) = g′.
Now gK ∈ G/K, and φ(gK) = f(g) = g′. Thus φ is onto and hence
φ is an isomorphism.
Let us see as to what this isomorphism means with regard to the factor
group R2/H given in Example 5.14.1. Define f : R2 → R by f(a, b) =
(0, b), the projection of the point (a, b) ∈ R2 on the Y-axis= R. The
identity element of the image group is the origin (0, 0). Clearly K is the
set of all points (a, b) ∈ R2 that are mapped to (0, 0), that is, the set
of those points of R2 whose projections on the Y-axis coincide with the
origin. Thus K is the X-axis(= R). Now φ : G/K = R2/R → G′ is
defined by φ((a+ b) +K) = f(a, b) = (0, b). This means that all points of
the line through (a, b) parallel to the X-axis are mapped to their common
projection on the Y-axis, namely, the point (0, b). Thus the isomorphism
between G/K and G′ is obtained by mapping each line parallel to the
X-axis to the point where the line meets the Y-axis.
Chapter 5 Algebraic Structures 253
We now consider another example. Let G = Sn, the symmetric group of
degree n, and G′ = −1, 1, the multiplicative group of two elements (with
multiplication defined in the usual way). 1 is the identity element of G′ and
the inverse of −1 is −1. Define f : G→ G′ by setting
f(σ) =
1 if σ ∈ Sn is an even permutation, that is, if σ ∈ An
−1 if σ ∈ Sn is an odd permutation.
Recall that An is a subgroup of Sn. Now |Sn| = n! and |An| = n!/2. Further,
if α is an odd permutation in Sn, and σ ∈ An, then ασ is an odd permutation,
and hence |αAn| = n!/2. Let Bn denote the set of odd permutations in Sn.
Then Sn = An ∪ Bn, An ∩ Bn = φ, and αAn = Bn for each α ∈ Bn. Thus
Gn/An has exactly two distinct cosets of An, namely, An and Sn \ An = Bn.
The mapping f : Sn → 1,−1 defined by f(α) = 1 or −1 accord-
ing to whether the permutation α is even or odd clearly defines a group
homomorphism. The kernel K of this homomorphism is An, and we have
Sn \An ≃ 1,−1. The isomorphism is obtained by mapping the coset σAn
to 1 or −1 according to whether σ is an even or an odd permutation.
5.16 Exercises
1. Let G = SL(n,C) be the set of all invertible complex matrices A of order
n. IF the operation· denotes matrix multiplication, show that G is a group
under ·.
2. Let G denote the set o fall real matrices of the form ( a 0b 1 ) with a 6= 0.
Show that G is a group under matrix multiplication.
Chapter 5 Algebraic Structures 254
3. Which of the following semigroups are groups?
(i). (Q, ·)
(ii). (R∗, ·)
(iii). (Q,+)
(iv). (R∗, ·)
(v). The set of all 2 by 2 real matrices under matrix multiplication.
(vi). The set of all 2 by 2 real matrices of the form ( a 0b 1 )
4. Prove that a finite cancellative semigroup is a group, that is, if H is a
finite semigroup in which both the cancellation laws are valid (that is,
ax = ay implies that x = y, and xa = ya implies that x = y, a, x, y ∈ H),
is a group. (Hint: If H = x1, x2, . . . , xn, consider aH and Ha and apply
Ex. 7 above.
5. Prove that in semigroup G in which the equations ax = b and yc = d, are
solvable in G, where a, b, c, d ∈ G, is a group.
6. In the group GL(2,C) of 2 × 2 complex nonsingular matrices, find the
order of the following elements:
(i) ( 1 00 −i ) (ii) ( 1 1
0 1 ) (iii) ( i 00 −i ) (iv)
( −2+3i −1+2i1−i 3−2i
)
7. Let G be a group, and φ : G → G be defined by φ(g) = g−1, g ∈ G. Show
that φ is an automorphism of G iff G is abelian.
8. Prove that any group of even order has an element of order 2. (Hint:
o(a) 6= 2 iff a 6= a−1. Pair of such elements (a, a−1).
Chapter 5 Algebraic Structures 255
9. Give an example of a noncyclic group each of whose proper subgroup is
cyclic.
10. Show that no group can be the set union of two of its proper subgroups.
11. Show that S = 3, 5 generates the group (Z,+).
12. Give an example of an infinite nonabelian group.
13. Show that the permutation T = 12345 and S = (25)(34) generate a
subgroup of order 10 in S5.
14. Find if the following permutations are odd:
(i) (123)(456) (ii) (1546)(2)(3) (iii) ( 1 2 3 4 5 6 7 8 92 5 4 3 1 7 6 9 8 ) .
15. If G = a1, . . . , an is a finite abelian group of order n, show that (a1a2 . . . an)2 =
e.
16. Let G = a ∈ R : a 6= 1. If the binary operation ∗ is defined in G by
a ∗ b = a+ b+ ab for all a, b ∈ G, show that (G, ∗) is a group.
17. Let G = a ∈ R : a 6= 1. If the binary operation ∗ is defined on G by
a ∗ b = a+ b− ab for all a, b ∈ G, show that (G, ∗) is not a group.
18. Let σ = (i1i2 . . . ir) be a cycle in Sn of length r. Show that the order of σ
(= the order of the group generated by σ) is r.
19. Prove that the centre of a group is a normal subgroup G.
20. Let α, β, γ be permutations in S4 defined by
α = ( 1 2 3 41 4 3 2 ), β = ( 1 2 3 4
2 1 4 3 ), γ = ( 1 2 3 43 1 2 4 ).
Find (i) α−1, (ii) α−1βγ (iii) βγ−1.
Chapter 5 Algebraic Structures 256
21. Show that any infinite cyclic group is isomorphic to (Z,+).
22. Show that the set ein : n ∈ Z forms a multiplicative group. Show that
this is isomorphic to (Z,+). Is this group cyclic?
23. Find a homomorphism of the additive group of integers to itself that is
not onto.
24. Give an example of a group that is isomorphic to one of its proper sub-
groups.
25. Prove that (Z,+) is not isomorphic to (Q,+). (Hint: Suppose ∃ an iso-
morphism φ : Z→ Q. Let φ(5) = a ∈ Q. Then ∃ b ∈ Q with 2b = a. Let
x ∈ Z be the preimage of b. Then 2x = 5 in Z, which is not true.)
26. Prove that the multiplicative groups R and C are not isomorphic.
27. Give an example of an infinite group in which ∀ element is of finite order.
28. Give the group table of the group S3. From the table, find the centre of
S3.
29. Show that if a subgroup H of a group G is generated by a subset S of G,
then H is a normal subgroup iff aSa−1 ⊂< S > for each a ∈ G.
30. Show that a group G is abelian iff the centre C(G) of G is G.
31. Let G be a group. Let [G,G] denote the subgroup of G generated by all
elements of G of the form aba−1b−1 (called the commutator of a and b)
for all pair of elements a, b ∈ G. Show that [G,G] is a normal subgroup of
G. [Hint: For c ∈ G, we have c(aba−1b−1)c−1 = (cac−1)(cbc−1)(cc−1)−1 ∈
[G,G]. Now apply Exercise 29.]
Chapter 5 Algebraic Structures 257
32. Show that a group G is abelian ⇔ [G,G] = e.
Remark: [G,G] is called the commutator subgroup of G and it is for this
reason, we take the subgroup generated by the commutators of G. There
is no known elementary counter example. For a counter example, see for
instance, Rotman, Theory of Groups.
33. Let G be the set of all roots of unity, that is, G = ω ∈ C : ωn = 1 for
some n ∈ N. Prove that G is an abelian group that is not cyclic.
34. If A and B are normal subgroups of a group G such that A ∩ B = e.
Then show that for ∀a ∈ A and b ∈ B, ab = ba.
35. If H is the only subgroup of a given finite order in a group G, show that
H is normal in G.
36. Show that any subgroup of a group G of index 2 is normal in G.
37. Prove that the subgroup e, (13) of S3 is not normal in S3.
38. Prove that the subgroup e, (123), (132) is a normal subgroup of S3.
5.17 Rings
The study of commutative rings arose as a natural abstraction of the algebraic
properties of the set of integers while that of the fields arose out of the sets
of rational, real and complex numbers.
We begin with the definition of a ring and then proceed on to establish
some of its basic properties.
Chapter 5 Algebraic Structures 258
5.17.1 Rings, Definitions and Examples
Definition 5.17.1:
A ring is a set A with two binary operations, denoted by + and · (called
addition and multiplication respectively) satisfying the following axioms:
R1: (A,+) is an abelian group. (The identity element of (A,+) is denoted
by 0).
R2: · is associative, that is, a · (b · c) = (a · b) · c for all a, b, c ∈ A.
R3: For all a, b, c ∈ A,
a · (b+ c) = a · b+ a · c (left distribution law)
(a+ b) · c = a · c+ b · c (right distribution law).
It is customary to write ab instead of a · b.
Examples of Rings
1. A = Z, the set of all integers with the usual addition + and the usual
multiplication taken as ·.
2. A = 2Z, the set of even integers with the usual addition and multipli-
cation.
3. A = Q, R or C with the usual addition and multiplication.
4. A = Zn = 0, 1, 2, . . . , n− 1, the set of integers modulo n, where +
and · denote addition and multiplication taken modulo n. (For in-
stance, if A = Z5, then in Z5, 3 + 4 = 7 = 2, and 3 · 4 = 12 = 2 as both
7 and 12 are congruent to 2 modulo 5).
Chapter 5 Algebraic Structures 259
5. A = Z[X], the set of polynomials in the indeterminate X with integer
coefficients with addition + and multiplication · defined in the usual
way.
6. A = Z + i√
3Z =a+ ib
√3 : a, b ∈ Z
⊂ C. Then A is a ring with the
usual + and · in C.
7. (Ring of Gaussian integers). LetA = ‘Z+iZ′ = a+ ib : a, b ∈ Z ⊂ C.
Then with the usual addition + and multiplication · in C, A is a ring.
8. (A ring of functions). Let A = C[0, 1], the set of all complex-valued
continuous functions on [0, 1]. For t ∈ [0, 1], and f, g ∈ A, set (f +
g)(t) = f(t) + g(t), and (f · g)(t) = f(t)g(t). Then it is clear that both
f + g and f · g are in A. It is easy to check that A is a ring.
Definition 5.17.2:
A ring A is called commutative if for all a, b ∈ A, ab = ba.
Hence if A is a noncommutative ring, there exists a pair of elements
x, y ∈ A with xy 6= yx.
All the rings given above in Examples 1 to 8 are commutative rings. We
now present an example of a noncommutative ring.
Example 9: Let A = M2(Z), the set of all 2 by 2 matrices with integers
as entries. A is a ring with the usual matrix addition as + and usual
matrix multiplication as ·. It is a noncommutative ring since M =(1 10 0
), N =
(0 01 1
)are in A, but MN 6= NM .
Chapter 5 Algebraic Structures 260
Unity element of a ring
Definition 5.17.3:
An element e of a ring A is called a unity element of A if ea = ae = a for all
a ∈ A.
A unity element of A, if it exists, must be unique. For, if e and f are
unity elements of A, then,
ef = e as f is a unity or an identity element of A,
ef = f as e is a unity element of A.
Therefore e = f . Hence if a ring A has a unity element e, we can refer to
it as the unity element e of A.
For the rings in Examples 1, 3 and 7 above, the number 1 is the unity
element. For the ring C[0, 1] of Example 8 above, the function 1 ∈ C[0, 1]
defined by 1(t) = 1 for all t ∈ [0, 1] acts as the unity element. For the ring
M2(Z) of Example 9., the matrix(1 00 1
)acts as a the unity element. A ring
may not have a unity element. For instance, the ring 2Z in Example 5.17.1
above has no unity element.
5.17.2 Units of a ring
An element a of a ring A with unit element e is called a unit in A if there
exist elements b and c in A such that ab = e = ca.
Proposition 5.17.4:
If a is a unit in a ring A with unity element e, and if ab = ca = e, then b = c.
Proof. We have, b = eb = (ca)b = c(ab) = ce = c.
Chapter 5 Algebraic Structures 261
We denote the element b(= c) described in Proposition 5.17.4 as the
inverse of a and denote it by a−1. Thus if a is a unit in A, then there exists
an element a−1 ∈ A such that aa−1 = a−1a = e. Clearly, a−1 is unique.
Proposition 5.17.5:
The units of a ring A (with identity element) form a group under multipli-
cation.
Proof. Exercise.
Units of the ring Zn
Let a be a unit in the ring Zn. (See Example 5.17.1 above). Then there
exists an x ∈ Zn such that ax = 1 in Zn, or equivalently, ax ≡ 1( mod n).
But this implies that ax − 1 = bn for some integer b. Hence (a, n) = the
gcd of a and n = 1. (Because if an integer c > 1 divides both a and n, then
it should divide 1). Conversely, if (a, n) = 1, by Euclidean algorithm (see
Section ??), there exist integers x and y with ax + ny = 1, and therefore
ax ≡ 1( mod n). This however means that a is a unit in Zn. Thus the set
U of units of Zn consists precisely of those integers in Zn, that are relatively
prime to n. By Definition 3.7.2, |U | is φ(n), where φ is the Euler function.
Zero Divisors
In the ring Z of integers, a is a divisor of c if there exists an integer b such
that ab = c. As Z is a commutative ring, we simply say that a is a divisor
of c and not a left divisor or right divisor of c. Taking c = 0, we have the
following more general definition.
Chapter 5 Algebraic Structures 262
Definition 5.17.6:
A left zero divisor in a ring A is a non-zero element a of A such that there
exists a non-zero element b of A with ab = 0 in A. a ∈ A is a right zero
divisor in A if ca = 0 for some c ∈ A, c 6= 0.
If A is commutative ring, a left zero divisor a in A is automatically a right
zero divisor in A and vice versa. In this case, we simply call a a zero divisor
in A.
Examples
1. If a = 2 in Z4, then a is a zero divisor in Z4, as 2 · 2 = 4 = 0 in Z4.
2. In the ring M2(Z), the matrix [ 1 10 0 ] is a right zero divisor as
[ 0 00 1 ] [ 1 1
0 0 ] = [ 0 00 0 ]
and [ 0 00 1 ] is not the zero matrix of M2(Z).
3. If p is a prime, then every non-zero element of Zp is a unit. This follows
from the fact that if 1 ≤ a < p, then (a, p) = 1. Hence no a ∈ Zp,
a 6= 0 is a zero divisor in Zp.
Theorem 5.17.7:
The following statements are true for any ring A.
(i) a0 = 0a for any a ∈ A.
(ii) a(−b) = (−a)b = −(ab) for all a, b ∈ A.
(iii) (−a)(−b) = ab for all a, b ∈ A.
Chapter 5 Algebraic Structures 263
Proof. Exercise.
5.18 Integral Domains
An integral domain is an abstraction of the algebraic structure of the ring of
integers.
Definition 5.18.1:
An integral domain A is a commutative ring with unity element having no
divisors of zero.
Examples
1. The rings Z and Z + iZ are both integral domains.
2. The ring 2Z of even integers is not an integral domain (Why?) even
though it has no zero divisor.
3. Let Z× Z = (a, b) : a ∈ Z, b ∈ Z. For (a, b), (c, d) in Z× Z, define
(a, b)± (c, d) = (a± c, b± d),
and(a, b) · (c, d) = (ac, bd).
Clearly Z×Z is a commutative ring with zero element (0, 0) and unity
element (1, 1) but not an integral domain as it has zero divisors. For
instance, (1, 0) · (0, 1) = (0, 0).
5.19 Exercises
1. Prove that Zn, n ≥ 2, is an integral domain iff n is a prime.
Chapter 5 Algebraic Structures 264
2. Give the proof of Proposition 5.17.5.
3. Determine the group of units of the ring
(i) Z, (ii) M2(Z), (iii) Z + iZ, (iv) Z + i√
3Z.
4. Let A be a ring, and a, b1, b2, . . . , bn ∈ A. Then show that a(b1 + b2 +
· · ·+ bn) = ab1 + ab2 + · · · abn. (Hint: Apply induction on n).
5. Let A be a ring, and a, b ∈ A. Then show that for any positive integer
n,
n(ab) = (na)b = a(nb).
(na stands for the element a+ a+ · · · (n times) of A).
6. Show that no unit of a ring A can be a zero divisor in A.
7. (Definition: A subset B of a ring A is a subring of A if B is a ring with
respect to the binary operations + and · of A). Prove:
(i) Z is a subring of Q.
(ii) Q is a subring of R.
(iii) R is a subring of C.
8. Prove that any ring A with identity element and cardinality p, where
p is a prime, is commutative. (Hint: Verify that the elements 1, 1 +
1, . . . , 1 + 1 + · · ·+ 1 (p times) are all distinct elements of A).
5.20 Fields
We now discuss the fundamental properties of fields and then go on to develop
the properties of finite fields that are basic to coding theory and cryptography.
Chapter 5 Algebraic Structures 265
If rings are algebraic abstractions of the set of integers, fields are algebraic
abstractions of the sets Q, R and C (as mentioned already).
Definition 5.20.1:
A field is a commutative ring with unity element in which every non-zero
element is a unit.
Hence if F is a field, and F ∗ = F \ 0, the set of non-zero elements of
F , then every element of F ∗ is a unit in F . Hence F ∗ is a group under the
multiplication operation of F . Conversely, if F is a commutative ring with
unit element and if F ∗ is a group under the multiplication operation of F ,
then every element of F ∗ is a unit under the multiplication operation of F ,
and hence F is a field. This observation enables one to give an equivalent
definition of a field.
Definition 5.20.2 (Equivalent definition):
A field is a commutative ring F with unit element in which the set F ∗ of
non-zero elements is a group under the multiplication operation of F .
Every field is an integral domain. To see this all that we have to verify
is that F has no zero divisors. Indeed, if ab = 0, a 6= 0, then a−1 exists in F
and so we have 0 = a−1(ab) = (a−1a)b = b in F . However, not every integral
domain is a field. For instance, the ring Z of integers is an integral domain
but not a field. (Recall that the only non-zero integers which are units are 1
and −1.)
Chapter 5 Algebraic Structures 266
5.21 Characteristic of a Field
Definition 5.21.1:
A field F is called finite if |F |, the cardinality of F , is finite; otherwise, F is
an infinite field.
Let F be a field whose zero and unity elements are denoted by 0F and
1F respectively. A subfield of F is a subset F ′ of F such that F ′ is also
a field with the same addition and multiplication operations of F . This of
course means that the zero and unity elements of F ′ are the same as those
of F . It is clear that the intersection of any family of subfields of F is again
a subfield of F . Let P denote the intersection of the family of all subfields
of F . Naturally, the subfield P is the smallest subfield of F . Because if
P ′ is a subfield of F that is properly contained in P , then P ⊂ P ′ ⊂6=P , a
contradiction. This smallest subfield P of F is called the prime field of F .
Necessarily, 0F ∈ P and 1F ∈ P .
As 1F ∈ P , the elements 1F , 1F +1F = 2 ·1F , 1F +1F +1F = 3 ·1F and, in
general, n · 1F , n ∈ N, all belong to P . There are then two cases to consider:
Case 1: The elements n ·1F , n ∈ N, are all distinct. In this case, the subfield
P itself is an infinite field and therefore F is an infinite field.
Case 2: The elements n · 1F , n ∈ N, are not all distinct. In this case,
there exist r, s ∈ N with r > s such that r · 1F = s · 1F , and therefore,
(r − s) · 1F = 0, where r − s is a positive integer. Hence there exists a least
positive integer p such that p · 1F = 0. We claim that p is a prime number.
If not, p = p1p2, where p1 and p2 are positive integers less than p. Then
0 = p · 1F = (p1p2) · 1F = (p1 · 1F )(p2 · 1F ) gives, as F is a field, either
Chapter 5 Algebraic Structures 267
p1 · 1F = 0 or p2 · 1F = 0. But this contradicts the choice of p. Thus p is
prime.
Definition 5.21.2:
The characteristic of a field F is the least positive integer p such that p·1F = 0
if such a p exists; otherwise, F is said to be of characteristic zero.
A field of characteristic zero is necessarily infinite (as its prime field al-
ready is). A finite field is necessarily of prime characteristic. However, there
are infinite fields with prime characteristic. Note that if a field F has char-
acteristic p, then px = 0 for each x ∈ F .
Examples
(i) The fields Q, R and C are all of characteristic zero.
(ii) The field Zp of integers modulo a prime p is of characteristic p.
(iii) For a field F , denote by F [X] the set of all polynomials in X over F ,
that is, polynomials whose coefficients are in F . F [X] is an integral
domain and the group of units of F [X] = F ∗.
(iv) The field Zp(X) of rational functions of the forma(X)
b(X), where a(X)
and b(X) are polynomials in X over Zp, and b(X) 6= 0, is an infinite
field of (finite) characteristic p.
Theorem 5.21.3:
Let F be a field of (prime) characteristic p. Then for all x, y ∈ F , (x± y)pn
=
xpn ± ypn
, and (xy)pn
= xpn
ypn
.
Chapter 5 Algebraic Structures 268
Proof. We apply induction on n. If n = 1, (by binomial theorem which is
valid for any commutative ring with unit element).
(x+ y)p = xp +
(p
1
)xp−1y + · · ·+
(p
p− 1
)xyp−1 + yp
= xp + yp, since p|(p
i
), 1 ≤ i ≤ p− 1 (5.1)
So assume that
(x+ y)pn
= xpn
+ ypn
.
Then (x+ y)pn+1
=(
(x+ y)pn)p
=(xpn
+ ypn)pby induction assumption
= (xpn
)p + (ypn
)p (by (5.1))
= xpn+1
+ ypn+1
(5.2)
Next we consider (x− y)pn
. If p = 2, then −y = y and so the result is valid.
If p is an odd prime, change y to −y in (5.2). This gives
(x− y)pn
= xpn
+ (−y)pn
= xpn
+ (−1)pn
ypn
= xpn − ypn
,
since (−1)pn
= −1.
5.22 Vector Spaces
In Section 5.20, we discussed some of the basic properties of fields. In the
present section, we look at the fundamental properties of vector spaces. We
follow up this discussion with a section on finite fields.
Chapter 5 Algebraic Structures 269
While the three algebraic structures—groups, rings and fields—are natu-
ral generalizations of integers and real numbers, the algebraic structure vector
space is a natural generalization of the 3-dimensional Euclidean space.
We start with the formal definition of a vector space. To define a vector
space, we need two objects: (i) a set V of vectors, and (ii) a field F of scalars.
In the case of Euclidean 3-space, V is the set of vectors, each vector being
an ordered triple (x1, x2, x3) of real numbers and F = R, the field of real
numbers. The axioms for a vector space that are given in Definition 5.22.1
below are easily seen to be generalizations of the properties of R3.
Definition 5.22.1:
A vector space (or linear space) V over a field F is a nonvoid set V whose
elements satisfy the following axioms:
(A) V has the structure of an additive abelian group.
(B) For every pair of elements α and v, where α ∈ F and v ∈ V , there exists
an element αv ∈ V called the product of v by α such that
(i) α(βv) = (αβ)v for all α, β ∈ F and v ∈ V , and
(ii) 1v = v for each v ∈ V (Here 1 is the unity element of the field F ).
(C) (i) For α ∈ F , and u, v in V , α(u+v) = αu+αv, that is, multiplication
by elements of F is distributive over addition in V .
(ii) For α, β ∈ F and v ∈ V , (α+β)v = αv+βv, that is multiplication
of elements of V by elements of F is distributive over addition in
F .
Chapter 5 Algebraic Structures 270
If F = R, V is called a real vector space; if F = C, then V is called
a complex vector space. When an explicit reference to the field F
is not required, we simply say that V is a vector space (omitting
the words “over the field F”). The product αv, α ∈ F, v ∈ V , is
often referred to as scalar multiplication, α being the scalar.
5.22.1 Examples of Vector Spaces
1. Let V = R3, the set of ordered triples x1, x2, x3 of real numbers. Then
R3 is a vector space over R (as mentioned earlier) and hence R3 is
a real vector space. More generally, if V = Rn, n ≥ 1, the set of
ordered n-tuples (x1, . . . , xn) of real numbers, then Rn is a real vector
space. If x = (x1, . . . , xn) and y = (y1, . . . , yn) ∈ Rn, then x + y =
(x1 + y1, . . . , xn + yn), and αx = (αx1, . . . , αxn). The zero vector of Rn
is (0, . . . , 0). Rn is known as the real affine space of dimension n.
2. Let V = Cn, the set of ordered n-tuples (z1, . . . , zn) of complex num-
bers. Then V is a vector space over R as well as over C. We note that
the real vector space Cn and the complex vector space Cn are essen-
tially different spaces despite the fact that the underlying set of vectors
is the same in both the cases.
3. R is a vector space over Q.
4. F [X], the ring of polynomials in X over the field F , is a vector space
over F .
5. The set of solutions of a homogeneous linear ordinary differential equa-
tion with real coefficients forms a real vector space. The reason is that
Chapter 5 Algebraic Structures 271
any such differential equation has the form
dny
dxn+ C1
dn−1y
dxn−1+ · · ·+ Cn−1y = 0. (1)
Clearly, if y1(x) and y2(x) are two solutions of the differential equa-
tion (1), then so is y(x) = α1y1(x) +α2y2(x), α1, α2 ∈ R. It is now easy
to verify that the axioms of a vector space are satisfied.
5.23 Subspaces
The notion of a subspace of a vector space is something very similar to the
notions of a subgroup, subring and a subfield.
Definition 5.23.1:
A subspace W of a vector space V over F is a subset W of V such that W is
also a vector space over F with addition and scalar multiplication as defined
for V .
Proposition 5.23.2:
A non-void subset W of a vector space V is a subspace of V iff for all u, v ∈W
and α, β ∈ F ,
αu+ βv ∈ W
Proof. If W is a subspace of V , then, as W is a vector space over F (with
the same addition and scalar multiplication as in V ), αu ∈ W and βv ∈ W ,
and therefore αu+ βv ∈W .
Conversely, if the condition holds, then it means that W is an additive
subgroup of V (. . .). Moreover, taking β = 0, we see that for each α ∈ F ,
Chapter 5 Algebraic Structures 272
u ∈ W , αu ∈ W . As W ⊂ V , all the axioms of a vector space are satisfied
by W and hence W is a subspace of V .
An example of a subspace
Let W = (a, b, 0) : a, b ∈ R. Then W is a subspace of R3. (To see this,
apply Definition 5.23.1). Geometrically, this means that the xy-plane of R3
(that is, the set of points of R3 with the z-coordinate zero) is a subspace of
R3.
Proposition 5.23.3:
If W1 and W2 are subspaces of a vector space V , then W1 ∩ W2 is also a
subspace of V . More generally, the intersection of any family of subspaces of
a vector space V is also a subspace of V .
Proof. Let u, v ∈ W = W1 ∩W2, and α, β ∈ F . Then αu + βv belongs to
W1 as well as to W2 by Proposition 5.23.2, and therefore to W . Hence W is
a subspace of V again by Proposition 5.23.2. The general case is similar.
5.24 Spanning Sets
Definition 5.24.1:
Let S be a subset of a vector space V over F . By the subspace spanned by
S, denoted by < S >, we mean the smallest subspace of V that contains S.
If < S >= V , we call S a spanning set of V .
Clearly, there is at least one subspace of V containing S, namely, V . Let
Chapter 5 Algebraic Structures 273
S denote the collection of all subspaces of V containing S. Then ∩W∈S
W is
also a subspace of V containing S. Clearly, it is the smallest subspace of V
containing S, and hence < S >= ∩W∈S
W .
Example 5.24.2:
We shall determine the smallest subspace of R3 containing the vectors (1, 2, 1)
and (2, 3, 4).
Clearly, W must contain the subspace spanned by (1, 2, 1), that is, the line
joining the origin (0, 0, 0) and (1, 2, 1). Similarly, W must also contain the
line joining (0, 0, 0) and (2, 3, 4). These two distinct lines meet at the origin
and hence define a unique plane through the origin, and this is the subspace
spanned by the two vectors (1, 2, 1) and (2, 3, 4). (See Proposition 5.24.3
below.
Proposition 5.24.3:
Let S be a subset of a vector space V over F . Then < S >= L(S), where
L(S) =α1s1 + α2s2 + · · · + αrsr : si ∈ S, 1 ≤ i ≤ r and αi ∈ F, 1 ≤ i ≤
r, r ∈ N
= set of all finite linear combinations of vectors of S over F .
Proof. First, it is easy to check that L(S) is a subspace of V . In fact, let
u, v ∈ L(S) so that
u = α1s1 + · · ·+ αrsr, and
v = β1s′1 + · · ·+ βts
′t
where si ∈ S, αi ∈ F for each i and βj ∈ F for each j. Hence if α, β ∈ F ,
then
αu+ βv = (αα1)s1 + · · ·+ (ααr)sr + (ββ1)s′1 + · · ·+ (ββt)s
′t ∈ L(S).
Chapter 5 Algebraic Structures 274
Hence by Proposition 5.23.2, L(S) is a subspace of V . Further 1·s = s ∈ L(S)
for each s ∈ S, and hence L(S) contains S. But by definition, < S > is the
smallest subspace of V containing S. Hence < S > ⊆ L(S).
Now, let W be any subspace of V containing S. Then any linear com-
bination of vectors of S is a vector of W , and hence L(S) ⊆ W . In other
words, any subspace of V that contains the set S must contain the subspace
L(S). Once again, as < S > is the smallest subspace of V containing S,
L(S) ≤< S >. Thus < S >= L(S).
Note : If S = u1, . . . , un is a finite set, then < S >=< u1, . . . , un >=
subspace of linear combinations of u1, . . . , un over F . In this case, we say
that S generates the subspace < S > or S is a set of generators for < S >.
Also L(S) is called the linear span of S in V .
Proposition 5.24.4:
Let u1, . . . , un and v be vectors of a vector space V . Suppose that v ∈<
u1, u2, . . . , un >. Then < u1, . . . , un >=< u1, . . . , un; v >.
Proof. Any element α1u1 + · · · + αnun, αi ∈ F for each i, of < u1, . . . , un >
can be rewritten as
α1u1 + · · ·+ αnun + 0 · v
and hence belongs to < u1, . . . , un; v >. Thus
< u1, . . . , un >⊆< u1, . . . , un; v >
Conversely, if
w = α1u1 + · · ·+ αnun + βv ∈< u1, . . . , un; v >,
Chapter 5 Algebraic Structures 275
then as v ∈< u1, . . . , un >, v = γ1u1 + · · ·+ γnun, γi ∈ F and therefore
w = (α1u1 + · · ·+ αnun) + β (γ1u1 + · · ·+ γnun)
=n∑
i=1
(αi + βγi)ui ∈< u1, . . . , un > .
Thus < u1, . . . , un; v >⊆< u1, . . . , un > and therefore
< u1, . . . , un >=< u1, . . . , un; v >
Corollary 5.24.5:
If S is any nonempty subset of a vector space V , and v ∈< S >, then
< S ∪ v >=< S >.
Proof. v ∈< S > implies, by virtue of Proposition 5.24.3, v is a linear com-
bination of a finite set of vectors in S. Now the rest of the proof is as in the
proof of Proposition 5.24.4.
5.25 Linear Independence and Base
Let V = R3, and e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) in R3. If
v = (x, y, z) is any vector of R3, then v = xe1 + ye2 + ze3. Trivially, this
is the only way to express v as a linear combination of e1, e2, e3. For this
reason, we call e1, e2, e3 a base for R3. We now formalize these notions.
Definition 5.25.1: (i) A finite subset S = v1, . . . , vn of vectors of a vec-
tor space V over a field F is said to be linearly independent if the
equation
α1v1 + α2v2 + · · ·+ αnvn = 0, αi ∈ F
Chapter 5 Algebraic Structures 276
implies that αi = 0 for each i.
In other words, a linearly independent set of vectors admits only the
trivial linear combination between them, namely,
0 · v1 + 0 · v2 + · · ·+ 0 · vn = 0
In this case we also say that the vectors v1, . . . , vn are linearly indepen-
dent over F . In the above equation, the zero on the right refers to the
zero vector of V while the zeros on the left refer to the scalar zero, that
is, the zero element of F .
(ii) An infinite subset S of V is linearly independent in V if every finite
subset of vectors of S is linearly independent.
(iii) A subset S of V is linearly dependent over F if it is not linearly indepen-
dent over F . This means that there exists a finite subset v1, . . . , vn
of S and a set of scalars α1, . . . , αn, not all zero, in F such that
α1v1 + · · ·+ αnvn = 0.
If v1, . . . , vn is linearly independent over F , we also note that the
vectors v1, . . . , vn are linearly independent over F .
Remark 5.25.2: (i) The zero vector of V forms a linearly dependent set
since it satisfies the nontrivial equation 1 · 0 = 0, where 1 ∈ F and
0 ∈ V .
(ii) Two vectors V are linearly dependent over F iff one of them is a scalar
multiple of the other.
Chapter 5 Algebraic Structures 277
(iii) If v ∈ V and v 6= 0, then v is linearly independent (since for α ∈
F, αv = 0 implies that α = 0).
(iv) The empty set is always taken to be linearly independent.
Proposition 5.25.3:
Any subset T of a linearly independent set S of a vector space is linearly
independent.
Proof. First assume that S is a finite subset of V . We can take T =
v1, . . . , vr and S = v1, . . . , vr; vr+1, . . . , vn, n ≥ r. The relation
α1v1 + · · ·αrvr = 0, αi ∈ F
is equivalent to the condition that
(α1v1 + · · ·αrvr) + (0 · vr+1 + · · · 0 · vn) = 0.
But this implies, as S is linearly independent over F , αi = 0, 1 ≤ i ≤ n.
Hence T is linearly independent.
If S is an infinite set and T ⊆ S, then any finite subset of T is a finite
subset of S and hence linearly independent. Hence T is linearly independent
over F .
A restatement of Proposition 5.25.3 is that any superset (⊂ V ) of a
linearly dependent subset of V is linearly dependent.
Corollary 5.25.4:
If v ∈ L(S), then S ∪ v is linearly dependent.
Chapter 5 Algebraic Structures 278
Proof. By hypothesis, there exist v1, . . . , vn in S, and α1, . . . , αn ∈ F such
that α1v1 + · · ·αnvn +(−1)v = 0. Hence v1, . . . , vn; v is linearly dependent
and so by Proposition 5.25.3, S ∪ v is linearly dependent.
Examples
1. C is a vector space over R. The vectors 1 and i of C are linearly
independent over R. In fact, if α, β ∈ R, then
α · 1 + β · i = 0
gives that α + βi = 0, and therefore α = 0 = β. One can check that
1+i, 1−i is also linearly independent over R, while 2+i, 1+i, 1−i
is linearly dependent over R. The last assertion follows from the fact
that if u = 2+ i, v = 1+ i and w = 1− i, then u+w = 3 and v+w = 2
so that
2(u+ w) = 3(v + w)
and giving
2u− 3v − w = 0.
2. The infinite set of polynomials S = 1, X,X2, . . . in the vector space
R[X] of polynomials in X with real coefficients is linearly independent.
Recall that an infinite set S is linearly independent iff every finite subset
of S is linearly independent. So consider a finite subset X i1 , X i2 , . . . , X in
of S. The equation
λ1Xi1 + λ2X
i2 + · · ·+ λnXin = 0 (A)
Chapter 5 Algebraic Structures 279
where the scalars λi, 1 ≤ i ≤ n, all belong to R, implies that the
polynomial on the left side of equation (A) is the zero polynomial and
hence must be zero for every real value of X. In other words, every
real number is a zero (root) of this polynomial. This is possible only if
each λi is zero. Hence the set S is linearly independent over R.
5.26 Bases of a Vector Space
Definition 5.26.1:
A basis (or base) of a vector space V over a field F is a subset B of V such
that
(i) B is linearly independent over F , and
(ii) B spans V ; in symbols, < B >= V .
Condition (ii) implies that every vector of V is a linear combination of (a
finite number of) vectors of B while condition (i) ensures that the expression
is unique. Indeed, if u = α1u1 + · · ·+αnun = β1u1 + · · ·+βnun, where the ui’s
are all in B, then (α1−β1)u1+· · ·+(αn−βn)un = 0. The linear independence
of the vectors u1, . . . , un, mean αi−βi = 0, that is, αi = βi for each i.(Notice
that we have taken the same u1, . . . , un in both the expressions as we can
always add terms with zero coefficients. For example, if
u = α1u1 + α2u2 = β1u′1 + β2u
′2, then
u = α1u1 + α2u2 + 0 · u′1 + 0 · u′2
= 0 · u1 + 0 · u2 + αu′1 + αu′2).
Chapter 5 Algebraic Structures 280
Example 5.26.2:
The vectors e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) form a basis for R3.
This follows from the following two facts.
1. e1, e2, e3 is linearly independent in R3. In fact, α!e1+α2e2+α3e3 = 0,
αi ∈ R, implies that
(α1, α2, α3) = α1(1, 0, 0) + α2(0, 1, 0) + α3(0, 0, 1) = (0, 0, 0),
and hence αi = 0, 1 ≤ i ≤ 3.
2. < e1, e2, e3 >= R3. To see this, any vector of < e1, e2, e3 >= α1e1 +
α2e2 + α3e3 = (α1, α2, α3) ∈ R3 and, conversely, any (α1, α2, α3) in R3
is α1e1 + α2e2 + α3e3 and hence belongs to < e1, e2, e3 >.
5.27 Dimension of a Vector Space
Definition 5.27.1:
By a finite-dimensional vector space, we mean a vector space that can be
generated (or spanned) by a finite number of vectors in it.
Our immediate goal is to establish that any finite-dimensional vector
space has a finite basis and that any two bases of a finite-dimensional vector
space have the same number of elements.
Lemma 5.27.2:
No finite-dimensional vector space can have an infinite basis.
Proof. Let V be a finite-dimensional vector space with a finite spanning set
S = v1, . . . , vn. Suppose to the contrary V has an infinite basis B. Then,
Chapter 5 Algebraic Structures 281
as B is a basis, vi is a linear combination of a finite subset Bi, 1 ≤ i ≤ n of
B. Let B′ =n∪
i=1Bi. Then B′ is also a finite subset of B. As B′ ⊂ B, B′ is
linearly independent and further, as each v ∈ V is a linear combination of
v1, . . . , vn, v is also a linear combination of the vectors of B′. Hence B′ is
also a basis for V . If x ∈ B|B′, then x ∈ L(B′) and so B′ ∪ x is a linearly
dependent subset of the linearly independent set B, a contradiction.
Lemma 5.27.3:
A finite sequence v1, . . . , vn of non-zero vectors of a vector space V is
linearly dependent iff for some k, 2 ≤ k ≤ n, vk is a linear combination of its
preceding vectors.
Proof. In one direction, the proof is trivial; if vk ∈< v1, . . . , vk−1 >, then
by Proposition 5.25.3, v1, . . . vk−1; vk is linearly dependent and so is its
superset v1, . . . , vn.
Conversely, assume that v1, v2, . . . , vn is linearly dependent. As v1 is
a non-zero vector, v1 is linearly independent (See (iii) of Remark 5.25.2).
Hence there must exist a k, 2 ≤ k ≤ n such that v1, . . . , vk−1 is linearly
independent while v1, . . . , vk is linearly dependent since at worst k can be
n. Hence there exists a set of scalars α1, . . . , αk, not all zero, such that
α1v1 + · · ·+ αkvk = 0.
Now αk 6= 0; for if αk = 0, there exists a nontrivial linear relation connecting
v1, . . . , vk−1 contradicting the fact that v1, . . . , vk−1 is linearly independent.
Thus
vk = −α−1k α1v1 − · · · − α−1
k αk−1vk−1.
Chapter 5 Algebraic Structures 282
Lemma 5.27.3 implies, by Proposition 5.24.4, that under the stated con-
ditions on vk,
< v1, . . . , vk, . . . , vn >=< v1, . . . , vk−1, vk+1, . . . , vn >=< v1, . . . ,∧vk, . . . , vn >,
where the ∧ symbol upon vk indicates that the vector vk should be deleted.
We next prove a very important property of finite-dimensional vector
spaces.
Theorem 5.27.4:
Any finite-dimensional vector space has a basis. Moreover, any two bases of
a finite-dimensional vector space have the same number of elements.
Proof. Let V be a finite-dimensional vector space. By Lemma 5.27.2, every
basis of V is finite. Let S = u1, . . . , um, and T = v1, . . . , vn be any two
bases of V . We want to prove that m = n.
Now v1 ∈ V =< u1, . . . , um >. Hence the set S1 = v1; u1, . . . , um is lin-
early dependent. By Lemma 5.27.3, there exists a vector ui1 ∈ u1, . . . , um
such that
< v1; u1, . . . , um >=< v1; u1, . . .∧ui1 , . . . , um > .
Now consider the set of vectors
S2 = v2, v1; u1, . . . , ,∧ui1 , . . . , um = v2, v1; u1, . . . um/
∧ui1.
As v2 ∈ V =< v1; u1, . . . ,∧ui1 , . . . , um >, there exists a vector ui2 ∈ u1, . . . ,
∧ui1 , . . . , um such that ui2 is a linear combination of the vectors preceding it
in the sequence S2. (Such a vector cannot be a vector of T as every subset
of T is linearly independent). Hence if
S3 = v1, v2; u1, . . . ,∧ui1 , . . .
∧ui2 , . . . , um
Chapter 5 Algebraic Structures 283
= v1, v2; u1, . . . , ui1 , . . . ui2 , . . . , um/ ∧ui1 ,
∧ui2
,
< S3 > = V.
Thus every time we introduce a vector from T , we are in a position to delete
a vector from S. Hence |T | ≤ |S|, that is, n ≤ m. Interchanging the roles
of the bases S and T , we see, by a similar argument, that m ≤ n. Thus
m = n.
Note that we have actually shown that any finite spanning subset of a
finite-dimensional vector space V does indeed contain a finite basis of V .
Theorem 5.27.4 makes the following definition unambiguous.
Definition 5.27.5:
The dimension of a finite-dimensional vector space is the number of elements
in any one of its bases.
If V is of dimension n over F , we write dimF V = n or, simply, dimV = n,
when F is known.
Examples
1. Rn is of dimension n over R. In fact, it is easy to check that the set of
vectors
S = e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1)
is a basis for Rn over R.
2. Cn is of dimension n over C.
Chapter 5 Algebraic Structures 284
3. Cn is of dimension 2n over R. In fact, if ek ∈ Cn has 1 in the k-th
position and 0 in the remaining positions, and fk ∈ Cn has i =√−1
in the k-th position and 0 in the remaining positions, then
S = e1, . . . , en; f1, . . . fn
forms a basis of Cn over R. (Verify!)
4. Let Pn(X) denote the set of real polynomials in X with real coefficients
of degrees not exceeding n. Then B = 1, X,X2, . . . , Xn is a basis for
Pn(X). Hence dimR Pn(X) = n+ 1.
5. The vector space R over Q is infinite-dimensional. This can be seen as
follows: Suppose dimQ R = n (finite). Then R has a basis v1, . . . , vn
over Q. Hence
R = α1v1 + · · ·+ αnvn : αi ∈ Q for each i
But as Q is countable, the number of such linear combinations is count-
able (See Section ??). This is a contradiction as R is uncountable. Thus
R is infinite-dimensional over Q.
Proposition 5.27.6:
Any maximal linearly independent subset of a finite-dimensional vector space
V is a basis for V .
Proof. Let B be a maximal linearly independent subset of V , that is, B is
not a proper subset of B′, where B′ is linearly independent in V . Suppose
B is not a basis for V . This means, by the definition of a basis for a vector
space, that there exists a vector x in V such that x 6∈< B >. Then B ∪ x
Chapter 5 Algebraic Structures 285
must be a linearly independent subset of V ; for, suppose B ∪ x is linearly
dependent. Then there exist v1, . . . , vn in B satisfying a nontrivial relation
α1v1 + · · ·+ αnvn + βx = 0, αi ∈ F, for each i and β ∈ F .
Now, β 6= 0, as otherwise, v1, . . . , vn would be linearly dependent over F .
Thus x = −β−1 (α1v1 + · · ·+ αnvn) ∈< B >, a contradiction. Thus B ∪ x
is linearly independent. But then the fact that B ∪ x ⊃ B violates the
maximality of B. Hence B must be a basis for V .
5.28 Exercises
1. Show that Z is not a vector space over Q.
2. If n ∈ N, show that the set of all real polynomials of degree n does not
form a vector space over R (under usual addition and scalar multipli-
cation of polynomials).
3. Which of the following are vector spaces over R?
(i) V1 = x, y, z ∈ R3 such that y + z = 0.
(ii) V1 = x, y, z ∈ R3 such that y + z = 1.
(iii) V1 = x, y, z ∈ R3 such that y ≥ 0.
(iv) V1 = x, y, z ∈ R3 such that z = 0.
4. Show that the dimension of the vector space of all m by n real matrices
over R is mn. [Hint: For m = 2, n = 3, the matrices
[1 0 00 0 0
],[0 1 00 0 0
],[0 0 10 0 0
],[0 0 01 0 0
],[0 0 00 1 0
],[0 0 00 0 1
]
form a basis for the space of all 2 by 3 real matrices. Verify this first].
Chapter 5 Algebraic Structures 286
5. Prove that a subspace of a finite-dimensional vector space is finite-
dimensional.
6. Show that the vector space of real polynomials inX is infinite-dimensional
over R.
7. Find a basis and the dimension of the subspace of R4 spanned by the
vectors u1 = (1, 2, 2, 0), u2 = (2, 4, 0, 1) and u3 = (4, 8, 4, 1).
8. Find the dimension of the subspace of R3 spanned by v1 = (2, 3, 7),
v2 = (1, 0,−1), v3 = (1, 1, 2) and v4 = (0, 1, 3).
[Hint: v1 = 3v3 − v2 and v4 = v3 − v2.]
9. State with reasons whether each of the following statements is true or
false:
(i) A vector space V can have two disjoint subspaces.
(ii) Every vector space of dimension n has a subspace of dimension
m for each m ≤ n.
(iii) A two-dimensional vector space has exactly three subspaces.
(iv) In a vector space, any two generating (that is, spanning) subsets
are disjoint.
(v) If n vectors of a vector space V span a subspace U of V , then
dimU = n.
Chapter 5 Algebraic Structures 287
5.29 Solutions of Linear Equations and Rank
of a Matrix
Let
A =
a11 a12 . . . a1na21 a22 . . . a2n...
......
am1 am2 . . . amn
be an m by n matrix over a field F . To be precise, we take F = R, the field
of real numbers. Let R1, R2, . . . , Rm be the row vectors and C1, C2, . . . , Cn
the column vectors of A. Then each Ri ∈ Rn and each Cj ∈ Rm. The row
space of A is the subspace <R1, . . . , Rm > of Rn, and its dimension is the
row rank of A. Clearly, (row rank of A) ≤ m since any m vectors of a vector
space span a subspace of dimension at most m. The column space of A and
the column rank of A (≤ n) are defined in an analogous manner.
We now consider three elementary row transformations (or operations)
defined on the row vectors of A:
(i) Rij—interchange of the i-th and j-th rows of A.
(ii) kRi—multiplication of the i-th row vector Ri of A by a non-zero scalar
(real number) k.
(iii) Ri + cRj—addition to the i-th row of A, c times the j-th row of A, c
being a scalar.
The elementary column transformations are defined in an analogous manner.
The inverse of each of these three transformations is again a transformation
of the same type. For instance, the inverse of Ri + cRj is got by additing to
the new i-th row, −c times Rj.
Chapter 5 Algebraic Structures 288
Let A∗ be a matrix obtained by applying a finite sequence of elementary
row transformations to a matrix A. Then the row space of A = row space of
A∗, and hence, row rank of A = dim( row space of A) = dim( row space of
A∗) = row rank of A∗. Now a matrix A∗ is said to be in row-reduced echelon
form if:
(i) The leading non-zero entry of any non-zero row (if any) of A∗ is 1.
(ii) The leading 1’s in the non-zero rows of A∗ occur in increasing order of
their columns.
(iii) Each column of A∗ containing a leading 1 of a row of A∗ has all its
other entries zero.
(iv) The non-zero rows of A∗ precede its zero rows, if any.
Now let D be a square matrix of order n. The three elementary row (respec-
tively column) operations considered above do not change the singular or
nonsingular nature of D. In other words, if D∗ is a row-reduced echelon form
of D, then D is singular iff D∗ is singular. In particular, D is nonsingular iff
D∗ = In, the identity matrix of order n. Hence if a row-reduced echelon form
A∗ of a matrix A has r non-zero rows, the maximum order of a nonsingular
square submatrix of A is r. This number is called the rank of A.
Definition 5.29.1:
The rank of a matrix A is the maximum order of a nonsingular square sub-
matrix of A. Equivalently, it is the maximum order of a nonvanishing deter-
minant minor of A.
Chapter 5 Algebraic Structures 289
Example 5.29.2:
Find the row-reduced echelon form of
A =
1 2 3 −12 1 −1 43 3 2 36 6 4 6
.
As the leading entry of R1 is 1, we perform the operations R2−2R1; R3−3R1;
R4 − 6R1 (where Ri stands for the i-th row of A). This gives
A1 =
1 2 3 −10 −3 −7 60 −3 −7 60 −6 −14 12
.
Next perform 13R2 (that is, replace R2 by 1
3R2). This gives
A′1 =
1 2 3 −10 1 7/3 −20 −3 −7 60 −6 −14 12
.
Now perform R1 − 2R2 (that is, replace R1 by R1 − 2R2 etc.); R3 + 3R2;
R4 + 6R2. This gives the matrix
A2 =
1 0 −5/3 30 1 7/3 −20 0 0 00 0 0 0
.
A2 is the row-reduced echelon form of A. Note that A2 is uniquely determined
by A. Since the maximum order of a non-singular submatrix of A2 is 2, rank
of A = 2. Moreover, row space of A2 =< R1, R2 (of A2) >. Clearly R1 and
R2 are linearly independent over R since for α1, α2 ∈ R, α1R1 + α2R2 =
(α1, α2, −5α1/3 + 7α2/3, 3α1 − 2α2) = 0 = (0, 0, 0, 0) implies that α1 = 0 =
α2. Thus the row rank of A is 2 and therefore the column rank of A is also
2.
Remark 5.29.3:
Since the last three rows of A1 are proportional (that is, one row is a multiple
Chapter 5 Algebraic Structures 290
of the other two), any 3× 3 submatrix of A2 will be singular. Since A1 has a
nonsingular submatrix of order 2, (for example, ( 1 20 −3 ] ), A1 is of rank 2 and
we can conclude that A is also of rank 2.
5.30 Solutions of Linear Equations
Consider the system of linear homogeneous equations
X1 + 2X2 + 3X3 −X4 = 0
2X1 +X2 −X3 + 4X4 = 0 (5.3)
3X1 + 3X2 + 2X3 + 3X4 = 0
6X1 + 6X2 + 4X3 + 6X4 = 0 .
These equations are called homogeneous because if (X1, X2, X3, X4) is a so-
lution of these equations, then so is (kX1, kX2, kX3, kX4) for any scalar k.
Trivially (0, 0, 0, 0) is a solution. We express these equations in the matrix
form
AX = 0, (5.4)
where
A =
1 2 3 −12 1 −1 43 3 2 36 6 4 6
, X =
X1X2X3X4
, and 0 =
0000
.
If X1 and X2 are any two solutions of (5.4), then so is aX1 + bX2 for scalars
a and b since A(aX1 + bX2) = a(AX1) + b(AX2) = a · 0 + b · 0 = 0. Thus the
set of solutions of (5.4) is (as X ∈ Rn) a vector subspace of Rn, where n =
the number of indeterminates in the equations (5.3).
It is clear that the three elementary row operations performed on a sys-
tem of homogeneous linear equations do not alter the set of solutions of the
Chapter 5 Algebraic Structures 291
equations. Hence if A∗ is the row reduced echelon form of A, the solution
sets of AX = 0 and A∗X = 0 are the same.
In the Example 5.29.2, A∗ = A2. Hence the equations A∗X = 0 are:
X1 − 5
3X3 + 3X4 = 0, and
X2 +7
3X3 − 2X4 = 0
These give
X1 =5
3X3 − 3X4
X2 = −7
3X3 + 2X4
so that
X =
X1X2X3X4
= X3
5/3−7/3
10
+X4
−3
201
.
Thus the space of solutions of AX = 0 is spanned by the two linearly inde-
pendent vectors
[5/3
−7/310
]and
[ −3201
]and hence is of dimension 2. This number
2 corresponds to the fact that X1, X2, X3, and X4 are all expressible in terms
of X3 and X4. Here X1 and X2 correspond to the identity submatrix of order
2 of A∗, that is, the rank of A. Also, dimension 2 = 4 − 2 = (number of
indeterminates) − (rank of A). The general case is clearly similar where the
system of equations is given by ai1X1 + · · · + ainXn = 0, 1 ≤ i ≤ m. These
are given by the matrix equation AX = 0, where A is the m by n matrix
(aij) of coefficients and X =
[ X1
...Xn
]; we state it as a theorem.
Theorem 5.30.1:
The solution space of a system of homogeneous linear equations is of dimen-
sion n−r, where n is the number of unknowns and r is the rank of the matrix
A of coefficients.
Chapter 5 Algebraic Structures 292
5.31 Solutions of Nonhomogeneous Linear Equa-
tions
A system of nonhomogeneous linear equations is of the form
a11X1 + a12X2 + · · ·+ a1nXn = b1
......
......
am1X1 + am2X2 + · · ·+ amnXn = bm.
These m equations are equivalent to the single matrix equation
AX = B,
where A = (aij) is an m by n matrix and B is a non-zero column vector of
length m.
It is possible that such a system of equations has no solution at all. For
example, consider the system of equations
X1 −X2 +X3 = 2
X1 +X2 −X3 = 0
3X1 = 6.
From the last equation, we get X1 = 2. This, when substituted in the first
two equations, yields −X2 + X3 = 0, X2 − X3 = −2 which are mutually
contradictory. Such equations are called inconsistent equations.
When are the equations represented by AX = B consistent?
Theorem 5.31.1:
The equations AX = B are consistent if and only if B belongs to the column
space of A.
Chapter 5 Algebraic Structures 293
Proof. The equations are consistent iff there exists a vector X0 =
[ α1
...αn
]such
that AX0 = B. But this happens iff α1C1+· · ·+αnCn = B, where C1, . . . , Cn
are the column vectors of A, that is, iff B belongs to the column space of
A.
Corollary 5.31.2:
The equations represented by AX = B are consistent iff rank of A = rank
of (A,B). [(A,B) denotes the matrix obtained from A by adding one more
column vector B at the end. It is called the matrix augmented by B].
Proof. By the above theorem, the equations are consistent iff
B ∈<C1, . . . , Cn>. But this is the case, by Proposition 5.24.4, iff
< C1, . . . , Cn >=< C1, . . . , Cn, B >. The latter condition is equivalent to,
column rank of A = column rank of (A,B) and consequently to rank of A =
rank of (A,B).
We now ask the natural question. If the system AX = B is consistent,
how to solve it?
Theorem 5.31.3:
Let X0 be any particular solution of the equation AX = B. Then, the set of
all solutions of AX = B is given by X0 + U, where U varies over the set
of solutions of the auxilliary equation AX = 0.
Proof. If AU = 0, then A(X0 +U) = AX0 +AU = B+0 = B, so that X0 +U
is a solution of AX = B.
Conversely, let X1 be an arbitrary solution of AX = B, so that AX1 = B.
Chapter 5 Algebraic Structures 294
Then AX0 = AX1 = B gives that A(X1 −X0) = 0. Setting X1 −X0 = U ,
we get, X1 = X0 + U , and AU = 0.
As in the case of homogeneous linear equations, the set of solutions of
AX = B remains unchanged when we perform any finite number of elemen-
tary row transformations on A, provided we take care to perform simulta-
neously the same operations on the matrix B on the right. As before, we
row-reduce A to its echelon form A0 and the equation AX = B gets trans-
formed into its equivalent equation A∗X0 = B0.
5.32 LUP Decomposition
Definition 5.32.1:
By an LUP decomposition of a square matrix A we mean an equation of the
form
PA = LU (5.5)
where P is a permutation matrix, L, a unit-triangular matrix and U , an
upper triangular matrix. (See Chapter 1 for definitions). Suppose we have
determined matrices P,L and U so that equation (5.5) holds. The equation
AX = b is equivalent to (PA)X = Pb in that both have the same set of
solutions X. This is because P−1 exists and hence (PA)X = Pb is same as
AX = b. Now set Pb = b′. Then PAX = Pb gives
LUX = b′. (5.6)
Hence if we set UX = Y (a column vector), then (5.6) becomes LY = b′. We
know from Section 5.31 how to solve LY = b′. A solution Y of this equation,
when substituted in UX = Y gives X again by the same method.
Chapter 5 Algebraic Structures 295
5.32.1 Computing an LU Decomposition
We first consider the case when A = (aij) is a nonsingular matrix of order
n. We begin by obtaining an LU decomposition for A; that is, an LUP
decomposition with P = In in (5.5). The process by which we obtain the LU
decomposition for A is known as Gaussian elimination .
Assume that a11 6= 0. We write
A =
a11 a12 . . . a1n
a21 a22 . . . a2n...
......
an1 an2 . . . ann
=
a11 a12 . . . a1n
a21... A′
an1
,
where A′ is a square matrix of order n− 1. We can now factor A as
1 0 . . . 0a21/a11
... In−1
an1/a11
a11 a12 . . . a1n
0... A′ − vwt
a110
where v =
[ a21
...an1
]and wt = (a12 . . . a1n). Note that vwt is also a matrix of
order n− 1. The matrix A1 = A′− vwt
a11
is called the Schur complement of A
with respect to a11.
We now recursively find an LU decomposition of A. If we assume that
A1 = L′U ′, where L′ is unit lower-triangular and U ′ is upper-triangular, then
A =
1 0
v/a11 In−1
a11 wt
0 A1
=
1 0
v/a11 In−1
a11 wt
0 L′U ′
=
1 0
v/a11 L′
a11 wt
0 U ′
Chapter 5 Algebraic Structures 296
= LU, (5.7)
where L =
1 0
v/a11 L′
, and U =
a11 wt
0 U ′
. The validity of the two middle
equations on the right of (5.7) can be verified by routine block multiplication
of matrices (See ???). This method is based on the supposition that a11
and all the leading entries of the successive Schur complements are all non-
zero. If a11 is zero, we interchange the first row of A with a subsequent row
having a non-zero first entry. This amounts to premultiplying both sides
by the corresponding permutation matrix P yielding the matrix PA on the
left. We now proceed as with the case when a11 6= 0. If a leading entry of
a subsequent Schur complement is zero, once again we make interchanges
of rows—not just the rows of the relevant Schur complement but the full
rows got from A. This again amounts to premultiplication by a permutation
matrix. Since any product of permutation matrices is a permutation matrix,
this process finally ends up with a matrix P ′A, where P ′ is a permutation
matrix of order n.
We now present two examples, one to obtain the LU decomposition when
it is possible and another to determine the LUP decomposition.
Example 5.32.2:
Find the LU decomposition of A =
[2 3 1 24 7 4 72 7 13 166 10 13 15
].
Here a11 = 2, v =[
426
], wt = [3, 1, 2].
Therefore v/a11 =[
213
], and so vwt/a11 =
[6 2 43 1 29 3 6
], where wt denotes the
transpose of w.
Hence the Schur complement of A is A1 =[
7 4 77 13 1610 13 15
]−[
6 2 43 1 29 3 6
]=[
1 2 34 12 141 10 9
].
Now the Schur complement of A1 is
Chapter 5 Algebraic Structures 297
A2 = [ 12 1410 9 ]− [ 4
1 ] [ 2, 3 ] = [ 12 1410 9 ]− [ 8 12
2 3 ] = [ 4 28 6 ].
This gives the Schur complement of A2 as
A3 = (6)− (2)(2) = (2)
= (1)(2) = L3U3,
where L3 = (1) is lower unit triangular and U3 = (2) is upper triangular.
Tracing back we get,
A2 =
[1 0
v2/a11 L3
] [4 wt
20 U3
]
=[
1 02 1
] [4 20 2
]= L2U2.
This gives,
A1 =
[1 0 04 1 01 2 1
][1 2 30 4 20 0 2
].
Consequently,
A =
1 0 0 02 1 0 01 4 1 03 1 2 1
2 3 1 20 1 2 30 0 4 20 0 0 2
= LU,
where
L =
1 0 0 02 1 0 01 4 1 03 1 2 1
is unit lower-triangular (see Ch. 1 for definition), and
U =
2 3 1 20 1 2 30 0 4 20 0 0 2
is upper-triangular.
Example 5.32.3:
Find the LUP decomposition of A =
2 3 1 24 6 4 72 7 13 166 10 13 15
.
Chapter 5 Algebraic Structures 298
Suppose we proceed as before: The Schur complement of A is
A1 =
[6 4 77 13 1610 13 15
]−[
213
][3 1 2] =
[6 4 77 13 1610 13 15
]−[
6 2 43 1 29 3 6
]=
[0 2 34 12 141 10 9
].
Since the leading entry is zero, we interchange the first row of A1 with some
other row. Suppose we interchange the first and third rows of A1. This
amounts to considering the matrix PA instead of A1, where P =
[1 0 0 00 0 0 10 0 1 00 1 0 0
].
Note that the first row of A2 corresponds to the second row of A and the last
row of A2 to the fourth row of A. This means that the Schur complement
of PA (instead of A) is A′1 =
[1 10 94 12 140 2 3
]. We now proceed with A′
1 as before.
The Schur complement of A′1 is
A2 =[12 142 3
]−[40
][10 9] =
[12 142 3
]−[40 360 0
]=[−28 −22
2 3
].
The Schur complement of A2 is
A3 = 3−[
2−28
][−22] = 3− 11
7=
10
7= (1)(10/7) = L3U3,
where L3 = [1] and U3 =
[10
7
]. Hence
A2 =
[1 02
−281
] [−28 −220 10
7
].
This gives
A′1 =
[1 0 04 1 00 −1
141
][1 10 90 −28 −220 0 10
7
].
Thus
1 0 0 03 1 0 01 4 1 02 0 −1
141
2 3 1 20 1 10 90 0 −28 −220 0 0 10
7
= LU,
where L and U are the first and second matrices in the product. Notice that
we have interchanged the second and fourth rows of A while computing L.
Chapter 5 Algebraic Structures 299
We have assumed, to start with, that A is a nonsingular matrix. Even if
A is invertible, it is possible that A has no LU decomposition. For example,
the nonsingular matrix[0 11 0
]has no LU decomposition. If A is singular it
is possible that not only a column of a Schur complement but even the full
column corresponding to it may be zero. In that case, we have to interchange
columns. This would result in a matrix of the form AP rather than PA.
It is also possible that we get the form P1AP2 where P1 and P2 are both
permutation matrices.
Example 5.32.4:
Solve the system of linear equations
2X1 + 3X2 +X3 + 2X4 = 10
4X1 + 6X2 + 4X3 + 7X4 = 25 (5.8)
2X1 + 7X2 + 13X3 + 16X4 = 40
6X1 + 10X2 + 13X3 + 15X4 = 50.
These equations are equivalent to AX = B, where A is the matrix of
Example 5.32.3 and B =
[10254050
]. If we interchange any pair of rows of (A|B),
it amounts to interchanging the corresponding equations. However, this will
in no way alter the solution. Hence the solutions of (5.8) are the same
as solutions of PAX = PB, where P is the permutation matrix obtained
in Example 5.32.3. Thus the solutions are the same as the solutions of
LUX = B′, where L and M are again those obtained in Example 5.32.3
and B′ is got from B by interchanging the second and fourth entries.
Now set UX = Y so that the given equations become LY = B′, where
Chapter 5 Algebraic Structures 300
Y =
[ Y1Y2Y3Y4
]. This gives
1 0 0 03 1 0 01 4 1 0
2 0−1
141
Y1Y2Y3Y4
=
10504025
.
These are equivalent to
Y1 = 10
3Y1 + Y2 = 50
Y1 + 4Y2 +Y3 = 40
2Y1 −1
14Y3+Y4= 25,
and we get Y1 = 10, Y2 = 20, Y3 = −50 and Y4 = 10/7. Substituting these
values in UX = Y , we get
2 3 1 20 1 10 90 0 −28 −22
0 0 010
7
X1X2X3X4
=
1020−5010
7
.
These give
2X1 + 3X2 +X3 + 2X4 = 10
X2 + 10X3 + 9X4 = 20
−28X3 − 22X4 = −50
10
7X4 =
10
7
Solving backward, we get X1 = 2, X2 = X3 = X4 = 1.
Chapter 5 Algebraic Structures 301
5.33 Exercises
1. Examine if the following equations are consistent.
X1 +X2 +X3 +X4 = 0
2X1 −X2 + 3X3 + 4X4 = 1
3X1 + 4X3 + 5X4 = 2
2. Solve the system of homogeneous linear equations:
4X1 + 4X2 + 3X3 − 5X4 = 0
X1 +X2 + 2X3 − 3X4 = 0
2X1 + 2X2 −X3 +X4 = 0
X1 +X2 + 2X3 − 3X4 = 0
Show that the solution space is of dimension 2.
3. Solve:
X1 +X2 +X3 +X4 = 0
X1 + 3X2 + 2X3 + 4X4 = 0
2X1 +X3 −X4 = 0
4. Solve by using LUP decomposition:
(i)
2X1 + 3X2 − 5X3 + 4X4 = 8
3X1 +X2 − 4X3 + 5X4 = 10
7X1 + 3X2 − 2X3 +X4 = 10
Chapter 5 Algebraic Structures 302
4X1 +X2 −X3 + 3X4 = 10
(ii)
3X1 − 2X2 +X3 = 7
X1 +X2 +X3 = 12
−X1 + 4X2 −X3 = 3
(iii)
2X1 + 4X2 − 5X3 +X4 = 8
4X1 +−5X3 −X4 = 16
−4X1 + 2X2 +X4 = −5
6X1 + 4X2 − 10X3 + 7X4 = 13
5.34 Finite Fields
In this section, we discuss the basic properties of finite fields. Finite fields
are fundamental to the study of codes and cryptography.
Recall that a field F is finite if |F | is finite. |F | is the order of F . The
characteristic of a finite field F , as seen in Section 5.21, is a prime number
and the prime field P of F is a field of p elements. P consists of the p
elements 1F , 2 · 1F = 1F + 1F , . . . , p · 1F = 0F . Clearly, F is a vector space
over P . If the dimension of F over P is n, then n is finite. Hence F has a
basis u1, . . . , un of n elements over P . This means that each element v ∈ F
is a unique linear combination of u1, . . . , un, say,
v = α1u1 + α2u2 + · · ·+ αnun, αi ∈ P, 1 ≤ i ≤ n.
Chapter 5 Algebraic Structures 303
For each i, αi can take |P | = p values, and so there are p ·p · · · (n times) = pn
distinct elements in F . Thus we have proved the following result.
Theorem 5.34.1:
The order of a finite field is a power of a prime number.
Finite fields are known as Galois fields after the French mathematician
Evariste Galois (1811–1832) who first studied them. A finite field of order q
is denoted by GF (q).
We now look at the converse of Theorem 5.34.1. Given a prime power pn
(where p is a prime), does there exist a field of order pn? The answer to this
question is in the affirmative. We give below two different constructions that
yield a field of order pn.
Theorem 5.34.2:
Given pn (where p is a prime), there exists a field of pn elements.
Construction 1: Consider the polynomial Xpn − X ∈ Zp[X] of degree pn.
(Recall that Zp[X] stands for the ring of polynomials in X with coefficients
from the field Zp of p elements). The derivative of this polynomial is
pnXpn−1 − 1 = −1 ∈ Zp[X],
and is therefore relatively prime to it. Hence the pn roots of Xpn − X are
all distinct. (Here, though no concept of the limit is involved, the notion of
the derivative has been employed as though it is a real polynomial). It is
known [28] that the roots of this polynomial lie in an extension field K ⊃ Zp.
Chapter 5 Algebraic Structures 304
K is also of characteristic p. If a and b any two roots of Xpn −X, then
apn
= a, and bpn
= b.
Now by Theorem 5.21.3,
(a± b)pn
= apn ± bpn
,
and, by the commutativity of multiplication in K,
apn
bpn
= (ab)pn
,
and so a± b and ab are also roots of Xpn −X. Moreover, if a is a non-zero
root of Xpn − X, then so is a−1 since (a−1)pn
= (apn
)−1 = a−1. Also the
associative and distributive laws are valid for the set of roots since they are
all elements of the field K. Finally 0 and 1 are also roots. In other words,
the pn roots of Xpn −X ∈ Zp[X] form a field of order pn.
Construction 2: Let f(X) = Xn+a1Xn−1+· · · an ∈ Zp[X] be a polynomial of
degree n irreducible over Zp
(The existence of such an irreducible polynomial
(with leading coefficient 1) of degree n is guaranteed by a result (see [5]) in
Algebra). Let F denote the ring of polynomials in Zp[X] reduced modulo
f(X) (that is, if g(X) ∈ Zp[X], divide g(X) by f(X) and take the remainder
g1(X) which is 0 or of degree less than n). Then every non-zero polynomial
in F is a polynomial of Zp[X] of degree at most n−1. Moreover, if a0Xn−1 +
· · ·+ an and b0Xn−1 + · · ·+ bn are two polynomials in F of degrees at most
n− 1, and if they are equal, then,
(a0 − b0)Xn−1 + · · ·+ (an − bn)
is the zero polynomial of F , and hence is a multiple of f(X) in Zp[X]. As
degree of f is n, this is possible only if ai = bi, 0 ≤ i ≤ n. Now if a0 + a1X +
Chapter 5 Algebraic Structures 305
· · · + an−1Xn−1 is any polynomial of F , ai ∈ Zp and hence has p choices.
Hence the number of polynomials of the form
a0Xn−1 + · · ·+ an ∈ Zp[X]
is pn.
We now show that F is a field. Clearly, F is a commutative ring with
unit element 1(= 0 ·Xn−1 + · · ·+ 0 ·X + 1). Hence we need only verify that
if a(X) ∈ F is not zero, then there exists b(X) ∈ F with a(X)b(X) = 1. As
a(X) 6= 0, and f(X) is irreducible over Zp, the gcd (a(X), f(X)) = 1. So by
Euclidean algorithm (Section 3.3), there exist polynomials C(X) and g(X)
in Zp[X] such that
a(X)C(X) + f(X)g(X) = 1 (5.9)
in Zp[X]. Now there exists C1(X) ∈ F with C1(X) ≡ C(X)( mod f(X)).
This means that there exist a polynomial h(X) in Zp[X] with C(X) −
C1(X) = h(X)f(X), and hence C(X) = C1(X) + h(X)f(X). Substitut-
ing this in (1) and taking modulo f(X), we get, a(X)C1(X) = 1 in F . Hence
a(X) has C1(X) as inverse in F . Thus every non-zero element of F has a
multiplicative inverse in F , and so F is a field of pn elements.
We have constructed a field of pn elements in two different ways—one,
as the field of roots of the polynomial Xpn − X ∈ Zp[X], and the other, as
the field of polynomials in Zp[X] reduced modulo the irreducible polynomial
f(X) of degree n over ZP . Essentially, there is not much of a difference
between the two constructions, as our next theorem shows.
Theorem 5.34.3:
Any two finite fields of the same order are isomorphic under a field isomor-
Chapter 5 Algebraic Structures 306
phism.
Example 5.34.4:
Take p = 2 and n = 3. The polynomial X3 +X + 1 of degree 3 is irreducible
over Z2. (If it is reducible, one of the factors must be of degree 1, and it
must be either X or X + 1 = X − 1 ∈ Z2[X]. But 0 and 1 are not roots
of X3 + X + 1 ∈ Z2[X]). The 23 = 8 polynomials over Z2 reduced modulo
X3 +X + 1 are:
0, 1, X,X + 1, X2, X2 + 1, X2 +X,X2 +X + 1
and they form a field. (Note that X3 = X+1, X3 +X = 1 and X3 +X = 0).
We have, for instance, (X2 +X + 1) + (X + 1) = X2 and (X2 + 1)(X + 1) =
X3 +X2 +X + 1 = X2. Also (X + 1)2 = X2 + 1.
We know that if F is a field, the set F ∗ of non-zero elements of F is a
group. In the case when F is a finite field, F ∗ has an additional algebraic
structure.
Theorem 5.34.5:
If F is a finite field, F ∗ (the set of non-zero elements of F ) is a cyclic group.
Proof. We know (as F is a field), F ∗ is a group. Hence we need only show
that F ∗ is generated by a single element.
Let α be an element of the group F ∗ of maximum order, say, k. Necessar-
ily, k ≤ q−1, where q = |F |. Choose β ∈ F ∗, β 6= α, 1. Let o(β) (= the order
of β) = l. Then l > 1. We first show that l|k. Now o(β(k,l)) =l
(k, l), where
(k, l) denotes the gcd of k and l. Further, as(0(α), o(β(k,l))
)=
(k,
l
(k, l)
)=
Chapter 5 Algebraic Structures 307
1, we have o(αβ(k,l)) = o(α) · o(β(k,l)) = k · l
(k, l)= [k, l], the lcm of k and l.
But, by our choice, the maximum order of any element of F ∗ is k. Therefore
[k, l] = k which implies that l|k. But l = o(β). Therefore βk = 1. Thus for
each of the q− 1 elements of x of F ∗, xk = 1 and so is a root of xk − 1. This
means that, as |F ∗| = q − 1, k = q − 1. Thus o(α) = q − 1 and so F ∗ is the
cyclic group generated by α.
Definition 5.34.6: (i) A primitive element of a finite field F is a generator
of the cyclic group F ∗.
(ii) A monic polynomial is a polynomial with leading coefficient 1. For
example, X2 + 2X + 1 ∈ R[X] is monic while 2X2 + 1 is not.
(iii) Let F be a finite field with prime field P . A primitive polynomial in
F [X] over P is the minimal polynomial in P [X] of a primitive element
of F . A minimal polynomial in P [X] of an element α ∈ F is a monic
polynomial of least degree in P [X] having α as a root.
Clearly the minimal polynomial of any element of F in P [X] is irreducible
over P .
Let α be a primitive element of F , and f(X) = Xn +a1Xn−1 + · · ·+an ∈
P[X] be the primitive polynomial of α. Then any polynomial in P [α] of
degree n or more can be reduced to a polynomial in P [α] of degree at most
n− 1. Moreover, no two distinct polynomials of P [α] of degree at most n− 1
can be equal; otherwise α would be a root of a polynomial of degree less than
n over P . Hence all the polynomials of the form
a0 + a1α + · · ·+ an−1αn−1, ai ∈ P
Chapter 5 Algebraic Structures 308
in P [α] are all distinct and |P [α]| = pn where p = |P |. These pn elements
constitute a subfield F ′ of F and α ∈ F ′. But then F ⊆ F ′ and hence
F = F ′. Thus |F | = |F ′| = pn. As α is a primitive element of F this means
that
F =
0; α, α2, . . . , αpn−1 = 1.
Example 5.34.7:
Consider the polynomial X4 + X + 1 ∈ Z2[X]. This is irreducible over Z2
(Check that it can have no linear or quadratic factor in Z2[X]). Let α be a
root (in an extension field of Z2) of this polynomial so that α4 + α + 1 = 0.
This means that α4 = α + 1.
We now prove that α is a primitive element of a field of 16 elements over
Z2 by checking that the 15 powers α, α2, . . . , α15 are all distinct and that
α15 = 1. Indeed, we have
α1 = α
α2 = α2
α3 = α3
α4 = α + 1
α5 = αα4 = α(α + 1) = α2 + α
α6 = αα5 = α3 + α2
α7 = αα6 = α4 + α3 = α3 + α + 1
α8 = αα7 = α4 + (α2 + α) = (α + 1) + (α2 + α) = α2 + 1
α9 = αα8 = α3 + α
α10 = αα9 = α4 + α2 = α2 + α + 1
Chapter 5 Algebraic Structures 309
α11 = αα10 = α3 + α2 + α
α12 = αα11 = α4 + (α3 + α2) = (α + 1) + (α3 + α2) = α3 + α2 + α + 1
α13 = αα12 = α4 + (α3 + α2 + α) = (α + 1) + (α3 + α2 + α)
= α3 + α2 + 1
α14 = αα13 = α4 + (α3 + α) = (α + 1) + (α3 + α) = α3 + 1
α15 = αα14 = α4 + α = (α + 1) + α = 1
Thus F = 0 ∪F ∗ = 0, α, α2, . . . , α15 = 1 and so α is a primitive element
of F = GF (24).
We observe that a polynomial irreducible over a field F need not be
primitive over F . For instance, the polynomial f(X) = X4+X3+X2+X+1 ∈
Z2[X] is irreducible over Z2 but it is not primitive. To check that f(X) is
irreducible, verify that F (X) has no linear or quadratic factor over Z2. Next,
for any root α of f(X), check that α5 = 1 so that o(α) < 15, and f(X) is not
primitive over Z2.(Recall that if f(X) were a primitive polynomial, some
root of f(X) should be a primitive element of GF (24)).
5.35 Factorization of Polynomials over Finite
Fields
Let α be a primitive element of the finite field F = GF (pn), where p is a
prime. Then F ∗ =α, α2, . . . , αpn−1 = 1
, and for any x ∈ F , xpn
= x.
Hence for each i, 1 ≤ i ≤ pn − 1,
(αi)pn
= αi.
Chapter 5 Algebraic Structures 310
This shows that there exists a least positive integer t such that αi·pt+1= αi.
Then set
Ci =i, pi, p2i, . . . , pti
, 0 ≤ i ≤ pn − 1.
The sets Ci are called the cyclotomic cosets modulo p defined with respect
to F and α. Now, corresponding to the coset Ci, 0 ≤ i ≤ pn − 1, consider
the polynomial
fi(X) = (X − αi)(X − αi·p)(X − αi·p2
) · · · (X − αi·pt
).
The coefficients of fi are elementary symmetric functions of αi, αip, . . . , αipt
and if β denotes any of these coefficients, then β satisfies the relation βp = β.
Hence β ∈ Zp and fi(X) ∈ Zp[X] for each i, 0 ≤ i ≤ pn − 1. Each element
of Ci determines the same cyclotomic coset, that is, Ci = Cip = Cip2 = · · · =
Cipt . Moreover, if j /∈ Ci, Ci ∩Cj = φ. This gives a factorization of Xpn −X
into irreducible factors over Zp. In fact, Xpn −X = X(Xpn−1 − 1), and
Xpn−1 − 1 = (x− α)(X − α2) · · · (X − αpn−1)
=∏
i
(∏
j∈Ci
(X − αj)
),
where the first product is taken over all the distinct cyclotomic cosets. What
is more, each polynomial fi(X) is irreducible over Zp as shown below. To see
this, assume that
g(X) = a0 + a1X + · · ·+ akXk ∈ F [X].
Then(g(X)
)p= ap
0 + ap1X
p + · · ·+ apk(Xk)p
= a0 + a1Xp + · · ·+ akX
kp
= g(Xp).
Chapter 5 Algebraic Structures 311
Consequently, if β is a root of g, g(β) = 0, and therefore 0 = (g(β))p = g (βp),
that is, βp is also a root of g(X). Hence if j ∈ Ci and αj is a root of fi(X),
then all the powers αk, k ∈ Ci, are roots of fi(X). Hence any non-constant
irreducible factor of fi(X) over Zp must contain all the terms (X−αj), j ∈ Ci
as factors. In other words, g(X) is irreducible over Zp.
Thus the determination of the cyclotomic cosets yields a simple device to
factorize Xpn −X into irreducible factors over ZP . We illustrate this fact by
an example.
Example 5.35.1:
We factorize X24 −X into irreducible factors over Z2.
Let α be a primitive element of the field GF (24). As a primitive polyno-
mial of degree 4 over Z2 having α as a root, we can take (See Example 5.34.7)
X4 +X + 1.
The cyclotomic cosets modulo 2 w.r.t. GF (24) and α are:
C0 = 0
C1 =
1, 2, 22 = 4, 23 = 8
(Note: 24 = 16 ≡ 1( mod 15))
C3 = 3, 6, 12, 9
C5 = 5, 10
C7 = 7, 14, 13, 11 .
Note that C2 = C1 = C4, and so on. Thus
X16 −X = X(X15 − 1) = X15∏
i=1
(X − αi
)
= X(X − α0
)[∏
i∈C1
(X − αi
)] [∏
i∈C3
(X − αi
)]
Chapter 5 Algebraic Structures 312
×[∏
i∈C5
(X − αi
)] [∏
i∈C7
(X − αi
)]
= X (X + 1)(X4 +X + 1
) (X4 +X3 +X2 +X + 1
)
×(X2 +X + 1
) (X4 +X3 + 1
). (5.10)
In computing the products, we have used the relation α4 + α + 1 = 0, that
is, α4 = α + 1. Hence, for instance,
∏
i∈C5
(X − αi
)=(X − α5
) (X − α10
)
= X2 −(α5 + α10
)X + α15
= X2 +[(α2 + α
)+(α2 + α + 1
)]X + α15
= X2 +X + 1.
The six factors on the right of Equation (5.10) are all irreducible over Z2. The
minimal polynomials of α, α3 and α7 are all of degree 4 over Z2. However,
while α and α7 are primitive elements of GF (24) (so that the polynomials
X4+X+1 and X4+X3+1 are primitive), α3 is not (even though its minimal
polynomial is also of degree 4).
Primitive polynomials ???? are listed in [44].
5.35.1 Exercises
1. Construct the following fields:
GF (24), GF (25) and GF (32).
2. Show that GF (25) has no GF (23) as a subfield
3. Factorize X23+X and X25
+X over Z2.
Chapter 5 Algebraic Structures 313
4. Factorize X32 −X over Z3.
5. Using Theorem 5.34.5, prove Fermat’s little Theorem that for any prime
p, ap−1 ≡ 1 ( mod p), for a 6≡ 0 ( mod p).
5.36 Mutually Orthogonal Latin Squares [MOLS]
In this section, we show, as an application of finite fields, the existence of
n− 1 mutually orthogonal latin squares of order n.
A latin square of order n is a double array L of n rows and n columns in
which the entries belong to a set S of n elements such that no two entries
of the same row or column of L are equal. Usually, we take S to be the set
1, 2, . . . , n but this is not always essential.
For example, [ 1 22 1 ] and
[1 2 32 3 13 1 2
]are latin squares of orders 2 and 3 re-
spectively. Let L1 = (aij), and L2 = (bij) be two latin squares of order n
with entries in S. We say that L1 and L2 are orthogonal latin squares if
the n2 ordered pairs (aij, bij) are all distinct. For example, L1 =[
1 2 32 3 13 1 2
],
and L2 =[
1 2 33 1 22 3 1
]are orthogonal latin squares of order 3 since the nine or-
dered pairs (1, 1), (2, 2), (3, 3); (2, 3), (3, 1), (1, 2); (3, 2), (1, 3), (2, 1) are all
distinct. However if M1 = [ 1 22 1 ] and M2 = [ 2 1
1 2 ], then the 4 ordered pairs
(1, 2), (2, 1), (2, 1) and (1, 2) are not all distinct. Hence M1 and M2 are not
orthogonal. The study of orthogonal latin squares started with Euler, who
had proposed the following problem of 36 officers. The problem asks for
an arrangement of 36 officers of 6 ranks and from 6 regiments in a square
formation of size 6 by 6. Each row and column of this arrangement are to
contain only one officer of each rank and only one officer from each regiment.
Chapter 5 Algebraic Structures 314
We label the ranks and the regiments from 1 through 6, and assign to each
officer an ordered pair of integers in 1 through 6. The first component of the
ordered pair corresponds to the rank of the officer and the second component
his regiment. Euler’s problem then reduces to finding a pair of orthogonal
latin squares of order 6. Euler conjectured in 1782 that there exists no pair
of orthogonal latin squares of order n ≡ 2( mod 4). Euler himself verified
the conjecture for n = 2, while Tarry in 1900 verified it for n = 6 by a sys-
tematic case by case analysis. But the most significant result with regard to
the Euler conjecture came from Bose, Shrikande and Parker who disproved
the conjecture by establishing that if n ≡ 2( mod 4) and n > 6, then there
exists a pair of orthogonal latin squares of order n.
A set L1, . . . , Lt of t latin squares of order n on S is called a set of mu-
tually orthogonal latin squares (MOLS) if Li and Lj are orthogonal whenever
i 6= j. It is easy to see [59] that the number t of MOLS of order n is bounded
by n− 1. Further, any set of n− 1 MOLS of order n is known to be equiva-
lent to the existence of a finite projective plane of order n. A long standing
conjecture is that if n is not a prime power, then there exists no complete
set of MOLS of order n.
We now show that if n is a prime power, there exists a set of n−1 MOLS
of order n. (Equivalently, this implies that there exists a projective plane of
any prime power order, though we do not prove this here).
Theorem 5.36.1:
Let n = pk, where p is a prime and k is a positive integer. Then for n ≥ 3,
there exists a complete set of MOLS of order n.
Chapter 5 Algebraic Structures 315
Proof. By Theorem 5.34.2, we know that there exists a finite field GF (pk) =
GF (n) = F , say. Denote the elements of F by a0 = 0, a1 = 1, a2, . . . , an−1.
Define the n− 1 matrices A1, . . . , An−1 of order n by
At = (atij) = (ataij), 0 ≤ i, j ≤ n− 1; and 1 ≤ t ≤ n− 1,
where ataij = atai + aj. The entries atij are all elements of the field F . We
claim that each At is a latin square. Suppose, for instance, two entries of
some i-th row of At, say atij and at
il are equal. This implies that
atai + aj = atai + al,
and hence aj = al. Consequently j = l. Thus all the entries of the i-th row
of At are distinct. For a similar reason, no two entries of the same column of
At are equal Hence At is a latin square.
We next claim that A1, . . . , An−1 is a set of MOLS. Suppose 1 ≤ r <
u ≤ n− 1. Then Ar and Au are orthogonal. For suppose that
(ar
ij, auij
)=(ar
i′j′ , aui′j′
).
This means that
arai + aj = arai′ + aj′ ,
and auai + aj = auai′ + aj′ .
Subtraction gives
(ar − au)ai = (ar − au)a′i
and hence, as ar 6= au, ai = ai′ . Consequently, i = i′ and j = j′. thus Ar and
Au are orthogonal.
Chapter 6
Graph Theory
6.1 Introduction
Graph theory, broadly speaking, studies properties between a given set of
objects with some structure. Informally, a graph is a set of objects called
vertices (or points) connected by links called edges (or lines). A graph is
usually depicted by means of a diagram in the plane in which the vertices
are denoted by points of the plane and edges by lines joining certain pairs of
vertices. Graph theory has its origins in recreational mathematics and its use
as a tool to solve practical problems. Since 1930, graph theory has received
considerable attention as a mathematical discipline. This is due to its wide
range of practical applications in many areas of the society. In fact, in recent
times, its importance has seen phenomenal growth in view of its wide-range
connections to Computer Science.
The earliest known recorded result in graph theory is due to the Swiss
mathematician Leonhard Euler (1701–1783) in 1736. The city of Konigsberg
(now Kaliningrad) has seven bridges linking two islands A and B and the
316
Chapter 6 Graph Theory 317
banks C and D of the Pregal river (later called the Pregolya). The people
of Konigsberg wondered if it was possible to take a stroll across the seven
bridges, crossing each bridge exactly once and returning to the starting point.
Euler showed that it was not possible.
C (land)
D (land)
Pregel river
A (island) B (island)
Figure 6.1:
In 1859, William Rowan Hamilton (1805–1865) discovered the idea of a
“cycle” that visits each vertex of a graph exactly once. Hamilton’s game
was a wooden regular dodecahedron with 20 vertices (labeled by cities). The
objective was to find a cycle traveling along the edges of the solid so that
each city was visited exactly once. Fig. 6.2 below is the planar graph for a
dodecahedron—the required cycle is designated by dark edges. Such a cycle
is now called a Hamilton cycle.
Chapter 6 Graph Theory 318
Figure 6.2:
Trees are special kinds of connected graphs that contain no cycles. In
1847, G. B. Kirchhoff (1824–1877) first used trees in his work on electri-
cal networks. It is also known that Arthur Caley (1821–1895) used trees
systematically in his attempts to enumerate the isomers of the saturated
hydrocarbons.
Many problems in graph theory are easy to pose although their solutions
may require research effort. Perhaps the most famous one is the Four Color
Theorem (FCT) which states that every planar map can be colored using
not more than four colors so that regions sharing a common boundary are
colored different. The final proof of the FCT, given by Kenneth Appel and
Wolfgang Haken in 1976 required 1200 hours of computer time.
The present chapter gives an exposure to some of the basic ideas in graph
theory.
Chapter 6 Graph Theory 319
6.2 Basic definitions and ideas
6.2.1 Types of Graphs
Definition 6.2.1:
A graph G consists of a vertex (or point) set V (G) = v1, . . . , vn and an edge
(or line) set E(G) = e1, . . . , em, where each edge consists of an unordered
pair of vertices. If e is an edge of G, then it is represented by the unordered
pair a, b (denoted by ab when no confusion arises). We call a and b, the
endpoints of e. If an edge e = u, v ∈ E(G), then u and v are said to be
adjacent. A null graph on n vertices, denoted by Nn, has no edges. The
empty graph is denoted by φ and has vertex set φ (and therefore edge set φ).
A loop is an edge whose endpoints are the same. Parallel edges or multiple
edges are edges that have the same pair of endpoints. A simple graph is one
that has no loops and no multiple edges.
The order of a graph G, denoted by n(G) or simply n, is the number of
vertices in V (G). A graph of order 1 is called trivial. The size of a graph G,
denoted by m(G) or simply m, is the number of edges in E(G). If a graph
G has finite order and finite size, then G is said to be a finite graph.
Unless stated otherwise, we consider only simple finite graphs.
It is possible to assign a direction or orientation to each edge in a graph—
we then treat an edge e (with endpoints u and v) as an ordered pair (u, v)
or (v, u) often denoted by−→uv or
−→vu.
Definition 6.2.2:
A directed graph or a digraph G consists of a vertex set V (G) = v1, . . . , vn
Chapter 6 Graph Theory 320
b
b
b
b b
1
4
2
5
3G1
V (G1) = 1, 2, 3, 4, 5E(G1) =
1, 2, 2, 3,4, 5
b
b
b b
bb b
b
4
1
2 3
65
7
G2
V (G2) = 1, 2, 3, 4, 5, 6, 7E(G2) =
(1, 2), (2, 3), (1, 4), (3, 6),(6, 5), (6, 7), (7, 6)
Figure 6.3:
and an arc set A(G) = e1, . . . , em where each arc is an ordered pair of
vertices. A simple digraph is one in which each ordered pair of vertices
occurs at most once as an arc. We indicate an arc as (u, v) or−→uv where u
and v are the endpoints. Note that this is different from the arc−→vu with
same endpoints. If e = (u, v), then u is called the tail and v is called the
head of e.
In depicting a digraph, we assign a direction, marked by an arrow to each
edge. Thus an arc−→uv of a digraph will be a line from u to v with an arrow
in the direction from u to v. Figures 6.3 and 6.4 show some examples of
graphs and digraphs. G1 is a graph where the set of vertices 1,2,3 forms
one component and the set of vertices 4,5 forms another component. We
thus have a graph that is not connected. The connected components of such
a graph may be “studied” separately. G2 is an example digraph.
Consider the graphs G3 and G4. We can make the following correspon-
dence (denoted by the double arrow ↔) between V (G3) and V (G4):
1↔ a, 2↔ b, 3↔ d, 4↔ d
Chapter 6 Graph Theory 321
b
b b
bb
b
b b
bb
5
6
1
2
4
3
8
7
G3
V (G3) = 1, 2, 3, 4, 5, 6, 7, 8E(G3) =
1, 4, 4, 8, 8, 5, 5, 1,1, 2, 5, 6, 8, 7, 4, 3,2, 3, 3, 7, 7, 6, 6, 2
b
b b
bb
b
b b
bb
f
b
e
a
h
d
g
c
G4
V (G4) = a, b, c, d, e, f, g, hE(G4) = ab, bc, cd, da, ef, fg,
gh, he, ae, bf, cg, dh
Figure 6.4:
5↔ e, 6↔ f, 7↔ g, 8↔ h
Letting i, j ∈ 1, 2, 3, 4, 5, 6, 7, 8 and x, y ∈ a, b, c, d, e, f, g, h we can easily
check that if i↔ x and j ↔ y then ij ∈ E(G3) if and only if xy ∈ E(G4). In
other words, G3 and G4 are the “same graph”. More formally we have the
following definition.
Definition 6.2.3:
Two simple graphs G and H are isomorphic if there is a bijection φ : V (G)→
V (H) such that u, v ∈ E(G) if and only if φ(u), φ(v) ∈ E(H).
Exercise 6.2.4:
Show that the Petersen graph P (most commonly drawn as G1 below) is
isomorphic to the graphs G2, G3 and G4 below:
Chapter 6 Graph Theory 322
b b
b
b
b
b
b
b
bb
b
b
b
b
b
b
bb
b
b
G1
b b
b
b
b
b
b
bb
b
b
bb
b
G2
b
b
b
b
b
b
b
b
b
b
b b
b
b b
b
b
b
b
b
G3
b
b
b
b
b
b b
b
b
b
b
b
G4
Figure 6.5:
Exercise 6.2.5:
Show that the Petersen graph is isomorphic to the following graph Q: The
vertex set V (Q) is the set of unordered pairs of numbers (i, j), i 6= j, 1 ≤
i, j ≤ 5. Two vertices i, j and k, l (i, j, k, l ∈ 1, 2, . . . , 5) form an edge
if and only if i, j ∩ k, l = φ.
Example 6.2.6:
Let V be a set cardinality n (vertices). From V we can obtain(
n2
)= n(n−1)
2
unordered pairs (these can be treated as possible edges). Each subset of these
ordered pairs defines a simple graph and hence there are 2(n
2) simple graphs
with vertex set V .
If |V | = 4, then there are 64 simple graphs on four vertices. However,
they fall into only eleven isomorphism classes as shown in Fig 6.6.
Chapter 6 Graph Theory 323
b
b b
b
G1
b
b b
b
G2
b
b b
b
G3
b
b b
b
G4
b
b b
b
G5
b
b b
b
G6
b
b b
b
G10
b
b b
b
G11
b
b b
b
G9
b
b b
b
G8
b
b b
b
G7
Figure 6.6:
Consider the pairs Gi and G12−i in Fig 6.6.
We see that G12−i is obtained from Gi by removing its edge(s) and intro-
ducing all edges not in Gi. This is the idea of complementation of a graph.
Note that G6 is isomorphic to its complement (formal definition follows).
While there are only 11 non-isomorphic simple graphs on six vertices,
there are as many as 1044 non-isomorphic simple graphs on seven vertices!
When the order and the size of graphs are small (“small graphs”), it is usually
easy to check for isomorphism. However, the problem of deciding whether
two given graphs are isomorphic or not, is difficult in general.
Definition 6.2.7:
The complement G of a simple graph G is the simple graph with vertex set
V (G) = V (G) and edge set E(G) defined thus: uv ∈ E(G) if and only if
uv /∈ E(G). That is, E(G) = uv|uv /∈ E(G)
Example 6.2.8:
The following graph G is isomorphic to its complement G:
Chapter 6 Graph Theory 324
b
b b
bb
b
b b
bb
b
b
1
4 3
2
5
8 7
6
G
b
b
bb
b
b
b
bb
b
b1
4 3
2
5
8 7
6
GFigure 6.7:
The mapping for the isomorphism is:
1→ 4, 2→ 6 3→ 1 4→ 7 5→ 2 6→ 8 7→ 3 8→ 5
Exercise 6.2.9:
Show that two graphs G and H are isomorphic if and only if G and H are
isomorphic.
Definition 6.2.10:
A simple graph is called self-complementary if it is isomorphic to its own
complement.
The complement of the null graph Nn is a graph with n vertices in which
every distinct pair of vertices is an edge. Such a graph is called a complete
graph or a clique. The complete graph on n vertices is denoted by Kn. It
easily follows that Kn has 12n(n− 1) edges.
Exercise 6.2.11:
The line-graph L(G) of a given graph G =(V (G), E(G)
)is the simple graph
whose vertices are the edges of G with ef ∈ E(L(G)
)if and only if two edges
Chapter 6 Graph Theory 325
e and f in G have a common endpoint. Prove that the Petersen graph is the
complement of the line-graph of K5.
Exercise 6.2.12:
Prove that if G is a self-complementary graph with n vertices, then n is either
4t or 4t+ 1, for some integer t (Hint: consider the number of edges in Kn)
A subgraph of a graph G is a graph H such that V (H) ⊆ V (G) and
E(H) ⊆ E(G). In notation, we write H ⊆ G and say that “H is a subgraph
of G”. For S ⊆ V (G), the induced subgraph G[S] or < S > of G is a
subgraph H of G such that V (H) = S and E(H) contains all edges of G
whose endpoints are in S (see Fig 6.8).
b
b
b b
b
b
b
b
G
b
b
b b
b
b
G1
b
b
b b
b
b
b
b
b
G2
Figure 6.8: G1 is an induced subgraph of G; G2 is not
Note that a complete graph may have many subgraphs that are not cliques
but every induced subgraphs of a complete graph is a clique.
The components of a graph G are its maximal connected subgraphs. A
component is nontrivial if it contains an edge.
An independent set of a graph G is a vertex subset S ⊆ V (G) such that
no two vertices of S are adjacent in G. It is easy to check that a clique of G
is an independent set of G and vice versa.
We next introduce a special class of graphs, called bipartite graphs. A
Chapter 6 Graph Theory 326
graph G is called bipartite if V (G) can be partitioned into two subsets X and
Y such that each edge of G has one endpoint in X and the other in Y . We
express this by writing G = G(X,Y ).
A complete bipartite graph G is a bipartite graph G(X,Y ) whose edge set
consists of all possible pairs of vertices having one endpoint in X and the
other in Y . If X has m vertices and Y has n vertices such a graph is denoted
by Km,n. Note that Km,n is isomorphic to Kn,m. It is easy to see that Km,n
has mn edges.
Example 6.2.13:
The graph of Fig 6.9.(a) is the 3-cube. It is a bipartite graph (though not
complete). Fig. 6.9.(b) is a redrawing of Fig. 6.9.(a) exhibiting the biparti-
tions X = x1, x2, x3, x4 and Y = y1, y2, y3, y4. Graphs of Fig. 6.9.(c) are
isomorphic to the complete bipartite graph K3,3.
b
b b
bb
b
b b
bb
x1
y4 x4
y1
y2
x3 y3
x2
(a)
≃
b
b
b
b
b
b
b
bb
b
b
b
x1
y1 y2 y3 y4
x4x3x2
(b)
b
b
b
b
b
b
b
b
b
b
b
b
bb
b
b b
b
b
b b b
bbb
b
b
(c) Three drawings of the complete bipartite graph K3,3
Figure 6.9:
Chapter 6 Graph Theory 327
Exercise 6.2.14:
Let G be the graph whose vertex set is the set of binary strings of length
n (n ≥ 1) . A vertex x in G is adjacent to vertex y of G if and only if x and
y differ exactly in one position in their binary representation. Prove that G
is a bipartite graph.
We next introduce the notion of a walk and related concepts.
Definition 6.2.15:
A walk of length k in a graphG is a non-null alternating sequence v0 e1 v1 e2 . . . ek vk
of vertices and edges of G (starting and ending with vertices) such that
ei = vi−1vi, for all i. A trail is a walk in which no edge is repeated. A path
is a walk with no repeated vertex. A (u, v)- walk is one whose first vertex
is u and last vertex v (u and v are the end vertices of the walk). A walk or
trail is closed if it has length at least one and has its endvertices the same.
A cycle is a closed path. A cycle on n vertices is denoted by Cn (where the
vertices are unlabeled).
Note that in a simple graph a walk is completely specified by its sequence
of vertices.
We now formally state the notion of connectedness.
Definition 6.2.16:
A graph G is connected if it has a (u, v)-path for each pair u, v ∈ V (G).
Exercise 6.2.17:
Let G be a simple graph. Show that if G is not connected, then its comple-
Chapter 6 Graph Theory 328
ment G is connected.
6.2.2 Two Interesting Applications
We now illustrate how the above concepts can be applied to problems.
(a) A Party Problem
Six people are at a party. Show that there are three people who all know
each other or there are three people who are mutually strangers.
Perhaps, the easiest way to solve the problem is using graph theory. Con-
sider the complete graph K6. We associate the six people with the six vertices
of K6. We color the edges joining two vertices black if the corresponding peo-
ple know each other. If two people do not know each other, we color the edge
joining the corresponding vertices grey. If there are three people who know
(don’t know) each other, then we should have a black (grey) triangle in K6.
Given an assignment of colors to all edges of K6, a subgraph H is called
monochromatic if all edges of H have the same color.
The party problem can now be posed as follows:
If we arbitrarily color the edges of K6 black or grey, then there must be
a monochromatic clique on three vertices.
Let u, v, w, x, y, z be the vertices of K6. An arbitrary vertex, say u in
K6 has degree 5. So when we color the edges incident with u, we must use the
color black or grey at least three times. Without loss of generality, assume
that the three edges are colored black as shown in Fig. 6.10. Let these edges
be uv, ux and uw. If any one of the edges vw, vx or xw is now colored
black, we get the required black triangle. Hence we suppose all these edges
Chapter 6 Graph Theory 329
are colored grey. But then this gives a grey triangle.
b
b
b
b
b
b
u
v w
xy z
Figure 6.10:
We remark that the generalization of the above party problem leads to
Ramsey Theory. However we will not develop the theory in this book.
(b) Proof of Schroder-Bernstein Theorem
An interesting application of bipartite graphs appears in the proof of Schroder-
Bernstein theorem (See Theorem 1.5.6) We begin with a result which we state
without proof: A graph G is bipartite iff no cycle of G is odd.
Let f : X → Y and g : Y → X be 1–1 maps. Form a bipartite graph
G = G(X, Y ) with bipartition (X,Y ). If x ∈ X, set (x, f(x)) ∈ E(G) and
if y ∈ Y , set(y, g(y)
)∈ E(G). The components of G could only be of one
of the following four types (See Fig. 6.11):
b b b b b b b b b b
b b b b b b b b b b
x g(f(x))
f(x) f(g(f(x)
))
d
b b b b b b b b b b
b b b b b b b b b b
x′ = g(y′) x′′ = g(y′′)
y′ y′′ = f(x′) d
Figure 6.11:
Chapter 6 Graph Theory 330
(i) a one-way infinite path starting at a vertex x ∈ X. Such a path is of
the form
x, f(x), g(f(x)
), f(g(f(x)
)), · · · .
(ii) a one-way infinite path starting at a vertex y′ ∈ Y . Such a path is of
the form
y′, g(y′), f(g(y′)
), · · · .
(iii) two-way infinite paths with vertices alternating between X and Y . Such
a path is of the form
. . . , x, y, x′, y′, . . . ,
where x, x′, . . . ∈ X, and y, y′ . . . ∈ Y .
(iv) an even cycle of the form
x1y1x2y2, . . . , xn, ynx1,
where xi ∈ X and yj ∈ Y for each i and j.
It is now easy to set up a bijection φ from X onto Y . If z is a vertex of a
component of type (i), set
φ(z) =
f(z) if z ∈ X
g(z) if z ∈ Y.
A similar statement applies for vertices of a component of type (ii). In
type (iii), define φ(x) = y, φ(x′) = y′ and so on. Finally, in type (iv),
if the component is x1y1x2y2 . . . xnynx1, set φ(xi) = yi, 1 ≤ i ≤ n. Then
φ : X → Y is a bijection from X onto Y . Thus X and Y are equipotent
sets.
Chapter 6 Graph Theory 331
6.2.3 First Theorem of Graph Theory
We next come to the first theorem of graph theory–first we introduce the
notion of degree of a vertex.
An edge e of a graph G is said to be incident with a vertex v if v is an
endvertex of e. We then say, v is incident with e. Two edges e and f having
a common vertex v are said to be adjacent.
Definition 6.2.18:
Given a graph G, let v ∈ V (G). The degree d(v) or dG(v) of v is the number
of edges of G incident with v.
A vertex of degree 1 in a graph G is called an end-vertex of G or a leaf
of G. We denote the maximum degree in a graph G by ∆(G); the minimum
degree is denoted by δ(G). A graph is regular if ∆(G) = δ(G). Also, a
graph is k-regular if ∆(G) = δ(G) = k. It is easy to check that a k-regular
graph with n vertices has nk/2 edges. The complete graph Kn is (n − 1)
regular. The complete bipartite graph Kn,n is n-regular. The Petersen graph
is 3-regular.
A 3-regular simple connected graph is called a cubic graph. It turns out
that cubic graphs on n vertices exist for even values of n. Note that K3,3 is
a cubic graph. Some cubic graphs are shown in Fig. 6.12.
b
b
b
b
b
b b
bb
b b
b
b
b b
b
b
b
b
b
b
b
b
b b
b b
b
bbb
b
b
b
b
b
Figure 6.12:
Chapter 6 Graph Theory 332
Exercise 6.2.19:
Consider any k-regular graph G for odd k. Prove that the number of edges
in G is a multiple of k.
Exercise 6.2.20:
Prove that every 5-regular graph contains a cycle of length at least six.
Theorem 6.2.21 (First Theorem of graph theory or Degree-Sum Formula):
For any graph G,∑
v ∈ V (G)
d(v) = 2m(G)
Proof. When the degrees of all the vertices are summed up, each edge is
counted twice. Hence the result.
By the Degree-Sum Formula, the average vertex degree is 2m(G)/n(G),
where n(G) is the order of G. Here,
δ(G) ≤ 2m(G)/n(G) ≤ ∆(G)
A vertex of a graph is called odd or even depending on whether its degree is
odd or even.
Corollary 6.2.22 (Handshake Lemma):
In any graph G, there is an even number of odd vertices.
Proof. Let A and B be respectively the set of odd and even vertices of G.
Then for each u ∈ B, d(u) is even and so∑u∈B
d(u) is also even. By the
Degree-Sum Formula
∑
u∈B
d(u) +∑
w∈A
d(w) =∑
v∈V (G)
d(v) = 2m(G)
Chapter 6 Graph Theory 333
This gives,∑
w∈A
d(w) = 2m(G)−∑
u∈B
d(v) = an even number
(being the difference of two even numbers)
Hence the result. This can be interpreted thus: the number of participants
at a birthday party each of who shake hands with an odd number of other
participants is always even.
6.3 Representations of Graphs
A graph G = (V (G), E(G)) can be represented as a collection of adjacency
lists or as an adjacency matrix (defined below). For sparse graphs for which
|E(G)| is much less compared to |V (G)|2, the adjacency list representation
is preferred. For dense graphs for which |E(G)| is close to |V (G)|2, the
adjacency matrix representation is good.
Definition 6.3.1:
The adjacency list representation of a graph G =(V (G), E(G)
)consists of
an array Adj of |V (G)|. For each vertex u of G, Adj[u] points to the list of
all vertices v that are adjacent to u.
For a directed graph Adj[u] points to the list of all vertices v such that−→uv
is an arc in E(G). It is easy to see that the adjacency list representation of
a graph (directed or undirected) has the desirable property that the amount
of memory it requires is
O(max(|V |, |E|)) = O(|V |+ |E|)
Given an adjacency list representation of a graph, to determine if an edge
Chapter 6 Graph Theory 334
uv is present in the graph, the only way is to search the list Adj[u] for v.
The process of determining the presence (or absence) of an edge uv is much
simpler in the adjacency-matrix (defined below) representation of a graph.
Definition 6.3.2:
To represent a graph G =(V (G), E(G)
), we first number the vertices of G
by 1, 2, . . . , |V | in some arbitrary manner. The adjacency matrix of G is then
the |V | × |V | matrix A = (aij) when,
aij =
1 if ij ∈ E(G)
0 otherwise.
The above definition applies to directed graphs also where we specify the
elements aij as,
aij =
1 if−→ij ∈ A(G)
0 otherwise.
Note that a graph may have many adjacency lists and adjacency matrices
because the numbering of the vertices is arbitrary. However, all the rep-
resentations yield graphs that are isomorphic. It is then possible to study
properties of graphs that do not dependent on the labels of the vertices.
Theorem 6.3.3:
Let G be a graph with n vertices v1, . . . , vn. Let A be the adjacency matrix
of G with this labeling of vertices. Let Ak be the result of multiplication of
k (a positive integer) copies of A. Then the (i, j)th entry of Ak is the number
of different (vi, vj)-walks in G of length k.
Chapter 6 Graph Theory 335
Proof. We prove the result by induction on k. For k = 1, the theorem follows
from the definition of the adjacency matrix of G, since a walk of length 1
from vi to vj is just the edge vivj.
We now assume that the result is true for Ak−1 (k > 1). Let Ak−1 = (bij)
i.e., bij is the number of different walks of length k − 1 from vi to vj. Let
Ak = (cij). We want to prove that cij is the number of different walks of
length k from vi to vj. By definition, Ak = Ak−1A and by the definition of
matrix multiplication,
cij =n∑
r=1
(i, r)th element ofAk−1
×
(r, j)th element of A
=n∑
r=1
birarj
Now every (vi, vj)-walk of length k consists of a (vi, vr)-walk of length k − 1
for some r followed by the edge vrvj. Now arj = 1 or 0 according as vr is
adjacent to vj or not. Now, by induction hypothesis, the number of (vi, vr)-
walks of length k − 1 is the (i, r)th entry bir of matrix Ak−1. Hence the total
number of (vi, vj)-walks of length k is,n∑
r=1
birarj = cij.
Hence the result is true for Ak.
The next theorem uses the above result to determine whether or not a
graph is connected.
Theorem 6.3.4:
Let G be a graph with n vertices v1, . . . , vn and let A be the adjacency matrix
of G. Let B = (bij) be the matrix given by,
B = A+ A2 + . . .+ An−1.
Chapter 6 Graph Theory 336
Then G is connected if and only if for every pair of distinct indices i, j,
bij 6= 0; that is, G is connected if and only if B has no zero elements off the
main diagonal.
Proof. Let a(k)ij denote the (i, j)th entry of Ak(k = 1. . . . , n − 1). We then
have,
bij = a(1)ij + a
(2)ij + . . .+ a
(n−1)ij
By Theorem 6.3.3, a(k)ij denotes the number of distinct walks of length k from
vi to vj. Thus,
bij = (number of different (vi, vj)-walks of length 1)
+ (number of different (vi, vj)-walks of length 2)
+ · · ·
+ (number of different (vi, vj)-walks of length n− 1)
In other words, bij is the number of different (vi, vj)-walks of length less than
n.
Assume that G is connected. Then for every pair i, j(i 6= j) there is a
path from vi to vj. Since G has only n vertices, any path is of length at most
n− 1. Hence there is a path of length less than n from vi to vj. This implies
that bij 6= 0.
Conversely, assume that bij 6= 0 for every pair i, j(i 6= j). Then from
the above discussion it follows that there is at least one walk of length less
than n, from vi to vj. This holds for every pair i, j(i 6= j) and therefore we
conclude that G is connected.
Exercise 6.3.5:
Let A be the adjacency matrix of a connected graph G with n vertices; is it
Chapter 6 Graph Theory 337
always necessary to compute up to An−1 to conclude that G is connected?
Justify your answer.
Let G be the simple connected graph shown in Fig. 6.13.
b
b b
b
b
1
2 4
3
Figure 6.13:
We label the vertices of G so that the edge 1, 2 is necessarily present
in G. The graph G can now be regarded as a collection of edges from the
lexicographically ordered sequence.
1, 2 1, 3 1, 4 2, 3 2, 4 3, 4
Indicating the presence or absence of an edge in G by 1 or 0 respectively, the
above sequence yields the binary string,
1 0 0 1 1 1
which represents the number 39 in the decimal system. Thus we can uniquely
represent a graph by a number.
6.4 Basic Ideas in Connectivity of Graphs
6.4.1 Some Graph Operations
Let G = (V,E) be a graph. We define graph operations that lead to new
graphs from G.
Chapter 6 Graph Theory 338
(a) Edge addition: Let uv ∈ E(G). Then the graph,
G+ uv =(V (G), E(G) ∪ uv
)
(b) Edge deletion: Let e ∈ E(G). Then the graph
G− e =(V (G), E(G) \ e
)
(c) Vertex deletion: Let v be a vertex of G. Then the graph
G− v =(V (G) \ v, e ∈ E(G)|e not incident with v
)
i.e., we delete the vertex v and all edges having v as an endvertex.
(d) Edge subdivision:
G%e =(V (G) ∪ z,
(E(G) \ xy
)∪xz, zy
)
where e = xy ∈ E(G) and z /∈ V (G) is a new vertex; we “insert a new
vertex z” on the edge xy.
(e) Deletion of a set of vertices: Let W ⊂ V (G). The graph G \W is the
graph obtained from G by deleting all the vertices of W as also each edge
that has at least one vertex in W .
The above operations are illustrated in Fig 6.14.
Chapter 6 Graph Theory 339
b
b
b
b
b bb
b
b
b
b
b
u
w
v
e
w′
G
bz
b
b
b
b
b bb
b
b
b
u
w
v
G+ uv
b
b
b
b
b
b
bb
b
G− eb
b
b
b
b
b
G− w
b
b
b
b
b
b
b
b
b
b
G% eb
b
b
b
b
G \W,where W = w,w′
Figure 6.14:
6.4.2 Vertex Cuts, Edge Cuts and Connectivity
Definition 6.4.1:
In a connected graph G, a subset V ′ ⊆ V (G) of is a vertex cut of G if G−V ′
is disconnected. It is a k-vertex cut if |V ′| = k. V ′ is called a separating set
of vertices of G. A vertex v of a connected graph G is a cut vertex of G if
v is a vertex cut of G.
Chapter 6 Graph Theory 340
Definition 6.4.2:
Let G be a nontrivial graph. Let S be a proper nonempty subset of V . Let
[S, S] denote the set of all edges of G having one endpoint in S and the other
in S. A set of edges of G of the form [S, S] is called an edge cut of G. An
edge e ∈ E(G) is a cut edge of G if e is an edge cut of G. An edge cut
of cardinality k is called a k-edge cut of G. If e is a cut edge of a connected
graph, then G− e has exactly two components.
Example 6.4.3:
Consider the graph in Fig. 6.15.
b b
b b
b b
b
b
b
uv
w x
y z
Figure 6.15:
v and w, x are vertex cuts. The edge subsets,wy, xy
,uv
and
xz are all edge cuts. Vertex v is a cut vertex. Edges uv and xz are cut
edges.
The following theorem characterizes a cut vertex of G.
Theorem 6.4.4:
A vertex v of a connected graph G with at least three vertices is a cut vertex
of G if and only if there exist vertices u and w of G, distinct from v, such
that v is in every (u,w)-path in G.
Proof. If v is a cut vertex G, then G − v is disconnected. Let G1 and G2
be the two components of G \ v. Choose u,w such that u ∈ V (G1) and
Chapter 6 Graph Theory 341
w ∈ V (G2). Then every (u,w)-path in G must contain v as otherwise u and
w would belong to the same component of G− v.
Conversely, assume that the condition of the theorem holds. Then the
deletion of v splits every (u,w)-path in G and hence u and w lie in different
components of G−v. So G−v is disconnected and therefore v is a cut vertex
of G.
The following two theorems characterize a cut edge of a graph.
Theorem 6.4.5:
In a connected graph G, an edge e = uv of is a cut edge of G if and only if
e does not belong to any cycle of G.
Proof. Let e = uv be a cut edge of G and let [S, S] = e be the partition
of V (G) defined by G − e so that u ∈ S and v ∈ S. If e belongs to a
cycle of G then [S, S] must contain at least one more edge contradicting that
e = [S, S]. Hence e cannot belong to a cycle.
Conversely assume that e is not a cut edge of G. Then G−e is connected
and hence there exists a (u, v)-path P in G − e. Then P together with the
edge e forms a cycle in G.
Theorem 6.4.6:
In a connected graph G, an edge e = uv is a cut edge if and only if there
exist vertices x and y such that e belongs to every (x, y)-path in G.
Proof. Let e = uv be a cut edge in G. Then G− e has two components, say
G1 and G2. Let x ∈ V (G1) and y ∈ V (G2). Then there is no (x, y)-path in
G− e. Hence every (x, y)-path in G must contain e.
Chapter 6 Graph Theory 342
Conversely, assume that there exist vertices u and v satisfying the condi-
tion of the theorem. Then there exists no (x, y)-path in G−e and this means
that G− e is disconnected. Hence e is a cut edge of G
Exercise 6.4.7:
Prove or disprove: Let G be a simple connected graph with |V (G)| ≥ 3.
Then G has a cut edge if and only if it has a cut vertex.
6.4.3 Vertex Connectivity and Edge-Connectivity
We next introduce two parameters of a graph which in a way, measure the
“connectedness” of a graph.
Definition 6.4.8:
Let G be a nontrivial connected graph having at least a pair of non-adjacent
vertices. The minimum k for which there exists a k-vertex cut is called the
vertex connectivity or simply the connectivity of G; it is denoted by κ(G).
If G has no pair of nonadjacent vertices, that is if G is a complete graph of
order n, then κ(G) is defined to be n− 1.
Note that the removal of any set of n− 1 vertices of Kn results in a K1.
A subset of vertices or edges of a connected graph G is said to disconnect
the graph if its deletion results in a disconnected graph.
Definition 6.4.9:
The edge connectivity of a connected graph G is the smallest k for which
there exists a k-edge cut ( i.e., an edge cut containing k edges). The edge
connectivity of G is denoted by λ(G).
Chapter 6 Graph Theory 343
If λ(G) is the connectivity of a graph G, then there exists a set of λ(G)
edges whose deletion results in a disconnected graph and no subset of edges
in G of size less than λ(G) has this property. Thus we have the following
definition
Definition 6.4.10:
A graph G is r-connected if κ(G) ≥ r. G is r-edge connected if λ(G) ≥ r.
The parameters κ(G), λ(G) and δ(G) are related by the following inequal-
ities.
Theorem 6.4.11:
For a connected graph G,
κ(G) ≤ λ(G) ≤ δ(G)
Proof. By our definition of κ(G) and λ(G), κ(G) ≥ 1, λ ≥ 1 and δ(G) ≥ 1.
Let E be an edge cut of G with λ(G) edges and let uv ∈ E. For each edge of
E that does not have both u and v as endpoints, remove an endpoint that is
different from u and v. If there are t such edges, at most t vertices would have
been removed. If the resulting graph is disconnected, then κ(G) ≤ t ≤ λ(G).
Otherwise, there will remain a subset of edges of E having u and v as end
vertices, the removal of which will disconnect the graph. Hence additional
removal of one of u or v will disconnect the graph or a trivial graph will result.
In the process, a set of at most t+ 1 vertices would have been removed and
so κ(G) ≤ t+ 1 ≤ λ(G). Finally it is clear that λ(G) ≤ δ(G); in fact, if v is
a vertex of G with dG(v) = δ(G), then the set of δ(G) edges incident with v
forms the edge cut [v, V \ v] of G. Thus λ(G) ≤ δ(G).
Chapter 6 Graph Theory 344
We now characterize 2-connected graphs. Two paths from a vertex u to
another vertex v (u 6= v) are internally disjoint if they have no common
vertex except the vertices u and v.
The following theorem due to Whitney characterizes 2-connected graphs.
Theorem 6.4.12:
A graph G with at least three vertices is 2-connected if and only if there
exists a pair of internally disjoint paths between any pair of distinct vertices.
Proof. For any two distinct vertices u, v in G, assume that G has at least
two internally disjoint (u, v)-paths. Let w be any vertex of G. Then w is
not a cut vertex of G. If not, by Theorem 6.4.12, there exist vertices u and
v of G such that every (u, v)-path in G contains w. Hence w is not a cut
vertex of G and therefore G is 2-connected. Conversely, assume that G is 2-
connected. We apply induction on d(u, v) to prove that G has two internally-
disjoint (u, v)-paths. When d(u, v) = 1, the graph G−uv is connected, since
λ(G) ≥ κ(G) ≥ 2. Any (u, v)-path in G − uv is internally disjoint in G
from the (u, v)-path consisting of the edge uv. Thus u and v are connected
by two internally-disjoint paths in G. Now we apply induction on d(u, v).
Let d(u, v) = k > 1 and assume that G has internally-disjoint (x, y)-paths
whenever 1 ≤ d(x, y) < k. Let w be the vertex appearing before v on a
shortest (u, v)-path. Since d(u,w) = k − 1, by induction hypothesis, G has
two internally disjoint (u,w)-paths, say P and Q. (see Fig. 6.16).
Chapter 6 Graph Theory 345
b
b
b b
b
bb
u
P
Q
z
wv
R
Figure 6.16:
Since G−w is connected, a (u, v)-path, say R, must exist in G−w. If R
avoids P or Q, we are done; otherwise let z be the last vertex of R belonging
to P ∪ Q. (see Fig 6.16). Without loss of generality, assume that z ∈ P .
We combine the (u, z)-subpath of P (with (z, v)-section of R) to obtain a
(u, v)-path internally-disjoint from the (u, v)-path Q ∪ (w, v)-path in G.
We next give further characterizations of 2-connected graphs.
Lemma 6.4.13 (Expansion Lemma):
Let G be a k-connected graph and let G ′ be obtained from G by adding a
new vertex x and making it adjacent to at least k vertices of G. Then G ′ is
also k-connected.
Proof. Let S be a separating set of G ′. We have to show that |S| ≥ k. If
x ∈ S, then S \ x separates G and so |S −x| ≥ k. Hence |S| ≥ k+ 1. If
x /∈ S, then if S separates G, then |S| ≥ k. Otherwise, S does not separate
G but separates G ′ and so the neighbor set of x in G ′ is contained in S.
Theorem 6.4.14:
For a graph G with |V (G)| ≥ 3, the following conditions characterize 2-
Chapter 6 Graph Theory 346
connected graphs and are all equivalent.
(a) G is connected and has no cut vertex.
(b) For all u, v ∈ V (G), u 6= v, there are two internally disjoint (u, v)-paths.
(c) For all u, v ∈ V (G), u 6= v, there is a cycle through u and v
(d) δ(G) ≥ 2 and every pair of edges in G lies on a common cycle.
Proof. Equivalence of (a) and (b) follows from Theorem 6.4.12. Any cycle
containing vertices u and v corresponds to a pair of internally disjoint (u, v)-
paths. Therefore (b) and (c) are equivalent.
Next we shall prove that (d) ⇒ (c). Let x, y ∈ V (G) be any two vertices
in G. We consider edges of the type ux and uy (since δ(G) > 1) or ux and
wy. By (d) these edges lie on a common cycle and hence x and y lie on a
common cycle (see Fig. 6.17).
b
b
b
x
u
y
b
b
b
b
ux
wy
Figure 6.17:
Therefore (d) ⇒ (c).
For proving (c) ⇒ (d), suppose that G is 2-connected and let uv, xy ∈
E(G). We add to G, the vertices w with neighborhood u, v and z with
Chapter 6 Graph Theory 347
neighborhood x, y. By the Expansion Lemma, the the resulting graph G′
is 2-connected and hence w, z lie on a common cycle C in G′. Since w, z each
have degree 2, this cycle contains the paths u,w, v and x, z, y but not uv and
xy. We replace the paths u,w, v and x, z, y in C by the edges uv and xy to
obtain a desired cycle in G.
We conclude this section after introducing the notion of a block.
Definition 6.4.15:
A graph G is nonseparable if it is nontrivial, connected and has no cut
vertices. A block of a graph is a maximal nonseparable subgraph of G.
Note that if G has no cut vertices then G itself is a block.
Example 6.4.16:
A graphG is shown in Fig 6.18.(a) and Fig 6.18.(b) shows its blocksB1, B2, B3
and B4.
Chapter 6 Graph Theory 348
b
b
b
b
b
b
b
b
b
g
h
f
e
d
b
a
c(a) Graph G
b
b
b
b
c
a
b
d b
b
b
d f
e
b
b
f
g
b
b
f
h(b) Blocks of G
Figure 6.18:
Let G be a connected graph with |V (G)| ≥ 3. The following statements
are straightforward.
1 Each block of G with at least three vertices is a 2-connected subgraph of
G.
2 Each edge of G belongs to one of its blocks and hence G is the union of its
blocks.
3 Any two blocks of G have at most one vertex in common; such a vertex, if
it exists, is a cut vertex of G.
4 A vertex of G that is not a cut vertex belongs to exactly one of its blocks.
5 A vertex of G is a cut vertex of G if and only if it belongs to at least two
blocks of G.
Chapter 6 Graph Theory 349
6.5 Trees and their properties
6.5.1 Basic Definition
The notion of a tree in graph theory is a simple and fundamental concept
with important applications in Computer Science.
Definition 6.5.1:
A tree is a connected graph that has no cycle (that is, acyclic). A forest
is an acyclic graph; each component of a forest is a tree. Fig. 6.19 shows
an arbitrary tree. The graphs of Fig 6.20 show all (unlabeled) trees with at
most five vertices.
b
b
b
b b
bb
b
b
b
b
b
b
bb
b
bb
b
bb
b
b
bb
b
bb
b
b
b
b
b
b
b
Figure 6.19: An arbitrary tree.
Chapter 6 Graph Theory 350
b
1 vertex
b
b
2 vertices
b
b
b
3 vertices
b
b
b
b
b
b
b
b
b
4 vertices
b b b b b
b
b b bb
b
b
bb
b
b b
5 vertices
Figure 6.20: Trees with at most five vertices.
A tree can be characterized in different ways. We do this in the next
theorem.
Theorem 6.5.2 (Tree characterizations):
Given a graph G = (V,E), the following conditions are equivalent:
(i) G is a tree.
(ii) There is a unique path between any pair of vertices in G.
(iii) G is connected and each edge of G is a cut edge.
(iv) G is acyclic and the graph formed from G by “adding an edge” (that is,
a graph of the form G + e where e joins a pair of nonadjacent vertices
of G) contains a unique cycle.
(v) G is connected and |V (G)| = |E(G)|+ 1.
Chapter 6 Graph Theory 351
Before proving Theorem 6.5.2, we first prove two lemmas.
Lemma 6.5.3:
Any tree with at least two vertices contains at least two leaves.
Proof. Given a tree T = (V,E), let P = (v0, e1, v1, . . . , ek, vk) be a longest
path of T . Clearly, length of P is at least 1 and so v0 6= vk. We claim that
both v0 and vt are leaves. This is done by contradiction. If v0 is not a leaf,
then there exists an edge e = v0v where e 6= e1. Then either v is in P or v is
not in P : if v is in P (that is, v = vi, i ≥ 2) then the edge e together with
the section of the path P from v0 to vi forms a cycle in T , a contradiction
to the fact that T is acyclic. If v is not in P , then that would contradict the
choice of P .
Lemma 6.5.4:
Let v be a leaf in a graph G. Then G is a tree if and only if G− v is a tree.
Proof. We first assume that G is a tree. Let x, y be two vertices of G − v.
Since G is connected x and y are connected by a path in G. This path
contains only two vertices, viz., x and y, of degree 1; so it does not contain
v. Consequently it is completely contained in G − v and hence G − v is
connected. As G is acyclic, G− v is also acyclic and so it is a tree.
We now assume that G− v is a tree. As v is a leaf, there exists an edge
uv incident on it. Clearly, u ∈ (G− v) and (G− v) ∪ uv has no cycle. As
G−v is connected (because it is a tree), (G−v)∪uv = G is connected.
We now prove Theorem 6.5.2
Chapter 6 Graph Theory 352
Proof of Theorem 6.5.2. We prove that each of the statements (ii) through
(v) is equivalent to statement (i). The proofs go by induction on the number
of vertices of G, using Lemma 6.5.4. For the induction basis, we observe that
all the statements (i) through (v) are valid if G contains a single vertex only.
We first show that (i) implies all of (ii) to (v). Let G be a tree with at
least two vertices and let v be a leaf and let v′ be the vertex adjacent to v
in G. By the induction hypothesis, we assume that G − v satisfies (ii) to
(v). Now the validity of (ii), (iii) and (v) for G is obvious. For (iv), since G
is connected, any two vertices x, y ∈ V (G) is connected by a path P and if
xy /∈ E(G), then P + xy creates a cycle. Therefore (i) implies (iv) as well.
We now prove that each of the conditions (ii) to (v) implies (i). In (ii)
and (iii) we already assume connectedness. Also, a graph satisfying (ii) or
(iii) cannot contain a cycle: for (ii), this is because two vertices in a cycle are
connected by two distinct paths and for (iii), the reason is that by omitting
an edge in a cycle we obtain a connected graph. Thus (ii) implies (i) and
(iii) also implies (i).
To verify that (iv) implies (i), it suffices to check that G is connected. If
x, y ∈ V (G), then either xy ∈ E(G) or the graph G + xy contains a unique
cycle. Necessarily, as G is a cyclic, this cycle must contain the edge xy. Now
removal of the edge xy from this cycle give a path from x to y in G. Thus
G is connected. We finally prove that (v) implies (i). Let G be a connected
graph satisfying |V (G)| = |E(G)| + 1 ≥ 2. The sum of the degrees of all
vertices is 2|V (G)| − 2. This means that not all vertices can have degree 2
or more. Since all degrees are at least 1 (by connectedness) there exists a
vertex v of degree exactly 1, that is, a leaf of G. The graph G′ = G − v is
Chapter 6 Graph Theory 353
again connected and it satisfies |V (G′)| = |E(G′)| + 1. Hence it is a tree by
the induction hypothesis and thus G is a tree as well.
6.5.2 Sum of distances from a leaf of a tree
Definition 6.5.5:
Let G be a graph and let u, v be a pair of vertices connected in a path in G.
Then the distance from u to v, denoted by dG(u, v) or simply d(u, v), is the
least length (in terms of the number of edges) of a (u, v)-path in G. If G has
no (u, v)-path, we define d(u, v) =∞.
Definition 6.5.6:
The diameter of a connected graph G (denoted by diam (G)) is defined as
the maximum of the distances between pairs of vertices of G. In symbols,
diam (G) = maxu,v∈V (G)
d(u, v).
Since in a tree any two vertices are connected by a unique path, the
diameter of a tree T is the length of a longest path in T .
We next prove a theorem that gives a bound on the sum of the distances
of all the vertices of a tree T from a given vertex of T .
Theorem 6.5.7:
Let u be a vertex of a tree T with n vertices. Then,
∑
v∈V (G)
d(u, v) ≤(n
2
).
Proof. The proof goes by induction on the number of vertices of G. The
result holds trivially for n = 2. Let n > 2. The graph T − u is a forest with
Chapter 6 Graph Theory 354
components T1, . . . , Tk, where k > 1. As T is connected, u has a neighbor
in each Ti; also, since T has no cycles, u has exactly one neighbor say vi, in
each Ti (the situation is shown in Fig. 6.21)
b
b
b b b
bb
b
b
b
bb
b
b
b
bb
b
b
b
b
bb
b
b
T1
v1u
v2
T2
v3 T3
Figure 6.21:
If v ∈ V (Ti), then the unique (u, v)-path in T passes through vi and we
have,
dT (u, v) = 1 + dTi(vi, v)
Letting ni = |Ti|, we obtain
∑
v∈V (Ti)
dT (u, v) = ni +∑
v∈V (Ti)
dTi(vi, v)
By the induction hypothesis,
∑
v∈V (Ti)
dTi(vi, v) ≤
(ni
2
)
if we now sum the formula for distances from u over all the components of
T − u, we obtain (ask∑
i=1
ni = n− 1),
∑
v∈V (T )
dT (u, v) ≤ (n− 1) +∑
i
(ni
2
)
Chapter 6 Graph Theory 355
We note thatk∑
i=1
(ni
2
)≤(
n−12
)since
∑ni = n − 1, and the right hand side
counts the edges inKn−1 and the left hand side counts the edges in a subgraph
of Kn−1 (a disjoint union of cliques). Hence we have,
∑
v ∈ V (T )
dT (u, v) ≤ (n− 1) +
(n− 1
2
)=
(n
2
)
6.6 Spanning Tree
6.6.1 Definition and Basic Results
We now introduce the notion of a spanning tree.
Definition 6.6.1:
A spanning subgraph of a graph G is a subgraph H of G such that V (H) =
V (G). A spanning tree of G is a spanning subgraph of G which is a tree.
It is not difficult to show that every connected graph has a spanning
tree. Note that not every spanning subgraph is connected and a connected
subgraph of G need not be a spanning subgraph.
Exercise 6.6.2:
Prove that a graph G is connected if and only if it has a spanning tree.
By labeling the vertices of K4, we can easily see that it has 16 different
spanning trees (see Fig. 6.22). However, K4 has only two non-isomorphic
unlabeled spanning trees namely K1,3 and P4.
Chapter 6 Graph Theory 356
b
b b
b
1
4 3
2b b
bb
1 2
34b
b b
b
4
1 2
3bb
b b
34
1 2
b b
b b
1 2
4 3b
b
b
b
4
1
3
2
b b
b b
4 3
1 2b
b
b
b
1
4
2
3
b b
bb
1 2
34
b
bb
b
4
21
3b
b
b
b
2
3
1
4
b
bb
b
1
34
2
b
b
b
b
3
1
4
2
b
b bb
b
4
1 2
3b
bb b
b
3
21
4b b
b
b
b
4 3
21
bb
b
b
b
34
1 2
Figure 6.22:
We state a general theorem due to the English mathematician Arthur
Caley without proof.
Theorem 6.6.3 (A. Caley (1889)):
The complete graph Kn has nn−2 different labeled spanning trees.
Corollary 6.6.4 (to Theorem 6.5.7):
If u is a vertex of a connected graph G of order n then
∑
v ∈ V (G)
d(u, v) ≤(n
2
)
Chapter 6 Graph Theory 357
Proof. Let T be a spanning tree of G. Every (u, v)-path in T also appears in
G (Gmay have other (u, v)-paths that are shorter than those in T ). Therefore
dG(u, v) ≤ dT (u, v), for every vertex v. This implies
∑
v ∈ V (G)
dG(u, v) ≤∑
v ∈ V (G)
dT (u, v)
But Theorem 6.5.7, we have∑
v ∈ V (G)
dT (u, v) ≤(
n2
).
Hence∑
v ∈ V (G)
dG(u, v) ≤(n
2
)
Exercise 6.6.5:
Let G be a graph with exactly one spanning tree. Prove that G is a tree.
Exercise 6.6.6:
If ∆ = k, then show that a graph G has a spanning tree with at least k
leaves.
The sum of the distances over all pairs of distinct vertices a graph G
is known as the Wiener Index of G, denoted by W (G). Thus W (G) =∑
u,v∈V (G) d(u, v).
Chemical molecules can be modeled as graphs by treating the atoms
as vertices and the atomic bonds as edges. Wiener index was originally
introduced by Harold Wiener who observed correlation between this index
and the boiling point of paraffins. Consider the path Pn of n vertices labeled
1 through n wherein vertex i to connected to vertex i− 1 (1 < i ≤ n).
b b b b
1 2 3 4b bb b
(n− 2) (n− 1)
W (Pn−1)
n
W (Pn)
Figure 6.23:
Chapter 6 Graph Theory 358
From Fig. 6.23 it easily follows that
W (Pn) = W (Pn−1) +n(n− 1)
2
yielding Pn =(
n+13
).
Exercise 6.6.7:
Prove the following:
(i) W (Kn) =(
n2
).
(ii) W (Km,n) = (m+ n)2 −mn−m− n.
(iii) W (C2n) = (2n)3/8.
(iv) W (C2n+2) = (2n+ 2)(2n+ 1)(2n)/8.
(v) Wiener index of the Petersen Graph is 75.
We next give an algorithm for finding a spanning tree of a graph.
Algorithm SP-TREE
Let G = (V,E) be the given graph, where |V (G)| = n and |E(G)| = m.
Let (e1, e2, . . . , em) be the edge sequence of G labeled in some way. We
now successively construct the subsets E0, E1, . . . of G.
Set E0 = φ. From Ei−1, the set Ei is obtained as follows:
Ei =
Ei−1 ∪ ei, if the graph (V (G), Ei−1 ∪ ei) has no cycle
Ei−1, otherwise
We stop if Ei has already n − 1 edges or if i = m (i.e., all edges have been
considered).
Chapter 6 Graph Theory 359
We prove that algorithm SP-TREE is correct.
Proposition 6.6.8:
If algorithm SP-TREE outputs a graph T with n − 1 edges, then T is a
spanning tree of G. If T has k < (n − 1) edges, then G is a disconnected
graph with n− k components.
Proof. From the way the sets Ei are constructed the graph T contains no
cycle. If k = |E(T )| = n−1, then by Theorem 6.5.2 (v), T is a tree and hence
it is a spanning tree. If k < n − 1, then T is a disconnected graph whose
every component is a tree. It is easy to reason that it has n− k components.
We prove that the vertex sets of the components of the graph T coincide
with those of the components of the graph G. Assume the contrary and
let x and y be vertices lying in the same component of G but in distinct
components of T . Let C be the component of T containing the vertex x.
Consider some path,
(x = x0, e1, x1, e2, . . . , ek, xk = y)
from x to y in G as shown in Fig. 6.24.
b
b
b
b
b
b
b
b
b
b
x xi
yC
e
Figure 6.24:
Let i be last index for which xi is contained in the component C. Obvi-
Chapter 6 Graph Theory 360
ously i < k and have xi+1 /∈ C. The edge e = xixi+1 thus does not belong to
T and so it had to form a cycle with some edges already selected into T at
some stage of the algorithm. Therefore the graph T +e also contains a cycle;
but this is impossible as e connects two distinct components of T . This gives
the desired contradiction.
6.6.2 Minimum Spanning Tree
The design of electronic circuits using integrated chips often requires that
the pins of several components be at the same potential. This is achieved
by wiring them together. To interconnect a set of n pins, we can use an
arrangement of n − 1 wires, each connecting two points. Of the various
arrangements on a circuit board, the one that uses the least amount of wire
is desirable.
The above wiring problem can be modeled thus: we are given a connected
graph G = (V (G), E(G)), where V (G) corresponds to the set of pins and
E(G) corresponds to the possible interconnections. Associated with each
edge uv ∈ E(G), we have a “weight” w(u, v) specifying the cost (amount
of wire needed) to connect u and v. We then wish to find a spanning tree
T (V,E(T )) of G whose total weight,
W (E(T )) =∑
u v ∈E(T )
w(u, v)
is minimum. Such a tree T is called a minimum spanning tree (MST) of the
graph G.
We now present what is popularly known as Prim’s algorithm (alterna-
tively called Jarnik’s algorithm) for finding the MST of a graph.
Chapter 6 Graph Theory 361
6.6.3 Algorithm PRIM
Let the input graph be G = (V,E), with |V (G)| = n, |E(G)| = m. We
will successively construct the sets V0, V1, V2, . . . ⊆ V of vertices and the sets
E1, E1, E2, . . . ⊆ E of edges as given below.
Let E0 = φ and V0 = v, where v is an arbitrary vertex. Having
constructed Vi−1 and Ei−1, we next find an edge ei = xiyi ∈ E(G) such that,
(i) xi ∈ Vi−1 and yi ∈ V \ Vi−1 and
(ii) ei is an edge of minimum weight.
We now set, Vi = Vi−1 ∪ yi and Ei = Ei−1 ∪ ei. If no such edge exists,
the algorithm terminates. Let Et denote the set for which the algorithm has
stopped. The algorithm outputs the graph T = (V,Et) as the MST.
Proof of correctness of Algorithm PRIM. Let T = (V,Et) be the output
of algorithm PRIM. Let the edges of Et be numbered e1, e2, . . . , en−1 in the
order in which they have been added to T by the algorithm.
Assume that T is not a MST; let T ′ be some MST. Let k(T ′) denote the
index for which all the edges e1, e2, . . . , ek ∈ E(T ′) but ek+1 /∈ E(T ′). Among
all MSTs, we select the one which has the maximum value k. Let this MST
be T = (V, E). Define k = k(T ).
Let us now consider the stage in the algorithm’s execution when the edge
ek+1 has been added to T . Let Tk = (Vk, Ek) be the tree formed by the
addition of the edges e1, . . . , ek. Then ek+1 = xy where x ∈ V (Tk) and
y /∈ V (Tk). Consider the graph T + ek+1. This graph contains some cycle
C-such a cycle necessarily contains the edge ek+1.
Chapter 6 Graph Theory 362
The cycle C consists of the edge ek+1 = xy, plus a path, say P connecting
the vertices x and y in the spanning tree T . At least one edge of the path P
has one vertex in the set Vk and the other vertex not in Vk. Let e be such an
edge. Obviously e is different from ek+1 (see Fig. 6.25) and also e ∈ E and
ek+1 /∈ E.
b
bb
b
b
b
b
b
b
b
b
b
b
b
bb
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
bey
x
ek+1
edges in P
Figure 6.25:
Both the e and ek+1 connect a vertex of Vk with a vertex outside Vk and
by the edge selection rule in the algorithm we get w(ek+1) ≤ w(e).
Now consider the graph T ′ = (T + eK+1)− e. This graph has n− 1 edges
and is connected as can be easily seen. Hence it is a spanning tree. Now we
have w(E(T ′)) = w(E)−w(e) +w(eK+1) ≤ w(E) and thus T ′ is an MST as
well, but with k(T ′) > k(T ). This is a contradiction to the choice of T and
hence T must be a MST.
We next present another elegant algorithm to find the MST of a weighted
graph. This is Kruskal’s algorithm.
Chapter 6 Graph Theory 363
6.6.4 Algorithm KRUSKAL
Given a weighted graph G, let us denote the edges of G by the sequence
e1, e2, . . . , em where
w(e1) ≤ w(e2) ≤, . . . ,≤ w(em).
For the above ordering of edges, execute algorithm SP-TREE.
Exercise 6.6.9:
Prove that algorithm KRUSKAL does produce an MST.
6.7 Independent Sets and Vertex Coverings
6.7.1 Basic Definitions
Definition 6.7.1:
Recall (see Section 6.2.1 on page 319) that an independent set of a graph
G is a subset S ⊆ V (G) such that no two vertices of S are adjacent in G.
S is a maximum independent set of G if G has no independent set S ′ with
|S ′| > |S|. A maximal independent set of G is an independent set that is not
a proper subset of an independent set of G.
Chapter 6 Graph Theory 364
b
bb
b
bb
b
p
u
t
s
r
q
v
Figure 6.26:
In Fig 6.26, v and p, q, r, s, t, u are both maximal independent sets.
The latter set is also a maximum independent set.
Definition 6.7.2:
A subset K ⊆ V (G) is called a covering of G if every edge of G is incident
with at least one vertex of K. A covering K is minimum if there is no
covering K ′ of G such that |K ′| < |K|; it is minimal if there is no covering
K ′′ of G such that K ′′ is a proper subset of K.
b
b b
b
bb
b
b
u
y
x w
v
z
Figure 6.27: Wheel W5
In the graphW5 of Fig 6.27, u, v, w, x, y is a covering ofW5; u, w, x, z
is a minimal covering.
Chapter 6 Graph Theory 365
Theorem 6.7.3:
In a graph G =(V (G), E(G)
), a subset S ⊆ V (G) is independent if V (G) \ S
is a covering of G.
Proof. In proof, we note that S is independent if and only if no two vertices
of S are adjacent in G. Hence every edge of S must be incident to a vertex
of V (G) \ S. Therefore V (G) \ S is a covering of G.
Definition 6.7.4:
The number of vertices in a maximum independent set of G is called the
independence number of G and is denoted by α(G). The number of vertices
in a minimum covering of G is the covering number of G and is denoted by
β(G).
Corollary 6.7.5:
For a graph G of order n, α(G) + β(G) = n.
Proof. Let S be a maximum independent set of G. By Theorem 6.7.3, V (G)\
S is a covering of G and therefore |V (G) \ S| = n − α(G) ≥ β(G) or n ≥
α(G)+β(G). Similarly, let K be a minimum covering of G. Then V (G)\K is
an independent set and so |V (G)\K| = n−β(G) ≤ α(G) or n ≤ α(G)+β(G).
The two inequalities together imply that α(G) + β(G) = n.
Exercise 6.7.6:
Prove that the smallest possible maximal independent set in a d-regular graph
is n/(d+ 1).
Exercise 6.7.7:
Chapter 6 Graph Theory 366
Show that a graph G having all degrees at most d satisfies the following
inequality:
α(G) ≥ V (G)
d+ 1
6.8 Vertex Colorings of Graphs
6.8.1 Basic Ideas
We begin with a simple problem, known as the “storage problem”. The
Chemistry department of a college wants to store various chemicals so that
incompatible chemicals (two chemicals are incompatible when they cause a
violent reaction when brought together) are stored in different rooms. The
college is interested in knowing the minimum number of rooms required to
store all the chemicals so that no two incompatible chemicals are in the same
room.
To solve the above problem, we form a graph G =(V (G), E(G)
), where
V (G) corresponds to the chemicals and uv ∈ E(G) if and only if the chemicals
corresponding to u and v are incompatible. Then any set of compatible
chemicals corresponds to an independent set of G. Thus a safe storing scheme
of chemicals corresponds to a partition of V (G) into independent subsets of
G. The cardinality of such a minimum partition of V (G) is then the required
number of rooms. This minimum cardinality is called the chromatic number
of the graph G.
Definition 6.8.1:
The chromatic number χ(G) of a graph G is the minimum number of in-
dependent subsets that partition the vertex set of G. Any such minimum
Chapter 6 Graph Theory 367
partition is called a chromatic partition of V (G).
We can interpret the above described storage problem as a vertex coloring
problem: we are asked to color the vertices of a graph G so that no two ad-
jacent vertices receive the same color; that is, no two incompatible chemicals
are marked by the same color. If we use the minimum number of colors, we
have solved the problem.
Definition 6.8.2:
A k-coloring of a graph G is a labeling f : V (G) → 1, . . . , k. The labels
are interpreted as colors; all vertices with the same color form a color class.
A k-coloring f is proper if an edge uv is in E(G) iff f(u) 6= f(v). A graph
G is k-colorable if it has a proper k-coloring. We will call the labels “colors”
because their numerical value is not important. Note that χ(G) is then the
minimum number of colors needed for a proper coloring of G. We also say “G
is k-chromatic” to mean χ(G) = k. It is obvious that χ(Kn) = n. Further
χ(G) = 2 if and only if G is bipartite having at least one edge. We can also
reason that χ(Cn) = 2 if n is even and χ(Cn) = 3 if n is odd.
Exercise 6.8.3:
Prove that χ(G) = 2 if and only if G is a bipartite graph with at least one
edge.
The Petersen graph, P has chromatic number 3. Fig. 6.28 shows a proper
3-coloring of P using three colors. Certainly, P is not 2-colorable, since it
contains an odd cycle.
Chapter 6 Graph Theory 368
b
b
b
b
b
b
b b bb
1
2
2
1
31
3
23
1
Figure 6.28:
Since each color class of a graph G is an independent set, we can see that,
χ(G) ≥ |V (G)|/α(G), where α(G) is the independence number of G.
Definition 6.8.4:
A graph G is called critical if for every proper subgraph H of G, we have
χ(H) < χ(G). Also, G is called k-critical if it is k-chromatic and critical.
The above definition holds for any graph. When G is connected it is
equivalent to the condition that χ(G − e) < χ(G) for each edge e ∈ E(G);
but then this is equivalent to saying χ(G− e) = χ(G)− 1. If χ(G) = 1, then
G is either trivial or totally disconnected. Hence G is 1-critical if and only
if G is K1. Also χ(G) = 2 implies that G is bipartite and has at least one
edge. Hence G is 2-critical if and only if G is K2.
Exercise 6.8.5:
Prove that every critical graph is connected.
Exercise 6.8.6:
Show that if G is k-critical, then for any v ∈ V (G) and e ∈ E(G),
χ(G− v) = χ(G− e) = k − 1.
Chapter 6 Graph Theory 369
It is clear that any k-chromatic graph contains a k-critical subgraph. This
can be seen by removing vertices and edges in succession, whenever possible,
without decreasing the chromatic number.
Theorem 6.8.7:
If G is k-critical, then δ(G) ≥ k − 1.
Proof. Suppose δ(G) ≤ k − 2. Let v be a vertex of minimum degree in
the graph G. Since G is k-critical, χ(G − v) = χ(G) − 1 = k − 1 (refer
Exercise 6.8.6). Hence in any proper (k − 1)-coloring of G − v, at most
(k − 2) colors alone would have been used to color the neighbors of v in G.
Thus there is at least one color, say c, that is left out of these (k− 1) colors.
If v is given that color c, a proper (k − 1)-coloring of G is obtained. This is
impossible since G is k-chromatic. Hence δ(G) ≥ (k − 1).
6.8.2 Bounds for χ(G)
We begin with the following corollary to Theorem 6.8.7
Corollary 6.8.8:
For any graph χ(G) ≤ 1 + ∆(G).
Proof. Let G be a k-chromatic graph and let H be a k-critical subgraph of
G. Then χ(H) = χ(G) = k.
By Theorem 6.8.7, δ(H) ≥ k − 1 and hence k ≤ 1 + δ(H) ≤ 1 + ∆(H) ≤
1 + ∆(G).
We can also show that the above result is implied by the “greedy” coloring
Chapter 6 Graph Theory 370
algorithm given below.
Algorithm GREEDY-COLORING
Given V (G), let v1, v2, . . . , vn be a vertex ordering, where |V (G)| = n. Color
the vertices in this order, assigning to vi the smallest-indexed label (color)
not already used on its lower-indexed neighbors.
In a vertex ordering, each vertex v has at most ∆(G) neighbors prior to v
in the ordering. So the algorithm GREEDY-COLORING cannot be forced to
use more than ∆(G)+1 colors. (reason: v has at most ∆(G) vertices adjacent
to it; for coloring these adjacent vertices we need ∆(G) colors and a different
color for coloring v itself). This constructively proves that χ(G) ≤ ∆(G)+1.
If G is an odd cycle, then χ(G) = 3 = 2+1 = 1+∆(G); if G is a complete
graph Kk, χ(G) = k = 1 + (k − 1) = 1 + ∆(G). That these are the only two
extremal families of graphs for which χ(G) = 1 + ∆(G) is asserted by the
following theorem, which we state without proof.
Theorem 6.8.9 (Brook’s Theorem):
If G is a connected graph other than a complete graph or an odd cycle, then
χ(G) ≤ ∆(G).
Exercise 6.8.10:
If χ(G) = k, then show that G contains at least k vertices each of degree at
least k − 1.
Chapter 7
Coding Theory
7.1 Introduction
Coding Theory has its origin in communication engineering. But with Shan-
non’s seminal paper of 1948 [?], it has been greatly influenced by math-
ematics with a variety of mathematical techniques to tackle its problems.
Algebraic coding theory uses a great deal of matrices, groups, rings, fields,
vector spaces, algebraic number theory and, not to speak of, algebraic geome-
try. In algebraic coding, each message is regarded as a block of symbols taken
from a finite alphabet. On most occasions, these are elements of Z2 = 0, 1.
Each message is then a finite string of 0’s and 1’s. For instance, 00110111 is
a message. Usually, the messages get transmitted through a communication
channel. It is quite possible that such channels are subjected to noises, and
consequently, the messages changed. The purpose of an error correcting code
is to add redundancy symbols to the message, based on, of course, on some
rule so that the original message could be retrieved even though it is garbled.
Any communication channel looks as in Figure 7.1. The first box of the
371
Chapter 7 Coding Theory 372
Message
1101
Encoder
1101001
Channelnoise
Receivedmessage
0100001
Decoder
1101001
Originalmessage
1101
Figure 7.1: The Communication Channel
channel indicates the message. It is then transmitted to the encoder which
adds a certain number of redundancy symbols. In Figure 7.1, they are 001
which when added to the message 1101 gives the coded message 1101001.
Because of channel noise, the coded message gets distorted and the received
message in Fig. 7.1 is 00101001. This message then enters the decoder. The
decoder applies the decoding algorithm and retrieves the coded message using
the added redundancy symbols. From this, the original message is read off
in the last box. The decoder has thus corrected two errors, that is, error in
two places.
The efficiency of a code is the number of errors it can correct. A code
is perfect if it can correct all of its errors. It is k-error-correcting if it can
correct k or fewer errors. The aim of coding theory is to devise efficient
codes. Its importance lies in the fact that erroneous messages could prove to
be disastrous.
It is relatively easier to detect errors than to correct them. Sometimes,
even detection may prove to be helpful as in the case of a feedback channel,
that is, a channel that has a provision for retransmission of the messages.
Suppose the message 1111 is sent through a feedback channel. If a single error
occurs and the received message is 0111, we can ask for a feedback twice and
may get 1111 on both the occasions. We then conclude that the received
message should have been 1111. Obviously, this method is not perfect as the
Chapter 7 Coding Theory 373
00
1 1q
q
pp
Figure 7.2:
original message could have been 0011. On the other hand, if the channel is
two-way, that is, it can detect errors so that the receiver knows the places
where the errors have occurred and also contains the provision for feedback,
then it can prove to be more effective in decoding the received message.
7.2 Binary Symmetric Channels
One of the simplest channels is the binary symmetric channel (BSC). This
channel has no memory and it simply transmits two symbols 0 and 1. It
has the property that the probability that a transmitted message is received
correctly is q while the probability that it is not is p = 1−q. This is pictorially
represented in Figure 7.2. Before considering an example of a BSC, we first
give the formal definition of a code.
Definition 7.2.1:
A code C of length n over a field F is a set of vectors in F n, the space of
ordered n-tuples over F . Any element of C is called a codeword of C.
As an example, the set of vectors, C = (10110), (00110), (11001), (11010)
is a code of length n over the field Z2. This code C has four codewords.
Chapter 7 Coding Theory 374
Suppose a binary codeword of length 4 (that is, a 4-digit codeword) is
sent through a BSC with probability q = 0.9. Then the probability that the
sent word is received correctly is q4 = (0.9)4 = 0.6561.
We now consider another code, namely, the Hamming (7, 4)-code. (See
Section 7.5 below). This code has as its words the binary vectors (1000 001),
(0100 011), (0010 010), (0001 111), of length 7 and all of their linear combi-
nations over the field Z2.
The first four positions are information positions and the last three are
the redundancy positions. There are in all 24 = 16 codewords in the code.
We shall see later that this code can correct one error. Hence the probability
that a received vector yields the transmitted vector is q7+7pq6, where the first
term corresponds to the case of no error and the term 7pq6 corresponds to a
single error in each of the seven possible positions. As q = 0.9, q7 + 7pq6 =
0.4783 + 0.3720 = 0.8503, which is quite large compared to the probability
0.6561 arrived at earlier in the case of a BSC.
Hamming code is an example of a class of codes called Linear Codes. we
now present some basic facts about linear codes.
7.3 Linear Codes
Definition 7.3.1:
An [n, k]-linear code C over a finite field F is a k-dimensional subspace of
V n, the vector space of ordered n-tuples over F .
If F has q elements, that is, F = GF (q), the [n, k]-code will have qk
codewords. The codewords of C are all of length n as they are n-vectors over
Chapter 7 Coding Theory 375
F . k is the dimension of C.
Definition 7.3.2:
A nonlinear code of length n over a field F is just a subset of the vector space
F n over F .
C is a binary code if F = Z2.
A linear code C is best represented by any one of its generator matrices.
Definition 7.3.3:
A generator matrix of a linear code C over F is a matrix whose row vectors
form a basis for C over F .
If C is an [n, k]-linear code over F , then a generator matrix of C is a k
by n matrix G over F whose row vectors form a basis for C.
For example, consider the binary code C1 with generator matrix
G1 =
1 0 0 1 1
0 1 0 1 0
0 0 1 0 1
.
Clearly, all the three row vectors of G1 are linearly independent over Z2.
Hence C1 has 23 = 8 codewords. The first three columns of G1 are linearly
independent over Z2. Therefore, the first three positions of any codewords of
C1 may be taken as information positions and the remaining two as redun-
dancy positions. In fact, the positions corresponding to any three linearly
independent columns of G1 may be taken as information positions and the
rest redundancies. Now any word of X of C1 is given by
X = x1R1 + x2R2 + x3R3, (8.1)
Chapter 7 Coding Theory 376
where x1, x2, x3 are all in Z2 and R1, R2, R3 are the three row vectors of
G1 in order. Hence by (8.1), X = (x1, x2, x3, x1 + x2, x1 + x3). If we take
X = (x1, x2, x3, x4, x5), we have the relations
x4 = x1 + x2, and
x5 = x1 + x3. (8.2)
In other words, the first redundancy coordinate of any codeword is the sum of
the first two information coordinates of that word while the next redundancy
coordinate is the sum of the first and third information coordinates.
Equations (8.2) are the parity-check equations of the code C1. They can
be rewritten as
x1 + x2 − x4 = 0, and
x1 + x3 − x5 = 0. (8.3)
In the binary case, equations (8.3) become
x1 + x2 + x4 = 0, and
x1 + x3 − x5 = 0. (8.4)
In other words, the vector X = (x1, x2, x3, x4, x5) ∈ C1 iff its coordinates
satisfy (8.4). Equivalently, X ∈ C1 iff it is orthogonal to the two vectors
(11010) and (10101). If we take these two vectors as the row vectors of a
matrix H1, then H1 is the 2 by 5 matrix
H1 =
1 1 0 1 0
1 0 1 0 1
.
H1 is called a parity-check matrix of the code C1. The row vectors of H1
are orthogonal to the row vectors of G1. (Recall that two vectors X =
Chapter 7 Coding Theory 377
(x1, . . . , xn) and Y = (y1, . . . , yn) of the same length n are orthogonal if their
inner product (= scalar product <X, Y > = x1y1 + · · ·+xnyn) is zero). Now
if a vector v is orthogonal to u1, . . . , uk, then it is orthogonal to any linear
combination of u1, . . . , un. Hence the row vectors of H1 which are orthogonal
to the row vectors of G1 are orthogonal to all the vectors of the row space of
G1, that is, to all the vectors of C1. Thus
C1 =X ∈ Z25 : H1X
t = 0
= Null space of the matrixH1,
when X t is the transpose of X. The orthogonality relations H1Xt = 0
give the parity-check conditions for the code C1. These conditions fix the
redundancy positions, given the message positions of any codeword. A similar
result holds good for any linear code. Thus any linear code over a field F is
either the row space of one of its generator matrices or the null space of its
corresponding parity-check matrix.
So far we have been considering binary linear codes. We now consider
linear codes over an arbitrary finite field F . As mentioned in Definition 7.3.1,
an [n, k] linear code C over F is a k-dimensional subspace of F n, the space
of all ordered n-tuples over F . If u1, . . . , uk is a basis of C over F , every
word of C is a unique linear combination
α1u1 + · · ·+ αkuk, αi ∈ F for each i.
Since αi can take q values for each i, 1 ≤ i ≤ k, C has q · q · · · q(k times) = qk
codewords.
Let G be the k by n matrix over F having u1, . . . , uk of F n as its row
vectors. Then as G has k (= dimension of C) rows and all the k rows form a
linearly independent set over F , G is a generator matrix of C. Consequently,
Chapter 7 Coding Theory 378
C is the row space of G over F . The null space of C is the space of vectors
X ∈ F n which are orthogonal to all the words of C. In other words, it is the
dual space C⊥ of C. As C is of dimension k over F , C⊥ is of dimension n−k
over F . Let X1, . . . , Xn−k be a basis of C⊥ over F . If H is the matrix
whose row vectors are X1, . . . , Xn−k, then H is a parity-check matrix of C.
It is an (n− k) by n matrix. Thus
C = row space of G
= null space of H
=X ∈ F n : HX t = 0
.
Theorem 7.3.4:
Let G = (Ik|A) be a generator matrix of a linear code C over F , where Ik is
the identity matrix of order k over F , and A is a k by (n − k) matrix over
F . Then a generator matrix of C⊥ is given by
H =(−At|In−k
)
over F .
Proof. Each row of H is orthogonal to all the rows of G since (by block
multiplication, See(...),
GH t = [Ik|A][−AIn−k
]= −A+ A = 0.
Recall that in the example following Definition 7.3.3, k = 3, n = 5, and
G1 = (I3|A) , whereA =
[1 11 00 1
]whileH1 =
(−At|I2
)=(At|I2
)over Z2.
Chapter 7 Coding Theory 379
Corollary 7.3.5:
G = [Ik|A] is a generator matrix of a code C of length n iff H = [−At|In−k]
is a parity-check matrix of C.
7.4 The Minimum Weight (Hamming Weight)
of a Code
Definition 7.4.1:
The weight W (v) of a codeword v of a code C is the number of nonzero
coordinates in v. The minimum weight of C is the least of the weights of its
nonzero codewords. The weight of the zero vector of C is naturally zero.
Example 7.4.2:
As an example, consider the binary code C2 with generator matrix
G2 =[
1 0 1 1 00 1 1 0 1
]= [I2|A] .
C2 has four codewords. Its three nonzero words are u1 = (10110),
u2 = (01101), and u3 = u1 + u2 = (11011). Their weights are 3, 3 and
4 respectively. Hence the minimum weight of C2 is 3. A parity-check matrix
of C2 is, (refer Theorem 7.3.4)
H2 = [−At|I5−2] =[At|I3
]=
[1 1 1 1 01 0 0 1 00 1 0 0 1
](as F = Z2).
Definition 7.4.3:
Let X,Y ∈ F n. The distance d(X,Y ), also called the Hamming distance
between X and Y , is defined to be the number of places in which X and Y
differ.
Chapter 7 Coding Theory 380
If X and Y are codewords of a linear code C, over a field F , then X − Y
is also in C and has nonzero coordinates only at the places where X and Y
differ. Accordingly, if X and Y are words of a linear code C, then
d(X,Y ) = wt (X − Y ). (8.5)
We state this result as our next theorem.
Theorem 7.4.4:
The minimum distance of a linear code C is the minimum weight of a nonzero
codeword of C.
Thus for the linear code C2 of Example 7.4.2, the minimum distance is 3.
The function d(X,Y ) defined in Equation (8.5) does indeed define a dis-
tance function (that is, a metric) on C. That is to say, it has the following
three properties:
For all X,Y, Z in C,
(i) d(X,Y ) ≥ 0, and d(X,Y ) = 0 iff X = Y .
(ii) d(X,Y ) = d(Y,X).
(iii) d(X,Z) ≤ d(X,Y ) + d(Y, Z).
7.5 Hamming Codes
Hamming codes are binary linear codes. They can be defined either by their
generator matrices or by their parity-check matrices. We prefer the latter.
Let us start by defining the [7, 4]-Hamming code H3. The seven column
vectors of its parity-check matrix H are the binary representations of the
Chapter 7 Coding Theory 381
numbers 1 to 7 written in such a way that the last three of its column
vectors form I3, the identity matrix of order 3. Thus
H =
[1 1 1 0 1 0 01 1 0 1 0 1 01 0 1 1 0 0 1
].
The columns of H are the binary representations of the numbers 7, 6, 5, 3;
4, 2, 1 respectively. As H is of the form [−At|I3], the generator matrix of H3
is given by
G = [I4|A] =
1 0 0 0 1 1 10 1 0 0 1 1 00 0 1 0 1 0 10 0 0 1 0 1 1
.
H3 is of length 23 − 1 − 3 = 4, and dimension 4 = 7 − 3 = 23 − 1 − 3.
What is the minimum distance of H3? One way of finding it is to list all
the 24 − 1 nonzero codewords (See Theorem 7.4.4). However, a better way
of determining it is the following. The first row of G is of weight 4 while the
remaining rows are of weight 3. The sum of any two or three of these row
vectors as well as the sum of all the four row vectors of G are all of weight
at least 3. Hence the minimum distance of H3 is 3.
7.6 Standard Array Decoding
We now write the coset decomposition of Z72 with respect to the subspace H3.
(Recall that H3 is a subgroup of the additive group Z72). As Z7
2 has 27 vectors,
and H3 has 24 codewords, the number of cosets of H3 in Z2 is 27/24 = 23.
(See ....). Each coset is of the form X + H3 = X + v : v ∈ H3. Any two
cosets are either identical or disjoint. The vector X is a representative of the
cosetX + H3. The zero vector is a representative of the coset H3. If X and
Y are each of weight 1, the coset X + H3 6= Y + H3, since X −Y is of weight
Chapter 7 Coding Theory 382
at most 2 and hence does not belong to H3 (as H3 is of minimum weight
3). Hence the seven vectors of weight 1 in Z72 together with the zero vector
define 8 = 23 pairwise disjoint cosets exhausting all the 23 × 24 = 27 vectors
of Z72. These eight vectors (namely, the seven vectors of weight one and the
zero vector) are called coset leaders.
We now construct a double array (Figure 7.3) of vectors of Z72 with the
cosets defined by the eight coset leaders (mentioned above). The first row is
the coset defined by the zero vector, namely, H3.
Figure 7.3 gives the standard array for the code H3. If u is the message
vector (that is, codeword) and v is the received vector, then v− u = e is the
error vector. If we assume that v has one or no error, then e is of weight 1 or 0.
Accordingly e is a coset leader of the standard array. Hence to get u from v,
we subtract e from v. In the binary case (as −e = e), u = v+e. For instance,
if in Figure 7.3, v = (1100 110), then v is present in the second coset for which
the leader is e = (1000 000). Hence the message is u = v + e = (0100 110).
This incidentally shows that H3 can correct single errors. However, if for
instance, u = (0100 110) and v = (1000 110), then e = (1100 000) is of weight
2 and is not a coset leader of the standard array of Figure 7.3. In this
case, the standard array decoding of H3 will not work as it would wrongly
decode v as (1000 110) − (0000 001) = (1000 111) ∈ H3. (Notice that v is
present in the last row of Figure 7.3). The error is due to the fact that v has
two errors and not just one. Standard array decoding is therefore maximum
likelihood decoding. The general Hamming code Hm is defined analogous to
H3. Its parity-check matrix H has the binary representations of the numbers
1, 2, 3, . . . , 2m − 1 as its column vectors. Each such vector is a vector of
Chapter 7 Coding Theory 383
length m. Hence H is a m by 2m − 1 binary matrix and the dimension
of Hm is 2m − 1 − m = 1 (number of columns in H) − (number of rows
in H)). In other words, Hm is a [2m − 1, 2m − 1−m] linear code over Z2.
Notice that the matrix H has rank m since H contains Im as a submatrix.
Chapte
r7
Codin
gT
heory
384
coset C = H3
leader
(0000000) (1000111) (0100110) (0010101) (0001011) (1100001) (1010010) (1001100) (0110011) (0101101) (0011010) (0111000) (1011001) (1101010) (1110100) (1111111)
(1000000) (0000111) (1100110) (1010101) (1001011) (0100001) (0010010) (0001100) (1110011) (1101101) (1011010) (1111000) (0011001) (0101010) (0110100) (0111111)
(0100000) (1100111) (0000110) (0110101) (0101011) (1000001) (1110010) (1101100) (0010011) (0001101) (0111010) (0011000) (1111001) (1001010) (1010100) (1011111)
(0010000) (1010111) (0110110) (0000101) (0011011) (1110001) (1000010) (1011100) (0100011) (0111101) (0001010) (0101000) (1001001) (1111010) (1100100) (1101111)
(0001000) (1001111) (0101110) (0011101) (0000011) (1101001) (1011010) (1000100) (0111011) (0100101) (0010010) (0110000) (1010001) (1100010) (1111100) (1110111)
(0000100) (1000011) (0100010) (0010001) (0001111) (1100101) (1010110) (1001000) (0110111) (0101001) (0011110) (0111100) (1011101) (1101110) (1110000) (1111011)
(0000010) (1000101) (0100100) (0010111) (0001001) (1100011) (1010000) (1001110) (0110001) (0101111) (0011000) (0111010) (1011011) (1101000) (1110110) (1111101)
Figure 7.3:
Chapter 7 Coding Theory 385
H:
· · · 0 · · · 0 · · · 0 · · ·...
......
0 1 1
· · · 1 · · · 0 · · · 1 · · ·
· · · 1 · · · 1 · · · 0 · · ·
· · · i · · · j · · · k · · ·
(a)
H:
· · · 1 · · · 0 · · ·
· · · 0 · · · 1 · · ·...
...
0 0
· · · 1 · · · 1 · · ·
· · · a · · · b · · ·
(b)
Figure 7.4:
The minimum distance of Hm, m ≥ 2 is 3. This can be seen as follows. Re-
call that Hm =X ∈ Z2m−1
2 : HX t = 0
. Let i, j, k denote respectively the
numbers of the columns of H in which the m-vectors (0. . . 011), (00. . . 0101)
and (0. . . 0110) are present. (see Figure 7.4 (a)). Let v be the binary vector
of length 2m − 1 which has 1 in the i-th, j-th and k-th positions and zero
at other positions. Clearly, v is orthogonal to all the row vectors of H and
hence belongs to Hm. Hence Hm has a word of weight 3. Further, Hm has no
word of weight 2 or 1. Suppose Hm has a word u of weight 2. Let a, b be the
positions where u has 1 (Figure 7.4 (b)). As the columns of H are distinct,
the a-th and b-th columns of H are not identical. It is clear that u is not
orthogonal to all the row vectors of H (For instance, u is not orthogonal to
the first row of H). Hence Hm has no word of weight 2. Similarly, it has no
word of weight 1. Thus the minimum weight of Hm is 3.
Chapter 7 Coding Theory 386
7.7 Sphere Packings
As before, let F n denote the vector space of all ordered n-tuples over F .
Recall (Section 7.4) that F n is a metric space with the Hamming distance
between vectors of F n as the metric.
Definition 7.7.1:
In F n, the sphere with centre X and radius r is the set
S(X, r) = Y ∈ F n : d(X,Y ) ≤ r ⊂ F n.
Definition 7.7.2:
An r-error-correcting linear code C is perfect if the spheres of radius r with
the words of C as centres are pairwise disjoint and their union is F n.
The above definition is justified because if v is a received vector that has
at most r errors, then v is at a distance at most r from a unique codeword
u of C, and hence v belongs to the unique sphere S(u, r); then, v will be
decoded as u.
Theorem 7.7.3:
The Hamming code Hm is a single-error-correcting perfect code.
Proof. Hm is code of dimension 2m−1−m over Z2 and hence, has 2(2m−1−m)
words. Now if v is any codeword of Hm, S(v, 1) contains v (which is at
distance zero from v) and the 2m − 1 codewords got from v (which is of
length 2m−1) by altering each position once at a time. Thus S(v, 1) contains
1 + (2m − 1) = 2m words of Hm. Consequently, the union of the spheres
S(v, 1) as v varies over Hm contains 22m−1−m · 2m vectors. But this number
Chapter 7 Coding Theory 387
is = 22m−1 = the number of vectors in F n, where n = 2m − 1.
Out next theorem shows that the minimum distance d of a linear code
stands as a good measure of the code.
Theorem 7.7.4:
If C is a linear code of minimum distance d, then C can correct t = ⌊d− 1
2⌋
or fewer errors.
Proof. It is enough to show that the spheres of radius t centred at the
codewords of C are pairwise disjoint. Indeed, if u and v are in C, and if
z ∈ B(u, t) ∩B(v, t) (See Figure 7.5), then
u vz
Figure 7.5:
d(u, v) ≤ d(u, z) + d(z, v)
≤ t+ t = 2t ≤ d− 1 < d,
a contradiction to the fact that d is the minimum distance of C.
For instance, for the Hamming code Hm, d = 3, and therefore t =
⌊d− 1
2⌋ = 1, and so Hm is single error-correcting, as observed earlier in
Section 7.6.
Chapter 7 Coding Theory 388
7.8 Extended Codes
Let C be a binary linear code of length n. We can extend this code by adding
an overall parity-check at the end. This means, we add a zero at the end of
each word of even weight in C and add 1 at the end of every word of odd
weight. This gives an extended code C′ of length n+ 1.
For looking at some of the properties of C′, we need a lemma.
Lemma 7.8.1:
Let w denote the weight function of a binary code C. Then
w(X + Y ) = w(X) + w(Y )− 2(X ⋆ Y ) (8.6)
where X ⋆ Y is the number of common 1’s in X and Y .
Proof. Let X and Y have common 1’s in the i1, i2, . . . , ip-th positions so that
X ⋆ Y = p. Let X have 1’s in the i1, . . . , ip and j1, . . . , jq-th positions and Y
in the i1, . . . , ip and l1, . . . , ir-th positions. Then w(X) = p+ q, w(Y ) = p+ r
and w(X + Y ) = q + r. The proof is now clear.
Coming back to the extended code C′, by definition, it is an even weight
code, that is, every word of C′ is of even weight. In fact, if X and Y are in
C′, then the RHS of Equation (8.6) is even and therefore w(X + Y ) is even.
A generator matrix of C′ is obtained by adding an overall parity-check to the
rows of a generator matrix of C. Thus a generator matrix of the extended
code H′
3 is
1 0 0 0 1 1 1 00 1 0 0 1 1 0 10 0 1 0 1 0 1 10 0 0 1 0 1 1 1
Chapter 7 Coding Theory 389
7.9 Syndrome Decoding
Let C be an [n, k]-linear code over GF (q) = F . The standard array decoding
scheme requires storage of qn vectors of F n and also comparisons of a received
vector with the coset leaders. The number of such comparisons is at most
qn−k, the number of distinct cosets in the standard array. Hence any method
that makes a sizeable reduction in storage and the number of comparisons
is to be welcomed. One such method is given by the syndrome-decoding
scheme.
Definition 7.9.1:
The syndrome of a vector Y ∈ F n with respect to a linear [n, k]-code over F
with parity-check matrix H is the vector HY t.
As H is an (n − k) by n matrix, the syndrome of Y is a column vector
of length n − k. We denote the syndrome of Y by S(Y ). For instance, the
syndrome of Y = (1110 001) with respect to the Hamming code H3 is
[1 1 1 1 1 0 01 1 0 1 0 1 01 0 1 1 0 0 1
]
1110001
=
[101
].
Theorem 7.9.2:
Two vectors of F n belong to the same coset in the standard array decompo-
sition of a linear code C iff they have the same syndrome.
Proof. Let u and v belong to the same coset a + C, a ∈ F n, of C. Then
u = a + X and v = a + Y , where X,Y are in C. Then S(u) = S(a + X) =
Chapter 7 Coding Theory 390
H(a+X)t = Hat +HX t = S(a) (Recall that as X ∈ C, S(X) = HX t = 0).
Similarly S(v) = S(a). Thus S(a) = S(b).
Conversely, let S(u) = S(v). Then Hut = Hvt and therefore H(u− v)t =
0. This means that u − v ∈ C, and hence the cosets u + C and v + C are
equal.
Theorem 7.9.2 shows that the syndromes of all the vectors of F n are
determined by the syndromes of the coset leaders of the standard array of C.
In case C is an [n, k]-binary linear code, there are 2n−k cosets and therefore
the number of distinct syndromes is 2n−k. Hence in contrast to standard-
array decoding, it is enough to store 2n−k vectors (instead of 2n vectors) in
the syndrome decoding. For instance, if C is an [100, 30]-binary linear code,
it is enough to store the 270 syndromes instead of the 2100 vectors in Z1002 , a
huge saving indeed.
7.10 Exercises
1. Find all the codewords of the binary code with generator matrix [ 1 0 1 1 11 0 0 1 1 ].
Find a parity-check matrix of the code. Write down the parity-check
equations.
2. Show by means of an example that the syndrome of a vector depends
on the choice of the parity-check matrix.
3. Decode the received vector (1100 011) in H3 using (i) the standard array
decoding, and (ii) syndrome decoding.
4. How many vectors of Z72 are there in S(u, 3), where u ∈ Z7
2?
Chapter 7 Coding Theory 391
5. How many vectors of F n are there in S(u, 3), where u ∈ F n, and
|F | = q.
6. Show that a t-error-correcting binary perfect [n, k]-linear code satisfies
the relation (n
0
)+
(n
1
)+ · · ·+
(n
t
)= 2n−k.
More generally, show that a t-error-correcting perfect [n, k]-linear code
over GF (q) satisfies the relation
(n
0
)+
(n
1
)+ · · ·+
(n
t
)= qn−k.
7. Show that there exists a set of eight binary vectors of length 6 such
that the distance between any two of them is at least 3.
8. Show that it is impossible to find nine binary vectors of length 6 such
that the distance between any two of them is at least 3.
9. Show that the function d(X,Y ) defined in Section 7.4 is indeed a metric.
10. Show that a linear code of minimum distance d can detect at most ⌊d2⌋
errors.
Chapter 8
Cryptography
8.1 Introduction
Cryptography is the science of transmitting messages in a secured fashion.
Naturally, it has become an important tool in this information age. It has
already entered into our day-to-day life through e-commerce, e-banking etc.,
not to speak of defence.
To make a message secure, the sender usually sends the message in a
disguised form. The intended receiver removes the disguise and then reads
off the original message. The original message of the sender is the plaintext
and the disguised message is the ciphertext. The plaintext and the ciphertext
are usually written in the same alphabet. The plaintext and the ciphertext
are divided, for the sake of computational convenience, into units of a fixed
length. The process of converting a plaintext to a ciphertext is known as
enciphering or encryption, and the reverse process is known as deciphering
or decryption. A message unit may consist of a single letter or any ordered
k-tuple, k ≥ 2. Each such unit is converted into a number in a suitable arith-
392
Chapter 8 Cryptography 393
metic and the transformations are then carried out on this set of numbers.
An enciphering transformation f converts a plaintext message unit P (given
by its corresponding number)into a number that represents the correspond-
ing ciphertext message unit C while its inverse transformation, namely, the
deciphering transformation just does the opposite by taking C to P . We
assume that there is a 1–1 correspondence between the set of all plaintext
units P to the set C of all ciphertext units. Hence each plaintext unit gives
rise to a unique ciphertext unit and vice versa. This can be represented
symbolically by
Pf−→ ξ
f−1
−→ C ,
where C stands for the enciphering transformation. Such a set up is known
as a cryptosystem.
8.2 Some Classical Cryptosystem
8.2.1 Caesar Cryptosystem
One of the earliest of the cryptosystems is the Caesar cryptosystem attributed
to Julius Caesar of 1st century. In this cryptosystem, the alphabet is the set
of English characters A, B, C, . . . ,X, Y , Z labelled 0, 1, 2, . . . , 23, 24, 25
respectively so that 0 corresponds to A, 1 corresponds to B and so on, and
finally 25 corresponds to Z. In this system, each message unit is of length
1 and hence consists of a single character. The encryption (transformation)
f ·P −→ C is given by
f(a) = a+ 3 (mod 26). (8.1)
Chapter 8 Cryptography 394
A B C D E F G H I J K L M
0 1 2 3 4 5 6 7 8 9 10 11 12
N O P Q R S T U V W X Y Z
13 14 15 16 17 18 19 20 21 22 23 24 25
Figure 8.1:
while the decryption (transformation) f−1 · C −→P is given by
f−1(b) = b− 3 (mod 26). (8.2)
Figure 8.1 gives the 1–1 correspondence between the characters A to Z and
the numbers 1 to 25. For example, the word “OKAY” corresponds to the
number sequence “(14) (10) (0) (24)” and this gets transformed, by eqn. (8.1),
to “(17)(13)(3)(1)” and so the corresponding ciphertext is “RNDB”. The
deciphering transformation applied to “RNDB” then gives back the message
“OKAY”.
8.2.2 Affine Cryptosystem
Suppose we want to encrypt the message “I LIKE IT”. In addition to the
English characters, we have in the message two spaces in between words.
So we add “space” to our alphabet by assigning to it the number 26. We
now do arithmetic modulo 27 instead of 26. Suppose, in addition, each such
message unit is an ordered pair (sometimes called a digraph). Then each
unit corresponds to a unique number in the interval [0, 272−1]. Now, in the
message, “I LIKE IT”, the number of characters including the two spaces is
9, an odd number. As our message units are ordered pairs, we add an extra
blank space at the end of the message. This makes the number of characters
Chapter 8 Cryptography 395
10 and hence the message can be divided into 5 units U1, U2, . . . , U5:
|I−| |LI| |KE| |−I| |T−|
where U1 = I− etc. (Here, − stands for space). Now U1 corresponds to the
number (see Figure 8.1) 8 · 271 + 26 = 242. Assume that the enciphering
transformation that acts now on ordered pairs is given by
C ≡ aP + b (mod 272) (8.3)
where a and b are in the ring Z27, a 6= 0 and (a, 27) = 1. In eqn. (8.3), P
and C denote a pair of corresponding plaintext and ciphertext units. The
Extended Euclidean Algorithm [?] ensures that as (a, 27) = 1, a has a unique
inverse a−1(mod 27) so that
aa−1 ≡ 1 (mod 272).
This enables us to solve for P in terms of C from the Congruence (8.3).
Indeed, we have
P ≡ a−1(C − b) (mod 272) (8.4)
As a specific example, let us take a = 4 and b = 2. Then
C ≡ 4P + 2 (mod 272).
Further as (4, 272) = 1, 4 has a unique inverse (mod 272); in fact 4−1 = −182
as 4 · (−182) ≡ 1(mod 272). This when substituted in congruence (8.4) gives
P ≡ −182 (C − 2) (mod 272).
Getting back to P =“I−” = 252 in I LIKE IT, we get
C ≡ 4 · 252 + 2 (mod 272)
Chapter 8 Cryptography 396
≡ 271 (mod 272).
Now 271 = 10 · 27 + 1, and therefore it corresponds to the ordered pair KB
in the ciphertext. Similarly, “LI”, “KE” and “(space)I” and “T(space)” cor-
respond to ?????????? respectively. Thus the ciphertext that corresponds to
the plaintext “I LIKE IT” is “KB, . . . , . . . , . . . , ”. To get back the plain-
text, we apply the inverse transformation (8.4). As the numerical equivalent
of “KB” is 271, relation (8.4) gives P ≡ (−182) (271 − 2) = 1863 ≡ 405 (
mod 272) and this, as seen earlier, corresponds to “I (−)”. An equation of
the form C = aP + b is known as an affine transformation. Hence such
cryptosystems are called affine cryptosystems .
In the Ceasar cryptosystem and the affine cryptosystem, in the transfor-
mation f(a) ≡ a + 3(mod 26), 3 is known as the key of the transformation.
In the transformation given by eqn. (8.3), there are two keys, namely, a and
b.
8.2.3 Private Key Cryptosystems
In the Caesar cryptosystem and the affine cryptosystem, the keys are known
to the sender and the receiver in advance. That is to say that whatever
information does the sender has with regard to his encryption, it is shared
by the receiver. For this reason, these cryptosystems are called private key
cryptosystems.
8.2.4 Hacking an affine cryptosystem
Suppose an intruder I (that is a person other than the sender A and the
receiver B) who has no knowledge of the private keys wants to hack the
Chapter 8 Cryptography 397
message, that is, decipher the message stealthily. We may suppose that the
type of cryptosystem used by A and B of the system including the unit
length, though not the keys, are known to I. Such an information may get
leaked out over a passage of time or may be obtained even by spying. How
does I go about hacking it? He does it by a method known as frequency
analysis.
Assume for a moment that the message units are of length 1. Look at a
long string of the ciphertext and find out the most-repeated character, the
next most-repeated character and so on. Suppose, for the sake of precision,
they are U, V, X, . . . Now in the English language, the most common char-
acters of the alphabet of 27 letters consisting of the English characters A
to Z and “space” are known to be “space” and E. Then “space” and E of
the plaintext correspond to U and V of the ciphertext respectively. If the
cryptosystem used is the affine system given by equation,
C = aP + b (mod 27),
we have 20 = a · 26 + b (mod 27)
and 21 = a · 4 = b (mod 27).
Subtraction yields
22a ≡ −1 (mod 27) (8.5)
As (22, 27) = 1, (8.5) has a unique solution a = 11. This gives b = 21−4a =
21− 44 = −23 = 4 ( mod 27). The cipher has thus been hacked.
Suppose now the cryptosystem is based on an affine transformation C =
aP + b with unit length 2. If the same alphabet consisting of 27 characters
of this section (namely, A to Z and space) is used, each unit corresponds to
Chapter 8 Cryptography 398
a unique nonnegative integer less that 272. Suppose the frequency analysis
of the ciphertext reveals that the most commonly occurring ordered pairs
are “CB” and “DX” in their decreasing orders of their frequencies. The
decryption transformation is of the form
P ≡ a′C + b′ (mod 272) (8.6)
Here a and b are the enciphering keys and a′, b′ are the deciphering keys.
Now it is known that in English language, the most frequently occurring
order pairs, in their decreasing orders of their frequencies, are “E(space)”
and “S(space)”. Symbolically,
“E(space)” −→ CA, and
“S(space)” −→ CX.
Writing these in terms of their numerical equivalents we get
(4× 27) + 26 = 134 −→ (2× 27) + 0 =54, and
(18× 27) + 26 = 512 −→ (3× 27) + 23=104. (8.7)
These, when substituted in (8.5), give the congruences:
134 ≡ 54a′ + b′ (mod 729), and
512 ≡ 104a′ + b′ (mod 729).
Subtraction gives
50a′ ≡ 378 (mod 729).
As (50, 729) = 1, this congruence has a unique solution by the Extended Eu-
clidean Algorithm [?]. In fact, a′ =?????????? and therefore b′ =??????????
Chapter 8 Cryptography 399
Thus the deciphering keys a′ and b′ have been determined and the cryptosys-
tem has been hacked.
In our case, the gcd(50, 729) happened to be 1 and hence we had no prob-
lem in determining the deciphering key. If not, we have to try all the possible
solutions for a′ and take the plaintext that is meaningful. Instead, we can also
continue with our frequency analysis and compare the next most-repeated
ordered pairs in the plaintext and ciphertext and get a third congruence and
try for a solution in conjunction with one or both of the earlier congruences.
If these also fail, we may have to adopt adhoc techniques to determine a′ and
b′.
8.3 Encryption Using Matrices
Assume once again that the message units are ordered pairs in the same
alphabet of size 27 of Section 8.2. We can use 2 by 2 matrices over the ring
Z27 to set up a private key cryptosystem in this case. In fact if A is any
2 by 2 matrix with entries from Z27, and (X, Y ) is any plaintext unit, we
encipher it as B = A
X
Y
, where B is again a 2 by 1 matrix and therefore
a ciphertext unit of length 2. If B =
X′
Y ′
, we have the equations
X′
Y ′
= A
X
Y
and
X
Y
= A−1
X′
Y ′
. (8.8)
Chapter 8 Cryptography 400
The first equation of (8.8) gives the encryption while the second gives the
decryption. Notice that A−1 must be taken in Z27. For A−1 to exist, we must
have (detA, 27) = 1. If this were not the case, we may have to try once
again adhoc methods.
As an example, take A =
2 1
4 3
. Then detA = 2, and (detA, 27) =
(2, 27) = 1. Hence 27(mod 27) exists and 2−1 = 14 ∈ Z27. This gives
A−1 = 14
3 −1
−4 2
=
42 −14
−56 28
=
15 13
25 1
over Z27. (8.9)
Suppose, for instance, we want to encipher “HEAD” using the above matrix
transformation. We proceed as follows: ‘HE” corresponds to the vector
7
4
,
and “AD” to the vector
0
3
. Hence the enciphering transformation gives
the corresponding ciphertext as
A
7 0
4 3
= A
7
4
, A
0
3
(mod 27)
=
2 1
4 3
7
4
,
2 1
4 3
0
3
(mod 27)
=
1 8
4 0
,
3
9
(mod 27)
=
1 8
1 3
,
3
9
=
S
N
,
D
J
Thus the ciphertext of “HEAD” is “SNDJ”. We can decipher “SNDJ” in
Chapter 8 Cryptography 401
exactly the same manner by taking A−1 in Z27. This gives the plaintext
A−1
18
13
, A−1
3
9
, where A−1 =
15 13
25 1
, as given by (8.9)
Therefore the plaintext is
450
463
,
162
84
(mod 27) =
7
4
,
0
3
and this corresponds to “HEAD”.
8.4 Exercises
1. Find the inverse of A =
17 5
8 7
in Z27.
2. Find the inverse of A =
12 3
5 17
in Z29.
3. Encipher the word “MATH” using the matrix A of Exercise 1 above as
the enciphering matrix in the alphabet A to Z of size 26. Check your
result by deciphering your ciphertext.
4. Solve the simultaneous congruences
x− y = 4 (mod 26)
7x− 4y = 10 (mod 26).
5. Encipher the word “STRIKES” using an affine transformation C ≡
4P + 7 ( mod 272) acting on units of length 2 over an alphabet of size
27 consisting of A to Z and the exclamation mark ! with 0 to 25 and
26 as the corresponding numerals.
Chapter 8 Cryptography 402
6. Suppose that we know that our adversary is using a 2 by 2 enciphering
matrix with a 29-letter alphabet, where A to Z have numerical equiv-
alents 0 to 25, (space)=26, ?=27 and !=28. We receive the message
(space) C ? Y C F ! Q, T W I U M H Q V.
Suppose we know by some means that the last four letters of the plain-
text are our adversary’s signature “MIKE”. Determine the full plain-
text.
8.5 Other Private Key Cryptosystems
We now describe two other private key cryptosystems.
8.5.1 Vigenere Cipher
In this cipher, the plaintext is in the English alphabet. The key consists of
an ordered set of letters for some fixed positive integer d. The plaintext is
divided into message units of length d. The ciphertext is obtained by adding
the key to each message unit using modulo 26 addition.
For example, let d = 3 and the key be XYZ. If the message is “ABAN-
DON”, the ciphertext is obtained by taking the numerical equivalence of the
plaintext, namely,
010 (13) 3 (14) 14 (13),
and the adding modulo 26 numerical equivalence of “XYZ”, namely, (23) (24) (25)
of the key. This yields
(23) (25) (25) (36) (27) (39) (37) (37) (mod 26)
Chapter 8 Cryptography 403
= (23) (25) 25 (10) 1 (13) (11) (11)
= “X Z Z K B N L L′′
as the ciphertext.
8.5.2 The One-Time Pad
This was introduced by G. S. Vernam in 1926. The alphabet Σ for the
plaintext is the set of 26 English characters. If the message M is of length
N , the key K is generated as a pseudorandom sequence of characters of Σ
also of the same length N . The ciphertext is then obtained by the equation
C ≡M +K (mod 26)
Notwithstanding the fact that the key K is as long as the message M , the
system has its own drawbacks.
(i) There are only standard methods of generating pseudorandom sequences
from out of Σ, and their number is not large.
(ii) The long private keyK must be communicated to the receiver in ad-
vance.
Despite these drawbacks, this cryptosystem is allegedly used in some highest
levels of communication such as the Washington-Moscow hotline [?].
There are several other private key cryptosystems. The interested reader
can see the references [?].
Chapter 8 Cryptography 404
8.6 Public Key Cryptography
All cryptosystems described so far are private key cryptosystems. This means
that some one who has enough information to encipher messages has enough
information to decipher messages as well. As a result, in private key cryp-
tography, any two persons in a group who want to communicate messages in
a secret way must have exchanged keys in a safe way (for instance, through
a trusted courier).
In 1976, the face of cryptography got altered radically with the invention
of public key cryptography by Diffie and Hellman [?]. In this cryptosystem,
the encryption can be done by any one. But the decryption can be done only
by the intended recipient who alone is in possession of a secret key.
At the heart of this cryptography is the concept of a “one-way function”.
Roughly speaking, a one-way function is a 1-1 function f which is such that
whenever k is given, it is possible to compute f(k) “rapidly” while it is
“extremely difficult” to compute the inverse of f in a “reasonable” amount of
time. There is no way of asserting that such and such a function is a one-way
function since the computations depend on the technology of the day—the
hardware and the software. So what passes on for a one-way function today
may fail to be a one-way function a few years hence.
As an example of a one-way function, consider two large primes p and q
each having at least 500 digits. Then it is “easy” to compute their product
n = pq. However, given n, there is no efficient factoring algorithm as on date
that would give p and q in a reasonable amount of time. The same problem
of forming the product pq with p and q having 100 digits had passed on for
a one-way function in the 1980’s but is no longer so today.
Chapter 8 Cryptography 405
8.6.1 Working of Public Key Cryptosystems
A public key cryptosystem works in the following way: Each person A in a
group has a public key PA and a secret key SA. The public keys are made
public as in a telephone book with PA given against the name A. A computes
his own secret key SA and keeps it within himself. The security of the system
rests on the fact that no person of the group other than A or an intruder
would be able to find out SA. The keys PA and SA are chosen to be inverses
of each other in that for any message M ,
(PA SA)M = M = (SA PA)M.
Transmission of Messages
Suppose A wants to send a message M to B in a secure fashion. The public
key of B is SB which is known to every one. A sends “PB ·M” to B. Now, to
decipher the message, B applies SB to it and gets SB(PBM) = (SB ·PB)M =
M . Note that none other than B can decipher the message sent by A since
B alone is in possession of SB.
Digital Signature
Suppose A wants to send some instruction to a bank (for instance, transfer
an amount to Mr. C from out of his account). If the intended message to the
bank is M , A applies his secret key SA to M and sends SAM to the bank.
He also gives his name for identification. The bank applies A’s public key PA
to it and gets the message PA(SAM) = (PA SA)M = M . This procedure
also authenticates A’s digital signature. This is in fact the method adopted
in credit cards.
Chapter 8 Cryptography 406
We now describe two public key cryptosystems. The first is RSA, after
their inventors, Rivest, Shamir and Adleman. In fact, Diffie and Hellman,
though they invented public key cryptography in 1976, did not give the pro-
cedure to implement it. Only Rivest Shamir and Adleman did it in 1978,
two years later.
8.6.2 RSA Public Key Cryptosystem
Suppose there is a group of people who want to communicate between them-
selves secretly. In such a situation, RSA is the most commonly used public
key cryptosystem. The length of the message units is fixed in advance as
also the alphabet in which the cryptosystem is operated. If for instance, the
alphabet consists of the English characters and the unit length is k, then any
message unit is represented by a number less than 26k = (say) N .
Description of RSA
We now describe RSA.
1. Each person A (traditionally called Alice) chooses two large distinct
primes p and q and computes their product n = pq, where p and q are
so chosen that n > N .
2. Each A chooses a small positive integer e, 1 < e < φ(n), such that
(e, φ(n)) = 1, where (the Euler function) φ(n) = φ(pq) = φ(p)φ(q) =
(p− 1)(q − 1). (e is odd as φ(n) is even).
3. As (e, φ(n)) = 1, by Extended Euclidean algorithm [?], e has a multi-
Chapter 8 Cryptography 407
plicative inverse d modulo n, that is,
ed ≡ 1 (mod φ(n)).
4. A (Alice) gives the ordered pair (n, e) as her public key and keeps d as
her private (secret) key.
5. Encryption P (M) of the message unit M is done by
P (M) ≡M e (mod n), (8.10)
while decryption S(M ′) of the cipher text unit M ′ is given by
S(M ′) ≡M ′d (mod n) (8.11)
Thus both P and S (of A) act on the ring Zn. Before we establish the
correctness of RSA, we observe that d (which is computed using the Extended
Euclidean algorithm) can be computed in O(log3 n) time. Further powers
M e and M ′d modulo n in eqns. (8.10) and (8.11) can also be computed in
O(log3 n) time [?]. Thus all computations in RSA can be done in polynomial
time.
Theorem 8.6.1 (Correctness of RSA):
Equations (8.10) and (8.11) are indeed inverse transformations.
Proof. We have
S (P (M)) = (S(M e)) ≡M ed (mod n).
Hence it suffices to show that
M ed ≡M (mod n).
Chapter 8 Cryptography 408
Now, by the definition of d,
ed ≡ 1 (mod φ(n)).
But φ(n) = φ(pq) = φ(p)φ(q) = (p− 1)(q − 1),
and therefore, ed = 1 + k(p− 1)(q − 1) for some integer k. Hence
M ed = M1+k(p−1)(q−1)
= M ·Mk(p−1)(q−1)
By Fermat’s little theorem, if (M, p) = 1,
Mp−1 ≡ 1 (mod p)
and therefore,
M ed = M ·M1+k(p−1)(q−1) ≡M (mod p).
If however (M, p) 6= 1, then (M, p) = p, and trivially
M ed ≡M (mod p).
Hence in both the cases,
M ed ≡M (mod p). (8.12)
For a similar reason,
M ed ≡M (mod q). (8.13)
As p and q are distinct primes, the congruences (8.12) and (8.13) imply that
M ed ≡M (mod pq)
≡M (mod n)
Chapter 8 Cryptography 409
The above description shows that if Bob wants to send the message M to
Alice, he will send it as M e(mod n) using the public key of Alice. To decipher
the message, Alice will raise this number to the power d and get M ed ≡ M
(mod n), the original message sent by Bob.
The security of RSA rests on the supposition that none other than Alice
can determine the private key d of Alice. A person can compute d if he/she
knows φ(n) = (p− 1)(q− 1) = n− (p+ q) + 1, that is to say, if he/she knows
the sum p + q. For this, he should know the factors p and q of n. Thus, in
essence, the security of RSA is based on the assumption that factoring a large
number n that is a product of two distinct primes is “difficult”. However to
quote Koeblitz [?], “no one can say with certainty that breaking RSA requires
factoring n. In fact, there is even some indirect evidence that breaking RSA
cryptosystem might not be quite as hard as factoring n. RSA is the public
key cryptosystem that has had by far the most commercial success. But,
increasingly, it is being challenged by elliptic curve cryptography”.
8.6.3 The ElGamal Public Key Cryptosystem
We have seen that RSA is based on the premise that factoring a very larger
integer which is a product of two “large” primes p and q is “difficult” com-
pared to forming their product pq. In other words, given p and q, finding
their product is a one-way function. ElGamal public key cryptosystem uses
a different one-way function, namely, a function that computes the power of
an element of a large finite group G. In other words, given G, g ∈ G, g 6= e,
and a positive integer a, ElGamal cryptosystem is based on the assumption
that computation of ga = b ∈ G is “easy” while given b ∈ G and g ∈ G, it is
Chapter 8 Cryptography 410
“difficult” to find the exponent a.
Definition 8.6.2:
Let G be a finite group and b ∈ G. If y ∈ G, then the discrete logarithm of y
with respect to base b is any non-negative integer x less than o(G), the order
of G, such that bx = y, and we write logb y = x.
As per the definition, logb y may or may not exist. However, if we take
G = F ∗q , the group of nonzero elements of a finite field Fq of q elements and
g, a generator of the cyclic group F ∗q (See [?]), then for any y ∈ F ∗
q , the
discrete logarithm logg y exists.
Example 8.6.3:
5 is a generator of F ∗17. In F ∗
17, the discrete logarithm of 12 with respect to
base 5 is 10. In symbols: log5 12 = 10. In fact, in F ∗17,
〈5〉 =
51 = 5, 52 = 8, 6, 13, 14, 2, 10, 58 = −1, 12, 9, 11, 4, 3, 15, 7, 516 = 1
This logarithm is called discrete as it is taken in a finite group.
8.6.4 Description of ElGamal System
The ElGamal system works in the following way: All the users in the system
agree to work in an already chosen large finite field Fq. A generator g of
F ∗q is fixed once and for all. Each message unit is then converted into a
number Fq. For instance, if the alphabet is the set of English characters and
if each message unit is of length 3, then the message unit BCD will have the
numerical equivalent 262 ·1+261 ·2+3 ( mod q). It is clear that in order that
these numerical equivalents of the message units are all distinct, q should be
Chapter 8 Cryptography 411
quite large. In our case, q ≥ 263. Now each user A in the system randomly
chooses an integer a = aA, 0 < a < q − 1, and keeps it as his or her secret
key. A declares ga ∈ Fq as his public key.
If B wants to send the message unit M to A, he chooses a random positive
integer k, k < q − 1, and sends the ordered pair
(gk, Mgak
)(8.14)
to A. Since B knows k and ga is the public key of A, B can compute gak.
How will A decipher B’s message? She will first raise the first number of the
ordered pair given in (8.14) to the power a and compute it in F ∗q . She will
then divide the second number Mgak of the pair by gak and get M . A can
do this as she has a knowledge of a. An intruder who gets to know the pair
(gk, Mgak) cannot find a = loggk(gak) ∈ F ∗q , since the security of the system
rests on the fact that finding discrete logarithm is “difficult”, that is, given
h and ha in Fq, there is no efficient algorithm to determine a.
There are other public key cryptosystems as well. The interested reader
can refer to [?].
8.7 Primality Testing
8.7.1 Nontrivial Square Roots (mod n)
We have seen that the most commonly applied public key cryptosystem,
namely, the RSA is built up on very large prime numbers (numbers having,
say, 500 digits and more). So there arises the natural question: Given a large
positive integer, how do we know that it is a prime or not. A ‘primality test’
is a test that tells if a given number is a prime or not.
Chapter 8 Cryptography 412
Let n be a prime, and a a positive integer with a2 ≡ 1 ( mod n). Then a
is called a square root mod n. This means that n divides (a− 1)(a+ 1), and
so, n|(a − 1) or n|(a + 1); in other words a ≡ ±1 ( mod n). Conversely, if
a ≡ ±1 ( mod n), then a2 ≡ 1 ( mod n). Hence a prime number has only the
trivial square root 1 and −1 modulo n. However, the converse is not true,
that is, there exist composite numbers m having only trivial square roots
modulo m, for instance, m = 10 is such a number. On the other hand 11 is
nontrivial square root of the composite number 20 since 11 6≡ ±1 ( mod 20).
Consider the modular exponentiation algorithm which determines
ac(mod n) in O(log2 n log c) time [?]. At any intermediate stage of the algo-
rithm, the output i is squared, taken modulo n and then multiplied to a or
1 as the case may be. If the square i2(mod n) of the output i is 1 modulo n,
then already we have determined a nontrivial square root modulo n, namely,
i. Therefore we can immediately conclude that n is not a prime and there-
fore a composite number. This is one of the major steps in the Miller-Rabin
Primality Testing Algorithm to be described below.
8.7.2 Prime Number Theorem
For a positive real number x, let π(x) denote the number of primes less than
or equal to x. The Prime Number Theorem states that π(x) is asymptotic
to x/ log x; in symbols, π(x) ≈ x/ log x. Here the logarithm is with respect
to base e. Consequently, π(n) ≈ n/ log n, or, equivalently, π(n)n≈ 1
log n. In
other words, in order to find a 100-digit prime one has to examine roughly
loge 10100 ≈ 230 randomly chosen 100-digit numbers for primality. (This
figure may drop down by half if we omit even numbers).
Chapter 8 Cryptography 413
8.7.3 Pseudoprimality Testing
Fermat’s Little Theorem (FLT) states that if n is prime, then for each a,
1 ≤ a ≤ n− 1,
an−1 ≡ 1 (mod n). (8.15)
Note that for any given a, an−1 ( mod n) can be computed in polynomial time
using the repeated squaring method [?]. However, the converse of Fermat’s
Little Theorem is not true. This is because of the presence of Carmichal
numbers. A Carmichal number is a composite number n satisfying (8.15) for
each a prime to n. They are sparse but are infinitely many (??????????).
The first few Carmichal numbers are 561, 1105, 1729.
Since we are interested in checking if a given large number n is prime
or not, n is certainly odd and hence (2, n) = 1. Consequently, if 2n−1 6≡ 1
(mod n), we can conclude with certainty, in view of FLT, that n is composite.
However, if 2n−1 ≡ 1 ( mod n), n may be a prime or not. If it is not a prime,
then it is a pseudoprime with respect to base b.
Definition 8.7.1:
n is called a pseudoprime to base a, where (a, n) = 1, if
(i) n is composite, and
(ii) an−1 ≡ 1 ( mod n)
In this case, n is also called a base a pseudoprime.
Base-2 Pseudoprime Test
Given an odd positive integer n, check if 2n−1 6≡ 1 ( mod n). If yes, n is
composite. If not, 2n−1 ≡ 1 ( mod n) and n may be a prime.
Chapter 8 Cryptography 414
But then there is a chance that n is not a prime. How often does this
happen? For n < 10, 000, there are only 22 pseudoprimes to base 2. They
are 341, 561, 645, 1105, . . . . Using better estimates due to Carlo Pomerance
(See [?]), we can conclude that the chance of a randomly chosen 50 digit
(resp. 100-digit) number satisfies (8.15) but fails to be a prime is < 10−6
(resp.< 10−13).
More generally, if (a, n) = 1, 1 < a < n, the pseudoprime test with
reference to base a checks if an−1 6≡ 1 ( mod n). If true, a is composite; if
not, a may be a prime.
8.7.4 The Miller-Rabin Primality Testing Algorithm
If we choose each a with (a, n) = 1, then we may have to choose φ(n) base
numbers a in the worst case. Instead, the Miller-Rabin Test works as follows:
(i) It tries several randomly chosen base values a instead of just one a.
(ii) While computing each modular exponentiation an−1 ( mod n), it stops
as soon as it notices a nontrivial square root of 1 ( mod n) and outputs
composite.
We now proceed to present the pseudocode for the Miller–Rabin test. The
code uses an auxiliary procedure WITNESS such that WITNESS (a, n) is
TRUE iff a is a “witness” to the compositeness of n. We now present and
justify the construction of “WITNESS”.
WITNESS (a, n)
Chapter 8 Cryptography 415 1 let (bk, bk−1, . . . , b0) be the binary representation of n− 1
2 d←− 1
3 for i←− k downto 0
4 do x←− d
5 d←− (d, d) ( mod n)
6 if d = 1 and x 6= n− 1
7 then return TRUE
8 if bi = 1
9 then d←− (d, d) ( mod n)
10 if d 6= 1
11 then return TRUE
12 return FALSEDescription of WITNESS (a, n)
Line 1 determines the binary representation of n. Lines 3 to 9 compute d
as an−1 ( mod n). This is done using the modular exponentiation method
(??????????) and can be done in polynomial time. Lines 6 and 7 check if
a nontrivial square root of 1 is present; if yes, then declares the test that n
is composite. Lines 10 and 11 return TRUE if an−1(mod n) 6≡ 1. We again
conclude that n is composite. If FALSE is returned in line 12, we conclude
that n is either a prime or a pseudoprime to base a. WITNESS (a, n) gives
a decisive conclusion only in the case when n is composite.
Chapter 8 Cryptography 416
Correctness of WITNESS (a, n)
If line 7 returns TRUE, then, x is a nontrivial square root modulo n and
so n is composite (See Section 8.7). If WITNESS returns TRUE in line 11,
then it has checked that an−1 6≡ 1(mod n) and therefore by Fermat’s Little
Theorem, a is composite. If line 12 returns FALSE, then an−1 ≡ 1(mod n),
and therefore a is either a prime or a pseudoprime to base a (as mentioned
before).
8.7.5 Miller-Rabin Algorithm (a, n)
The Miller-Rabin algorithm is a probabilistic algorithm to test the compos-
iteness of n. It chooses at random s base numbers in 1, 2, . . . , n− 1 from
a random-number generating process: RANDOM (1, n − 1). It then uses
WITNESS (a, n) as an auxiliary procedure and checks if a is composite.
MILLER-RABIN (a, n) 1 for j ←− 1 to s
2 do a←− RANDOM (1, n− 1)
3 if WITNESS (a, n)
4 then return COMPOSITE ⊲ Definitely
5 return PRIME ⊲ Almost surely
Chapter 8 Cryptography 417
Description and Correctness of Miller-Rabin (a, n)
The main loop (beginning of line 1) picks s random values of a ∈ 1, 2, . . . , n
(line 2). If one of a’s so picked is a witness to the compositeness of n (line
3), then Miller-Rabin outputs COMPOSITE in line 4. If no witness is found
in line 3, Miller-Rabin outputs PRIME in line 5. If Miller-Rabin outputs
PRIME in line 5, then there may be an error in the procedure. This error
may not depend on n but rather on the size of s and the luck in the random
choice of the base a. However, this error is rather small. In fact, it can be
shown that the error is less than 1/2s. Hence if the number of witnesses s is
large, the error is comparatively small.
8.8 The Agrawal-Kayal-Saxena (AKS) Primal-
ity Testing Algorithm
8.8.1 Introduction
The Miller-Rabin primality testing algorithm is a probabilistic algorithm
that uses Fermat’s Little Theorem. Another probabilistic algorithm is due
to Solovay and Strassen. It uses the fact that if n is an odd prime, then
an−1
2 ≡ (a
n)(mod n), where (
a
n) stands for the Legendre symbol. The Miller-
Rabin primality test is known to be the fastest randomized primality testing
algorithm, to within constant factors. However, the question of determin-
ing a polynomial time algorithm (See ??????????for definition) to test if a
given number is prime or not was remaining unsolved till July 2002. In
August 2002, Agrawal, Kayal and Saxena of the Indian Institute of Tech-
Chapter 8 Cryptography 418
nology, Kanpur, India made a sensational revelation that they have found a
polynomial time algorithm for primality testing. Actually, their algorithm
works in O (log7.5 n) time (Recall from ?????????? that O (f(n)) stands for
O(f(n)· (polynomial in log f(n))
). It is based on a generalization of Fermat’s
Little Theorem to polynomial rings over finite fields. Notably, the correct-
ness proof of their algorithm requires only simple tools of algebra. In the
following section, we present the details of AKS algorithm in ???
8.8.2 The Basis of AKS Algorithm
The AKS algorithm is based on the following identity for prime numbers
which is a generalization of Fermat’s Little Theorem.
Lemma 8.8.1:
Let a ∈ Z, n ∈ N, n ≥ 2, and (a, n) = 1. Then n is prime if and only if
(X + a)n ≡ Xn + a (mod n) (8.16)
Proof. We have
(X + a)n = Xn +n−1∑
n=1
(n
i
)Xn−1ai + an.
If n is prime, each term(
ni
), 1 ≤ i ≤ n − 1, is divisible by n. Further, as
(a, n) = 1, by Fermat’s Little Theorem, an ≡ a ( mod n). This establishes
(8.16).
If n is composite, n has a prime factor q < n. Let qk||n (that is, qk|n but
qk+1 6 | n). Now consider the term(
nq
)Xn−qaq in the expansion of (X + a)n.
We have (n
q
)=n(n− 1) · · · (n− q + 1)
1 · 2 · · · q .
Chapter 8 Cryptography 419
Then qk 6 |(
nq
). For if qk
∣∣∣(
nq
),(as qk||n
), n(n − 1) · · · (n − q + 1) must be
divisible by q, a contradiction. Hence qk, and therefore n does not divide the
term(
nq
)Xn−qaq. This shows that (X + a)n− (Xn + a) is not identically zero
over Zn.
The above identity suggests a simple test for primality: given input n,
choose an a and test whether the congruence (8.16) is satisfied, However
this takes time Ω(n) because we need to evaluate n coefficients in the LHS
in the worst case. A simple way to reduce the number of coefficients is to
evaluate both sides of (8.16) modulo a polynomial of the form Xr − 1 for an
appropriately chosen small r. In other words, test if the following equation
is satisfied:
(X + a)n = Xn + a (mod Xr − 1, n) (8.17)
From Lemma 8.8.1, it is immediate that all primes n satisfy eqn. (8.17) for
all values of a and r. The problem now is that some composites n may also
satisfy the eqn. (8.17) for a few values of a and r (and indeed they do).
However, we can almost restore the characterization: we show that for an
appropriately chosen r if the eqn. (8.17) is satisfied for several a’s, then n
must be a prime power. It turns out that the number of such a’s and the
appropriate r are both bounded by a polynomial in log n, and this yields a
deterministic polynomial time algorithm for testing primality.
8.8.3 Notation and Preliminaries
Fp denotes the finite field with p elements, where p is a prime. Recall that if p
is prime and h(x) is a polynomial of degree d irreducible over Fp, then Fp[X]/
(h(X)) is a finite field of order pd. We will use the notation, f(X) = g(X)
Chapter 8 Cryptography 420
(mod h(X), n) to represent the equation f(X) = g(X) in the ring Zn[X]/
(h(X)), that is, if the coefficients of f(X), g(X) and h(X) are reduced modulo
n, then h(X) divides f(X)− g(X).
As mentioned earlier, for any function f(n) of n, O(f(n)
)stands for
O(f(n) · poly in log
(f(n)
)). For example,
O(logk n) = O(
logk n · poly(
log(
logk n)))
= O(
logk n · poly(
log log n))
= O(
logk+ǫ n)
for any ǫ > 0.
All logarithms in this section are with respect to base 2.
Given r ∈ N, a ∈ Z with (a, r) = 1, the order of a modulo r is the smallest
number k such that ak ≡ 1(mod r). It is denoted as Or(a). φ(r) is the Euler’s
quotient function. Since by Fermat’s theorem, aφ(r) ≡ 1 ( mod r), and since
aOr(a) ≡ 1 ( mod r), by the definition of Or(a), we have Or(a) | φ(r).
We need the following simple fact about the lcm of the first m natural
numbers (See ?????????? for a proof).
Lemma 8.8.2:
Let lcm (m) denote the lcm of the first m natural numbers. Then for m ≥ 7,
lcm (m) ≥ 2m.
Chapter 8 Cryptography 421
8.8.4 The AKS Algorithm
Input, integer n > 1
1 If (n = ab for a ∈ N and b > 1), output COMPOSITE
2 Find the smallest r such that Or(n) > 4 log2 n
3 If 1 < (a, n) < n for some a ≤ r, output COMPOSITE
4 If n ≤ r, output PRIME
5 For a = 1 to ⌊2√φ(r) · log n⌋, do if (X + a)n 6= Xn + a ( mod Xn − 1, n),
output COMPOSITE
6 output PRIME
Theorem 8.8.3:
The AKS algorithm returns PRIME iff n is prime.
We now prove Theorem 8.8.3 through a sequence of lemmas.
Lemma 8.8.4:
If n is prime, then the AKS algorithm returns PRIME.
Proof. If n is prime, we have to show that AKS will not return COMPOSITE
in steps 1, 3 and 5. Certainly, the algorithm will not return COMPOSITE
in step 1. Also, if n is prime, there exists no a such that 1 < (a, n) < n, so
that the algorithm will not return COMPOSITE in step 3. By Lemma 8.8.1,
the for loop in step 5 cannot return COMPOSITE. Hence the algorithm will
identify n as PRIME either in step 4 or in step 6.
We now consider the steps when the algorithm returns PRIME, namely,
steps 4 and 6. Suppose the algorithm returns PRIME in step 4. Then n
must be prime. If n were composite, n = n1n2, where 1 < n1, n2 < n.
Chapter 8 Cryptography 422
then as n ≤ r, if we take a = n1, we have a ≤ r. So in step 3, we would
have had 1 < (a, n) = a < n, a ≤ r. Hence the algorithm would have
output COMPOSITE in step 3 itself. Thus we are left out with only one
case, namely, the case if the algorithm returns PRIME in step 6. For the
purpose of subsequent analysis, we assume this to be the case.
The algorithm has two main steps (namely, 2 and 5). Step 2 finds an
appropriate r and step 5 verifies eqn. (8.17) for a number of a’s. We first
bound the magnitude of r.
Lemma 8.8.5:
There exists an r ≤ 16 log5 n+ 1 such that Or(n) > 4 log2 n.
Proof. Let r1, . . . , rt be all the numbers such that Ori(n) ≤ 4 log2 n for each
i and therefore ri divides αi = (nOri(n) − 1) for each i. Now for each i, αi
divides the product
P =
⌊4 logr n⌋∏
i=1
(ni − 1) < n16 log4 n = (2log n)16 log4 n = 216 log5 n.
(Note that we have used the fact thatt∏
i=1
(ni − 1) < nt2 , the proof of which
follows readily by induction on t). As ri divides αi and αi divides P for
each i, 1 ≤ i ≤ t, the lcm of the ri’s also divides P . Hence (lcm of the
ri’s)< 216 log5 n. However, by Lemma 8.8.2,
lcm
1, 2, . . . , ⌈16 log5 n⌉≥ 2⌈16 2log5 n⌉.
Hence there must exist a number r in
1, 2, . . . , ⌈16 log5 n⌉
, that is, r ≤
16 log5 n+ 1 such that Or(n) > 4 log2 n.
Let p be a prime divisor of n. We must have p > r. For, if p ≤ r, (then
Chapter 8 Cryptography 423
as p < n), n would have been declared COMPOSITE in step 3, while if
p = n ≤ r, n would have been declared PRIME in step 4. This forces that
(n, r) = 1. Otherwise, there exists a prime divisor p of n and r, and hence
p ≤ r, a contradiction as seen above. Hence (p, r) is also equal to 1. We fix
p and r for the remainder of this section. Also, let l = ⌊2√φ(r) log n⌋.
Step 5 of the algorithm verifies l equations. Since the algorithm does not
output COMPOSITE in this step (recall that we are now examining step 6),
we have
(X + a)n = Xn + a (mod Xr − 1, n)
for every a, 1 ≤ a ≤ l. This implies that
(X + a)n = Xn + a (mod Xr − 1, p) (8.18)
for every a, 1 ≤ a ≤ l. By Lemma 8.8.1, we have
(X + a)p = Xp + a (mod Xr − 1, p) (8.19)
for 1 ≤ a ≤ b. Comparing eqn. (8.18) with (8.19), we notice that n behaves
like prime p. We give a name to this property:
Definition 8.8.6:
For polynomial f(X) and number m ∈ N, m is said to be introspective for
f(X) if
[f(X)]m = f(Xm) (mod Xr − 1, p).
It is clear from eqns. (8.18) and (8.19) that both n and p are introspective
for X + a, 1 ≤ a ≤ l. Our next lemma shows that introspective numbers are
closed under multiplication.
Chapter 8 Cryptography 424
Lemma 8.8.7:
If m and m′ are introspective numbers for f(X), then so is mm′.
Proof. Since m is introspective for f(X), we have
f(Xm) = (f(X))m (mod Xr − 1, p)
and hence [f(Xm)]m′
= f(X)mm′
(mod Xr − 1, p). (8.20)
Also, since m′ is introspective for f(X), we have
f(Xm′
)= f(X)m′
(mod Xr − 1, p).
Replacing X by Xm in the last equation, we get
f(Xmm′
)= f (Xm)m′
(mod Xmr − 1, p).
and hence f(Xmm′
)≡ f (Xm)m′
(mod Xr − 1, p) (8.21)
(since Xr − 1 divides Xmr − 1). Consequently from (8.20) and (8.21),
(f(X))mm′
= f(Xmm′
)(mod Xr − 1, p).
Thus mm′ is introspective for f(X).
Next we show that for a given number m, the set of polynomials for which
m is introspective is closed under multiplication.
Lemma 8.8.8:
If m is introspective for both f(X) and g(X), then it is also introspective for
the product f(X)g(X).
Proof. The proof follows from the equation:
[f(X) · g(X)]m = [f(X)]m[g(X)]m = f(Xm)g(Xm) (mod Xr − 1, p).
Chapter 8 Cryptography 425
Eqns. (8.18) and (8.19) together imply that both n and p are introspective
for (X + a). Hence by Lemmas 8.8.7 and 8.8.8, every number in the set
I = nipj : i, j ≥ 0 is introspective for every polynomial in the set P =l∏
a=1
(X + a)ea : ea ≥ 0
. We now define two groups based on the sets I
and P that will play a crucial role in the proof.
The first group consists of the set G of all residues of numbers in I modulo
r. Since both n and p are prime to r, so is any number in I. Hence G ⊂ Z∗n,
the multiplicative group of residues mod r that are relatively prime to r. It
is easy to check that G is a group. (Only thing that requires a verification is
that nipj has a multiplicative inverse in G. Since nOr(n) ≡ 1 ( mod r), there
exists i′, 0 ≤ i′ < Or(n) such that ni = ni′ . Hence inverse of ni (= ni′) is
n(Or(n)−i). A similar argument applies for p as pOr(n) = 1. (p being a prime
divisor of r, (p, r) = 1). Let |G| = the order of the group G = t (say). As G
is generated by n and p modulo r and since Or(n) > 4 log2 n, t > 4 log2 n.
To define the second group, we need some basic facts about cyclotomic
polynomials over finite fields. Let Qr(X) be the r-th cyclotomic polynomial
over the field Fp (?????[?]). Then Qr(X) divides Xr − 1 and factors into
irreducible factors of the same degree d = Or(p). Let h(X) be one such
irreducible factor of degree d. Then F = Fp[X]/(h(X)) is a field. The
second group that we want to consider is the group generated by X+ 1, X+
2, . . . , X + l in the multiplicative group F ∗ of nonzero elements of the field
F . Hence it consists of simply the residues of polynomials in P modulo h(X)
and p. Denote this group by G.
We claim that the order of G is exponential in either t = |G| or l.
Lemma 8.8.9:
Chapter 8 Cryptography 426
|G| ≥ min
2l − 1, 2t
.
Proof. First note that h(X) |Qr(X) and Qr(X) | (Xr − 1). Hence X may be
taken as a primitive r-th root of unity in F = Fp[X]/(h(X)).
We claim that (*) if f(X) and g(X) are polynomials of degree less than
t and if f(X) 6= g(X) in P , then their images in F (got by reducing the
coefficients modulo p and then taking modulo h(X) ) are distinct. To see
this, assume that f(X) = g(X) in the field F (that is, the images of f(X)
and g(X) in the field F are the same). Let m ∈ I. Recall that every number
of I is introspective with respect to every polynomial in P . Hence m is
introspective with respect to both f(X) and g(X). This means that
f(X)m = f(Xm) (mod Xr − 1, p),
and g(X)m = g(Xm) (mod Xr − 1, p).
Consequently, f(Xm) = g(Xm) (mod Xr − 1, p),
and since h(X) | (Xr − 1),
f(Xm) = g(Xm) (mod h(X), p).
In other words f(Xm) = g(Xm) in F , and therefore Xm is a root of the
polynomial Q(Y ) = f(Y ) − g(Y ) for each m ∈ G. As X is a primitive r-th
root of unity and for each m ∈ G, (m, r) = 1, Xm is also a primitive r-th
root of unity. Since each m ∈ G is reduced modulo r, the powers Xm, m ∈ G,
are all distinct. Thus there are at least t = |G| primitive r-th roots of unity
Xm, m ∈ G, and each one of them is a root of Q(Y ). But since f and g are
of degree less than t, Q(Y ) has degree less than t. This contradiction shows
that f(X) 6= g(X) in F .
Chapter 8 Cryptography 427
We next observe that the numbers 1, 2, . . . , t are all distinct in Fp. This
is because
l = ⌊2√φr) log n⌋ < 2
√r log n
< 2√r
√t
2(as t > 4 log2 n)
< r (as t < r; recall that G is a subgroup of Z∗n)
< p (by assumption on p).
Hence 1, 2, . . . , l ⊂ 1, 2, . . . , p− 1. This shows that the elements X + 1,
X + 2, . . . , X + l are all distinct in Fp[X] and therefore in Fp[X]/h(X) =
F . If t ≤ l, then all the possible products of the polynomials in the set
X + 1, X + 2, . . . X + t except the one containing all the t of them are all
distinct and of degree less than t. Their number is 2t − 1 and all of them
belong to P . By (*), their images in F are all distinct, that is, |G| ≥ 2t − 1.
If t > l, then there exist at least 2l such polynomials (namely, the product
of all subsets of X + 1, . . . , X + l). These products are all of degree at
most l and hence of degree < t. Hence in this case, |G| ≥ 2l. Thus |G| ≥
min
2t − 1, 2l
.
Finally we show that if n is not a prime power, then |G| is bounded above
by an exponential function of t = |G|.
Lemma 8.8.10:
If n is not a prime power, |G| ≤ 12n2
√t.
Proof. Set I = ni · pj : 0 ≤ i, j ≤ ⌊t⌋. If n is not a prime power (recall
that p|n), the number of terms in I = (⌊t⌋+ 1)2 > t. When reduced mod r,
the elements of I give elements of G. But |G| = t. Hence there exist at least
Chapter 8 Cryptography 428
two distinct numbers in I which become equal when reduced modulo r. Let
them be m1, m2 with m1 > m2. So we have (since r divides (m1 −m2) ),
Xm1 = Xm2 (mod Xr − 1) (8.22)
Let f(X) ∈ P . Then
[f(X)]m1 = f(Xm1) (mod Xr − 1, p)
= f(Xm2) (mod Xr − 1, p) by (8.22)
= [f(X)]m2 (mod Xr − 1, p)
= [f(X)]m2 (mod h(X), p) since h(X) | (Xr − 1).
This implies that
[f(X)]m1 = [f(X)]m2 in the field F . (8.23)
Now f(X) when reduced modulo (h(X), p) yields an element of G. Thus
every polynomial of G is a root of the polynomial
Q1(Y ) = Y m1 − Y m2 over F .
Thus there are at least |G| distinct roots in F . Naturally, |G| ≤ degree of
Q1(Y ). Now the degree of Q1(Y )
= m1 (as m1 > m2)
≤ the greatest number in I = (n p)⌊t⌋
<
(n2
2
)√t (p <
n
2, since p |n, and p 6= n
)
=n2
√t
2√
t<n2
√t
2.
Hence |G| ≤ m1 <n2
√t
2.
Chapter 8 Cryptography 429
Lemma 8.8.9 gives a lower bound for |G| while Lemma 8.8.10 gives an
upper bound for |G|. These bounds enable us to prove the correctness of the
algorithm.
Chapter 9
Finite Automata
9.1 Introduction
The broad areas of automata, formal languages, computability and complex-
ity comprise the core of contemporary Computer Science theory. Progress in
these areas has given a formal meaning to computation, which itself received
serious attention in the 1930s. Automata theory provides a firm basis for
the mathematical models of computing devices; finite automaton is one such
model. Ideas of simple finite automata have been implicitly used in elec-
tromechanical devices for over a century. A formal version of finite automata
appeared in 1943 in the McCulloch-Pitts model of neural networks. The
1950s saw intensive work on finite automata (often under the name “sequen-
tial machines”), including their recognizing power and equivalence to regular
expressions.
A finite automaton is a greatly restricted model of a modern computer.
The main restriction is that it has only limited memory (which cannot be
expanded) and has no auxiliary storage. The theory of finite automata is rich
430
Chapter 9 Finite Automata 431
and elegant. In 1970, with the development of the UNIX operating system,
practical applications of finite automata have appeared—in lexical analysis
(lex), in text searching (grep) and in Unix-like utilities (awk). Extensions
to finite automata have been proposed in the literature. For example, the
Buchi automata is one that accepts infinite input sequences.
We begin the notion of languages which is freely used in later discussions.
We then introduce regular expressions. Subsequent sections deal with finite
automata and their properties.
9.2 Languages
An alphabet is any finite set of symbols or letters. If we are considering
ASCII characters as our alphabet we can build text as understood commonly.
With the alphabet 0, 1, we can build binary numbers. We will denote an
arbitrary alphabet by the Greek letter Σ. Given Σ, a string (or any sentence
or a sequence) over Σ is any finite-length sequence of symbols of Σ.
e.g. if Σ = 0, 1, we can construct strings ending in 0;
that is, we can prescribe the sequence, 0, 10, 100, 110, 1000, 1100,
1010, 1110, . . .
(these strings can be precisely described in another way—they
are binary numbers divisible by 2)
The length of a string x, denoted by |x| is defined to be the number of
symbols in it. For example |1010| = 4. The null string or empty string is a
unique string of length 0 (over Σ). It is denoted by ǫ. Thus |ǫ| = 0. The set
of all strings over an alphabet Σ is denoted by Σ⋆. For example, if Σ = a,
Chapter 9 Finite Automata 432
then,
a⋆ = ǫ, a aa, aaa, aaaa, . . .
Consider the alphabet A comprising both upper and lower case of the 26
letters of the English alphabet as well as the punctuation marks, blank and
question mark. That is,
A = a, b, . . . , A, B, . . . , 6 b, ?
The sentence,
This 6 b is 6 b a6 b sentence.
is a sequence in A⋆. Similarly, the sentence,
sihT 6 b si6 b ynnuf.
is a sequence in A⋆.
The objective in defining a language is to selectively pick certain sequences
of Σ⋆, that make sense in some way or that satisfy some property.
Definition 9.2.1:
Let Σ be a finite set, the alphabet. A language over Σ is a subset of the set
Σ⋆.
Note that Σ⋆, φ and Σ denote some language! It is not difficult to reason
that since Σ is finite, Σ⋆ is a countably infinite set (the basic idea is thus: for
each k ≥ 0, all strings of length k are enumerated before all strings of length
k + 1; strings of length exactly k are enumerated lexicographically, once we
fix some ordering of the symbols in Σ).
Chapter 9 Finite Automata 433
Since languages are sets, they can be combined by union, intersection and
difference. Given a language A we also define the complement in Σ⋆ as
A =∼ A = x ∈ Σ⋆ |x 6∈ A .
In other words, ∼ A is Σ⋆ \ A.
If A and B are languages over Σ, their concatenation is a language A ·B
or simply AB defined as,
AB = xy |x ∈ A and y ∈ B .
The powers of An of a language A are inductively defined
A = ǫ An+1 = AAn
For example, a, bn is the set of strings over a, b of length n.
The closure or the Kleene star or the asterate A⋆ of a language A is the
set of all strings obtained by concatenating zero or more strings from A. This
is exactly equivalent to taking the union of all finite powers of A. That is
A⋆ =⋃
n≥0
An
We also define A+ to be the union of all nonzero powers of A:
A+ = AA⋆
Exercise 9.2.2:
The language L is defined over the alphabet a, b recursively as follows:
i) ǫ ∈ L
Chapter 9 Finite Automata 434
ii) ∀x in L, xa, bx and abx are also in L
iii) nothing else is in L
Show that L is the set of all elements in a, b⋆ except those containing the
substring aab.
A primary issue in the theory of computation is the representation of
languages by finite specification. As we are interested in languages that may
have infinite number of strings, the issue is quite challenging. Also, for many
languages, characterizing all the strings by a simple unique property may not
be possible. In any case, given a specification of a language, we are mostly
interested in the following problems:
(i) how to automatically generate the strings in the given language?
(ii) how to recognize whether a given string belongs to the given language?
Answers to these questions comprise much of the subject matter of the study
of automata and formal languages. Models of computation, such as finite
automata, pushdown automata, Turing Machines or λ-calculus evolved be-
fore modern computers came into existence. Parallely and independently, the
formal notion of grammar and language (Chomsky hierarchy, a hierarchy of
language classification) were developed. The equivalence between languages
and automata is now well-understood.
Our concern here is restricted to what are called as regular languages
described precisely by regular expressions (answering the question (i) above).
We are also concerned with the corresponding model of computation called
finite automata or finite state machines (which will answer the question (ii)
above).
Chapter 9 Finite Automata 435
9.3 Regular Expressions and Regular Languages
To concisely describe an identifier in a programming language, we may say
the following:
“an identifer (in some programming language) is a
letter followed by zero or more letters or digits.”
This can be syntactically described as,
identifier = letter · (letter + digit)⋆
The expression on the right is a typical regular expression. Interpreting ·
as concatenation, + as union and ⋆ as the Kleene star, we get the desired
meaning. We generalize the idea in the above example: we can construct
a language using the operations of union, concatenation and Kleene star on
“simple languages”. Given Σ, for any a ∈ Σ, we can form a simple language
a; also we allow empty language and ǫ to be simple languages.
Definition 9.3.1:
Regular expressions (and the corresponding languages) over Σ are exactly
those expressions that can be constructed from the following rules:
1. φ is a regular expression, denoting the empty language.
2. ǫ is a regular expression, denoting the language ǫ, the language con-
taining only the empty string.
3. If a ∈ Σ, a is a regular expression denoting the language a, the
language with only one string (comprising the single symbol a).
Chapter 9 Finite Automata 436
4. If R and S are regular expressions denoting languages LR and LS re-
spectively, then:
i) (R) + (S) is a regular expression denoting the language LR ∪ LS.
ii) (R) · (S) is a regular expression denoting the language LR · LS.
iii) (R)⋆ is a regular expression denoting the language L⋆R.
Languages that can be obtained by the above rules 1 through 4 produce
what are called as regular languages over Σ, which we define later.
Regular expressions obtained as above are fully paranthesized. We can
relax on this requirement if we agree to the precedence rules: Kleene star has
highest precedence and + , with the lowest precedence (we use parentheses
when necessary).
Example 9.3.2:
We write a+ b⋆c to mean (a+ ((b⋆)c)), a regular expression over a, b, c.
Example 9.3.3:
The regular expression (a + b)⋆ cannot be written as a + b⋆ because they
denote different languages.
Example 9.3.4:
To reason about the language represented by the regular expression (a+b)⋆a,
we note that (a + b)⋆ denotes strings of any length (including 0) formed by
taking a or b. This is then concatenated with the symbol a. Therefore the
language consists of all possible strings from a, b ending in a.
Chapter 9 Finite Automata 437
If two regular expressions R and S denote the same language, we write
R = S and we say R and S are equivalent.
Exercise 9.3.5:
Let Σ = a, b. Reason that
(i) (a+ b)⋆ = a⋆, b⋆⋆.
(ii) (aa+ ab+ ba+ bb)⋆ = ((a+ b)(a+ b))⋆.
Every regular expression can be regarded as a way of generating mem-
bers of a language. There are however some languages which have simple
descriptions even though they cannot be described by regular expressions.
For example, the language 0n1n |n ≥ 1 can be shown to be not regular.
Thus regular expressions are limited in their powers to specify languages in
general.
9.4 Finite Automata — Definition
In the previous section, we saw regular expressions as language generators.
We now describe a simple automaton or an abstract machine to recognize
strings that belong to a language generated by regular expressions. In the
discussions that follow, Σ should be obvious from the context.
Consider the following simple diagram, which depicts a finite automaton M1:
Chapter 9 Finite Automata 438
q0 q1
1 0 0
1The automaton M1
Figure 9.1:
Let 1010 be the input to the machine. We start in state q0 (marked by
an arrow) and we see a 1 in the input (scanned from left to right). On seeing
the 1 we make the transition as indicated and stay in state q0. On seeing the
second symbol 0, from state q0 we follow the transition and move to state q1.
In state q1 we see the third symbol 1. We make the transition to state q0. In
state q0 we see the last symbol 0 and make the transition to state q1 which is
an “accepting state” (marked by double circle). We say that the input string
1010 is accepted by the automaton M1. It is not difficult to reason that M1
accepts only all strings described by (0 + 1)⋆0 i.e., all strings ending in 0.
Let us next consider the following automaton M2:
q0 q1
0 1 1
0The automaton M2
Figure 9.2:
We can easily reason that M2 accepts the empty string ǫ and those that
end in 0.
We now consider the following automaton M3 which “solves” a “practical
Chapter 9 Finite Automata 439
problem”—testing for divisibility of a number by 3: The above automaton
s
q0
q1 q2
0, 3, 6, 9
0, 3, 6, 90, 3, 6, 9 2, 5, 8
1, 4, 7
1, 4, 7 1, 4, 7
2, 5, 8
0, 3, 6, 9
2, 5, 8
1, 4, 7 2, 5, 8
Automaton M3−Divisibility by 3 tester
Figure 9.3:
recognizes all strings from Σ = 0, 1, 2, 3, . . . , 9 that are exactly divisible
by 3. Note that at any time, in scanning an input, we are in:
—–state q0, if the partial string scanned (i.e., the number corresponding to
it) so far is congruent to 0 ( mod 3);
—–state q1, if the partial string scanned so far is congruent to 1 ( mod 3);
and
—–state q2, if the partial string scanned so far is congruent to 2 ( mod 3)
Given a machine M , if A is the set of all strings accepted by M , we
say that the language of the machine M is A. This is concisely written as
L(M) = A.
Chapter 9 Finite Automata 440
Exercise 9.4.1:
By trial and error, design a finite automaton that precisely recognizes the
following language A:
A = ω |ω is a string that has an equal number of occurrences of 01 and 10
as substrings.
We now give the definition of a (deterministic) finite state automaton
(DFA).
Definition 9.4.2:
A (deterministic) finite automaton M is a 5-tuple
M = (Q, Σ, δ, s, F ) ,
where
Q is a finite set of states
Σ is a finite alphabet
δ: Q× Σ→ Q is a transition function
s ∈ Q is the start state
F ⊆ Q is the set of accept (or final) states.
Note that δ, the transition function defines the rules for “moving” through
the automaton as depicted pictorially. It can equivalently be described by a
table. If M is in state q and sees input a, it moves to state δ(q, a). No move
on ǫ is allowed; also δ(q, a) is uniquely specified.
Example 9.4.3:
Chapter 9 Finite Automata 441
The following automaton M4 accepts strings ending in 11:
M4 = (Q, Σ, δ, s, F )
where Q = a, b, c, Σ = 0, 1, s = a and F = c. δ is given by the
following table:
δ: 0 1
a a b
b a c
c a c
We now define δ, the “multistep version” of δ:
δ : Q× Σ⋆ → Q
By definition, for any q ∈ Q, δ(q, ǫ) = q and for any string x ∈ Σ⋆ and
symbol a ∈ Σ,
δ(q, xa) = δ(δ(q, x), a
).
Thus the state δ(q, x) is the state the automaton M will end up in, when
started in state q, fed the input x and allowed transitions according to δ.
Note that δ and δ agree on strings of length one:
δ(q, a) = δ(q, ǫa), since a = ǫa
= δ(δ(q, ǫ), a
)= δ(q, a).
Formally, a string x is accepted by the automaton M if δ(s, x) ∈ F and is
rejected by the automaton M if δ(s, x) 6∈ F .
We can now formally define L(M), the language of the machine:
L(M) =x ∈ Σ⋆ | δ(s, x) ∈ F
Chapter 9 Finite Automata 442
Definition 9.4.4:
A language A is said to be regular if A = L(M) for some DFA M .
9.4.1 The Product Automaton and Closure Properties
Given two regular languages A and B, we now show that A ∪B, A ∩B and
∼ A are also regular.
Let A and B be regular languages. Then there must be automata M1
and M2 such that L(M1) = A and L(M2) = B. Let,
M1 = (Q1, Σ, δ1, s1, F1)
M2 = (Q2, Σ, δ2, s2, F2)
To show that A∩B is regular, we have to show that there exists an automaton
M3 such that L(M3) = A∩B. We claim that M3 can be constructed as given
below:
Let M3 = (Q3, Σ, δ3, s3, F3) , where
Q3 = Q1 ×Q2 = (p, q) | p ∈ Q1 and q ∈ Q2
F3 = F1 × F2 = (p, q) | p ∈ F1 and q ∈ F2
s3 = (s1, s2)
δ3
((p, q), a
)=(δ1(p, a), δ2(q, a)
)
The automaton M3 is called the product of M1 and M2.
The operation of M3 should be intuitively clear. On an input x ∈ Σ⋆, it
will simulate the moves of M1 and M2 simultaneously and reach an accept
state only if x is accepted by both M1 and M2. A formal argument is given
below.
Chapter 9 Finite Automata 443
Lemma 9.4.5:
We first show that for all x ∈ Σ⋆,
δ3 ((p, q), x) =(δ1(p, x), δ2(q, x)
).
The proof is by induction on |x | .
If x = ǫ, δ3 ((p, q), ǫ) = (p, q) =(δ1(p, ǫ), δ2(q, ǫ)
).
Now assume that the lemma holds for x ∈ Σ⋆. We will show that it holds
for xa also where a ∈ Σ.
(δ3(p, q), xa
)= δ3
(δ3 ((p, q), x) , a
)definition of δ3
= δ3
((δ1(p, x), δ2(q, x)
), a)
induction hypothesis
=(δ1
(δ1(p, x), a
), δ2
(δ2(q, x), a
))definition of δ3
=(δ1 (p, xa)) , δ2 (q, xa)
)definition of δ1 and δ2
We now show that L(M3) = L(M1) ∩ L(M2).
For all x ∈ Σ⋆,
x ∈ L(M3)
⇔ δ3(s3, x) ∈ F3 definition of acceptance
⇔ δ3 ((s1, s2), x) ∈ F1 × F2 definition of s3 and F3
⇔(δ1(s1, x), δ2(s2, x)
)∈ F1 × F2 lemma 9.4.5
⇔ δ1(s1, x) ∈ F1 and δ2(s2, x) ∈ F2 definition of set product
⇔ x ∈ L(M1) and x ∈ L(M2) definition of acceptance
⇔ x ∈ L(M1) ∩ L(M2) definition of intersection
Chapter 9 Finite Automata 444
We next argue that regular sets are closed under complementation. To do
this, we take the DFA M , where L(M) = A, and interchange the set of accept
and reject states. The modified automaton accepts a string x exactly when
x is not accepted by M . So we have constructed the modified automaton to
accept ∼ A.
By one of the DeMorgan’s laws,
A ∪B = ∼ (∼ A∩ ∼ B)
Therefore, the regular languages are closed under union also.
In the sequel, we shall give only intuitively appealing informal correctness
arguments rather than rigorous formal proofs as done above.
9.5 Nondeterministic Finite Automata
9.5.1 Nondeterminism
A DFA can be made more “flexible” by adding a feature called nondetermin-
ism. This feature allows the machine to change states that is only partially
determined by the current state and input symbol. That is, from a current
state and input symbol, the next state can be any one of the several possible
legal states. Also the machine may change its state by making an ǫ-transition
i.e., there may be transition from one state to another without reading any
new input symbol. It however turns out that every nondeterministic finite
automaton (NFA) is equivalent to a DFA.
An NFA is said to accept its input string x if it is possible to start in the
start state and scan x, moving according to the transition rules making a
series of choices (for moving to next state) that eventually leads to one of the
Chapter 9 Finite Automata 445
accepting states when the end of x is reached. Since there are many choices
for going to the next state, there may be many paths through the NFA in
response to the input x—some may lead to accept states and some may lead
to reject states. The NFA is said to accept x if at least one computation
path on x starting from the start state leads to an accept state. It should be
noted that in the NFA model itself there is no mechanism to determine as to
which transition to make in response to a next symbol from the input. We
illustrate these ideas with an example. Consider the following automaton
N1:
q0 q1 q2 q3
0, 1
1 0, 1 0, 1
The automaton N1
Figure 9.4:
In the automaton N1, from the state q0 there are two transitions on the
symbol 1. So N1 is indeed nondeterministic. It is easy to reason that N1
accepts all strings over 0, 1⋆ that contain a 1 in the third position from
the end.
On an input, say, 01110100, a computation can be such that we always
stay in state q0. But the computation where we stay in state q0 till we read
the first five symbols and then move to states q1, q2 and q3 on reading the
next three symbols accepts the string 01110100. Therefore N1 accepts the
string 01110100.
Chapter 9 Finite Automata 446
9.5.2 Definition of NFA
The definition is very similar to that of a DFA except that we need to describe
the new way of making transitions. In an NFA the input to the transition
function is a state plus an input symbol or the empty string; the transition
is to a set of possible next (legal) states.
Let P(Q) be the power set of Q. Let Σǫ denote Σ∪ǫ, for any alphabet
Σ. The formal definition is as follows:
Definition 9.5.1:
A nondeterministic finite automaton is a 5-tuple (Q, Σ, δ, s, F ), where
Q is a finite set of states
Σ is a finite alphabet
δ: Q× Σǫ → P(Q) is the transition function
s ∈ Q is the start state and
F ⊆ Q is the set of accept states.
We can now formally state the notion of computation for an NFA. Let
N = (Q, Σ, δ, s, F ) be an NFA and let w be a string over the alphabet Σ.
Then we say that N accepts w if we can write w as w1w2 · · ·wk, where each
wi, 1 ≤ i ≤ k, is in Σǫ and there exists a sequence of states q0, q1, . . . , qk,
0 ≤ i ≤ k such that,
(i) q0 = s
(ii) qi+1 ∈ δ (qi, wi+1), for i = 0, . . . , k − 1
(iii) qk ∈ F
Condition (i) states that the machine starts in its start state. Condition (ii)
Chapter 9 Finite Automata 447
states that on reading the next symbol in any current state the next state is
any one of the allowed legal states. Condition (iii) states that the machine
exhausts its input to end up in any one of the final states.
9.6 Subset Construction—Equivalence of DFA
and NFA
Definition 9.6.1:
Two automata M and N are said to be equivalent if L(M) = L(N).
A fundamental fact is that both deterministic and non-deterministic finite
automata recognize the same class of languages and are therefore equivalent.
Thus, corresponding to an NFA there exists a DFA—the DFA may require
more states. The equivalence can be proved using what is known as the
subset construction, which can be intuitively described as given below.
Let N be the given NFA. We start with a pebble in the start state of N .
We next track the moves in N using its transition function. Assume that we
have scanned y, a prefix of the input string and that the next symbol is b. We
now make all b moves (as indicated by δ) from every (current) state where we
have a pebble to each destination state to which we can move on b or on ǫ. In
each of these destination states, we place a pebble and we remove the pebbles
off the current states. Let P be the set of current states and P ′ be the set of
destination states. Then we can build a transition function for a DFA which
has P as one state and which moves to the state P ′ on seeing b from state
P . This intuitive idea can be precisely refined and captured. We first need
a preliminary computation before we describe the subset construction.
Chapter 9 Finite Automata 448
Let ǫ-CLOSURE(q), q ∈ Q be the states of N built by applying the
following rules:
(i) q is added to ǫ-CLOSURE(q);
(ii) If r1 is in ǫ-CLOSURE(q) and there is an edge, labelled ǫ from r1
to r2, then r2 is also added to ǫ-CLOSURE(q) if r2 is not already
there. Rule (ii) is repeated until no more new states can be added
to ǫ-CLOSURE(q).
Thus ǫ-CLOSURE(q) is the set of states that can be reached from q on
one or more ǫ-transitions alone. If T is a set of states, then we define ǫ-
CLOSURE(T ) as the union over all states q in T of ǫ-CLOSURE(q). The
procedure given in Fig. 9.1 illustrates the computation of ǫ-CLOSURE(T )
using a stack. begin
push all states of T onto STACK;
ǫ-CLOSURE(T) = T;
while (STACK not empty) do
begin
pop q, top element of STACK, off STACK;
for (each state r with an edge from q to r labelled ǫ) do
if (r is not in ǫ-CLOSURE (T)) then
begin add r to ǫ-CLOSURE (T);
push r onto STACK
end
end
end
Chapter 9 Finite Automata 449Figure 9.1: Computing ǫ-CLOSURE
We now give the subset construction procedure. That is, we are given an
NFA N ; we are required to construct a DFA D equivalent to N .
Initially let ǫ-CLOSURE (s) be a state (the start state) of D, where s is
the start state of N . We assume that initially each state of D is “unmarked”.
We now execute the following procedure. begin
while(there is an unmarked state q = (r1, r2, . . . rn) of D) do
begin mark q;
for (each input symbol a ∈ Σ) do
begin
let T be the set of states to which there is
a transition on a from some state ri in q;
x = ǫ-CLOSURE (T);
if (x has not yet been added to the set of
states of D)
then make x an unmarked state of D;
Add a transition from q to x labelled a
if not already present;
end
end
endFigure 9.2:
Chapter 9 Finite Automata 450
We thus have the following theorem.
Theorem 9.6.2:
Every NFA has an equivalent DFA.
Thus NFAs give an alternative characterization of regular languages: a
language is regular if and only if some NFA recognizes it.
Exercise 9.6.3:
The following example illustrates the fact that given an NFA with ǫ-transitions,
we can get another NFA with no ǫ-transitions.
q0
0 1
ǫq1
ǫ
q2
0
q3
0
q0 0 0q1 q3
0 0
q2
0
0 1
1
0
Figure 9.5:
Argue that the NFAs given above are equivalent.
Exercise 9.6.4:
Argue that both the following NFAs accept the language (01⋆ + 0⋆1).
Chapter 9 Finite Automata 451
ǫ
q0 q10
1
p0 p11
0p0 p1
q1q00
1
101
0
Figure 9.6:
Exercise 9.6.5:
Consider the following NFA which accepts all strings ending in 01 (interpreted
as a binary integer, the NFA accepts all integers of the form 4x+ 1, x = any
integer).
q0 q1 q20 1
0, 1
Figure 9.7:
Show that by applying the subset construction procedure we get the following
equivalent DFA.
q0 q0, q1 q0, q20 1
1 0
10
Figure 9.8:
Chapter 9 Finite Automata 452
Exercise 9.6.6:
Consider the following NFA which accepts all strings of 0s and 1s such that
the nth symbol from the end is 1.
q0 q1 q21 0, 1
0, 1
0, 1b b b
0, 1qn−1
0, 1qn
Figure 9.9:
Argue that the above NFA is a bad case for subset construction.
9.7 Closure of Regular Languages Under Con-
catenation and Kleene Star
We first show that the class of regular languages is closed under the con-
catenation operation. Given two regular languages A and B, let N1 and N2
be two NFAs such that L(N1) = A and L(N2) = B. From N1 and N2 we
construct a new NFA N , to recognize AB, as suggested by the following
figure.
Chapter 9 Finite Automata 453
s
s1 s2
ǫ
ǫ
ǫ
N
N1 N2
Figure 9.10:
The key idea is that, we make the start state s1 of N1 as the start state
s of N . From each accept state of N1 we make an ǫ-transition to the start
state s2 of N2. The accept states of N are the accept states of N2 only.
Let N1 = (Q1, Σ, δ1, s1, F1)
N2 = (Q2, Σ, δ2, s2, F2)
We construct N to recognize AB as follows:
N = (Q, Σ, δ, s1, F2)
where
(i) Q = Q1 ∪Q2
(ii) We define δ so that for any q ∈ Q and a ∈ Σǫ
Chapter 9 Finite Automata 454
δ(q, a) =
δ1(q, a) if q ∈ Q1 and q 6∈ F1
δ1(q, a) if q ∈ F1 and a 6∈ ǫ
δ1(q, a) ∪ s2 if q ∈ F1 and a = ǫ
δ2(q, a) if q ∈ Q2
Finally we show that the class of regular languages is closed under the
star operation.
Given a regular language A, we wish to prove that the language A⋆ is also
regular. Let N be an NFA such that L(N) = A. We modify N , as suggested
by the following figure, to build an NFA N ′:
s
N N ′
ǫs′ ǫ
ǫ
Figure 9.11:
The NFA N ′ accepts any input that can be broken into several pieces
and each piece is accepted by N . In addition, N ′ also accepts ǫ which is a
member of A⋆.
Let N = (Q, Σ, δ, s, F ) be an NFA that recognizes A. We construct N ′
to recognize A⋆ as follows:
N ′ = (Q′, Σ, δ′, s′, F ′)
Chapter 9 Finite Automata 455
Q′ = Q ∪ s′
F ′ = F ∪ s′
For any q ∈ Q and a ∈ Σǫ, we define
δ′(q, a) =
δ(q, a) if q ∈ Q and a 6∈ F
δ(q, a) if q ∈ F and a 6= ǫ
δ(q, a) ∪ s if q ∈ F and a ∈ ǫ
s if q = s′ and a = ǫ
φ if q = s′ and a 6= ǫ
We thus have the following theorem:
Theorem 9.7.1:
The class of languages accepted by finite automaton is closed under intersec-
tion, complementation, union, concatenation and Kleene star.
Proof. Follows from Subsection 9.4.1 and the above discussions.
9.8 Regular Expressions and Finite Automata
Although regular expressions and finite automata appear to be quite differ-
ent, they are equivalent in their descriptive power. Thus any regular expres-
sion can be converted to a finite automaton and given any finite automaton,
we can get a regular expression capturing the language recognized by it.
Chapter 9 Finite Automata 456
Lemma 9.8.1:
If a language A is described by a regular expression R, then it is regular;
that is, there is an NFA to recognize A.
Proof. Given a regular expression R, we can construct an equivalent NFA N
by considering the following four cases:
1. R = φ. Then L(R) = φ and the following NFA recognizes L(R).
Figure 9.12:
2. R = ǫ. The L(R) = ǫ and the following NFA recognizes L(R).
Figure 9.13:
3. R = a, for some a ∈ Σ. The L(R) = a and the following NFA
recognizes L(R):
a
Figure 9.14:
4. R can be one of the following (R1 and R2 are regular expressions):
i) R1 +R2
ii) R1 ·R2
Chapter 9 Finite Automata 457
iii) R⋆1
For this case we simply construct the equivalent NFA following the techniques
used in proving closure properties (union, catenation and Kleene star) of
regular languages.
The following example will provide a concrete illustration:
Example 9.8.2:
Consider the regular expression (ab+aab)⋆. We proceed to construct an NFA
to recognize the corresponding language:
Step 1 We construct NFAs for a, b :
a b
Figure 9.15:
Step 2 We use the above NFAs and construct NFAs for ab and aab.
a ǫ b
a ǫ a ǫ b
Figure 9.16:
Step 3 Using the above NFAs for ab and aab, we construct an NFA for
ab+ aab.
Chapter 9 Finite Automata 458
ǫ
ǫ
a ǫ b
a ǫ a ǫ b
Figure 9.17:
Step 4 From the above NFA for ab + aab, we build the required NFA for
(ab+ aab)⋆.
ǫǫ
ǫ
a ǫ b
a ǫ a ǫ b
ǫ
ǫ
Figure 9.18:
Lemma 9.8.3:
If a language is regular (that is, it is recognized by a finite automaton), then
it is described by a regular expression.
Proof. Let L ⊆ Σ⋆ be accepted by a DFA M = (Q, Σ, δ, s, F ). Then L =x ∈ Σ⋆ | δ(s, x) ∈ F
. Let F = q1, q2, . . . , qk. For every qi ∈ F we can
consider the language Lqias,
Lqi=x ∈ Σ⋆ | δ(s, x) = qi
Chapter 9 Finite Automata 459
Then L = Lq1 ∪ Lq2 ∪ · · · ∪ Lqk.
If Lqiis described by a regular expression rqi
then L can be described by
the regular expression, rq1 + rq2 + · · · + rqk. (recall: finite unions of regular
sets are regular).
For any two states p, q ∈ Q, let L(p, q) be the language defined as,
L(p, q) =x ∈ Σ⋆ | δ(p, x) = q
It is therefore sufficient to show that L(p, q) is described by a regular expres-
sion.
We now look at the number of distinct states through which M passes
in moving from state p to state q. It turns out that it is more convenient to
consider, for each k, a specific set of k states and to consider the set of strings
that cause M to go from state p to state q by going through only states in
that set. If k is large enough, this set of strings will be all of L(p, q).
Let us relabel the states of M using integers 1 through n, where n = |Q|.
For a string x ∈ Σ⋆, let x represent a path from state p to state q going
through state t is there exist strings y, z (both 6= ǫ) such that,
x = yz and δ(p, y) = t δ(t, z) = q
Now, for j ≥ 0, let
L(p, q, j) =set of strings corresponding to paths from state p to
state q that go through no state numbered higher than j.
We have, L(p, q) = L(p, q, n), because no string in L(p, q) can go through a
state numbered higher than n for there are no states numbered greater than
n.
Chapter 9 Finite Automata 460
The problem now is to show that L(p, q, n) can be described by a regular
expression.
(This will be true if we can show that L(p, q, j) is described by a regular
expression for every j with 0 ≤ j ≤ n).
We proceed to do this by induction. For the basis step, we show that
L(p, q, 0) can be described by a regular expression. By definition, L(p, q, 0)
represents sets of strings corresponding to paths from state p to state q going
through no state numbered higher than 0—this means going through no state
at all, which means that the string can contain no more than one symbol.
Therefore,
L(p, q, 0) ⊆ Σ ∪ ǫ
More explicitly,
L(p, q, 0) =
a ∈ Σ | δ(p, q) = q if p 6= q
a ∈ Σ | δ(p, a) = p ∪ ǫ if p = q.
It is easy to reason that L(p, q, 0) can be described by a regular expression.
The induction hypothesis is that, for some k ≥ 0 and for every states p, q
where 0 ≤ p, q ≤ n, the language L(p, q, k) can be described by a regular
expression. Assuming the hypothesis, we wish to show that for every p, q
in the same range, the language L(p, q, k + 1) can also be described by a
regular expression. We note that for k ≥ n, L(p, q, k + 1) = L(p, q, k) and
so we assume k < n.
By definition, a string x ∈ L(p, q, k+ 1) if it represents a path from p to
q that goes through no state numbered higher than k + 1. This is possible
in two ways. First, the path does not go higher than k—so, x ∈ L(p, q, k).
Second, the path goes through the state k + 1 (and nothing higher). In this
Chapter 9 Finite Automata 461
case, in general, it goes from state p to the state k + 1 (for the first time),
then possibly loops from state k + 1 back to itself (zero or more times), and
then from the state k+ 1 to the state q. This means, we can write x as yzw,
where y corresponds to the path from state p to the first visit of state k+ 1,
z corresponds to the looping in state k + 1, and w corresponds to the path
from state k + 1 to state q. We note that in each of the two parts y and w
and in each of the looping making z, the path does not go through any state
higher than k. Therefore,
y ∈ L(p, k + 1, k), w ∈ L(k + 1, q, k), z ∈ L(k + 1, k + 1, k)⋆
Considering both the above cases, we have,
x ∈ L(p, q, k) ∪ L(p, k + 1, k)L(k + 1, k + 1, k)⋆L(k + 1, q, k)
Since x is any string in L(p, q, k + 1), we can write,
L(p, q, k + 1) = L(p, q, k) ∪ L(p, k + 1, k)L(k + 1, k + 1, k)⋆L(k + 1, q, k)
The expression on the right hand side above can be described by a regular
expression if the individual languages in it can be described by regular ex-
pressions (this is so, by induction hypothesis). Therefore L(p, q, k + 1) can
be described by a regular expression.
Theorem 9.8.4:
A language is regular if and only if some regular expression describes it.
Proof. Follows from lemmas 9.8.1 and 9.8.3 above.
Chapter 9 Finite Automata 462
9.9 DFA State Minimization
Consider the regular expression (a+ b)⋆abb. If we construct the NFA for this
and apply subset construction algorithm, we will get the following DFA:
A B
D
C
E
b
b
aa
a
b baa
b
Figure 9.19:
The above DFA has five states. The following DFA with only four states
also accepts the language described by (a+ b)⋆abb:
Aa
b
b
B Cb
Da
b
a a
Figure 9.20:
The above example suggests that a given DFA can possibly be simplified
to give an equivalent DFA with a lesser number of states.
We now give an algorithm that gives a general method of reducing the
Chapter 9 Finite Automata 463
number of states of a DFA. Let the given DFA be,
M = (Q, Σ, δ, s, F )
We assume that from every state there is a transition on every input (if q is
a state not conforming to this then introduce a new “dead state” d from q
to d on the inputs that are not already present; also add transitions from d
to d on all inputs).
We say that a string w distinguishes a state q1 from a state q2 if δ(q1, w) ∈
F and δ(q2, w) 6∈ F or vice versa.
The minimization procedure works on M by finding all groups of states
that can be distinguished by some input string. Those groups of states that
cannot be distinguished are then merged to form a single state for the entire
group. The algorithm works by keeping a partition of Q such that each group
of states consists of states which have not yet been distinguished from one
another and such that any pair of states chosen from different groups have
been found distinguishable by some input.
Initially the two groups are F and Q\F . The fundamental step is to take
a group of states, say A = q1, q2, . . . , qk and some input symbol a ∈ Σ and
consider δ(qi, a) for every qi ∈ A. If these transitions are to states that fall
into two or more different groups of the current partition, then we must split
A into subsets so that the transitions from the subsets of A are all confined
to a single group of the current partition. For example, let δ(q1, a) = t1 and
δ(q2, a) = t2 and let t1 and t2 be in different groups. Then we must split A
into at least two subsets so that one subset contains q1 and another contains
q2. Note that t1 and t2 are distinguished by some string w and so q1 and q2
are distinguished by the string aw.
Chapter 9 Finite Automata 464
The algorithm repeats the process of splitting groups until no more groups
need to be split. The following fact can be formally proved: If there exists a
string w such that δ(q1, w) ∈ F and δ(q2, w) 6∈ F or vice versa then q1 and
q2 cannot be in the same group; if no such w exists then q1 and q2 can be in
the same group.
Algorithm DFA Minimization
Input: A DFA M = (Q, Σ, δ, s, F ) with transitions defined for all states.
Output: A DFA M ′ such that L(M ′) = L(M).
1. We construct a partition∏
of Q. Initially∏
consists of F and Q \ F
only. We next refine∏
to∏
new, a new partition using procedure Refine-∏
given in Fig. ??. That is,∏
new consists of groups of∏
, each split into one
or more subgroups. If∏
new 6=∏
, we replace∏
by∏
new and repeat the
procedure Refine-∏
. If∏
new =∏
, then we terminate the process of refining∏
.
Let G1, G2, . . . , Gk be the final groups of∏
. procedure Refine-
∏
begin for (each group G of∏) do
begin partition G into subgroups such that two states
q1 and q2 of G are in the same subgroup iff for
all a ∈ Σ, δ(q1, a) and (q2, a) are in the same group of∏;
/* in the worst case, a state will be in a
subgroup by itself */
place all subgroups so formed in∏
new
Chapter 9 Finite Automata 465
end
endFigure 9.3:
2. For each Gi in∏
we pick a representative, an arbitrary state in Gi. The
representatives will be the states of the DFA M ′.
Let qi be the representative of Gi and for a ∈ Σ let δ(qi, a) ∈ Gj. Note
Gj can be same as Gi. Let qj be the representative of Gj. Then in M ′ we
add the transition from qi to qj on a. Let the initial state of M ′ be the
representative of the group containing the initial state of s of M . Also let
the final states of M ′ be the representatives which are in F .
3. If M ′ has a dead state d, then remove d from M ′. Also remove any
state not reachable from the initial state. Any transition from other states
to d become undefined.
Example 9.9.1:
The following DFA recognizes the language described by the regular expres-
sion (0 + 1)⋆10.
Chapter 9 Finite Automata 466
4
5
6
7
2
3
1
0
1
1 0
0
1
01
1
0
1
0
0
1
Figure 9.21:
Applying the above algorithm gives the final partition of states as 1, 2, 4,
5, 3, 7 and 6. The minimized DFA is given below:
1
1
0
0
0 1
1,2,4 5,3,7 6
Figure 9.22:
Section 9.11 will answer the question of the uniqueness of the minimal
DFA.
Chapter 9 Finite Automata 467
9.10 Myhill-Nerode Relations
9.10.1 Isomorphism of DFAs
We begin with the notion of isomorphism of DFAs.
Let D1 and D2 be two DFAs where,
D1 = (Q1, Σ, δ1, s1, F1) and
D2 = (Q2, Σ, δ2, s2, F2)
The DFAs D1 and D2 are said to be isomorphic if there is a 1–1 onto mapping
f : Q1 → Q2 such that,
(i) f(s1) = s2
(ii) f (δ1(p, a)) = δ2 (f(p), a) for all p ∈ Q1 and a ∈ Σ
(iii) p ∈ F1 if and only if f(p) ∈ F2, for all p ∈ Q1
Conditions (i), (ii) and (iii) imply that D1 and D2 are essentially the same
automaton up to renaming of states. Therefore they accept the same input
set. We can show that the minimal state DFA corresponding to set it accepts
is unique up to isomorphism. This can be done by a beautiful correspondence
between a DFA with input alphabet Σ and certain equivalence relations on
Σ⋆.
9.10.2 Myhill-Nerode Relation
Let M be the following DFA with no inaccessible states:
M = (Q, Σ, δ, s, F )
Chapter 9 Finite Automata 468
Let L(M) = R ⊆ Σ⋆. That is, R is the regular set accepted by M .
We define an equivalence relation ≡M on Σ⋆:
For any two strings x, y ∈ Σ⋆, we say x ≡M y if and only if δ(s, x) =
δ(s, y). It is easy to see that ≡M is an equivalence relation on Σ⋆. Thus the
automaton M induces the equivalence relation ≡M on Σ⋆. It is interesting
to note that ≡M also satisfies the following properties:
(a) ≡M is a right congruence:
For any x, y ∈ Σ⋆ and a ∈ Σ, if x ≡M y then xa ≡M ya.
To see this, we assume that x ≡M y. Then
δ(s, xa) =(δ(s, x), a
), by definition
=(δ(s, y), a
), by assumption
= δ(s, ya), by definition
Hence xa ≡M ya.
(b) ≡M refines R:
For any x, y ∈ Σ⋆, if x ≡M y then (x ∈ R⇔ y ∈ R). By definition
x ≡M y means δ(s, x) = δ(s, y), which is either an accept state or a
reject state. Therefore either both x and y are accepted or both x and
y are rejected. Stated another way, every equivalence class induced by
≡M has all its elements in R or none of its elements in R.
(c) ≡M is of its finite index:
That is, the number of equivalence classes induced by ≡M is finite.
Corresponding to each state q ∈ Q, there is exactly one equivalence
Chapter 9 Finite Automata 469
class given by,x ∈ Σ⋆ | δ(s, x) = q
The number of states is finite and hence the number of equivalence
classes is also finite.
Definition 9.10.1:
An equivalence relation ≡M on Σ⋆ is a Myhill-Nerode relation for R, a regular
set, if it satisfies the properties (a), (b) and (c) above; that is, ≡M is a right
congruence of finite index, refining R.
The interesting fact about the definition above is that it characterises ex-
actly the relations on Σ⋆ that are ≡M for some automaton M . That is, we
can construct M from ≡M using only the fact that ≡M is a Myhill-Nerode
relation.
9.10.3 Construction of the DFA from a given Myhill-
Nerode Relation
We first show the construction of the DFA M≡, for a given R ⊆ Σ⋆, from
any given Myhill-Nerode relation ≡ for R. We will assume that R is not
necessarily regular. Given any string x, its equivalence class is defined by,
[x] = y | y ≡ x
Note that there are indefinitely many strings, but there are only finitely many
equivalence classes by property (c) above.
We now define the DFA M≡ = (Q, Σ, δ, s, F ) as follows:
Q = [x] |x ∈ Σ⋆
Chapter 9 Finite Automata 470
s = [ǫ]
F = [x] |x ∈ R
– well defined because x ∈ R iff [x] ∈ F by property (b)
δ ([x], a) = [xa] – well defined by property (a)
We now have to show that L(M≡) = R.
Lemma 9.10.2:
δ ([x], y) = [xy]
Proof. The proof is by induction on [y]. For the basis, we note that,
δ ([x], ǫ) = [x] = [xǫ]
In the induction step, we consider δ ([x], ya). By definition,
δ ([x], ya) = δ(δ ([x], y) , a
)
= δ ([xy], a) , using induction hypothesis
= [xyz]
Theorem 9.10.3:
L(M≡) = R
Proof. Let x be any string recognized by M≡. We will argue that x is in R.
Now, x ∈ L(M≡)
⇔ δ ([ǫ], x) ∈ F, definition of acceptance
Chapter 9 Finite Automata 471
⇔ [x] ∈ F using lemma 9.10.2
⇔ x ∈ F, by definition of F
9.10.4 Myhill-Nerode Relation and the Corresponding
DFA
We have done the following constructions:
(i) given M , we have defined ≡M , a Myhill-Nerode relation.
(ii) given ≡ (a Myhill-Nerode relation), we have defined M≡.
We will now show that the constructions (i) and (ii) are inverses up to iso-
morphism of automata.
Lemma 9.10.4:
Let ≡ be a Myhill-Nerode relation for R ⊆ Σ⋆. Let M≡ be the corresponding
DFA. From M≡ if we now define the corresponding Myhill-Nerode relation,
say ≡M≡ , it is identical to ≡.
Proof. Given≡, let the corresponding DFA beM≡, defined asM≡ = (Q, Σ, δ, s, F ).
Then, for any two strings x, y ∈ Σ⋆, by definition,
x≡M≡ y ⇔ δ (s, x) = δ (s, y)
⇔ δ ([ǫ], x) = δ ([ǫ], y)
⇔ [x] = [y] by lemma 9.10.2
⇔ x ≡ y.
Hence ≡M≡ is identical to ≡.
Chapter 9 Finite Automata 472
Lemma 9.10.5:
Let the DFA for R be M , with no inaccessible states. Let the corresponding
Myhill-Nerode relation be ≡M . From ≡M if we construct the corresponding
DFA say M≡M, it is isomorphic to M .
Proof.
Let M = (Q, Σ, δ, s, F ) and
let M≡M= (Q′, Σ, δ′, s′, F ′)
By construction, for the DFA M≡M
[x] = y | y ≡M x
=y | δ(s, y) = δ(s, x)
Q′ = [x] |x ∈ Σ⋆
s′ = [ǫ]
F ′ = [x] |x ∈ R
δ′ ([x], a) = [xa]
We now have to show that M≡Mand M are isomorphic under the map:
f : Q′ → Q where f ([x]) = δ (s, x)
By the definition of ≡M ,
[x] = [y] iff δ (s, x) = δ (s, y) .
So the map of f is well-defined on the equivalence classes induced by ≡M .
Also f is 1–1. Since M has no inaccessible states f is onto.
Chapter 9 Finite Automata 473
To argue that f is an isomorphism we need to show that,
(i) f(s′) = s
(ii) f(δ ([x], a)
)= δ(f([x]), a
)and
(iii) [x] ∈ F ′ iff f ([x]) ∈ F
We show these as follows:
(i) f(s′) = f ([ǫ]) , definition of s′
= δ (s, ǫ) definition of f
= s, definition of δ
ii) f(δ ([x], a)
)= f ([xa]) , definition of δ′
= δ (s, xa) , definition of f
= δ(δ (s, x) , a
), definition of δ
= δ(f([x]), a
), definition of f
(iii) [x] ∈ F ′ iff x ∈ R, definition of acceptance and property (b)
iff δ (s, x) ∈ F, since L(M) = R
iff f ([x]) ∈ F, definition of f.
We thus have the following theorem.
Theorem 9.10.6:
Let Σ be a finite alphabet. Up to isomorphism of automata, there is a 1–1
correspondence between DFA (with no inaccessible states) over Σ accepting
R ⊆ Σ⋆ and Myhill-Nerode relations for R on Σ⋆.
Theorem 9.10.6 implies that we can deal with regular sets and finite
automata in terms of a few simple algebraic properties.
Chapter 9 Finite Automata 474
9.11 Myhill-Nerode Theorem
9.11.1 Notion of Refinement
Definition 9.11.1:
A relation r1 is said to refine another relation r2 if r1 ⊆ r2, considered as sets
of ordered pairs. That is, r1 refines r2 if for all x, y if x r1 y holds then x r2 y
also holds.
For equivalence relations ≡1 and ≡2, this means that for every x, the
≡1-class of x is included in the ≡2-class of x.
For example, the equivalence relation i ≡ j mod 6 on the integers refines
the equivalence relation i ≡ j mod 3.
We will now show that there exists a coarsest Myhill-Nerode relation ≡R
for any given regular set R; that is, any other Myhill-Nerode relation for R
refines ≡R. The relation ≡R corresponds to the unique minimal DFA for R.
Property (b) of the definition of Myhill-Nerode relations says that a
Myhill-Nerode relation ≡ for R refines the equivalence relation with equiv-
alence classes R and Σ \ R. The relation of refinement between equivalence
relations is a partial order:
• it is reflexive: every relation refines itself.
• it is antisymmetric: if ≡1 refines ≡2 and if ≡2 refines ≡1 then ≡1 and
≡2 are the same relation.
• it is transitive: if ≡1 refines ≡2 and if ≡2 refines ≡3 then ≡1 refines ≡3.
Note that, if ≡1 refines ≡2, then ≡1 is the finer and ≡2 is the coarser of
the two relations. The identity relation on any set S, (x, x) |x ∈ S is the
Chapter 9 Finite Automata 475
finest equivalence relation on S. Also the universal relation on any set S,
(x, y) |x, y ∈ S is the coarsest equivalence relation on S.
9.11.2 Myhill-Nerode Theorem and Its Proof
Let R ⊆ Σ⋆, regular or not. The equivalence relation ≡R on Σ⋆ is defined in
terms of R as follows:
For any x, y, z ∈ Σ⋆, x≡R y if and only if (xz ∈ R⇔ yz ∈ R).
That is two strings x and y are equivalent under ≡R if whenever we
append any other string z to both of them, the resulting strings xz and yz
are either both in R or both not in R. It is easy to reason that ≡R is an
equivalence relation for any R. The following lemma 9.11.2 will show that
≡R for any R (regular or not) satisfies the properties (a) and (b) of Myhill-
Nerode relations and it is the coarsest such relation on Σ⋆. In case R is
regular, ≡R is of finite index and is therefore a Myhill-Nerode relation for
R. In fact, it is the coarsest possible Myhill-Nerode relation for R and it
corresponds to the unique minimal finite automaton for R.
Lemma 9.11.2:
Let R ⊆ Σ⋆, be any set, regular or not. Let the relation ≡R be defined as
follows:
For any x, y, z ∈ Σ⋆, x≡R y if and only if (xz ∈ R⇔ yz ∈ R).
Then ≡R is a right congruence refining R and is the coarsest such relation
on ≡R.
Proof. We first show that ≡R is a right congruence:
Chapter 9 Finite Automata 476
taking z = aw, a ∈ Σ, w ∈ Σ⋆ in the definition of ≡R we have,
x≡R y ⇒ ∀a ∈ Σ, ∀w ∈ Σ⋆ (xaw ∈ R⇔ yaw ∈ R)
⇒ ∀a ∈ Σ (xa≡R ya)
We next show that ≡R refines R. Taking z = ǫ in the definition of R,
x≡R y ⇒ (x ∈ R⇔ y ∈ R) .
We now show ≡R is the coarsest such relation. That is any other equivalence
relation ≡ satisfying property (a) and (b) refines ≡R:
x ≡ y ⇒ ∀z ∈ Σ⋆ (xz ≡ yz) , by induction on |z| and using property (a)
⇒ ∀z ∈ Σ⋆ (xz ∈ R⇔ yz ∈ R) , by property (b)
⇒ x≡R y, by definition of ≡R.
We are now ready to state and prove the Myhill-Nerode Theorem.
Theorem 9.11.3 (Myhill-Nerode Theorem):
Let R ⊆ Σ⋆. The following statements are equivalent:
(i) R is regular.
(ii) there exists a Myhill-Nerode relation for R.
(iii) the relation ≡R is of finite index.
Proof. We first show (i) ⇒ (ii): Given a DFA M for R (because R is reg-
ular) we can construct ≡R, a Myhill-Nerode relation for R (as shown in
Section 9.11.2).
Chapter 9 Finite Automata 477
We next show (ii)⇒ (iii): By lemma 9.11.2, any Myhill-Nerode relation
R is of finite index and refines ≡R; therefore ≡R is of finite index.
We finally show (iii) = (i): If ≡R is of finite index, then it is a Myhill-
Nerode relation for R and given ≡R we can construct the corresponding DFA
M≡Rfor R.
Since ≡R is the unique coarsest Myhill-Nerode relation for R, a regular
set, it corresponds to the DFA for R with the fewest states among all DFAs
for R.
9.12 Non-regular Languages
9.12.1 An Example
We now show that the power of finite automata is limited. Specifically we
show that there exist languages that are not regular.
Consider the set A given by,
A = anbn |n ≥ 0 = ǫ, ab, aabb, aaabbb, . . .
The intuitive argument to show that there exists no DFA M such that
L(M) = A goes thus: if such an M exists then M has to remember when
passing through the centre point between the a’s and the b’s and how many
a’s it has seen. It has to do this for arbitrarily long strings anbn (n may
be arbitrarily large), much larger than the number of states. This is an un-
bounded amount of information and it is not possible to remember this with
only finite memory.
The formal argument is given below:
Chapter 9 Finite Automata 478
Assume that A is regular and assume that a DFA M exists such that
L(M) = A. Let k be the number of states of this assumed DFA M . Consider
the action of M on the input anbn, where n ≫ k. Let the start state be s
and let the machine reach a final state r after scanning the input string anbn:
aaaaaaaaaaaaaaaa︸ ︷︷ ︸ bbbbbbbbbbbbbbb︸ ︷︷ ︸
s r
n n
Figure 9.23:
Since n ≥ k, by pigeon hole principle, there must exist some state, say
p that M must enter more than once while scanning the input. We break
the string anbn into three pieces u, v, w where v is the string of a’s scanned
between the two occurrences of entry into state p. This is depicted below:
aaaaaa︸ ︷︷ ︸ aaaaa︸ ︷︷ ︸ aaaa bbbbbbbbbbbbbbb︸ ︷︷ ︸
s rp p
u v w
Figure 9.24:
Note that we have assumed |v| > 0. We then have
δ(s, u) = p δ(p, v) = p δ(p, w) = r ∈ F
We now show that the substring v can be deleted and the resulting string
will still be erroneously accepted:
δ(s, uw) = δ(δ(s, u), w
)= δ(p, w) = r ∈ F
The acceptance is erroneous because after deleting v, the number of a’s in the
resulting string is strictly less than the number of b’s. This is a contradiction
Chapter 9 Finite Automata 479
and the assumption there exists a DFA M to recognize A is wrong. In other
words, A is not regular. Note that we could also insert (“pump in”) extra
copies of v and the resulting string would be erroneously accepted.
We formalize the idea used in the above example in the form of a theorem
called the “pumping lemma”.
9.12.2 The Pumping Lemma
The theorem commonly known as “pumping lemma” is about a special prop-
erty of regular languages. Languages that are not regular lack the property.
The property states that each string (of a regular language) contains a sec-
tion that can be “repeated any number of times” with the resulting string
still remaining in the language.
Theorem 9.12.1 (Pumping Lemma):
Let A be a regular set. Then there is a number p (the pumping length) such
that for any string w ∈ A with |w| ≥ p, w may be divided into three parts,
as w = xyz satisfying the following conditions:
(i) xyiz ∈ A, i ≥ 0
(ii) |y| > 0 and
(iii) |xy| ≤ p
Proof. We start with an informal approach to the proof.
Let M = (Q, Σ, δ, s, F ) such that L(M) = A. Let us take p to be |Q|,
the number of states. If for all w ∈ A it happens that |w| < p, then the
theorem is vacuously true—because the three conditions should only for all
Chapter 9 Finite Automata 480
strings of length at least p. Let w ∈ A be such that |w| = n ≥ p. Consider
the sequence of states (starting with s) that M goes through in accepting
w. It starts in state s, then goes to say q3, then say q21, then say q10 and
so on till it reaches a find state say q14 at the end of the last symbol in w.
Since |w| = n, the sequence of states q3, q21, q10, . . . , q14 has length n + 1.
Also as n ≥ p, n + 1 is greater than p, the number of states of M . By the
pigeon hole principle, the sequence of states must contain a repeated state.
The following figure depicts the situation—state q10 is repeated:
w = w1 w2 w3 w4 w5 b b b wn
s q3 q21 q10 q10 q14b b b
Figure 9.25:
We divide w into three parts thus: x is the part of w appearing before
the first occurrence of q10; y is the part of w between the two occurrences of
q10; z is the remaining part of w, coming after the second occurrence of q10.
Therefore,
δ(s, x) = q10 δ(q10, y) = q10 δ(q10, z) = q14
Let us “pump in” a copy of y into w—we get the string xyyz. The extra
copy of y will start in state q10 and will end in q10. So z will still start in
state q10 and will go to the accept state q14. Thus xyyz will be accepted.
Similar reasoning shows that the string xyiz, i > 0 will be accepted. It is
easy to reason that xz is also accepted. Thus condition (i) is satisfied.
Since y is the part occurring between the two different occurrences of
state q10, |y| > 0 and so the condition (ii) is satisfied.
To get condition (iii), we make sure that q14 is the first repetition in the
Chapter 9 Finite Automata 481
sequence. By pigeon hole principle, the first p+1 states in the sequence must
contain a repetition. Therefore |xy| ≤ p.
We now formalize the above ideas.
As before, let w = w1w2w3 · · ·wn, n ≥ p. Let r1(= s), r2, r3, . . . , rn+1
be the sequence of states that M enters while processing w, so that ri+1 =
δ(ri, wi), 1 ≤ i ≤ n. This sequence of states has length n+1 which is at least
p+ 1. By the pigeon hole principle there must be two identical states among
the first p+ 1 states in the sequence. Let the first occurrence of the repeated
state by ra and the second occurrence of the same state by rb. Because ra
occurs among the first p+ 1 states starting at r1 we have a ≤ p+ 1. Now let,
x = w1w2 · · ·wa−1
y = wa · · ·wb−1
z = wb · · ·wn
As x takes M from state r1 to ra, y takes M from ra to rb and z takes M from
ra to rn+1, an accept state, M must accept xyiz for i ≥ 0 (see the informal
argument). We know that a 6= b, so |y| > 0. Also a ≤ p+ 1 and so |xy| ≤ p.
Thus all the conditions of the theorem are satisfied.
Bibliography
[1] G. E. Andrews, Number Theory, Hindustan Publishing Corporation (In-dia) Delhi, 1992.
[2] T. M. Apostol, Introduction to Analytic Number Theory, Springer Inter-national Student Edition, Narosa Publishing House, New Delhi, 1989.
[3] R. Balakrishnan and K. Ranganathan, A Textbook of Graph Theory,Springer, New York, 2000.
[4] C. Berge, Priciples of Combinatorics, Academic Press, New York, 1971.
[5] E. R. Berlekamp, Algebraic Coding Theory, revised ed., Algean ParkPress, Laguna Hills, 1984.
[6] B. Bollobas, Modern Graph Theory, Springer-Verlag, New York, 1998.
[7] J. A. Bondy and U. S. R. Murty, Graph Theory with Applications, TheMacMillan Press Ltd., 1976.
[8] G. S. Boolos and R. C. Jeffrey, Computability and Logic, 2/e, CambridgeUniversity Press, Cambridge, 1980.
[9] G. Brassard and P. Bratley, Algorithmics: Theory and Practice,Prentice-Hall, 1988.
[10] C. Chuan-Chong and K. K. Meng, Principles and Techniques in Com-binatorics, World Scientific, Singapore, 1992.
[11] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups,Springer-Verlag, New York, 1988.
[12] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algo-rithms, Prentice Hall of India, Eastern Economy Edition, 2000.
482
BIBLIOGRAPHY 483
[13] M. D. Davis, R. Sigal, and E. J. Weyuker, Computability, Complexityand Languages: Fundamentals of Theoretical Computer Science, 2/e,Academic Press, New York, 1994.
[14] W. Diffie and M. E. Hellman, New Directions in Cryptography, IEEETrans. Inform. Theory 22 (1976), 644–684.
[15] L. Dornhoff and F. E. Hohn, Applied Modern Algebra, Macmillan, NewYork, 1978.
[16] H. M. Edgar, A First Course in Number Theory, Wadsworth PublishingCo., 1988.
[17] H. B. Enderton, A Mathematical Introduction to Logic, Academic Press,New York, 1972.
[18] P. Flajolet, Mathematical Methods in the Analysis of Algorithms andData Structures, Trends in Theoretical Computer Science (E. Borger,ed.), Computer Science Press, 1988.
[19] W. J. Gilbert, Modern Algebra with Applications, Wiley, New York,1976.
[20] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics,Addison-Wesley, Reading, MA, 1989.
[21] W. K. Grassmann and J. P. Tremblay, Logic and Discrete Mathematics,Prentice-Hall, New Jersey, 1996.
[22] G. Gratzer, Lattice Theory, Freeman, San Francisco, 1971.
[23] , General Lattice Theory, Birkhauser, Basel, 1978.
[24] D. H. Greene and D. E. Knuth, Mathematics for the Analysis of Algo-rithms, Birkhauser, Boston, 1982.
[25] F. Harary, Graph Theory, Addison-Wesley, Reading MA, 1971, SecondPrinting.
[26] J. L. Hein, Discrete Structures, Logic and Computability, 2/e, Jones andBartlett, 2002.
[27] M. E. Hellman, The Mathematics of Public-key Cryptogrphy, Sci. Amer.241 (1979), 130–139.
[28] I. N. Herstein, Topics in Algebra, 2/e, Wiley, New York, 1975.
BIBLIOGRAPHY 484
[29] K. Hoffman and R. Kunze, Linear Algebra, Prentice Hall of India, East-ern Economy Edition, New Delhi, 1967.
[30] F. E. Hohn, Applied Boolean Algebra, 2/e, Macmillan, 1970.
[31] W. M. L. Holcombe, Algebraic Automata Theory, Cambridge UniversityPress, Cambridge, 1982.
[32] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory,Languages and Computation, Addision-Wesley, 1979.
[33] J. E. Hopcroft and J. E. Ullman, Formal Languages and their Relationto Automata, Addison-Wesley, 1969.
[34] E. Horowitz, S. Sahini, and S. Rajasekaran, Fundamentals of ComputerAlgorithms, Galgotia Publication Pvt Ltd, New Delhi, 2000.
[35] N. Jacobson, Basic Algebra, 2/e, vol. I & II, Freeman, San Francisco,1985.
[36] D. Jungnickel, Finite Fields; Structure and Arithmetic, Bibliographis-ches Institut, Mannheim, 1993.
[37] D. Kahn, The Codebreakers, Weidenfeld and Nicholson, 1967.
[38] D. E. Knuth, Seminumerical Algorithms, 2/e, The Art of ComputerProgramming, vol. 2, Addison-Wesley, Reading, Mass, 1981.
[39] , Fundamental Algorithms, 3/e, The Art of Computer Program-ming, vol. 1, Addison Wesley Longman, 1997.
[40] N. Koblitz, A Course on Number Theory and Cryptography, Springer–Verleg, 1987.
[41] D. C. Kozen, Automata and Computability, Springer-Verlag, New York,1997.
[42] S. Lang, Algebra, Addison-Wesley, 1993.
[43] H. R. Lewis and C. H. Papadimitriou, Elements of the Theory of Com-putation, Prentice-Hall, Eaglewood Cliffs, 1981.
[44] R. Lidl and G. Pilz, Applied Abstract Algebra, 2/e, Springer–Verlag,1998.
[45] S. MacLane and G. Birkhoff, Algebra, 2/e, Collier Macmillan, 1979.
BIBLIOGRAPHY 485
[46] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-CorrectingCodes, vol. I & II, North-Holland, Amsterdam, 1977.
[47] Neeraj Kayal Manindra Agrawal and Nitin Saxena, Primes is in p,Mathematics Newsletter 555 (2003), 11–19.
[48] J. Martin, Introduction to Languages and the Theory of Computation,McGraw Hill, 2003.
[49] J. Matousek and J. Nesetril, Invitation to Discrete Mathematics, Claren-don Press, Oxford, 1998.
[50] , Invitation to Discrete Mathematics, Clarendon Press, Oxford,1998.
[51] R. J. McEliece, Information Theory and Coding Theory, Addison-Wesley, Reading,Mass, 1977.
[52] E. Mendelson, Introduction to Mathematical Logic, D. Van Nostrand,New York, 1979.
[53] J. L. Mott, A. Kandel, and T. P. Baker, Discrete Mathematics for Com-puter Science, Reston Publishing Co., VA, 1983.
[54] I. Niven and H. S Zuckerman, An Introduction to the Theory of Numbers,???, 2000.
[55] A. M. Odlyuzko, Discrete Logarithms in Finite Fields and Their Cryp-tographic Significance, Advances in Cryptology, Paris 1984 (T. Beth,N. Cot, and I. Ingemarsson, eds.), Lecture Notes in Computer Science,vol. 209, Springer–Verlag, Berlin, 1985, pp. 224–314.
[56] E. S. Page and L. B. Wilson, An Introduction to Computational Com-binatorics, Cambridge University Press, Cambridge, 1979.
[57] W. W. Peterson and E. J. Weldon Jr., Error-Correcting Codes, MITPress, Cambridge,Mass, 1972.
[58] V. Pless, Introduction to the Theory of Error-Correcting Codes, 2/e,Wiley, New York, 1989.
[59] H. J. Ryser, Combinatorial Mathematics, Carus Mathematical Series,no. 14, American Mathematical Association, 19.
[60] R. Sedgewick and P. Flajolet, An Introduction to the Analysis of Algo-rithms, Addison-Wesley, 1996.
BIBLIOGRAPHY 486
[61] M. Sipser, Introduction to Theory of Computation, PWS PublishingCompany, 1997.
[62] G. Szasz, Introduction to Lattice Theory, Academic Press, New York,1963.
[63] B. L. van der Waerden, Modern Algebra, vol. I & II, Ungar, New York,1970.
[64] J. H. van Lint, Introduction to Coding Theory, 2/e, Springer-Verlag,New York, 1992.
[65] J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cam-bridge University Press, 1999.
[66] D. Welsh, Codes and Cryptography, Clarendon Press, New York, 1988.
[67] D. B. West, Introduction to Graph Theory, Prentice-Hall, New Jersey,1996.