INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Transcript of INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
B OA Z BA R A K
I N T RO D U C T I O N TOT H E O R E T I CA LCO M P U T E R S C I E N C E
T E X T B O O K I N P R E PA R AT I O N .AVA I L A B L E O N HTTPS://INTROTCS.ORG
Text available on https://github.com/boazbk/tcs - please post any issues there - thank you!
This version was compiled on Wednesday 26th August, 2020 18:10
Copyright © 2020 Boaz Barak
This work is licensed under a CreativeCommons âAttribution-NonCommercial-NoDerivatives 4.0 Internationalâ license.
To Ravit, Alma and Goren.
Contents
Preface 9
Preliminaries 17
0 Introduction 19
1 Mathematical Background 37
2 Computation and Representation 73
I Finite computation 111
3 Defining computation 113
4 Syntactic sugar, and computing every function 149
5 Code as data, data as code 175
II Uniform computation 205
6 Functions with Infinite domains, Automata, and Regularexpressions 207
7 Loops and infinity 241
8 Equivalent models of computation 271
9 Universality and uncomputability 315
10 Restricted computational models 347
11 Is every theorem provable? 365
Compiled on 8.26.2020 18:10
6
III Efficient algorithms 385
12 Efficient computation: An informal introduction 387
13 Modeling running time 407
14 Polynomial-time reductions 439
15 NP, NP completeness, and the Cook-Levin Theorem 465
16 What if P equals NP? 483
17 Space bounded computation 503
IV Randomized computation 505
18 Probability Theory 101 507
19 Probabilistic computation 527
20 Modeling randomized computation 539
V Advanced topics 561
21 Cryptography 563
22 Proofs and algorithms 591
23 Quantum computing 593
VI Appendices 625
Contents (detailed)
Preface 90.1 To the student . . . . . . . . . . . . . . . . . . . . . . . . 10
0.1.1 Is the effort worth it? . . . . . . . . . . . . . . . . 110.2 To potential instructors . . . . . . . . . . . . . . . . . . . 120.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 14
Preliminaries 17
0 Introduction 190.1 Integer multiplication: an example of an algorithm . . . 200.2 Extended Example: A faster way to multiply (optional) 220.3 Algorithms beyond arithmetic . . . . . . . . . . . . . . . 270.4 On the importance of negative results . . . . . . . . . . 280.5 Roadmap to the rest of this book . . . . . . . . . . . . . 29
0.5.1 Dependencies between chapters . . . . . . . . . . 300.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 320.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 33
1 Mathematical Background 371.1 This chapter: a readerâs manual . . . . . . . . . . . . . . 371.2 A quick overview of mathematical prerequisites . . . . 381.3 Reading mathematical texts . . . . . . . . . . . . . . . . 39
1.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . 401.3.2 Assertions: Theorems, lemmas, claims . . . . . . 401.3.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.4 Basic discrete math objects . . . . . . . . . . . . . . . . . 411.4.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 411.4.2 Special sets . . . . . . . . . . . . . . . . . . . . . . 421.4.3 Functions . . . . . . . . . . . . . . . . . . . . . . . 441.4.4 Graphs . . . . . . . . . . . . . . . . . . . . . . . . 461.4.5 Logic operators and quantifiers . . . . . . . . . . 491.4.6 Quantifiers for summations and products . . . . 501.4.7 Parsing formulas: bound and free variables . . . 501.4.8 Asymptotics and Big-đ notation . . . . . . . . . . 52
8
1.4.9 Some ârules of thumbâ for Big-đ notation . . . . 531.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.5.1 Proofs and programs . . . . . . . . . . . . . . . . 551.5.2 Proof writing style . . . . . . . . . . . . . . . . . . 551.5.3 Patterns in proofs . . . . . . . . . . . . . . . . . . 56
1.6 Extended example: Topological Sorting . . . . . . . . . 581.6.1 Mathematical induction . . . . . . . . . . . . . . . 601.6.2 Proving the result by induction . . . . . . . . . . 611.6.3 Minimality and uniqueness . . . . . . . . . . . . . 63
1.7 This book: notation and conventions . . . . . . . . . . . 651.7.1 Variable name conventions . . . . . . . . . . . . . 661.7.2 Some idioms . . . . . . . . . . . . . . . . . . . . . 67
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 691.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 71
2 Computation and Representation 732.1 Defining representations . . . . . . . . . . . . . . . . . . 75
2.1.1 Representing natural numbers . . . . . . . . . . . 762.1.2 Meaning of representations (discussion) . . . . . 78
2.2 Representations beyond natural numbers . . . . . . . . 782.2.1 Representing (potentially negative) integers . . . 792.2.2 Twoâs complement representation (optional) . . 792.2.3 Rational numbers, and representing pairs of
strings . . . . . . . . . . . . . . . . . . . . . . . . . 802.3 Representing real numbers . . . . . . . . . . . . . . . . . 822.4 Cantorâs Theorem, countable sets, and string represen-
tations of the real numbers . . . . . . . . . . . . . . . . . 832.4.1 Corollary: Boolean functions are uncountable . . 892.4.2 Equivalent conditions for countability . . . . . . 89
2.5 Representing objects beyond numbers . . . . . . . . . . 902.5.1 Finite representations . . . . . . . . . . . . . . . . 912.5.2 Prefix-free encoding . . . . . . . . . . . . . . . . . 912.5.3 Making representations prefix-free . . . . . . . . 942.5.4 âProof by Pythonâ (optional) . . . . . . . . . . . 952.5.5 Representing letters and text . . . . . . . . . . . . 972.5.6 Representing vectors, matrices, images . . . . . . 992.5.7 Representing graphs . . . . . . . . . . . . . . . . . 992.5.8 Representing lists and nested lists . . . . . . . . . 992.5.9 Notation . . . . . . . . . . . . . . . . . . . . . . . . 100
2.6 Defining computational tasks as mathematical functions 1002.6.1 Distinguish functions from programs! . . . . . . 102
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 108
9
I Finite computation 111
3 Defining computation 1133.1 Defining computation . . . . . . . . . . . . . . . . . . . . 1153.2 Computing using AND, OR, and NOT. . . . . . . . . . . 116
3.2.1 Some properties of AND and OR . . . . . . . . . 1183.2.2 Extended example: Computing XOR from
AND, OR, and NOT . . . . . . . . . . . . . . . . . 1193.2.3 Informally defining âbasic operationsâ and
âalgorithmsâ . . . . . . . . . . . . . . . . . . . . . 1213.3 Boolean Circuits . . . . . . . . . . . . . . . . . . . . . . . 123
3.3.1 Boolean circuits: a formal definition . . . . . . . . 1243.3.2 Equivalence of circuits and straight-line programs 127
3.4 Physical implementations of computing devices (di-gression) . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.4.1 Transistors . . . . . . . . . . . . . . . . . . . . . . 1313.4.2 Logical gates from transistors . . . . . . . . . . . 1323.4.3 Biological computing . . . . . . . . . . . . . . . . 1323.4.4 Cellular automata and the game of life . . . . . . 1323.4.5 Neural networks . . . . . . . . . . . . . . . . . . . 1323.4.6 A computer made from marbles and pipes . . . . 133
3.5 The NAND function . . . . . . . . . . . . . . . . . . . . 1343.5.1 NAND Circuits . . . . . . . . . . . . . . . . . . . . 1353.5.2 More examples of NAND circuits (optional) . . . 1363.5.3 The NAND-CIRC Programming language . . . . 138
3.6 Equivalence of all these models . . . . . . . . . . . . . . 1403.6.1 Circuits with other gate sets . . . . . . . . . . . . 1413.6.2 Specification vs. implementation (again) . . . . . 142
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1433.8 Biographical notes . . . . . . . . . . . . . . . . . . . . . . 146
4 Syntactic sugar, and computing every function 1494.1 Some examples of syntactic sugar . . . . . . . . . . . . . 151
4.1.1 User-defined procedures . . . . . . . . . . . . . . 1514.1.2 Proof by Python (optional) . . . . . . . . . . . . . 1534.1.3 Conditional statements . . . . . . . . . . . . . . . 154
4.2 Extended example: Addition and Multiplication (op-tional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.3 The LOOKUP function . . . . . . . . . . . . . . . . . . . 1584.3.1 Constructing a NAND-CIRC program for
LOOKUP . . . . . . . . . . . . . . . . . . . . . . . 1594.4 Computing every function . . . . . . . . . . . . . . . . . 161
4.4.1 Proof of NANDâs Universality . . . . . . . . . . . 1624.4.2 Improving by a factor of đ (optional) . . . . . . . 163
10
4.5 Computing every function: An alternative proof . . . . 1654.6 The class SIZE(đ ) . . . . . . . . . . . . . . . . . . . . . . 1674.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1704.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 174
5 Code as data, data as code 1755.1 Representing programs as strings . . . . . . . . . . . . . 1775.2 Counting programs, and lower bounds on the size of
NAND-CIRC programs . . . . . . . . . . . . . . . . . . . 1785.2.1 Size hierarchy theorem (optional) . . . . . . . . . 180
5.3 The tuples representation . . . . . . . . . . . . . . . . . 1825.3.1 From tuples to strings . . . . . . . . . . . . . . . . 183
5.4 A NAND-CIRC interpreter in NAND-CIRC . . . . . . . 1845.4.1 Efficient universal programs . . . . . . . . . . . . 1855.4.2 A NAND-CIRC interpeter in âpseudocodeâ . . . 1865.4.3 A NAND interpreter in Python . . . . . . . . . . 1875.4.4 Constructing the NAND-CIRC interpreter in
NAND-CIRC . . . . . . . . . . . . . . . . . . . . . 1885.5 A Python interpreter in NAND-CIRC (discussion) . . . 1915.6 The physical extended Church-Turing thesis (discus-
sion) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1935.6.1 Attempts at refuting the PECTT . . . . . . . . . . 195
5.7 Recap of Part I: Finite Computation . . . . . . . . . . . . 1995.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2015.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 203
II Uniform computation 205
6 Functions with Infinite domains, Automata, and Regularexpressions 2076.1 Functions with inputs of unbounded length . . . . . . . 208
6.1.1 Varying inputs and outputs . . . . . . . . . . . . . 2096.1.2 Formal Languages . . . . . . . . . . . . . . . . . . 2116.1.3 Restrictions of functions . . . . . . . . . . . . . . . 211
6.2 Deterministic finite automata (optional) . . . . . . . . . 2126.2.1 Anatomy of an automaton (finite vs. unbounded) 2156.2.2 DFA-computable functions . . . . . . . . . . . . . 216
6.3 Regular expressions . . . . . . . . . . . . . . . . . . . . . 2176.3.1 Algorithms for matching regular expressions . . 221
6.4 Efficient matching of regular expressions (optional) . . 2236.4.1 Matching regular expressions using DFAs . . . . 2276.4.2 Equivalence of regular expressions and automata 2286.4.3 Closure properties of regular expressions . . . . 230
11
6.5 Limitations of regular expressions and the pumpinglemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
6.6 Answering semantic questions about regular expres-sions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2396.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 240
7 Loops and infinity 2417.1 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . 242
7.1.1 Extended example: A Turing machine for palin-dromes . . . . . . . . . . . . . . . . . . . . . . . . 244
7.1.2 Turing machines: a formal definition . . . . . . . 2457.1.3 Computable functions . . . . . . . . . . . . . . . . 2477.1.4 Infinite loops and partial functions . . . . . . . . 248
7.2 Turing machines as programming languages . . . . . . 2497.2.1 The NAND-TM Programming language . . . . . 2517.2.2 Sneak peak: NAND-TM vs Turing machines . . . 2547.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . 254
7.3 Equivalence of Turing machines and NAND-TM pro-grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.3.1 Specification vs implementation (again) . . . . . 2607.4 NAND-TM syntactic sugar . . . . . . . . . . . . . . . . . 261
7.4.1 âGOTOâ and inner loops . . . . . . . . . . . . . . 2617.5 Uniformity, and NAND vs NAND-TM (discussion) . . 2647.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2657.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 268
8 Equivalent models of computation 2718.1 RAM machines and NAND-RAM . . . . . . . . . . . . . 2738.2 The gory details (optional) . . . . . . . . . . . . . . . . . 277
8.2.1 Indexed access in NAND-TM . . . . . . . . . . . . 2778.2.2 Two dimensional arrays in NAND-TM . . . . . . 2798.2.3 All the rest . . . . . . . . . . . . . . . . . . . . . . 279
8.3 Turing equivalence (discussion) . . . . . . . . . . . . . . 2808.3.1 The âBest of both worldsâ paradigm . . . . . . . 2818.3.2 Letâs talk about abstractions . . . . . . . . . . . . 2818.3.3 Turing completeness and equivalence, a formal
definition (optional) . . . . . . . . . . . . . . . . . 2838.4 Cellular automata . . . . . . . . . . . . . . . . . . . . . . 284
8.4.1 One dimensional cellular automata are Turingcomplete . . . . . . . . . . . . . . . . . . . . . . . 286
8.4.2 Configurations of Turing machines and thenext-step function . . . . . . . . . . . . . . . . . . 287
12
8.5 Lambda calculus and functional programming lan-guages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
8.5.1 Applying functions to functions . . . . . . . . . . 2908.5.2 Obtaining multi-argument functions via Currying 2918.5.3 Formal description of the λ calculus . . . . . . . . 2928.5.4 Infinite loops in the λ calculus . . . . . . . . . . . 295
8.6 The âEnhancedâ λ calculus . . . . . . . . . . . . . . . . 2958.6.1 Computing a function in the enhanced λ calculus 2978.6.2 Enhanced λ calculus is Turing-complete . . . . . 298
8.7 From enhanced to pure λ calculus . . . . . . . . . . . . 3018.7.1 List processing . . . . . . . . . . . . . . . . . . . . 3028.7.2 The Y combinator, or recursion without recursion 303
8.8 The Church-Turing Thesis (discussion) . . . . . . . . . 3068.8.1 Different models of computation . . . . . . . . . . 307
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 3088.10 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 312
9 Universality and uncomputability 3159.1 Universality or a meta-circular evaluator . . . . . . . . . 316
9.1.1 Proving the existence of a universal TuringMachine . . . . . . . . . . . . . . . . . . . . . . . . 318
9.1.2 Implications of universality (discussion) . . . . . 3209.2 Is every function computable? . . . . . . . . . . . . . . . 3219.3 The Halting problem . . . . . . . . . . . . . . . . . . . . 323
9.3.1 Is the Halting problem really hard? (discussion) 3269.3.2 A direct proof of the uncomputability of HALT
(optional) . . . . . . . . . . . . . . . . . . . . . . . 3279.4 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . 329
9.4.1 Example: Halting on the zero problem . . . . . . 3309.5 Riceâs Theorem and the impossibility of general soft-
ware verification . . . . . . . . . . . . . . . . . . . . . . . 3339.5.1 Riceâs Theorem . . . . . . . . . . . . . . . . . . . . 3359.5.2 Halting and Riceâs Theorem for other Turing-
complete models . . . . . . . . . . . . . . . . . . . 3399.5.3 Is software verification doomed? (discussion) . . 340
9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 3429.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 345
10 Restricted computational models 34710.1 Turing completeness as a bug . . . . . . . . . . . . . . . 34710.2 Context free grammars . . . . . . . . . . . . . . . . . . . 349
10.2.1 Context-free grammars as a computational model 35110.2.2 The power of context free grammars . . . . . . . 35310.2.3 Limitations of context-free grammars (optional) 355
13
10.3 Semantic properties of context free languages . . . . . . 35710.3.1 Uncomputability of context-free grammar
equivalence (optional) . . . . . . . . . . . . . . . 35710.4 Summary of semantic properties for regular expres-
sions and context-free grammars . . . . . . . . . . . . . 36010.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 36110.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 362
11 Is every theorem provable? 36511.1 Hilbertâs Program and Gödelâs Incompleteness Theorem 365
11.1.1 Defining âProof Systemsâ . . . . . . . . . . . . . . 36711.2 Gödelâs Incompleteness Theorem: Computational
variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36811.3 Quantified integer statements . . . . . . . . . . . . . . . 37011.4 Diophantine equations and the MRDP Theorem . . . . 37211.5 Hardness of quantified integer statements . . . . . . . . 374
11.5.1 Step 1: Quantified mixed statements and com-putation histories . . . . . . . . . . . . . . . . . . 375
11.5.2 Step 2: Reducing mixed statements to integerstatements . . . . . . . . . . . . . . . . . . . . . . 378
11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 38011.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 382
III Efficient algorithms 385
12 Efficient computation: An informal introduction 38712.1 Problems on graphs . . . . . . . . . . . . . . . . . . . . . 389
12.1.1 Finding the shortest path in a graph . . . . . . . . 39012.1.2 Finding the longest path in a graph . . . . . . . . 39212.1.3 Finding the minimum cut in a graph . . . . . . . 39212.1.4 Min-Cut Max-Flow and Linear programming . . 39312.1.5 Finding the maximum cut in a graph . . . . . . . 39512.1.6 A note on convexity . . . . . . . . . . . . . . . . . 395
12.2 Beyond graphs . . . . . . . . . . . . . . . . . . . . . . . . 39712.2.1 SAT . . . . . . . . . . . . . . . . . . . . . . . . . . 39712.2.2 Solving linear equations . . . . . . . . . . . . . . . 39812.2.3 Solving quadratic equations . . . . . . . . . . . . 399
12.3 More advanced examples . . . . . . . . . . . . . . . . . 39912.3.1 Determinant of a matrix . . . . . . . . . . . . . . . 39912.3.2 Permanent of a matrix . . . . . . . . . . . . . . . . 40112.3.3 Finding a zero-sum equilibrium . . . . . . . . . . 40112.3.4 Finding a Nash equilibrium . . . . . . . . . . . . 40212.3.5 Primality testing . . . . . . . . . . . . . . . . . . . 40212.3.6 Integer factoring . . . . . . . . . . . . . . . . . . . 403
14
12.4 Our current knowledge . . . . . . . . . . . . . . . . . . . 40312.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 40412.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 40412.7 Further explorations . . . . . . . . . . . . . . . . . . . . 405
13 Modeling running time 40713.1 Formally defining running time . . . . . . . . . . . . . . 409
13.1.1 Polynomial and Exponential Time . . . . . . . . . 41013.2 Modeling running time using RAM Machines /
NAND-RAM . . . . . . . . . . . . . . . . . . . . . . . . . 41213.3 Extended Church-Turing Thesis (discussion) . . . . . . 41713.4 Efficient universal machine: a NAND-RAM inter-
preter in NAND-RAM . . . . . . . . . . . . . . . . . . . 41813.4.1 Timed Universal Turing Machine . . . . . . . . . 420
13.5 The time hierarchy theorem . . . . . . . . . . . . . . . . 42113.6 Non uniform computation . . . . . . . . . . . . . . . . . 425
13.6.1 Oblivious NAND-TM programs . . . . . . . . . . 42713.6.2 âUnrolling the loopâ: algorithmic transforma-
tion of Turing Machines to circuits . . . . . . . . . 43013.6.3 Can uniform algorithms simulate non uniform
ones? . . . . . . . . . . . . . . . . . . . . . . . . . 43213.6.4 Uniform vs. Nonuniform computation: A recap . 434
13.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 43513.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 438
14 Polynomial-time reductions 43914.1 Formal definitions of problems . . . . . . . . . . . . . . 44014.2 Polynomial-time reductions . . . . . . . . . . . . . . . . 441
14.2.1 Whistling pigs and flying horses . . . . . . . . . . 44214.3 Reducing 3SAT to zero one and quadratic equations . . 444
14.3.1 Quadratic equations . . . . . . . . . . . . . . . . . 44714.4 The independent set problem . . . . . . . . . . . . . . . 44914.5 Some exercises and anatomy of a reduction. . . . . . . . 452
14.5.1 Dominating set . . . . . . . . . . . . . . . . . . . . 45314.5.2 Anatomy of a reduction . . . . . . . . . . . . . . . 456
14.6 Reducing Independent Set to Maximum Cut . . . . . . 45814.7 Reducing 3SAT to Longest Path . . . . . . . . . . . . . . 459
14.7.1 Summary of relations . . . . . . . . . . . . . . . . 46214.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 46214.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 463
15 NP, NP completeness, and the Cook-Levin Theorem 46515.1 The class NP . . . . . . . . . . . . . . . . . . . . . . . . . 465
15.1.1 Examples of functions in NP . . . . . . . . . . . . 46815.1.2 Basic facts about NP . . . . . . . . . . . . . . . . . 470
15
15.2 From NP to 3SAT: The Cook-Levin Theorem . . . . . . . 47115.2.1 What does this mean? . . . . . . . . . . . . . . . . 47315.2.2 The Cook-Levin Theorem: Proof outline . . . . . 473
15.3 The NANDSAT Problem, and why it is NP hard . . . . 47415.4 The 3NAND problem . . . . . . . . . . . . . . . . . . . . 47615.5 From 3NAND to 3SAT . . . . . . . . . . . . . . . . . . . 47915.6 Wrapping up . . . . . . . . . . . . . . . . . . . . . . . . . 48015.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 48115.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 482
16 What if P equals NP? 48316.1 Search-to-decision reduction . . . . . . . . . . . . . . . . 48416.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 487
16.2.1 Example: Supervised learning . . . . . . . . . . . 49016.2.2 Example: Breaking cryptosystems . . . . . . . . . 491
16.3 Finding mathematical proofs . . . . . . . . . . . . . . . 49116.4 Quantifier elimination (advanced) . . . . . . . . . . . . 492
16.4.1 Application: self improving algorithm for 3SAT . 49516.5 Approximating counting problems and posterior
sampling (advanced, optional) . . . . . . . . . . . . . . 49516.6 What does all of this imply? . . . . . . . . . . . . . . . . 49616.7 Can P â NP be neither true nor false? . . . . . . . . . . 49816.8 Is P = NP âin practiceâ? . . . . . . . . . . . . . . . . . . 49916.9 What if P â NP? . . . . . . . . . . . . . . . . . . . . . . . 50016.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 50216.11 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 502
17 Space bounded computation 50317.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 50317.2 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 503
IV Randomized computation 505
18 Probability Theory 101 50718.1 Random coins . . . . . . . . . . . . . . . . . . . . . . . . 507
18.1.1 Random variables . . . . . . . . . . . . . . . . . . 51018.1.2 Distributions over strings . . . . . . . . . . . . . . 51218.1.3 More general sample spaces . . . . . . . . . . . . 512
18.2 Correlations and independence . . . . . . . . . . . . . . 51318.2.1 Independent random variables . . . . . . . . . . . 51418.2.2 Collections of independent random variables . . 516
18.3 Concentration and tail bounds . . . . . . . . . . . . . . . 51618.3.1 Chebyshevâs Inequality . . . . . . . . . . . . . . . 51818.3.2 The Chernoff bound . . . . . . . . . . . . . . . . . 519
16
18.3.3 Application: Supervised learning and empiricalrisk minimization . . . . . . . . . . . . . . . . . . 520
18.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 52218.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 525
19 Probabilistic computation 52719.1 Finding approximately good maximum cuts . . . . . . . 528
19.1.1 Amplifying the success of randomized algorithms 52919.1.2 Success amplification . . . . . . . . . . . . . . . . 53019.1.3 Two-sided amplification . . . . . . . . . . . . . . . 53119.1.4 What does this mean? . . . . . . . . . . . . . . . . 53119.1.5 Solving SAT through randomization . . . . . . . 53219.1.6 Bipartite matching . . . . . . . . . . . . . . . . . . 534
19.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 53719.3 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 53719.4 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 537
20 Modeling randomized computation 53920.1 Modeling randomized computation . . . . . . . . . . . 540
20.1.1 An alternative view: random coins as an âextrainputâ . . . . . . . . . . . . . . . . . . . . . . . . . 542
20.1.2 Success amplification of two-sided error algo-rithms . . . . . . . . . . . . . . . . . . . . . . . . . 544
20.2 BPP and NP completeness . . . . . . . . . . . . . . . . . 54520.3 The power of randomization . . . . . . . . . . . . . . . . 547
20.3.1 Solving BPP in exponential time . . . . . . . . . . 54720.3.2 Simulating randomized algorithms by circuits . . 547
20.4 Derandomization . . . . . . . . . . . . . . . . . . . . . . 54920.4.1 Pseudorandom generators . . . . . . . . . . . . . 55020.4.2 From existence to constructivity . . . . . . . . . . 55220.4.3 Usefulness of pseudorandom generators . . . . . 553
20.5 P = NP and BPP vs P . . . . . . . . . . . . . . . . . . . . 55420.6 Non-constructive existence of pseudorandom genera-
tors (advanced, optional) . . . . . . . . . . . . . . . . . 55720.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 56020.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 560
V Advanced topics 561
21 Cryptography 56321.1 Classical cryptosystems . . . . . . . . . . . . . . . . . . . 56421.2 Defining encryption . . . . . . . . . . . . . . . . . . . . . 56621.3 Defining security of encryption . . . . . . . . . . . . . . 56721.4 Perfect secrecy . . . . . . . . . . . . . . . . . . . . . . . . 569
17
21.4.1 Example: Perfect secrecy in the battlefield . . . . 57021.4.2 Constructing perfectly secret encryption . . . . . 571
21.5 Necessity of long keys . . . . . . . . . . . . . . . . . . . 57221.6 Computational secrecy . . . . . . . . . . . . . . . . . . . 574
21.6.1 Stream ciphers or the âderandomized one-timepadâ . . . . . . . . . . . . . . . . . . . . . . . . . . 576
21.7 Computational secrecy and NP . . . . . . . . . . . . . . 57921.8 Public key cryptography . . . . . . . . . . . . . . . . . . 581
21.8.1 Defining public key encryption . . . . . . . . . . 58321.8.2 Diffie-Hellman key exchange . . . . . . . . . . . . 584
21.9 Other security notions . . . . . . . . . . . . . . . . . . . 58621.10 Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
21.10.1 Zero knowledge proofs . . . . . . . . . . . . . . . 58621.10.2 Fully homomorphic encryption . . . . . . . . . . 58721.10.3 Multiparty secure computation . . . . . . . . . . 587
21.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 58921.12 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 589
22 Proofs and algorithms 59122.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 59122.2 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 591
23 Quantum computing 59323.1 The double slit experiment . . . . . . . . . . . . . . . . . 59423.2 Quantum amplitudes . . . . . . . . . . . . . . . . . . . . 594
23.2.1 Linear algebra quick review . . . . . . . . . . . . 59723.3 Bellâs Inequality . . . . . . . . . . . . . . . . . . . . . . . 59823.4 Quantum weirdness . . . . . . . . . . . . . . . . . . . . . 60023.5 Quantum computing and computation - an executive
summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 60023.6 Quantum systems . . . . . . . . . . . . . . . . . . . . . . 602
23.6.1 Quantum amplitudes . . . . . . . . . . . . . . . . 60423.6.2 Quantum systems: an executive summary . . . . 605
23.7 Analysis of Bellâs Inequality (optional) . . . . . . . . . . 60523.8 Quantum computation . . . . . . . . . . . . . . . . . . . 608
23.8.1 Quantum circuits . . . . . . . . . . . . . . . . . . 60823.8.2 QNAND-CIRC programs (optional) . . . . . . . 61123.8.3 Uniform computation . . . . . . . . . . . . . . . . 612
23.9 Physically realizing quantum computation . . . . . . . 61323.10 Shorâs Algorithm: Hearing the shape of prime factors . 614
23.10.1 Period finding . . . . . . . . . . . . . . . . . . . . 61423.10.2 Shorâs Algorithm: A birdâs eye view . . . . . . . . 615
23.11 Quantum Fourier Transform (advanced, optional) . . . 617
18
23.11.1 Quantum Fourier Transform over the BooleanCube: Simonâs Algorithm . . . . . . . . . . . . . . 619
23.11.2 From Fourier to Period finding: Simonâs Algo-rithm (advanced, optional) . . . . . . . . . . . . . 620
23.11.3 From Simon to Shor (advanced, optional) . . . . 62123.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 62323.13 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 623
VI Appendices 625
Preface
âWe make ourselves no promises, but we cherish the hope that the unobstructedpursuit of useless knowledge will prove to have consequences in the futureas in the pastâ ⊠âAn institution which sets free successive generations ofhuman souls is amply justified whether or not this graduate or that makes aso-called useful contribution to human knowledge. A poem, a symphony, apainting, a mathematical truth, a new scientific fact, all bear in themselves allthe justification that universities, colleges, and institutes of research need orrequireâ, Abraham Flexner, The Usefulness of Useless Knowledge, 1939.
âI suggest that you take the hardest courses that you can, because you learnthe most when you challenge yourself⊠CS 121 I found pretty hard.â, MarkZuckerberg, 2005.
This is a textbook for an undergraduate introductory course onTheoretical Computer Science. The educational goals of this book areto convey the following:
âą That computation arises in a variety of natural and human-madesystems, and not only in modern silicon-based computers.
âą Similarly, beyond being an extremely important tool, computationalso serves as a useful lens to describe natural, physical, mathemati-cal and even social concepts.
âą The notion of universality of many different computational models,and the related notion of the duality between code and data.
âą The idea that one can precisely define a mathematical model ofcomputation, and then use that to prove (or sometimes only conjec-ture) lower bounds and impossibility results.
âą Some of the surprising results and discoveries in modern theoreti-cal computer science, including the prevalence of NP-completeness,the power of interaction, the power of randomness on one handand the possibility of derandomization on the other, the abilityto use hardness âfor goodâ in cryptography, and the fascinatingpossibility of quantum computing.
Compiled on 8.26.2020 18:10
20
I hope that following this course, students would be able to rec-ognize computation, with both its power and pitfalls, as it arises invarious settings, including seemingly âstaticâ content or ârestrictedâformalisms such as macros and scripts. They should be able to followthrough the logic of proofs about computation, including the cen-tral concept of a reduction, as well as understanding âself-referentialâproofs (such as diagonalization-based proofs that involve programsgiven their own code as input). Students should understand thatsome problems are inherently intractable, and be able to recognize thepotential for intractability when they are faced with a new problem.While this book only touches on cryptography, students should un-derstand the basic idea of how we can use computational hardness forcryptographic purposes. However, more than any specific skill, thisbook aims to introduce students to a new way of thinking of computa-tion as an object in its own right and to illustrate how this new way ofthinking leads to far-reaching insights and applications.
My aim in writing this text is to try to convey these concepts in thesimplest possible way and try to make sure that the formal notationand model help elucidate, rather than obscure, the main ideas. I alsotried to take advantage of modern studentsâ familiarity (or at leastinterest!) in programming, and hence use (highly simplified) pro-gramming languages to describe our models of computation. Thatsaid, this book does not assume fluency with any particular program-ming language, but rather only some familiarity with the generalnotion of programming. We will use programming metaphors andidioms, occasionally mentioning specific programming languagessuch as Python, C, or Lisp, but students should be able to follow thesedescriptions even if they are not familiar with these languages.
Proofs in this book, including the existence of a universal TuringMachine, the fact that every finite function can be computed by somecircuit, the Cook-Levin theorem, and many others, are often con-structive and algorithmic, in the sense that they ultimately involvetransforming one program to another. While it is possible to followthese proofs without seeing the code, I do think that having accessto the code, and the ability to play around with it and see how it actson various programs, can make these theorems more concrete for thestudents. To that end, an accompanying website (which is still workin progress) allows executing programs in the various computationalmodels we define, as well as see constructive proofs of some of thetheorems.
0.1 TO THE STUDENT
This book can be challenging, mainly because it brings together avariety of ideas and techniques in the study of computation. There
21
are quite a few technical hurdles to master, whether it is followingthe diagonalization argument for proving the Halting Problem isundecidable, combinatorial gadgets in NP-completeness reductions,analyzing probabilistic algorithms, or arguing about the adversary toprove the security of cryptographic primitives.
The best way to engage with this material is to read these notes ac-tively, so make sure you have a pen ready. While reading, I encourageyou to stop and think about the following:
âą When I state a theorem, stop and take a shot at proving it on yourown before reading the proof. You will be amazed by how muchbetter you can understand a proof even after only 5 minutes ofattempting it on your own.
âą When reading a definition, make sure that you understand whatthe definition means, and what the natural examples are of objectsthat satisfy it and objects that do not. Try to think of the motivationbehind the definition, and whether there are other natural ways toformalize the same concept.
âą Actively notice which questions arise in your mind as you read thetext, and whether or not they are answered in the text.
As a general rule, it is more important that you understand thedefinitions than the theorems, and it is more important that youunderstand a theorem statement than its proof. After all, before youcan prove a theorem, you need to understand what it states, and tounderstand what a theorem is about, you need to know the definitionsof the objects involved. Whenever a proof of a theorem is at leastsomewhat complicated, I provide a âproof idea.â Feel free to skip theactual proof in a first reading, focusing only on the proof idea.
This book contains some code snippets, but this is by no meansa programming text. You donât need to know how to program tofollow this material. The reason we use code is that it is a precise wayto describe computation. Particular implementation details are notas important to us, and so we will emphasize code readability at theexpense of considerations such as error handling, encapsulation, etc.that can be extremely important for real-world programming.
0.1.1 Is the effort worth it?This is not an easy book, and you might reasonably wonder whyshould you spend the effort in learning this material. A traditionaljustification for a âTheory of Computationâ course is that you mightencounter these concepts later on in your career. Perhaps you willcome across a hard problem and realize it is NP complete, or find aneed to use what you learned about regular expressions. This might
22
1 An earlier book that starts with circuits as the initialmodel is John Savageâs [Sav98].
very well be true, but the main benefit of this book is not in teachingyou any practical tool or technique, but instead in giving you a differ-ent way of thinking: an ability to recognize computational phenomenaeven when they occur in non-obvious settings, a way to model compu-tational tasks and questions, and to reason about them.
Regardless of any use you will derive from this book, I believelearning this material is important because it contains concepts thatare both beautiful and fundamental. The role that energy and matterplayed in the 20th century is played in the 21st by computation andinformation, not just as tools for our technology and economy, but alsoas the basic building blocks we use to understand the world. Thisbook will give you a taste of some of the theory behind those, andhopefully spark your curiosity to study more.
0.2 TO POTENTIAL INSTRUCTORS
I wrote this book for my Harvard course, but I hope that other lectur-ers will find it useful as well. To some extent, it is similar in contentto âTheory of Computationâ or âGreat Ideasâ courses such as thosetaught at CMU or MIT.
The most significant difference between our approach and moretraditional ones (such as Hopcroft and Ullmanâs [HU69; HU79] andSipserâs [Sip97]) is that we do not start with finite automata as our ini-tial computational model. Instead, our initial computational modelis Boolean Circuits.1 We believe that Boolean Circuits are more fun-damental to the theory of computing (and even its practice!) thanautomata. In particular, Boolean Circuits are a prerequisite for manyconcepts that one would want to teach in a modern course on Theoret-ical Computer Science, including cryptography, quantum computing,derandomization, attempts at proving P â NP, and more. Even incases where Boolean Circuits are not strictly required, they can of-ten offer significant simplifications (as in the case of the proof of theCook-Levin Theorem).
Furthermore, I believe there are pedagogical reasons to start withBoolean circuits as opposed to finite automata. Boolean circuits are amore natural model of computation, and one that corresponds moreclosely to computing in silicon, making the connection to practicemore immediate to the students. Finite functions are arguably easierto grasp than infinite ones, as we can fully write down their truth ta-ble. The theorem that every finite function can be computed by someBoolean circuit is both simple enough and important enough to serveas an excellent starting point for this course. Moreover, many of themain conceptual points of the theory of computation, including thenotions of the duality between code and data, and the idea of universal-ity, can already be seen in this context.
23
After Boolean circuits, we move on to Turing machines and proveresults such as the existence of a universal Turing machine, the un-computability of the halting problem, and Riceâs Theorem. Automataare discussed after we see Turing machines and undecidability, as anexample for a restricted computational model where problems such asdetermining halting can be effectively solved.
While this is not our motivation, the order we present circuits, Tur-ing machines, and automata roughly corresponds to the chronologicalorder of their discovery. Boolean algebra goes back to Booleâs andDeMorganâs works in the 1840s [Boo47; De 47] (though the defini-tion of Boolean circuits and the connection to physical computationwas given 90 years later by Shannon [Sha38]). Alan Turing definedwhat we now call âTuring Machinesâ in the 1930s [Tur37], while finiteautomata were introduced in the 1943 work of McCulloch and Pitts[MP43] but only really understood in the seminal 1959 work of Rabinand Scott [RS59].
More importantly, while models such as finite-state machines, reg-ular expressions, and context-free grammars are incredibly importantfor practice, the main applications for these models (whether it is forparsing, for analyzing properties such as liveness and safety, or even forsoftware-defined routing tables) rely crucially on the fact that theseare tractable models for which we can effectively answer semantic ques-tions. This practical motivation can be better appreciated after studentssee the undecidability of semantic properties of general computingmodels.
The fact that we start with circuits makes proving the Cook-LevinTheorem much easier. In fact, our proof of this theorem can be (andis) done using a handful of lines of Python. Combining this proofwith the standard reductions (which are also implemented in Python)allows students to appreciate visually how a question about computa-tion can be mapped into a question about (for example) the existenceof an independent set in a graph.
Some other differences between this book and previous texts arethe following:
1. For measuring time complexity, we use the standard RAM machinemodel used (implicitly) in algorithms courses, rather than Tur-ing machines. While these two models are of course polynomiallyequivalent, and hence make no difference for the definitions of theclasses P, NP, and EXP, our choice makes the distinction betweennotions such as đ(đ) or đ(đ2) time more meaningful. This choicealso ensures that these finer-grained time complexity classes corre-spond to the informal definitions of linear and quadratic time that
24
students encounter in their algorithms lectures (or their whiteboardcoding interviewsâŠ).
2. We use the terminology of functions rather than languages. That is,rather than saying that a Turing Machine đ decides a language đż â0, 1â, we say that it computes a function đč ⶠ0, 1â â 0, 1. Theterminology of âlanguagesâ arises from Chomskyâs work [Cho56],but it is often more confusing than illuminating. The languageterminology also makes it cumbersome to discuss concepts suchas algorithms that compute functions with more than one bit ofoutput (including basic tasks such as addition, multiplication,etcâŠ). The fact that we use functions rather than languages meanswe have to be extra vigilant about students distinguishing betweenthe specification of a computational task (e.g., the function) and itsimplementation (e.g., the program). On the other hand, this point isso important that it is worth repeatedly emphasizing and drillinginto the students, regardless of the notation used. The book doesmention the language terminology and reminds of it occasionally,to make it easier for students to consult outside resources.
Reducing the time dedicated to finite automata and context-freelanguages allows instructors to spend more time on topics that a mod-ern course in the theory of computing needs to touch upon. Theseinclude randomness and computation, the interactions between proofsand programs (including Gödelâs incompleteness theorem, interactiveproof systems, and even a bit on the đ-calculus and the Curry-Howardcorrespondence), cryptography, and quantum computing.
This book contains sufficient detail to enable its use for self-study.Toward that end, every chapter starts with a list of learning objectives,ends with a recap, and is peppered with âpause boxesâ which encour-age students to stop and work out an argument or make sure theyunderstand a definition before continuing further.
Section 0.5 contains a âroadmapâ for this book, with descriptionsof the different chapters, as well as the dependency structure betweenthem. This can help in planning a course based on this book.
0.3 ACKNOWLEDGEMENTS
This text is continually evolving, and I am getting input from manypeople, for which I am deeply grateful. Salil Vadhan co-taught withme the first iteration of this course and gave me a tremendous amountof useful feedback and insights during this process. Michele Amorettiand Marika Swanberg carefully read several chapters of this text andgave extremely helpful detailed comments. Dave Evans and RichardXu contributed many pull requests fixing errors and improving phras-
25
ing. Thanks to Anil Ada, Venkat Guruswami, and Ryan OâDonnell forhelpful tips from their experience in teaching CMU 15-251.
Thanks to everyone that sent me comments, typo reports, or postedissues or pull requests on the GitHub repository https://github.com/boazbk/tcs. In particular I would like to acknowledge helpfulfeedback from Scott Aaronson, Michele Amoretti, Aadi Bajpai, Mar-guerite Basta, Anindya Basu, Sam Benkelman, JarosĆaw BĆasiok, EmilyChan, Christy Cheng, Michelle Chiang, Daniel Chiu, Chi-Ning Chou,Michael Colavita, Rodrigo Daboin Sanchez, Robert Darley Waddilove,Anlan Du, Juan Esteller, David Evans, Michael Fine, Simon Fischer,Leor Fishman, Zaymon Foulds-Cook, William Fu, Kent Furuie, PiotrGaluszka, Carolyn Ge, Mark Goldstein, Alexander Golovnev, SayanGoswami, Michael Haak, Rebecca Hao, Joosep Hook, Thomas HUET,Emily Jia, Chan Kang, Nina Katz-Christy, Vidak Kazic, Eddie Kohler,Estefania Lahera, Allison Lee, Benjamin Lee, OndĆej LengĂĄl, RaymondLin, Emma Ling, Alex Lombardi, Lisa Lu, Aditya Mahadevan, Chris-tian May, Jacob Meyerson, Leon Mlodzian, George Moe, Glenn Moss,Hamish Nicholson, Owen Niles, Sandip Nirmel, Sebastian Oberhoff,Thomas Orton, Joshua Pan, Pablo Parrilo, Juan Perdomo, Banks Pick-ett, Aaron Sachs, Abdelrhman Saleh, Brian Sapozhnikov, AnthonyScemama, Peter SchĂ€fer, Josh Seides, Alaisha Sharma, Haneul Shin,Noah Singer, Matthew Smedberg, Miguel Solano, Hikari Sorensen,David Steurer, Alec Sun, Amol Surati, Everett Sussman, Marika Swan-berg, Garrett Tanzer, Eric Thomas, Sarah Turnill, Salil Vadhan, PatrickWatts, Jonah Weissman, Ryan Williams, Licheng Xu, Richard Xu, Wan-qian Yang, Elizabeth Yeoh-Wang, Josh Zelinsky, Fred Zhang, GraceZhang, and Jessica Zhu.
I am using many open source software packages in the productionof these notes for which I am grateful. In particular, I am thankful toDonald Knuth and Leslie Lamport for LaTeX and to John MacFarlanefor Pandoc. David Steurer wrote the original scripts to produce thistext. The current version uses Sergio Correiaâs panflute. The templatesfor the LaTeX and HTML versions are derived from Tufte LaTeX,Gitbook and Bookdown. Thanks to Amy Hendrickson for some LaTeXconsulting. Juan Esteller and Gabe Montague initially implementedthe NAND* programming languages in OCaml and Javascript. I usedthe Jupyter project to write the supplemental code snippets.
Finally, I would like to thank my family: my wife Ravit, and mychildren Alma and Goren. Working on this book (and the correspond-ing course) took so much of my time that Alma wrote an essay for herfifth-grade class saying that âuniversities should not pressure profes-sors to work too much.â Iâm afraid all I have to show for this effort is600 pages of ultra-boring mathematical text.
PRELIMINARIES
1 This quote is typically read as disparaging theimportance of actual physical computers in ComputerScience, but note that telescopes are absolutelyessential to astronomy as they provide us with themeans to connect theoretical predictions with actualexperimental observations.2 To be fair, in the following sentence Graham saysâyou need to know how to calculate time and spacecomplexity and about Turing completenessâ. Thisbook includes these topics, as well as others such asNP-hardness, randomization, cryptography, quantumcomputing, and more.
0Introduction
âComputer Science is no more about computers than astronomy is abouttelescopesâ, attributed to Edsger Dijkstra.1
âHackers need to understand the theory of computation about as much aspainters need to understand paint chemistry.â, Paul Graham 2003.2
âThe subject of my talk is perhaps most directly indicated by simply askingtwo questions: first, is it harder to multiply than to add? and second, why?âŠI(would like to) show that there is no algorithm for multiplication computation-ally as simple as that for addition, and this proves something of a stumblingblock.â, Alan Cobham, 1964
The origin of much of science and medicine can be traced back tothe ancient Babylonians. However, the Babyloniansâ most significantcontribution to humanity was arguably the invention of the place-valuenumber system. The place-value system represents any number usinga collection of digits, whereby the position of the digit is used to de-termine its value, as opposed to a system such as Roman numerals,where every symbol has a fixed numerical value regardless of posi-tion. For example, the average distance to the moon is approximately238,900 of our miles or 259,956 Roman miles. The latter quantity, ex-pressed in standard Roman numerals isMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMDCCCCLVI
Writing the distance to the sun in Roman numerals would requireabout 100,000 symbols: a 50-page book just containing this singlenumber!
Compiled on 8.26.2020 18:10
Learning Objectives:âą Introduce and motivate the study of
computation for its own sake, irrespective ofparticular implementations.
âą The notion of an algorithm and some of itshistory.
âą Algorithms as not just tools, but also ways ofthinking and understanding.
âą Taste of Big-đ analysis and the surprisingcreativity in the design of efficientalgorithms.
30 introduction to theoretical computer science
For someone who thinks of numbers in an additive system likeRoman numerals, quantities like the distance to the moon or sun arenot merely largeâthey are unspeakable: cannot be expressed or evengrasped. Itâs no wonder that Eratosthenes, who was the first personto calculate the earthâs diameter (up to about ten percent error) andHipparchus who was the first to calculate the distance to the moon,did not use a Roman-numeral type system but rather the Babyloniansexagesimal (i.e., base 60) place-value system.
0.1 INTEGER MULTIPLICATION: AN EXAMPLE OF AN ALGORITHM
In the language of Computer Science, the place-value system for rep-resenting numbers is known as a data structure: a set of instructions,or ârecipeâ, for representing objects as symbols. An algorithm is a setof instructions, or ârecipeâ, for performing operations on such rep-resentations. Data structures and algorithms have enabled amazingapplications that have transformed human society, but their impor-tance goes beyond their practical utility. Structures from computerscience, such as bits, strings, graphs, and even the notion of a programitself, as well as concepts such as universality and replication, have notjust found (many) practical uses but contributed a new language anda new way to view the world.
In addition to coming up with the place-value system, the Babylo-nians also invented the âstandard algorithmsâ that we were all taughtin elementary school for adding and multiplying numbers. These al-gorithms have been essential throughout the ages for people usingabaci, papyrus, or pencil and paper, but in our computer age, do theystill serve any purpose beyond torturing third graders? To see whythese algorithms are still very much relevant, let us compare the Baby-lonian digit-by-digit multiplication algorithm (âgrade-school multi-plicationâ) with the naive algorithm that multiplies numbers throughrepeated addition. We start by formally describing both algorithms,see Algorithm 0.1 and Algorithm 0.2.
Algorithm 0.1 â Multiplication via repeated addition.
Input: Non-negative integers đ„, đŠOutput: Product đ„ â đŠ1: Let đđđ đąđđĄ â 0.2: for đ = 1, ⊠, đŠ do3: đđđ đąđđĄ â đđđ đąđđĄ + đ„4: end for5: return đđđ đąđđĄ
introduction 31
Algorithm 0.2 â Grade-school multiplication.
Input: Non-negative integers đ„, đŠOutput: Product đ„ â đŠ1: Write đ„ = đ„đâ1đ„đâ2 ⯠đ„0 and đŠ = đŠđâ1đŠđâ2 ⯠đŠ0 in dec-
imal place-value notation. # đ„0 is the ones digit of đ„, đ„1 isthe tens digit, etc.
2: Let đđđ đąđđĄ â 03: for đ = 0, ⊠, đ â 1 do4: for đ = 0, ⊠, đ â 1 do5: đđđ đąđđĄ â đđđ đąđđĄ + 10đ+đ â đ„đ â đŠđ6: end for7: end for8: return đđđ đąđđĄ
Both Algorithm 0.1 and Algorithm 0.2 assume that we alreadyknow how to add numbers, and Algorithm 0.2 also assumes that wecan multiply a number by a power of 10 (which is, after all, a sim-ple shift). Suppose that đ„ and đŠ are two integers of đ = 20 decimaldigits each. (This roughly corresponds to 64 binary digits, which isa common size in many programming languages.) Computing đ„ â đŠusing Algorithm 0.1 entails adding đ„ to itself đŠ times which entails(since đŠ is a 20-digit number) at least 1019 additions. In contrast, thegrade-school algorithm (i.e., Algorithm 0.2) involves đ2 shifts andsingle-digit products, and so at most 2đ2 = 800 single-digit opera-tions. To understand the difference, consider that a grade-schooler canperform a single-digit operation in about 2 seconds, and so would re-quire about 1, 600 seconds (about half an hour) to compute đ„ â đŠ usingAlgorithm 0.2. In contrast, even though it is more than a billion timesfaster than a human, if we used Algorithm 0.1 to compute đ„ â đŠ using amodern PC, it would take us 1020/109 = 1011 seconds (which is morethan three millennia!) to compute the same result.
Computers have not made algorithms obsolete. On the contrary,the vast increase in our ability to measure, store, and communicatedata has led to much higher demand for developing better and moresophisticated algorithms that empower us to make better decisionsbased on these data. We also see that in no small extent the notion ofalgorithm is independent of the actual computing device that executesit. The digit-by-digit multiplication algorithm is vastly better than iter-ated addition, regardless whether the technology we use to implementit is a silicon-based chip, or a third grader with pen and paper.
Theoretical computer science is concerned with the inherent proper-ties of algorithms and computation; namely, those properties that areindependent of current technology. We ask some questions that were
32 introduction to theoretical computer science
already pondered by the Babylonians, such as âwhat is the best way tomultiply two numbers?â, but also questions that rely on cutting-edgescience such as âcould we use the effects of quantum entanglement tofactor numbers faster?â.
RRemark 0.3 â Specification, implementation and analysisof algorithms.. A full description of an algorithm hasthree components:
âą Specification: What is the task that the algorithmperforms (e.g., multiplication in the case of Algo-rithm 0.1 and Algorithm 0.2.)
âą Implementation: How is the task accomplished:what is the sequence of instructions to be per-formed. Even though Algorithm 0.1 and Algo-rithm 0.2 perform the same computational task(i.e., they have the same specification), they do it indifferent ways (i.e., they have different implementa-tions).
âą Analysis: Why does this sequence of instructionsachieve the desired task. A full description of Algo-rithm 0.1 and Algorithm 0.2 will include a proof foreach one of these algorithms that on input đ„, đŠ, thealgorithm does indeed output đ„ â đŠ.
Often as part of the analysis we show that the algo-rithm is not only correct but also efficient. That is, wewant to show that not only will the algorithm computethe desired task, but will do so in prescribed numberof operations. For example Algorithm 0.2 computesthe multiplication function on inputs of đ digits usingđ(đ2) operations, while Algorithm 0.4 (describedbelow) computes the same function using đ(đ1.6)operations. (We define the đ notations used here inSection 1.4.8.)
0.2 EXTENDED EXAMPLE: A FASTER WAY TO MULTIPLY (OP-TIONAL)
Once you think of the standard digit-by-digit multiplication algo-rithm, it seems like the âobviously bestâ â way to multiply numbers.In 1960, the famous mathematician Andrey Kolmogorov organizeda seminar at Moscow State University in which he conjectured thatevery algorithm for multiplying two đ digit numbers would requirea number of basic operations that is proportional to đ2 (Ω(đ2) opera-tions, using đ-notation as defined in Chapter 1). In other words, Kol-mogorov conjectured that in any multiplication algorithm, doublingthe number of digits would quadruple the number of basic operationsrequired. A young student named Anatoly Karatsuba was in the au-
introduction 33
Figure 1: The grade-school multiplication algorithmillustrated for multiplying đ„ = 10đ„ + đ„ and đŠ =10đŠ + đŠ. It uses the formula (10đ„ + đ„) Ă (10đŠ + đŠ) =100đ„đŠ + 10(đ„đŠ + đ„đŠ) + đ„đŠ.
3 If đ„ is a number then âđ„â is the integer obtained byrounding it down, see Section 1.7.
dience, and within a week he disproved Kolmogorovâs conjecture bydiscovering an algorithm that requires only about đ¶đ1.6 operationsfor some constant đ¶. Such a number becomes much smaller than đ2as đ grows and so for large đ Karatsubaâs algorithm is superior to thegrade-school one. (For example, Pythonâs implementation switchesfrom the grade-school algorithm to Karatsubaâs algorithm for num-bers that are 1000 bits or larger.) While the difference between anđ(đ1.6) and an đ(đ2) algorithm can be sometimes crucial in practice(see Section 0.3 below), in this book we will mostly ignore such dis-tinctions. However, we describe Karatsubaâs algorithm below since itis a good example of how algorithms can often be surprising, as wellas a demonstration of the analysis of algorithms, which is central to thisbook and to theoretical computer science at large.
Karatsubaâs algorithm is based on a faster way to multiply two-digitnumbers. Suppose that đ„, đŠ â [100] = 0, ⊠, 99 are a pair of two-digit numbers. Letâs write đ„ for the âtensâ digit of đ„, and đ„ for theâonesâ digit, so that đ„ = 10đ„ + đ„, and write similarly đŠ = 10đŠ + đŠ forđ„, đ„, đŠ, đŠ â [10]. The grade-school algorithm for multiplying đ„ and đŠ isillustrated in Fig. 1.
The grade-school algorithm can be thought of as transforming thetask of multiplying a pair of two-digit numbers into four single-digitmultiplications via the formula(10đ„ + đ„) Ă (10đŠ + đŠ) = 100đ„đŠ + 10(đ„đŠ + đ„đŠ) + đ„đŠ (1)
Generally, in the grade-school algorithm doubling the number ofdigits in the input results in quadrupling the number of operations,leading to an đ(đ2) times algorithm. In contrast, Karatsubaâs algo-rithm is based on the observation that we can express Eq. (1) alsoas
(10đ„+đ„)Ă(10đŠ+đŠ) = (100â10)đ„đŠ+10 [(đ„ + đ„)(đŠ + đŠ)]â(10â1)đ„đŠ (2)
which reduces multiplying the two-digit number đ„ and đŠ to com-puting the following three simpler products: đ„đŠ, đ„đŠ and (đ„ + đ„)(đŠ + đŠ).By repeating the same strategy recursively, we can reduce the task ofmultiplying two đ-digit numbers to the task of multiplying three pairsof âđ/2â + 1 digit numbers.3 Since every time we double the number ofdigits we triple the number of operations, we will be able to multiplynumbers of đ = 2â digits using about 3â = đlog2 3 ⌠đ1.585 operations.
The above is the intuitive idea behind Karatsubaâs algorithm, but isnot enough to fully specify it. A complete description of an algorithmentails a precise specification of its operations together with its analysis:proof that the algorithm does in fact do what itâs supposed to do. The
34 introduction to theoretical computer science
Figure 2: Karatsubaâs multiplication algorithm illus-trated for multiplying đ„ = 10đ„ + đ„ and đŠ = 10đŠ + đŠ.We compute the three orange, green and purple prod-ucts đ„đŠ, đ„đŠ and (đ„ + đ„)(đŠ + đŠ) and then add andsubtract them to obtain the result.
Figure 3: Running time of Karatsubaâs algorithmvs. the grade-school algorithm. (Python implementa-tion available online.) Note the existence of a âcutoffâlength, where for sufficiently large inputs Karat-suba becomes more efficient than the grade-schoolalgorithm. The precise cutoff location varies by imple-mentation and platform details, but will always occureventually.
operations of Karatsubaâs algorithm are detailed in Algorithm 0.4,while the analysis is given in Lemma 0.5 and Lemma 0.6.
Algorithm 0.4 â Karatsuba multiplication.
Input: nonnegative integers đ„, đŠ each of at most đ digitsOutput: đ„ â đŠ1: procedure Karatsuba(đ„,đŠ)2: if đ †4 then return đ„ â đŠ ;3: Let đ = âđ/2â4: Write đ„ = 10đđ„ + đ„ and đŠ = 10đđŠ + đŠ5: đŽ â Karatsuba(đ„, đŠ)6: đ” â Karatsuba(đ„ + đ„, đŠ + đŠ)7: đ¶ â Karatsuba(đ„, đŠ)8: return (10đ â 10đ) â đŽ + 10đ â đ” + (1 â 10đ) â đ¶9: end procedure
Algorithm 0.4 is only half of the full description of Karatsubaâsalgorithm. The other half is the analysis, which entails proving that (1)Algorithm 0.4 indeed computes the multiplication operation and (2)it does so using đ(đlog2 3) operations. We now turn to showing bothfacts:Lemma 0.5 For every nonnegative integers đ„, đŠ, when given input đ„, đŠAlgorithm 0.4 will output đ„ â đŠ.Proof. Let đ be the maximum number of digits of đ„ and đŠ. We provethe lemma by induction on đ. The base case is đ †4 where the algo-rithm returns đ„ â đŠ by definition. (It does not matter which algorithmwe use to multiply four-digit numbers - we can even use repeatedaddition.) Otherwise, if đ > 4, we define đ = âđ/2â, and writeđ„ = 10đđ„ + đ„ and đŠ = 10đđŠ + đŠ.
Plugging this into đ„ â đŠ, we getđ„ â đŠ = 102đđ„đŠ + 10đ(đ„đŠ + đ„đŠ) + đ„đŠ . (3)
Rearranging the terms we see thatđ„ â đŠ = 102đđ„đŠ + 10đ [(đ„ + đ„)(đŠ + đŠ) â đ„đŠ â đ„đŠ] + đ„đŠ . (4)
since the numbers đ„,đ„, đŠ,đŠ,đ„ + đ„,đŠ + đŠ all have at most đ + 1 < đ digits,the induction hypothesis implies that the values đŽ, đ”, đ¶ computedby the recursive calls will satisfy đŽ = đ„đŠ, đ” = (đ„ + đ„)(đŠ + đŠ) andđ¶ = đ„đŠ. Plugging this into (4) we see that đ„ â đŠ equals the value(102đ â 10đ) â đŽ + 10đ â đ” + (1 â 10đ) â đ¶ computed by Algorithm 0.4.
introduction 35
Lemma 0.6 If đ„, đŠ are integers of at most đ digits, Algorithm 0.4 willtake đ(đlog2 3) operations on input đ„, đŠ.Proof. Fig. 2 illustrates the idea behind the proof, which we onlysketch here, leaving filling out the details as Exercise 0.4. The proofis again by induction. We define đ (đ) to be the maximum number ofsteps that Algorithm 0.4 takes on inputs of length at most đ. Since inthe base case đ †2, Exercise 0.4 performs a constant number of com-putation, we know that đ (2) †đ for some constant đ and for đ > 2, itsatisfies the recursive equationđ (đ) †3đ (âđ/2â + 1) + đâČđ (5)
for some constant đâČ (using the fact that addition can be done in đ(đ)operations).
The recursive equation (5) solves to đ(đlog2 3). The intuition be-hind this is presented in Fig. 2, and this is also a consequence of theso called âMaster Theoremâ on recurrence relations. As mentionedabove, we leave completing the proof to the reader as Exercise 0.4.
Figure 4: Karatsubaâs algorithm reduces an đ-bitmultiplication to three đ/2-bit multiplications,which in turn are reduced to nine đ/4-bit multi-plications and so on. We can represent the compu-tational cost of all these multiplications in a 3-arytree of depth log2 đ, where at the root the extra costis đđ operations, at the first level the extra cost isđ(đ/2) operations, and at each of the 3đ nodes oflevel đ, the extra cost is đ(đ/2đ). The total cost isđđ âlog2 đđ=0 (3/2)đ †10đđlog2 3 by the formula forsumming a geometric series.
Karatsubaâs algorithm is by no means the end of the line for multi-plication algorithms. In the 1960âs, Toom and Cook extended Karat-subaâs ideas to get an đ(đlogđ(2đâ1)) time multiplication algorithm forevery constant đ. In 1971, Schönhage and Strassen got even better al-gorithms using the Fast Fourier Transform; their idea was to somehowtreat integers as âsignalsâ and do the multiplication more efficientlyby moving to the Fourier domain. (The Fourier transform is a centraltool in mathematics and engineering, used in a great many applica-tions; if you have not seen it yet, you are likely encounter it at somepoint in your studies.) In the years that followed researchers kept im-proving the algorithm, and only very recently Harvey and Van Der
36 introduction to theoretical computer science
Hoeven managed to obtain an đ(đ logđ) time algorithm for multipli-cation (though it only starts beating the Schönhage-Strassen algorithmfor truly astronomical numbers). Yet, despite all this progress, westill donât know whether or not there is an đ(đ) time algorithm formultiplying two đ digit numbers!
RRemark 0.7 â Matrix Multiplication (advanced note).(This book contains many âadvancedâ or âoptionalânotes and sections. These may assume backgroundthat not every student has, and can be safely skippedover as none of the future parts depends on them.)Ideas similar to Karatsubaâs can be used to speed upmatrix multiplications as well. Matrices are a powerfulway to represent linear equations and operations,widely used in a great many applications of scientificcomputing, graphics, machine learning, and manymany more.One of the basic operations one can do withtwo matrices is to multiply them. For example,
if đ„ = (đ„0,0 đ„0,1đ„1,0 đ„1,1) and đŠ = (đŠ0,0 đŠ0,1đŠ1,0 đŠ1,1)then the product of đ„ and đŠ is the matrix(đ„0,0đŠ0,0 + đ„0,1đŠ1,0 đ„0,0đŠ0,1 + đ„0,1đŠ1,1đ„1,0đŠ0,0 + đ„1,1đŠ1,0 đ„1,0đŠ0,1 + đ„1,1đŠ1,1). You can
see that we can compute this matrix by eight productsof numbers.Now suppose that đ is even and đ„ and đŠ are a pair ofđ Ă đ matrices which we can think of as each com-posed of four (đ/2) Ă (đ/2) blocks đ„0,0, đ„0,1, đ„1,0, đ„1,1and đŠ0,0, đŠ0,1, đŠ1,0, đŠ1,1. Then the formula for the matrixproduct of đ„ and đŠ can be expressed in the same wayas above, just replacing products đ„đ,đđŠđ,đ with matrixproducts, and addition with matrix addition. Thismeans that we can use the formula above to give analgorithm that doubles the dimension of the matricesat the expense of increasing the number of operationby a factor of 8, which for đ = 2â results in 8â = đ3operations.In 1969 Volker Strassen noted that we can computethe product of a pair of two-by-two matrices usingonly seven products of numbers by observing thateach entry of the matrix đ„đŠ can be computed byadding and subtracting the following seven terms:đĄ1 = (đ„0,0 + đ„1,1)(đŠ0,0 + đŠ1,1), đĄ2 = (đ„0,0 + đ„1,1)đŠ0,0,đĄ3 = đ„0,0(đŠ0,1 â đŠ1,1), đĄ4 = đ„1,1(đŠ0,1 â đŠ0,0),đĄ5 = (đ„0,0 + đ„0,1)đŠ1,1, đĄ6 = (đ„1,0 â đ„0,0)(đŠ0,0 + đŠ0,1),đĄ7 = (đ„0,1 â đ„1,1)(đŠ1,0 + đŠ1,1). Indeed, one can verify
that đ„đŠ = (đĄ1 + đĄ4 â đĄ5 + đĄ7 đĄ3 + đĄ5đĄ2 + đĄ4 đĄ1 + đĄ3 â đĄ2 + đĄ6).
introduction 37
Using this observation, we can obtain an algorithmsuch that doubling the dimension of the matricesresults in increasing the number of operations by afactor of 7, which means that for đ = 2â the cost is7â = đlog2 7 ⌠đ2.807. A long sequence of work hassince improved this algorithm, and the current recordhas running time about đ(đ2.373). However, unlike thecase of integer multiplication, at the moment we donâtknow of any algorithm for matrix multiplication thatruns in time linear or even close to linear in the sizeof the input matrices (e.g., an đ(đ2đđđđŠđđđ(đ)) timealgorithm). People have tried to use group represen-tations, which can be thought of as generalizations ofthe Fourier transform, to obtain faster algorithms, butthis effort has not yet succeeded.
0.3 ALGORITHMS BEYOND ARITHMETIC
The quest for better algorithms is by no means restricted to arithmetictasks such as adding, multiplying or solving equations. Many graphalgorithms, including algorithms for finding paths, matchings, span-ning trees, cuts, and flows, have been discovered in the last severaldecades, and this is still an intensive area of research. (For example,the last few years saw many advances in algorithms for the maximumflow problem, borne out of unexpected connections with electrical cir-cuits and linear equation solvers.) These algorithms are being usednot just for the ânaturalâ applications of routing network traffic orGPS-based navigation, but also for applications as varied as drug dis-covery through searching for structures in gene-interaction graphs tocomputing risks from correlations in financial investments.
Google was founded based on the PageRank algorithm, which isan efficient algorithm to approximate the âprincipal eigenvectorâ of(a dampened version of) the adjacency matrix of the web graph. TheAkamai company was founded based on a new data structure, knownas consistent hashing, for a hash table where buckets are stored at dif-ferent servers. The backpropagation algorithm, which computes partialderivatives of a neural network in đ(đ) instead of đ(đ2) time, under-lies many of the recent phenomenal successes of learning deep neuralnetworks. Algorithms for solving linear equations under sparsityconstraints, a concept known as compressed sensing, have been usedto drastically reduce the amount and quality of data needed to ana-lyze MRI images. This made a critical difference for MRI imaging ofcancer tumors in children, where previously doctors needed to useanesthesia to suspend breath during the MRI exam, sometimes withdire consequences.
38 introduction to theoretical computer science
Even for classical questions, studied through the ages, new dis-coveries are still being made. For example, for the question of de-termining whether a given integer is prime or composite, which hasbeen studied since the days of Pythagoras, efficient probabilistic algo-rithms were only discovered in the 1970s, while the first deterministicpolynomial-time algorithm was only found in 2002. For the relatedproblem of actually finding the factors of a composite number, newalgorithms were found in the 1980s, and (as weâll see later in thiscourse) discoveries in the 1990s raised the tantalizing prospect ofobtaining faster algorithms through the use of quantum mechanicaleffects.
Despite all this progress, there are still many more questions thananswers in the world of algorithms. For almost all natural prob-lems, we do not know whether the current algorithm is the âbestâ,or whether a significantly better one is still waiting to be discovered.As alluded in Cobhamâs opening quote for this chapter, even for thebasic problem of multiplying numbers we have not yet answered thequestion of whether there is a multiplication algorithm that is as ef-ficient as our algorithms for addition. But at least we now know theright way to ask it.
0.4 ON THE IMPORTANCE OF NEGATIVE RESULTS
Finding better algorithms for problems such as multiplication, solv-ing equations, graph problems, or fitting neural networks to data, isundoubtedly a worthwhile endeavor. But why is it important to provethat such algorithms donât exist? One motivation is pure intellectualcuriosity. Another reason to study impossibility results is that theycorrespond to the fundamental limits of our world. In other words,impossibility results are laws of nature.
Here are some examples of impossibility results outside computerscience (see Section 0.7 for more about these). In physics, the impos-sibility of building a perpetual motion machine corresponds to the lawof conservation of energy. The impossibility of building a heat enginebeating Carnotâs bound corresponds to the second law of thermo-dynamics, while the impossibility of faster-than-light informationtransmission is a cornerstone of special relativity. In mathematics,while we all learned the formula for solving quadratic equations inhigh school, the impossibility of generalizing this formula to equationsof degree five or more gave birth to group theory. The impossibilityof proving Euclidâs fifth axiom from the first four gave rise to non-Euclidean geometries, which ended up crucial for the theory of generalrelativity.
In an analogous way, impossibility results for computation corre-spond to âcomputational laws of natureâ that tell us about the fun-
introduction 39
damental limits of any information processing apparatus, whetherbased on silicon, neurons, or quantum particles. Moreover, computerscientists found creative approaches to apply computational limitationsto achieve certain useful tasks. For example, much of modern Internettraffic is encrypted using the RSA encryption scheme, which relies onits security on the (conjectured) impossibility of efficiently factoringlarge integers. More recently, the Bitcoin system uses a digital ana-log of the âgold standardâ where, instead of using a precious metal,new currency is obtained by âminingâ solutions for computationallydifficult problems.
Chapter Recap
âą The history of algorithms goes back thousandsof years; they have been essential much of hu-man progress and these days form the basis ofmulti-billion dollar industries, as well as life-savingtechnologies.
âą There is often more than one algorithm to achievethe same computational task. Finding a faster al-gorithm can often make a much bigger differencethan improving computing hardware.
âą Better algorithms and data structures donât justspeed up calculations, but can yield new qualitativeinsights.
âą One question we will study is to find out what isthe most efficient algorithm for a given problem.
âą To show that an algorithm is the most efficient onefor a given problem, we need to be able to provethat it is impossible to solve the problem using asmaller amount of computational resources.
0.5 ROADMAP TO THE REST OF THIS BOOK
Often, when we try to solve a computational problem, whether it issolving a system of linear equations, finding the top eigenvector of amatrix, or trying to rank Internet search results, it is enough to use theâI know it when I see itâ standard for describing algorithms. As longas we find some way to solve the problem, we are happy and mightnot care much on the exact mathematical model for our algorithm.But when we want to answer a question such as âdoes there exist analgorithm to solve the problem đ ?â we need to be much more precise.
In particular, we will need to (1) define exactly what it means tosolve đ , and (2) define exactly what an algorithm is. Even (1) cansometimes be non-trivial but (2) is particularly challenging; it is notat all clear how (and even whether) we can encompass all potentialways to design algorithms. We will consider several simple models of
40 introduction to theoretical computer science
computation, and argue that, despite their simplicity, they do captureall âreasonableâ approaches to achieve computing, including all thosethat are currently used in modern computing devices.
Once we have these formal models of computation, we can tryto obtain impossibility results for computational tasks, showing thatsome problems can not be solved (or perhaps can not be solved withinthe resources of our universe). Archimedes once said that given afulcrum and a long enough lever, he could move the world. We willsee how reductions allow us to leverage one hardness result into aslew of a great many others, illuminating the boundaries betweenthe computable and uncomputable (or tractable and intractable)problems.
Later in this book we will go back to examining our models ofcomputation, and see how resources such as randomness or quantumentanglement could potentially change the power of our model. Inthe context of probabilistic algorithms, we will see a glimpse of howrandomness has become an indispensable tool for understandingcomputation, information, and communication. We will also see howcomputational difficulty can be an asset rather than a hindrance, andbe used for the âderandomizationâ of probabilistic algorithms. Thesame ideas also show up in cryptography, which has undergone notjust a technological but also an intellectual revolution in the last fewdecades, much of it building on the foundations that we explore inthis course.
Theoretical Computer Science is a vast topic, branching out andtouching upon many scientific and engineering disciplines. This bookprovides a very partial (and biased) sample of this area. More thananything, I hope I will manage to âinfectâ you with at least some ofmy love for this field, which is inspired and enriched by the connec-tion to practice, but is also deep and beautiful regardless of applica-tions.
0.5.1 Dependencies between chaptersThis book is divided into the following parts, see Fig. 5.
âą Preliminaries: Introduction, mathematical background, and repre-senting objects as strings.
âą Part I: Finite computation (Boolean circuits): Equivalence of cir-cuits and straight-line programs. Universal gate sets. Existence of acircuit for every function, representing circuits as strings, universalcircuit, lower bound on circuit size using the counting argument.
âą Part II: Uniform computation (Turing machines): Equivalence ofTuring machines and programs with loops. Equivalence of models
introduction 41
(including RAM machines, đ calculus, and cellular automata), con-figurations of Turing machines, existence of a universal Turing ma-chine, uncomputable functions (including the Halting problem andRiceâs Theorem), Gödelâs incompleteness theorem, restricted com-putational models models (regular and context free languages).
âą Part III: Efficient computation: Definition of running time, timehierarchy theorem, P and NP, P/poly, NP completeness and theCook-Levin Theorem, space bounded computation.
âą Part IV: Randomized computation: Probability, randomized algo-rithms, BPP, amplification, BPP â P/đđđđŠ, pseudorandom genera-tors and derandomization.
âą Part V: Advanced topics: Cryptography, proofs and algorithms(interactive and zero knowledge proofs, Curry-Howard correspon-dence), quantum computing.
Figure 5: The dependency structure of the differentparts. Part I introduces the model of Boolean cir-cuits to study finite functions with an emphasis onquantitative questions (how many gates to computea function). Part II introduces the model of Turingmachines to study functions that have unbounded inputlengths with an emphasis on qualitative questions (isthis function computable or not). Much of Part II doesnot depend on Part I, as Turing machines can be usedas the first computational model. Part III dependson both parts as it introduces a quantitative study offunctions with unbounded input length. The moreadvanced parts IV (randomized computation) andV (advanced topics) rely on the material of Parts I, IIand III.
The book largely proceeds in linear order, with each chapter build-ing on the previous ones, with the following exceptions:
âą The topics of đ calculus (Section 8.5 and Section 8.5), Gödelâs in-completeness theorem (Chapter 11), Automata/regular expres-sions and context-free grammars (Chapter 10), and space-boundedcomputation (Chapter 17), are not used in the following chapters.Hence you can choose whether to cover or skip any subset of them.
âą Part II (Uniform Computation / Turing Machines) does not havea strong dependency on Part I (Finite computation / Boolean cir-cuits) and it should be possible to teach them in the reverse order
42 introduction to theoretical computer science
with minor modification. Boolean circuits are used Part III (efficientcomputation) for results such as P â P/poly and the Cook-LevinTheorem, as well as in Part IV (for BPP â P/poly and derandom-ization) and Part V (specifically in cryptography and quantumcomputing).
âą All chapters in Part V (Advanced topics) are independent of oneanother and can be covered in any order.
A course based on this book can use all of Parts I, II, and III (possi-bly skipping over some or all of the đ calculus, Chapter 11, Chapter 10or Chapter 17), and then either cover all or some of Part IV (random-ized computation), and add a âsprinklingâ of advanced topics fromPart V based on student or instructor interest.
0.6 EXERCISES
Exercise 0.1 Rank the significance of the following inventions in speed-ing up multiplication of large (that is 100-digit or more) numbers.That is, use âback of the envelopeâ estimates to order them in terms ofthe speedup factor they offered over the previous state of affairs.
a. Discovery of the grade-school digit by digit algorithm (improvingupon repeated addition)
b. Discovery of Karatsubaâs algorithm (improving upon the digit bydigit algorithm)
c. Invention of modern electronic computers (improving upon calcu-lations with pen and paper).
Exercise 0.2 The 1977 Apple II personal computer had a processorspeed of 1.023 Mhz or about 106 operations per seconds. At thetime of this writing the worldâs fastest supercomputer performs 93âpetaflopsâ (1015 floating point operations per second) or about 1018basic steps per second. For each one of the following running times(as a function of the input length đ), compute for both computers howlarge an input they could handle in a week of computation, if they runan algorithm that has this running time:
a. đ operations.
b. đ2 operations.
c. đ logđ operations.
d. 2đ operations.
introduction 43
4 As we will see in Chapter Chapter 21, almost anycompany relying on cryptography needs to assumethe non existence of certain algorithms. In particular,RSA Security was founded based on the securityof the RSA cryptosystem, which presumes the nonexistence of an efficient algorithm to compute theprime factorization of large integers.5 Hint: Use a proof by induction - suppose that this istrue for all đâs from 1 to đ and prove that this is truealso for đ + 1.
e. đ! operations.
Exercise 0.3 â Usefulness of algorithmic non-existence. In this chapter wementioned several companies that were founded based on the discov-ery of new algorithms. Can you give an example for a company thatwas founded based on the non existence of an algorithm? See footnotefor hint.4
Exercise 0.4 â Analysis of Karatsubaâs Algorithm. a. Suppose thatđ1, đ2, đ3, ⊠is a sequence of numbers such that đ2 †10 andfor every đ, đđ †3đâđ/2â+1 + đ¶đ for some đ¶ â„ 1. Prove thatđđ †20đ¶đlog2 3 for every đ > 2.5
b. Prove that the number of single-digit operations that Karatsubaâsalgorithm takes to multiply two đ digit numbers is at most1000đlog2 3.
Exercise 0.5 Implement in the programming language of yourchoice functions Gradeschool_multiply(x,y) and Karat-suba_multiply(x,y) that take two arrays of digits x and y and returnan array representing the product of x and y (where x is identifiedwith the number x[0]+10*x[1]+100*x[2]+... etc..) using thegrade-school algorithm and the Karatsuba algorithm respectively.At what number of digits does the Karatsuba algorithm beat thegrade-school one?
Exercise 0.6 â Matrix Multiplication (optional, advanced). In this exercise, weshow that if for some đ > 2, we can write the product of two đ Ă đreal-valued matrices đŽ, đ” using at most đđ multiplications, then wecan multiply two đ Ă đ matrices in roughly đđ time for every largeenough đ.
To make this precise, we need to make some notation that is unfor-tunately somewhat cumbersome. Assume that there is some đ â âand đ †đđ such that for every đ Ă đ matrices đŽ, đ”, đ¶ such thatđ¶ = AB, we can write for every đ, đ â [đ]:đ¶đ,đ = đâ1ââ=0 đŒâđ,đđâ(đŽ)đâ(đ”) (6)
for some linear functions đ0, ⊠, đđâ1, đ0, ⊠, đđâ1 ⶠâđ2 â â andcoefficients đŒâđ,đđ,đâ[đ],ââ[đ]. Prove that under this assumption forevery đ > 0, if đ is sufficiently large, then there is an algorithm thatcomputes the product of two đ Ă đ matrices using at most đ(đđ+đ)arithmetic operations. See footnote for hint.6
44 introduction to theoretical computer science
0.7 BIBLIOGRAPHICAL NOTES
For a brief overview of what weâll see in this book, you could do farworse than read Bernard Chazelleâs wonderful essay on the Algo-rithm as an Idiom of modern science. The book of Moore and Mertens[MM11] gives a wonderful and comprehensive overview of the theoryof computation, including much of the content discussed in this chap-ter and the rest of this book. Aaronsonâs book [Aar13] is another greatread that touches upon many of the same themes.
For more on the algorithms the Babylonians used, see Knuthâspaper and Neugebauerâs classic book.
Many of the algorithms we mention in this chapter are coveredin algorithms textbooks such as those by Cormen, Leiserson, Rivert,and Stein [Cor+09], Kleinberg and Tardos [KT06], and Dasgupta, Pa-padimitriou and Vazirani [DPV08], as well as Jeff Ericksonâs textbook.Ericksonâs book is freely available online and contains a great exposi-tion of recursive algorithms in general and Karatsubaâs algorithm inparticular.
The story of Karatsubaâs discovery of his multiplication algorithmis recounted by him in [Kar95]. As mentioned above, further improve-ments were made by Toom and Cook [Too63; Coo66], Schönhage andStrassen [SS71], FĂŒrer [FĂŒr07], and recently by Harvey and Van DerHoeven [HV19], see this article for a nice overview. The last paperscrucially rely on the Fast Fourier transform algorithm. The fascinatingstory of the (re)discovery of this algorithm by John Tukey in the con-text of the cold war is recounted in [Coo87]. (We say re-discoverybecause it later turned out that the algorithm dates back to Gauss[HJB85].) The Fast Fourier Transform is covered in some of the booksmentioned below, and there are also online available lectures such asJeff Ericksonâs. See also this popular article by David Austin. Fast ma-trix multiplication was discovered by Strassen [Str69], and since thenthis has been an active area of research. [BlĂ€13] is a recommendedself-contained survey of this area.
The Backpropagation algorithm for fast differentiation of neural net-works was invented by Werbos [Wer74]. The Pagerank algorithm wasinvented by Larry Page and Sergey Brin [Pag+99]. It is closely relatedto the HITS algorithm of Kleinberg [Kle99]. The Akamai company wasfounded based on the consistent hashing data structure described in[Kar+97]. Compressed sensing has a long history but two foundationalpapers are [CRT06; Don06]. [Lus+08] gives a survey of applicationsof compressed sensing to MRI; see also this popular article by Ellen-berg [Ell10]. The deterministic polynomial-time algorithm for testingprimality was given by Agrawal, Kayal, and Saxena [AKS04].
introduction 45
We alluded briefly to classical impossibility results in mathematics,including the impossibility of proving Euclidâs fifth postulate fromthe other four, impossibility of trisecting an angle with a straightedgeand compass and the impossibility of solving a quintic equation viaradicals. A geometric proof of the impossibility of angle trisection(one of the three geometric problems of antiquity, going back to theancient Greeks) is given in this blog post of Tao. The book of MarioLivio [Liv05] covers some of the background and ideas behind theseimpossibility results. Some exciting recent research is focused ontrying to use computational complexity to shed light on fundamentalquestions in physics such understanding black holes and reconcilinggeneral relativity with quantum mechanics
1Mathematical Background
âI found that every number, which may be expressed from one to ten, surpassesthe preceding by one unit: afterwards the ten is doubled or tripled ⊠untila hundred; then the hundred is doubled and tripled in the same manner asthe units and the tens ⊠and so forth to the utmost limit of numeration.â,Muhammad ibn MĆ«sÄ al-KhwÄrizmÄ«, 820, translation by Fredric Rosen,1831.
In this chapter we review some of the mathematical concepts thatwe use in this book. These concepts are typically covered in coursesor textbooks on âmathematics for computer scienceâ or âdiscretemathematicsâ; see the âBibliographical Notesâ section (Section 1.9)for several excellent resources on these topics that are freely-availableonline.
A mathematicianâs apology. Some students might wonder why thisbook contains so much math. The reason is that mathematics is sim-ply a language for modeling concepts in a precise and unambiguousway. In this book we use math to model the concept of computation.For example, we will consider questions such as âis there an efficientalgorithm to find the prime factors of a given integer?â. (We will see thatthis question is particularly interesting, touching on areas as far apartas Internet security and quantum mechanics!) To even phrase such aquestion, we need to give a precise definition of the notion of an algo-rithm, and of what it means for an algorithm to be efficient. Also, sincethere is no empirical experiment to prove the nonexistence of an algo-rithm, the only way to establish such a result is using a mathematicalproof.
1.1 THIS CHAPTER: A READERâS MANUAL
Depending on your background, you can approach this chapter in twodifferent ways:
âą If you have already taken a âdiscrete mathematicsâ, âmathematicsfor computer scienceâ or similar courses, you do not need to read
Compiled on 8.26.2020 18:10
Learning Objectives:âą Recall basic mathematical notions such as
sets, functions, numbers, logical operatorsand quantifiers, strings, and graphs.
âą Rigorously define Big-đ notation.âą Proofs by induction.âą Practice with reading mathematical
definitions, statements, and proofs.âą Transform an intuitive argument into a
rigorous proof.
48 introduction to theoretical computer science
the whole chapter. You can just take quick look at Section 1.2 to seethe main tools we will use, Section 1.7 for our notation and conven-tions, and then skip ahead to the rest of this book. Alternatively,you can sit back, relax, and read this chapter just to get familiarwith our notation, as well as to enjoy (or not) my philosophicalmusings and attempts at humor.
âą If your background is less extensive, see Section 1.9 for some re-sources on these topics. This chapter briefly covers the conceptsthat we need, but you may find it helpful to see a more in-depthtreatment. As usual with math, the best way to get comfort withthis material is to work out exercises on your own.
âą You might also want to start brushing up on discrete probability,which weâll use later in this book (see Chapter 18).
1.2 A QUICK OVERVIEW OF MATHEMATICAL PREREQUISITES
The main mathematical concepts we will use are the following. Wejust list these notions below, deferring their definitions to the rest ofthis chapter. If you are familiar with all of these, then you might wantto just skip to Section 1.7 to see the full list of notation we use.
âą Proofs: First and foremost, this book involves a heavy dose of for-mal mathematical reasoning, which includes mathematical defini-tions, statements, and proofs.
âą Sets and set operations: We will use extensively mathematical sets.We use the basic set relations of membership (â) and containment(â), and set operations, principally union (âȘ), intersection (â©), andset difference (⧔).
âą Cartesian product and Kleene star operation: We also use theCartesian product of two sets đŽ and đ”, denoted as đŽ Ă đ” (that is,đŽ Ă đ” the set of pairs (đ, đ) where đ â đŽ and đ â đ”). We denote byđŽđ the đ fold Cartesian product (e.g., đŽ3 = đŽ Ă đŽ Ă đŽ) and by đŽâ(known as the Kleene star) the union of đŽđ for all đ â 0, 1, 2, âŠ.
âą Functions: The domain and codomain of a function, properties suchas being one-to-one (also known as injective) or onto (also knownas surjective) functions, as well as partial functions (that, unlikestandard or âtotalâ functions, are not necessarily defined on allelements of their domain).
âą Logical operations: The operations AND (â§), OR (âš), and NOT(ÂŹ) and the quantifiers âthere existsâ (â) and âfor allâ (â).
âą Basic combinatorics: Notions such as (đđ) (the number of đ-sizedsubsets of a set of size đ).
mathematical background 49
Figure 1.1: A snippet from the âmethodsâ section ofthe âAlphaGo Zeroâ paper by Silver et al, Nature, 2017.
Figure 1.2: A snippet from the âZerocashâ paper ofBen-Sasson et al, that forms the basis of the cryptocur-rency startup Zcash.
âą Graphs: Undirected and directed graphs, connectivity, paths, andcycles.
âą Big-đ notation: đ, đ, Ω, đ, Î notation for analyzing asymptoticgrowth of functions.
âą Discrete probability: We will use probability theory, and specifi-cally probability over finite samples spaces such as tossing đ coins,including notions such as random variables, expectation, and concen-tration. We will only use probability theory in the second half ofthis text, and will review it beforehand in Chapter 18. However,probabilistic reasoning is a subtle (and extremely useful!) skill, anditâs always good to start early in acquiring it.
In the rest of this chapter we briefly review the above notions. Thisis partially to remind the reader and reinforce material that mightnot be fresh in your mind, and partially to introduce our notationand conventions which might occasionally differ from those youâveencountered before.
1.3 READING MATHEMATICAL TEXTS
Mathematicians use jargon for the same reason that it is used in manyother professions such engineering, law, medicine, and others. Wewant to make terms precise and introduce shorthand for conceptsthat are frequently reused. Mathematical texts tend to âpack a lotof punchâ per sentence, and so the key is to read them slowly andcarefully, parsing each symbol at a time.
With time and practice you will see that reading mathematical textsbecomes easier and jargon is no longer an issue. Moreover, readingmathematical texts is one of the most transferable skills you could takefrom this book. Our world is changing rapidly, not just in the realmof technology, but also in many other human endeavors, whether itis medicine, economics, law or even culture. Whatever your futureaspirations, it is likely that you will encounter texts that use new con-cepts that you have not seen before (see Fig. 1.1 and Fig. 1.2 for tworecent examples from current âhot areasâ). Being able to internalizeand then apply new definitions can be hugely important. It is a skillthatâs much easier to acquire in the relatively safe and stable context ofa mathematical course, where one at least has the guarantee that theconcepts are fully specified, and you have access to your teaching stafffor questions.
The basic components of a mathematical text are definitions, asser-tions and proofs.
50 introduction to theoretical computer science
Figure 1.3: An annotated form of Definition 1.1,marking which part is being defined and how.
1.3.1 DefinitionsMathematicians often define new concepts in terms of old concepts.For example, here is a mathematical definition which you may haveencountered in the past (and will see again shortly):
Definition 1.1 â One to one function. Let đ, đ be sets. We say that afunction đ ⶠđ â đ is one to one (also known as injective) if for everytwo elements đ„, đ„âČ â đ, if đ„ â đ„âČ then đ(đ„) â đ(đ„âČ).Definition 1.1 captures a simple concept, but even so it uses quite
a bit of notation. When reading such a definition, it is often useful toannotate it with a pen as youâre going through it (see Fig. 1.3). Forexample, when you see an identifier such as đ , đ or đ„, make sure thatyou realize what sort of object is it: is it a set, a function, an element,a number, a gremlin? You might also find it useful to explain thedefinition in words to a friend (or to yourself).
1.3.2 Assertions: Theorems, lemmas, claimsTheorems, lemmas, claims and the like are true statements about theconcepts we defined. Deciding whether to call a particular statement aâTheoremâ, a âLemmaâ or a âClaimâ is a judgement call, and does notmake a mathematical difference. All three correspond to statementswhich were proven to be true. The difference is that a Theorem refersto a significant result, that we would want to remember and highlight.A Lemma often refers to a technical result, that is not necessarily im-portant in its own right, but can be often very useful in proving othertheorems. A Claim is a âthrow awayâ statement, that we need to usein order to prove some other bigger results, but do not care so muchabout for its own sake.
1.3.3 ProofsMathematical proofs are the arguments we use to demonstrate that ourtheorems, lemmas, and claims area indeed true. We discuss proofs inSection 1.5 below, but the main point is that the mathematical stan-dard of proof is very high. Unlike in some other realms, in mathe-matics a proof is an âairtightâ argument that demonstrates that thestatement is true beyond a shadow of a doubt. Some examples in thissection for mathematical proofs are given in Solved Exercise 1.1 andSection 1.6. As mentioned in the preface, as a general rule, it is moreimportant you understand the definitions than the theorems, and it ismore important you understand a theorem statement than its proof.
mathematical background 51
1.4 BASIC DISCRETE MATH OBJECTS
In this section we quickly review some of the mathematical objects(the âbasic data structuresâ of mathematics, if you will) we use in thisbook.
1.4.1 SetsA set is an unordered collection of objects. For example, when wewrite đ = 2, 4, 7, we mean that đ denotes the set that contains thenumbers 2, 4, and 7. (We use the notation â2 â đâ to denote that 2 isan element of đ.) Note that the set 2, 4, 7 and 7, 4, 2 are identical,since they contain the same elements. Also, a set either contains anelement or does not contain it â there is no notion of containing itâtwiceâ â and so we could even write the same set đ as 2, 2, 4, 7(though that would be a little weird). The cardinality of a finite set đ,denoted by |đ|, is the number of elements it contains. (Cardinality canbe defined for infinite sets as well; see the sources in Section 1.9.) So,in the example above, |đ| = 3. A set đ is a subset of a set đ , denotedby đ â đ , if every element of đ is also an element of đ . (We canalso describe this by saying that đ is a superset of đ.) For example,2, 7 â 2, 4, 7. The set that contains no elements is known as theempty set and it is denoted by â . If đŽ is a subset of đ” that is not equalto đ” we say that đŽ is a strict subset of đ”, and denote this by đŽ â đ”.
We can define sets by either listing all their elements or by writingdown a rule that they satisfy such as
EVEN = đ„ | đ„ = 2đŠ for some non-negative integer đŠ . (1.1)
Of course there is more than one way to write the same set, and of-ten we will use intuitive notation listing a few examples that illustratethe rule. For example, we can also define EVEN as
EVEN = 0, 2, 4, ⊠. (1.2)Note that a set can be either finite (such as the set 2, 4, 7) or in-
finite (such as the set EVEN). Also, the elements of a set donât haveto be numbers. We can talk about the sets such as the set đ, đ, đ, đ, đąof all the vowels in the English language, or the set New York, LosAngeles, Chicago, Houston, Philadelphia, Phoenix, San Antonio,San Diego, Dallas of all cities in the U.S. with population more thanone million per the 2010 census. A set can even have other sets as ele-ments, such as the set â , 1, 2, 2, 3, 1, 3 of all even-sized subsetsof 1, 2, 3.Operations on sets: The union of two sets đ, đ , denoted by đ âȘ đ ,is the set that contains all elements that are either in đ or in đ . Theintersection of đ and đ , denoted by đ â© đ , is the set of elements that are
52 introduction to theoretical computer science
1 The letter Z stands for the German word âZahlenâ,which means numbers.
both in đ and in đ . The set difference of đ and đ , denoted by đ ⧔ đ (andin some texts also by đ â đ ), is the set of elements that are in đ but notin đ .
Tuples, lists, strings, sequences: A tuple is an ordered collection of items.For example (1, 5, 2, 1) is a tuple with four elements (also known asa 4-tuple or quadruple). Since order matters, this is not the sametuple as the 4-tuple (1, 1, 5, 2) or the 3-tuple (1, 5, 2). A 2-tuple is alsoknown as a pair. We use the terms tuples and lists interchangeably.A tuple where every element comes from some finite set ÎŁ (such as0, 1) is also known as a string. Analogously to sets, we denote thelength of a tuple đ by |đ |. Just like sets, we can also think of infiniteanalogues of tuples, such as the ordered collection (1, 4, 9, âŠ) of allperfect squares. Infinite ordered collections are known as sequences;we might sometimes use the term âinfinite sequenceâ to emphasizethis, and use âfinite sequenceâ as a synonym for a tuple. (We canidentify a sequence (đ0, đ1, đ2, âŠ) of elements in some set đ with afunction đŽ ⶠâ â đ (where đđ = đŽ(đ) for every đ â â). Similarly,we can identify a đ-tuple (đ0, ⊠, đđâ1) of elements in đ with a functionđŽ ⶠ[đ] â đ.)
Cartesian product: If đ and đ are sets, then their Cartesian product,denoted by đ Ă đ , is the set of all ordered pairs (đ , đĄ) where đ â đ andđĄ â đ . For example, if đ = 1, 2, 3 and đ = 10, 12, then đ Ă đcontains the 6 elements (1, 10), (2, 10), (3, 10), (1, 12), (2, 12), (3, 12).Similarly if đ, đ , đ are sets then đ Ă đ Ă đ is the set of all orderedtriples (đ , đĄ, đą) where đ â đ, đĄ â đ , and đą â đ . More generally, forevery positive integer đ and sets đ0, ⊠, đđâ1, we denote by đ0 Ă đ1 Ă⯠à đđâ1 the set of ordered đ-tuples (đ 0, ⊠, đ đâ1) where đ đ â đđ forevery đ â 0, ⊠, đ â 1. For every set đ, we denote the set đ Ă đ by đ2,đ Ă đ Ă đ by đ3, đ Ă đ Ă đ Ă đ by đ4, and so on and so forth.
1.4.2 Special setsThere are several sets that we will use in this book time and again. Theset â = 0, 1, 2, ⊠(1.3)contains all natural numbers, i.e., non-negative integers. For any naturalnumber đ â â, we define the set [đ] as 0, ⊠, đ â 1 = đ â â â¶đ < đ. (We start our indexing of both â and [đ] from 0, while manyother texts index those sets from 1. Starting from zero or one is simplya convention that doesnât make much difference, as long as one isconsistent about it.)
We will also occasionally use the set †= ⊠, â2, â1, 0, +1, +2, âŠof (negative and non-negative) integers,1 as well as the set â of real
mathematical background 53
numbers. (This is the set that includes not just the integers, but alsofractional and irrational numbers; e.g., â contains numbers such as+0.5, âđ, etc.) We denote by â+ the set đ„ â â ⶠđ„ > 0 of positive realnumbers. This set is sometimes also denoted as (0, â).Strings: Another set we will use time and again is0, 1đ = (đ„0, ⊠, đ„đâ1) ⶠđ„0, ⊠, đ„đâ1 â 0, 1 (1.4)which is the set of all đ-length binary strings for some natural numberđ. That is 0, 1đ is the set of all đ-tuples of zeroes and ones. This isconsistent with our notation above: 0, 12 is the Cartesian product0, 1 Ă 0, 1, 0, 13 is the product 0, 1 Ă 0, 1 Ă 0, 1 and so on.
We will write the string (đ„0, đ„1, ⊠, đ„đâ1) as simply đ„0đ„1 ⯠đ„đâ1. Forexample, 0, 13 = 000, 001, 010, 011, 100, 101, 110, 111 . (1.5)
For every string đ„ â 0, 1đ and đ â [đ], we write đ„đ for the đđĄâelement of đ„.
We will also often talk about the set of binary strings of all lengths,which is
0, 1â = (đ„0, ⊠, đ„đâ1) ⶠđ â â , , đ„0, ⊠, đ„đâ1 â 0, 1 . (1.6)
Another way to write this set is as0, 1â = 0, 10 âȘ 0, 11 âȘ 0, 12 âȘ ⯠(1.7)
or more concisely as 0, 1â = âȘđââ0, 1đ . (1.8)The set 0, 1â includes the âstring of length 0â or âthe empty
stringâ, which we will denote by "". (In using this notation we fol-low the convention of many programming languages. Other textssometimes use đ or đ to denote the empty string.)
Generalizing the star operation: For every set ÎŁ, we defineÎŁâ = âȘđââÎŁđ . (1.9)For example, if ÎŁ = đ, đ, đ, đ, ⊠, đ§ then ÎŁâ denotes the set of all finitelength strings over the alphabet a-z.
Concatenation: The concatenation of two strings đ„ â ÎŁđ and đŠ â ÎŁđ isthe (đ + đ)-length string đ„đŠ obtained by writing đŠ after đ„. That is, ifđ„ â 0, 1đ and đŠ â 0, 1đ, then đ„đŠ is equal to the string đ§ â 0, 1đ+đsuch that for đ â [đ], đ§đ = đ„đ and for đ â đ, ⊠, đ + đ â 1, đ§đ = đŠđâđ.
54 introduction to theoretical computer science
2 For two natural numbers đ„ and đ, đ„ mod đ (short-hand for âmoduloâ) denotes the remainder of đ„when it is divided by đ. That is, it is the number đ in0, ⊠, đ â 1 such that đ„ = đđ + đ for some integer đ.We sometimes also use the notation đ„ = đŠ ( mod đ)to denote the assertion that đ„ mod đ is the same as đŠmod đ.
1.4.3 FunctionsIf đ and đ are nonempty sets, a function đč mapping đ to đ , denotedby đč ⶠđ â đ , associates with every element đ„ â đ an elementđč(đ„) â đ . The set đ is known as the domain of đč and the set đis known as the codomain of đč . The image of a function đč is the setđč(đ„) | đ„ â đ which is the subset of đč âs codomain consisting of alloutput elements that are mapped from some input. (Some texts userange to denote the image of a function, while other texts use rangeto denote the codomain of a function. Hence we will avoid using theterm ârangeâ altogether.) As in the case of sets, we can write a func-tion either by listing the table of all the values it gives for elementsin đ or by using a rule. For example if đ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9and đ = 0, 1, then the table below defines a function đč ⶠđ â đ .Note that this function is the same as the function defined by the ruleđč(đ„) = (đ„ mod 2).2
Table 1.1: An example of a function.
Input Output0 01 12 03 14 05 16 07 18 09 1
If đ ⶠđ â đ satisfies that đ(đ„) â đ(đŠ) for all đ„ â đŠ then we saythat đ is one-to-one (Definition 1.1, also known as an injective functionor simply an injection). If đč satisfies that for every đŠ â đ there is someđ„ â đ such that đč(đ„) = đŠ then we say that đč is onto (also known as asurjective function or simply a surjection). A function that is both one-to-one and onto is known as a bijective function or simply a bijection.A bijection from a set đ to itself is also known as a permutation of đ. Ifđč ⶠđ â đ is a bijection then for every đŠ â đ there is a unique đ„ â đsuch that đč(đ„) = đŠ. We denote this value đ„ by đč â1(đŠ). Note that đč â1is itself a bijection from đ to đ (can you see why?).
Giving a bijection between two sets is often a good way to showthey have the same size. In fact, the standard mathematical definitionof the notion that âđ and đ have the same cardinalityâ is that there
mathematical background 55
Figure 1.4: We can represent finite functions as adirected graph where we put an edge from đ„ tođ(đ„). The onto condition corresponds to requiringthat every vertex in the codomain of the functionhas in-degree at least one. The one-to-one conditioncorresponds to requiring that every vertex in thecodomain of the function has in-degree at most one. Inthe examples above đč is an onto function, đș is one toone, and đ» is neither onto nor one to one.
exists a bijection đ ⶠđ â đ . Further, the cardinality of a set đ isdefined to be đ if there is a bijection from đ to the set 0, ⊠, đ â 1.As we will see later in this book, this is a definition that generalizes todefining the cardinality of infinite sets.Partial functions: We will sometimes be interested in partial functionsfrom đ to đ . A partial function is allowed to be undefined on somesubset of đ. That is, if đč is a partial function from đ to đ , then forevery đ â đ, either there is (as in the case of standard functions) anelement đč(đ ) in đ , or đč(đ ) is undefined. For example, the partial func-tion đč(đ„) = âđ„ is only defined on non-negative real numbers. Whenwe want to distinguish between partial functions and standard (i.e.,non-partial) functions, we will call the latter total functions. When wesay âfunctionâ without any qualifier then we mean a total function.
The notion of partial functions is a strict generalization of func-tions, and so every function is a partial function, but not every partialfunction is a function. (That is, for every nonempty đ and đ , the setof partial functions from đ to đ is a proper superset of the set of totalfunctions from đ to đ .) When we want to emphasize that a functionđ from đŽ to đ” might not be total, we will write đ ⶠđŽ âđ đ”. We canthink of a partial function đč from đ to đ also as a total function fromđ to đ âȘ â„ where â„ is a special âfailure symbolâ. So, instead ofsaying that đč is undefined at đ„, we can say that đč(đ„) = â„.Basic facts about functions: Verifying that you can prove the followingresults is an excellent way to brush up on functions:âą If đč ⶠđ â đ and đș ⶠđ â đ are one-to-one functions, then their
composition đ» ⶠđ â đ defined as đ»(đ ) = đș(đč(đ )) is also one toone.
âą If đč ⶠđ â đ is one to one, then there exists an onto functionđș ⶠđ â đ such that đș(đč(đ )) = đ for every đ â đ.
âą If đș ⶠđ â đ is onto then there exists a one-to-one function đč ⶠđ âđ such that đș(đč(đ )) = đ for every đ â đ.
âą If đ and đ are finite sets then the following conditions are equiva-lent to one another: (a) |đ| †|đ |, (b) there is a one-to-one functionđč ⶠđ â đ , and (c) there is an onto function đș ⶠđ â đ. (This isactually true even for infinite đ and đ : in that case (b) (or equiva-lently (c)) is the commonly accepted definition for |đ| †|đ |.)
PYou can find the proofs of these results in many dis-crete math texts, including for example, Section 4.5in the Lehman-Leighton-Meyer notes. However, I
56 introduction to theoretical computer science
3 It is possible, and sometimes useful, to think of anundirected graph as the special case of a directedgraph that has the special property that for every pairđą, đŁ either both the edges (đą, đŁ) and (đŁ, đą) are presentor neither of them is. However, in many settings thereis a significant difference between undirected anddirected graphs, and so itâs typically best to think ofthem as separate categories.
Figure 1.5: An example of an undirected and a di-rected graph. The undirected graph has vertex set1, 2, 3, 4 and edge set 1, 2, 2, 3, 2, 4. Thedirected graph has vertex set đ, đ, đ and the edgeset (đ, đ), (đ, đ), (đ, đ), (đ, đ).
strongly suggest you try to prove them on your own,or at least convince yourself that they are true byproving special cases of those for small sizes (e.g.,|đ| = 3, |đ | = 4, |đ| = 5).
Let us prove one of these facts as an example:Lemma 1.2 If đ, đ are non-empty sets and đč ⶠđ â đ is one to one, thenthere exists an onto function đș ⶠđ â đ such that đș(đč(đ )) = đ forevery đ â đ.
Proof. Choose some đ 0 â đ. We will define the function đș ⶠđ â đ asfollows: for every đĄ â đ , if there is some đ â đ such that đč(đ ) = đĄ thenset đș(đĄ) = đ (the choice of đ is well defined since by the one-to-oneproperty of đč , there cannot be two distinct đ , đ âČ that both map to đĄ).Otherwise, set đș(đĄ) = đ 0. Now for every đ â đ, by the definition of đș,if đĄ = đč(đ ) then đș(đĄ) = đș(đč(đ )) = đ . Moreover, this also shows thatđș is onto, since it means that for every đ â đ there is some đĄ, namelyđĄ = đč(đ ), such that đș(đĄ) = đ .
1.4.4 GraphsGraphs are ubiquitous in Computer Science, and many other fields aswell. They are used to model a variety of data types including socialnetworks, scheduling constraints, road networks, deep neural nets,gene interactions, correlations between observations, and a greatmany more. Formal definitions of several kinds of graphs are givennext, but if you have not seen graphs before in a course, I urge you toread up on them in one of the sources mentioned in Section 1.9.
Graphs come in two basic flavors: undirected and directed.3
Definition 1.3 â Undirected graphs. An undirected graph đș = (đ , đž) con-sists of a set đ of vertices and a set đž of edges. Every edge is a sizetwo subset of đ . We say that two vertices đą, đŁ â đ are neighbors, ifthe edge đą, đŁ is in đž.
Given this definition, we can define several other properties ofgraphs and their vertices. We define the degree of đą to be the numberof neighbors đą has. A path in the graph is a tuple (đą0, ⊠, đąđ) â đ đ+1,for some đ > 0 such that đąđ+1 is a neighbor of đąđ for every đ â [đ]. Asimple path is a path (đą0, ⊠, đąđâ1) where all the đąđâs are distinct. A cycleis a path (đą0, ⊠, đąđ) where đą0 = đąđ. We say that two vertices đą, đŁ â đare connected if either đą = đŁ or there is a path from (đą0, ⊠, đąđ) wheređą0 = đą and đąđ = đŁ. We say that the graph đș is connected if every pair ofvertices in it is connected.
mathematical background 57
Here are some basic facts about undirected graphs. We give someinformal arguments below, but leave the full proofs as exercises (theproofs can be found in many of the resources listed in Section 1.9).Lemma 1.4 In any undirected graph đș = (đ , đž), the sum of the degreesof all vertices is equal to twice the number of edges.
Lemma 1.4 can be shown by seeing that every edge đą, đŁ con-tributes twice to the sum of the degrees (once for đą and the secondtime for đŁ).Lemma 1.5 The connectivity relation is transitive, in the sense that if đą isconnected to đŁ, and đŁ is connected to đ€, then đą is connected to đ€.
Lemma 1.5 can be shown by simply attaching a path of the form(đą, đą1, đą2, ⊠, đąđâ1, đŁ) to a path of the form (đŁ, đąâČ1, ⊠, đąâČđâČâ1, đ€) to obtainthe path (đą, đą1, ⊠, đąđâ1, đŁ, đąâČ1, ⊠, đąâČđâČâ1, đ€) that connects đą to đ€.Lemma 1.6 For every undirected graph đș = (đ , đž) and connected pairđą, đŁ, the shortest path from đą to đŁ is simple. In particular, for everyconnected pair there exists a simple path that connects them.
Lemma 1.6 can be shown by âshortcuttingâ any non simple pathfrom đą to đŁ where the same vertex đ€ appears twice to remove it (seeFig. 1.6). It is a good exercise to transforming this intuitive reasoningto a formal proof:
Figure 1.6: If there is a path from đą to đŁ in a graphthat passes twice through a vertex đ€ then we canâshortcutâ it by removing the loop from đ€ to itself tofind a path from đą to đŁ that only passes once throughđ€.
Solved Exercise 1.1 â Connected vertices have simple paths. Prove Lemma 1.6
Solution:
The proof follows the idea illustrated in Fig. 1.6. One complica-tion is that there can be more than one vertex that is visited twiceby a path, and so âshortcuttingâ might not necessarily result in a
58 introduction to theoretical computer science
simple path; we deal with this by looking at a shortest path betweenđą and đŁ. Details follow.Let đș = (đ , đž) be a graph and đą and đŁ in đ be two connected
vertices in đș. We will prove that there is a simple graph betweenđą and đŁ. Let đ be the shortest length of a path between đą and đŁand let đ = (đą0, đą1, đą2, ⊠, đąđâ1, đąđ) be a đ-length path from đą to đŁ(there can be more than one such path: if so we just choose one ofthem). (That is đą0 = đą, đąđ = đŁ, and (đąâ, đąâ+1) â đž for all â â [đ].)We claim that đ is simple. Indeed, suppose otherwise that there issome vertex đ€ that occurs twice in the path: đ€ = đąđ and đ€ = đąđ forsome đ < đ. Then we can âshortcutâ the path đ by considering thepath đ âČ = (đą0, đą1, ⊠, đąđâ1, đ€, đąđ+1, ⊠, đąđ) obtained by taking thefirst đ vertices of đ (from đą0 = 0 to the first occurrence of đ€) andthe last đ â đ ones (from the vertex đąđ+1 following the second oc-currence of đ€ to đąđ = đŁ). The path đ âČ is a valid path between đą andđŁ since every consecutive pair of vertices in it is connected by anedge (in particular, since đ€ = đąđ = đ€đ, both (đąđâ1, đ€) and (đ€, đąđ+1)are edges in đž), but since the length of đ âČ is đ â (đ â đ) < đ, thiscontradicts the minimality of đ .
RRemark 1.7 â Finding proofs. Solved Exercise 1.1 is agood example of the process of finding a proof. Youstart by ensuring you understand what the statementmeans, and then come up with an informal argumentwhy it should be true. You then transform the infor-mal argument into a rigorous proof. This proof neednot be very long or overly formal, but should clearlyestablish why the conclusion of the statement followsfrom its assumptions.
The concepts of degrees and connectivity extend naturally to di-rected graphs, defined as follows.
Definition 1.8 â Directed graphs. A directed graph đș = (đ , đž) consistsof a set đ and a set đž â đ Ă đ of ordered pairs of đ . We sometimesdenote the edge (đą, đŁ) also as đą â đŁ. If the edge đą â đŁ is presentin the graph then we say that đŁ is an out-neighbor of đą and đą is anin-neighbor of đŁ.A directed graph might contain both đą â đŁ and đŁ â đą in which
case đą will be both an in-neighbor and an out-neighbor of đŁ and viceversa. The in-degree of đą is the number of in-neighbors it has, and theout-degree of đŁ is the number of out-neighbors it has. A path in the
mathematical background 59
graph is a tuple (đą0, ⊠, đąđ) â đ đ+1, for some đ > 0 such that đąđ+1 is anout-neighbor of đąđ for every đ â [đ]. As in the undirected case, a simplepath is a path (đą0, ⊠, đąđâ1) where all the đąđâs are distinct and a cycleis a path (đą0, ⊠, đąđ) where đą0 = đąđ. One type of directed graphs weoften care about is directed acyclic graphs or DAGs, which, as their nameimplies, are directed graphs without any cycles:
Definition 1.9 â Directed Acyclic Graphs. We say that đș = (đ , đž) is adirected acyclic graph (DAG) if it is a directed graph and there doesnot exist a list of vertices đą0, đą1, ⊠, đąđ â đ such that đą0 = đąđ andfor every đ â [đ], the edge đąđ â đąđ+1 is in đž.
The lemmas we mentioned above have analogs for directed graphs.We again leave the proofs (which are essentially identical to theirundirected analogs) as exercises.Lemma 1.10 In any directed graph đș = (đ , đž), the sum of the in-degrees is equal to the sum of the out-degrees, which is equal to thenumber of edges.Lemma 1.11 In any directed graph đș, if there is a path from đą to đŁ and apath from đŁ to đ€, then there is a path from đą to đ€.Lemma 1.12 For every directed graph đș = (đ , đž) and a pair đą, đŁ suchthat there is a path from đą to đŁ, the shortest path from đą to đŁ is simple.
RRemark 1.13 â Labeled graphs. For some applicationswe will consider labeled graphs, where the vertices oredges have associated labels (which can be numbers,strings, or members of some other set). We can thinkof such a graph as having an associated (possiblypartial) labelling function đż ⶠđ âȘ đž â â, where â isthe set of potential labels. However we will typicallynot refer explicitly to this labeling function and simplysay things such as âvertex đŁ has the label đŒâ.
1.4.5 Logic operators and quantifiersIf đ and đ are some statements that can be true or false, then đ ANDđ (denoted as đ ⧠đ) is a statement that is true if and only if both đand đ are true, and đ OR đ (denoted as đ âš đ) is a statement that istrue if and only if either đ or đ is true. The negation of đ , denoted asÂŹđ or đ , is true if and only if đ is false.
Suppose that đ(đ„) is a statement that depends on some parameter đ„(also sometimes known as an unbound variable) in the sense that forevery instantiation of đ„ with a value from some set đ, đ(đ„) is eithertrue or false. For example, đ„ > 7 is a statement that is not a priori
60 introduction to theoretical computer science
4 In this book, we place the variable bound by a quan-tifier in a subscript and so write âđ„âđđ(đ„). Manyother texts do not use this subscript notation and sowill write the same statement as âđ„ â đ, đ(đ„).
true or false, but becomes true or false whenever we instantiate đ„ withsome real number. We denote by âđ„âđđ(đ„) the statement that is trueif and only if đ(đ„) is true for every đ„ â đ.4 We denote by âđ„âđđ(đ„) thestatement that is true if and only if there exists some đ„ â đ such thatđ(đ„) is true.
For example, the following is a formalization of the true statementthat there exists a natural number đ larger than 100 that is not divisi-ble by 3: âđââ(đ > 100) ⧠(âđâđđ + đ + đ â đ) . (1.10)
âFor sufficiently large n.â One expression that we will see come uptime and again in this book is the claim that some statement đ(đ) istrue âfor sufficiently large đâ. What this means is that there exists aninteger đ0 such that đ(đ) is true for every đ > đ0. We can formalizethis as âđ0âââđ>đ0đ(đ).1.4.6 Quantifiers for summations and productsThe following shorthands for summing up or taking products of sev-eral numbers are often convenient. If đ = đ 0, ⊠, đ đâ1 is a finite setand đ ⶠđ â â is a function, then we write âđ„âđ đ(đ„) as shorthand forđ(đ 0) + đ(đ 1) + đ(đ 2) + ⊠+ đ(đ đâ1) , (1.11)
and âđ„âđ đ(đ„) as shorthand forđ(đ 0) â đ(đ 1) â đ(đ 2) â ⊠â đ(đ đâ1) . (1.12)
For example, the sum of the squares of all numbers from 1 to 100can be written as âđâ1,âŠ,100 đ2 . (1.13)
Since summing up over intervals of integers is so common, thereis a special notation for it. For every two integers, đ †đ, âđđ=đ đ(đ)denotes âđâđ đ(đ) where đ = đ„ â †ⶠđ †đ„ †đ. Hence, we canwrite the sum (1.13) as 100âđ=1 đ2 . (1.14)
1.4.7 Parsing formulas: bound and free variablesIn mathematics, as in coding, we often have symbolic âvariablesâ orâparametersâ. It is important to be able to understand, given someformula, whether a given variable is bound or free in this formula. For
mathematical background 61
example, in the following statement đ is free but đ and đ are bound bythe â quantifier: âđ,đââ(đ â 1) ⧠(đ â đ) ⧠(đ = đ Ă đ) (1.15)
Since đ is free, it can be set to any value, and the truth of the state-ment (1.15) depends on the value of đ. For example, if đ = 8 then(1.15) is true, but for đ = 11 it is false. (Can you see why?)
The same issue appears when parsing code. For example, in thefollowing snippet from the C programming language
for (int i=0 ; i<n ; i=i+1) printf("*");
the variable i is bound within the for block but the variable n isfree.
The main property of bound variables is that we can rename them(as long as the new name doesnât conflict with another used variable)without changing the meaning of the statement. Thus for example thestatement âđ„,đŠââ(đ„ â 1) ⧠(đ„ â đ) ⧠(đ = đ„ Ă đŠ) (1.16)
is equivalent to (1.15) in the sense that it is true for exactly the sameset of đâs.
Similarly, the code
for (int j=0 ; j<n ; j=j+1) printf("*");
produces the same result as the code above that used i instead of j.
RRemark 1.14 â Aside: mathematical vs programming no-tation. Mathematical notation has a lot of similaritieswith programming language, and for the same rea-sons. Both are formalisms meant to convey complexconcepts in a precise way. However, there are somecultural differences. In programming languages, weoften try to use meaningful variable names such asNumberOfVerticeswhile in math we often use shortidentifiers such as đ. Part of it might have to do withthe tradition of mathematical proofs as being hand-written and verbally presented, as opposed to typedup and compiled. Another reason is if the wrongvariable name is used in a proof, at worst is causesconfusion to readers; when the wrong variable name
62 introduction to theoretical computer science
is used in a program, planes might crash, patientsmight die, and rockets could explode.One consequence of that is that in mathematics weoften end up reusing identifiers, and also ârun outâof letters and hence use Greek letters too, as well asdistinguish between small and capital letters anddifferent font faces. Similarly, mathematical notationtends to use quite a lot of âoverloadingâ, using oper-ators such as + for a great variety of objects (e.g., realnumbers, matrices, finite field elements, etc..), andassuming that the meaning can be inferred from thecontext.Both fields have a notion of âtypesâ, and in mathwe often try to reserve certain letters for variablesof a particular type. For example, variables such asđ, đ, đ, â, đ, đ will often denote integers, and đ willoften denote a small positive real number (see Sec-tion 1.7 for more on these conventions). When readingor writing mathematical texts, we usually donât havethe advantage of a âcompilerâ that will check typesafety for us. Hence it is important to keep track of thetype of each variable, and see that the operations thatare performed on it âmake senseâ.Kunâs book [Kun18] contains an extensive discus-sion on the similarities and differences between thecultures of mathematics and programming.
1.4.8 Asymptotics and Big-đ notationâlog log logđ has been proved to go to infinity, but has never been observed todo so.â, Anonymous, quoted by Carl Pomerance (2000)
It is often very cumbersome to describe precisely quantities suchas running time and is also not needed, since we are typically mostlyinterested in the âhigher order termsâ. That is, we want to understandthe scaling behavior of the quantity as the input variable grows. Forexample, as far as running time goes, the difference between an đ5-time algorithm and an đ2-time one is much more significant than thedifference between an 100đ2 + 10đ time algorithm and an 10đ2 timealgorithm. For this purpose, đ-notation is extremely useful as a wayto âdeclutterâ our text and focus our attention on what really matters.For example, using đ-notation, we can say that both 100đ2 + 10đand 10đ2 are simply Î(đ2) (which informally means âthe same up toconstant factorsâ), while đ2 = đ(đ5) (which informally means that đ2is âmuch smaller thanâ đ5).
Generally (though still informally), if đč, đș are two functions map-ping natural numbers to non-negative reals, then âđč = đ(đș)â meansthat đč(đ) †đș(đ) if we donât care about constant factors, whileâđč = đ(đș)â means that đč is much smaller than đș, in the sense that nomatter by what constant factor we multiply đč , if we take đ to be large
mathematical background 63
Figure 1.7: If đč(đ) = đ(đș(đ)) then for sufficientlylarge đ, đč(đ) will be smaller than đș(đ). For example,if Algorithm đŽ runs in time 1000 â đ + 106 andAlgorithm đ” runs in time 0.01 â đ2 then even thoughđ” might be more efficient for smaller inputs, whenthe inputs get sufficiently large, đŽ will run much fasterthan đ”.
enough then đș will be bigger (for this reason, sometimes đč = đ(đș)is written as đč âȘ đș). We will write đč = Î(đș) if đč = đ(đș) andđș = đ(đč), which one can think of as saying that đč is the same as đș ifwe donât care about constant factors. More formally, we define Big-đnotation as follows:
Definition 1.15 â Big-đ notation. Let â+ = đ„ â â | đ„ > 0 be the setof positive real numbers. For two functions đč, đș ⶠâ â â+, we saythat đč = đ(đș) if there exist numbers đ, đ0 â â such that đč(đ) â€đ â đș(đ) for every đ > đ0. We say that đč = Î(đș) if đč = đ(đș) andđș = đ(đč). We say that đč = Ω(đș) if đș = đ(đč).
We say that đč = đ(đș) if for every đ > 0 there is some đ0 suchthat đč(đ) < đđș(đ) for every đ > đ0. We say that đč = đ(đș) ifđș = đ(đč).Itâs often convenient to use âanonymous functionsâ in the context ofđ-notation. For example, when we write a statement such as đč(đ) =đ(đ3), we mean that đč = đ(đș) where đș is the function defined byđș(đ) = đ3. Chapter 7 in Jim Apsnesâ notes on discrete math provides
a good summary of đ notation; see also this tutorial for a gentler andmore programmer-oriented introduction.đ is not equality. Using the equality sign for đ-notation is extremelycommon, but is somewhat of a misnomer, since a statement such asđč = đ(đș) really means that đč is in the set đșâČ â¶ âđ,đ s.t. âđ>đđșâČ(đ) â€đđș(đ). If anything, it makes more sense to use inequalities and writeđč †đ(đș) and đč ℠Ω(đș), reserving equality for đč = Î(đș), andso we will sometimes use this notation too, but since the equalitynotation is quite firmly entrenched we often stick to it as well. (Sometexts write đč â đ(đș) instead of đč = đ(đș), but we will not use thisnotation.) Despite the misleading equality sign, you should rememberthat a statement such as đč = đ(đș) means that đč is âat mostâ đș insome rough sense when we ignore constants, and a statement such asđč = Ω(đș) means that đč is âat leastâ đș in the same rough sense.
1.4.9 Some ârules of thumbâ for Big-đ notationThere are some simple heuristics that can help when trying to com-pare two functions đč and đș:
âą Multiplicative constants donât matter in đ-notation, and so ifđč(đ) = đ(đș(đ)) then 100đč(đ) = đ(đș(đ)).âą When adding two functions, we only care about the larger one. For
example, for the purpose of đ-notation, đ3 + 100đ2 is the same asđ3, and in general in any polynomial, we only care about the largerexponent.
64 introduction to theoretical computer science
âą For every two constants đ, đ > 0, đđ = đ(đđ) if and only if đ †đ,and đđ = đ(đđ) if and only if đ < đ. For example, combining the twoobservations above, 100đ2 + 10đ + 100 = đ(đ3).
âą Polynomial is always smaller than exponential: đđ = đ(2đđ) forevery two constants đ > 0 and đ > 0 even if đ is much smaller thanđ. For example, 100đ100 = đ(2âđ).
âą Similarly, logarithmic is always smaller than polynomial: (logđ)đ(which we write as logđ đ) is đ(đđ) for every two constants đ, đ > 0.For example, combining the observations above, 100đ2 log100 đ =đ(đ3).
RRemark 1.16 â Big đ for other applications (optional).While Big-đ notation is often used to analyze runningtime of algorithms, this is by no means the only ap-plication. We can use đ notation to bound asymptoticrelations between any functions mapping integersto positive numbers. It can be used regardless ofwhether these functions are a measure of runningtime, memory usage, or any other quantity that mayhave nothing to do with computation. Here is oneexample which is unrelated to this book (and henceone that you can feel free to skip): one way to state theRiemann Hypothesis (one of the most famous openquestions in mathematics) is that it corresponds tothe conjecture that the number of primes between 0and đ is equal to â«đ2 1
lnđ„ đđ„ up to an additive error ofmagnitude at most đ(âđ logđ).
1.5 PROOFS
Many people think of mathematical proofs as a sequence of logicaldeductions that starts from some axioms and ultimately arrives at aconclusion. In fact, some dictionaries define proofs that way. This isnot entirely wrong, but at its essence mathematical proof of a state-ment X is simply an argument that convinces the reader that X is truebeyond a shadow of a doubt.
To produce such a proof you need to:
1. Understand precisely what X means.
2. Convince yourself that X is true.
3. Write your reasoning down in plain, precise and concise English(using formulas or notation only when they help clarity).
mathematical background 65
In many cases, the first part is the most important one. Understand-ing what a statement means is oftentimes more than halfway towardsunderstanding why it is true. In third part, to convince the readerbeyond a shadow of a doubt, we will often want to break down thereasoning to âbasic stepsâ, where each basic step is simple enoughto be âself evidentâ. The combination of all steps yields the desiredstatement.
1.5.1 Proofs and programsThere is a great deal of similarity between the process of writing proofsand that of writing programs, and both require a similar set of skills.Writing a program involves:
1. Understanding what is the task we want the program to achieve.
2. Convincing yourself that the task can be achieved by a computer,perhaps by planning on a whiteboard or notepad how you willbreak it up to simpler tasks.
3. Converting this plan into code that a compiler or interpreter canunderstand, by breaking up each task into a sequence of the basicoperations of some programming language.
In programs as in proofs, step 1 is often the most important one.A key difference is that the reader for proofs is a human being andthe reader for programs is a computer. (This difference is erodingwith time as more proofs are being written in a machine verifiable form;moreover, to ensure correctness and maintainability of programs, itis important that they can be read and understood by humans.) Thusour emphasis is on readability and having a clear logical flow for ourproof (which is not a bad idea for programs as well). When writing aproof, you should think of your audience as an intelligent but highlyskeptical and somewhat petty reader, that will âcall foulâ at every stepthat is not well justified.
1.5.2 Proof writing styleA mathematical proof is a piece of writing, but it is a specific genreof writing with certain conventions and preferred styles. As in anywriting, practice makes perfect, and it is also important to revise yourdrafts for clarity.
In a proof for the statement đ, all the text between the wordsâProof:â and âQEDâ should be focused on establishing that đ is true.Digressions, examples, or ruminations should be kept outside thesetwo words, so they do not confuse the reader. The proof should havea clear logical flow in the sense that every sentence or equation in itshould have some purpose and it should be crystal-clear to the reader
66 introduction to theoretical computer science
what this purpose is. When you write a proof, for every equation orsentence you include, ask yourself:
1. Is this sentence or equation stating that some statement is true?
2. If so, does this statement follow from the previous steps, or are wegoing to establish it in the next step?
3. What is the role of this sentence or equation? Is it one step towardsproving the original statement, or is it a step towards proving someintermediate claim that you have stated before?
4. Finally, would the answers to questions 1-3 be clear to the reader?If not, then you should reorder, rephrase or add explanations.
Some helpful resources on mathematical writing include this hand-out by Lee, this handout by Hutching, as well as several of the excel-lent handouts in Stanfordâs CS 103 class.
1.5.3 Patterns in proofsâIf it was so, it might be; and if it were so, it would be; but as it isnât, it ainât.Thatâs logic.â, Lewis Carroll, Through the looking-glass.
Just like in programming, there are several common patterns ofproofs that occur time and again. Here are some examples:
Proofs by contradiction: One way to prove that đ is true is to showthat if đ was false it would result in a contradiction. Such proofsoften start with a sentence such as âSuppose, towards a contradiction,that đ is falseâ and end with deriving some contradiction (such as aviolation of one of the assumptions in the theorem statement). Here isan example:Lemma 1.17 There are no natural numbers đ, đ such that
â2 = đđ .Proof. Suppose, towards a contradiction that this is false, and so letđ â â be the smallest number such that there exists some đ â âsatisfying
â2 = đđ . Squaring this equation we get that 2 = đ2/đ2 orđ2 = 2đ2 (â). But this means that đ2 is even, and since the product oftwo odd numbers is odd, it means that đ is even as well, or in otherwords, đ = 2đâČ for some đâČ â â. Yet plugging this into (â) shows that4đâČ2 = 2đ2 which means đ2 = 2đâČ2 is an even number as well. By thesame considerations as above we get that đ is even and hence đ/2 andđ/2 are two natural numbers satisfying đ/2đ/2 = â2, contradicting theminimality of đ.
mathematical background 67
Proofs of a universal statement: Often we want to prove a statement đ ofthe form âEvery object of type đ has property đ .â Such proofs oftenstart with a sentence such as âLet đ be an object of type đâ and end byshowing that đ has the property đ . Here is a simple example:Lemma 1.18 For every natural number đ â đ , either đ or đ + 1 is even.
Proof. Let đ â đ be some number. If đ/2 is a whole number thenwe are done, since then đ = 2(đ/2) and hence it is even. Otherwise,đ/2 + 1/2 is a whole number, and hence 2(đ/2 + 1/2) = đ + 1 is even.
Proofs of an implication: Another common case is that the statement đhas the form âđŽ implies đ”â. Such proofs often start with a sentencesuch as âAssume that đŽ is trueâ and end with a derivation of đ” fromđŽ. Here is a simple example:Lemma 1.19 If đ2 â„ 4đđ then there is a solution to the quadratic equa-tion đđ„2 + đđ„ + đ = 0.Proof. Suppose that đ2 â„ 4đđ. Then đ = đ2 â 4đđ is a non-negativenumber and hence it has a square root đ . Thus đ„ = (âđ + đ )/(2đ)satisfiesđđ„2 + đđ„ + đ = đ(âđ + đ )2/(4đ2) + đ(âđ + đ )/(2đ) + đ= (đ2 â 2đđ + đ 2)/(4đ) + (âđ2 + đđ )/(2đ) + đ . (1.17)
Rearranging the terms of (1.17) we getđ 2/(4đ) + đ â đ2/(4đ) = (đ2 â 4đđ)/(4đ) + đ â đ2/(4đ) = 0 (1.18)
Proofs of equivalence: If a statement has the form âđŽ if and only ifđ”â (often shortened as âđŽ iff đ”â) then we need to prove both that đŽimplies đ” and that đ” implies đŽ. We call the implication that đŽ impliesđ” the âonly ifâ direction, and the implication that đ” implies đŽ the âifâdirection.
Proofs by combining intermediate claims: When a proof is more complex,it is often helpful to break it apart into several steps. That is, to provethe statement đ, we might first prove statements đ1,đ2, and đ3 andthen prove that đ1 ⧠đ2 ⧠đ3 implies đ. (Recall that ⧠denotes thelogical AND operator.)
Proofs by case distinction: This is a special case of the above, where toprove a statement đ we split into several cases đ¶1, ⊠, đ¶đ, and provethat (a) the cases are exhaustive, in the sense that one of the cases đ¶đmust happen and (b) go one by one and prove that each one of thecases đ¶đ implies the result đ that we are after.
68 introduction to theoretical computer science
Proofs by induction: We discuss induction and give an example inSection 1.6.1 below. We can think of such proofs as a variant of theabove, where we have an unbounded number of intermediate claimsđ0, đ2, ⊠, đđ, and we prove that đ0 is true, as well as that đ0 impliesđ1, and that đ0 ⧠đ1 implies đ2, and so on and so forth. The websitefor CMU course 15-251 contains a useful handout on potential pitfallswhen making proofs by induction.
âWithout loss of generality (w.l.o.g)â: This term can be initially quite con-fusing. It is essentially a way to simplify proofs by case distinctions.The idea is that if Case 1 is equal to Case 2 up to a change of variablesor a similar transformation, then the proof of Case 1 will also implythe proof of Case 2. It is always a statement that should be viewedwith suspicion. Whenever you see it in a proof, ask yourself if youunderstand why the assumption made is truly without loss of gen-erality, and when you use it, try to see if the use is indeed justified.When writing a proof, sometimes it might be easiest to simply repeatthe proof of the second case (adding a remark that the proof is verysimilar to the first one).
RRemark 1.20 â Hierarchical Proofs (optional). Mathe-matical proofs are ultimately written in English prose.The well-known computer scientist Leslie Lamportargues that this is a problem, and proofs should bewritten in a more formal and rigorous way. In hismanuscript he proposes an approach for structuredhierarchical proofs, that have the following form:
âą A proof for a statement of the form âIf đŽ then đ”âis a sequence of numbered claims, starting withthe assumption that đŽ is true, and ending with theclaim that đ” is true.
âą Every claim is followed by a proof showing howit is derived from the previous assumptions orclaims.
âą The proof for each claim is itself a sequence ofsubclaims.
The advantage of Lamportâs format is that the rolethat every sentence in the proof plays is very clear.It is also much easier to transform such proofs intomachine-checkable forms. The disadvantage is thatsuch proofs can be tedious to read and write, withless differentiation between the important parts of thearguments versus the more routine ones.
mathematical background 69
1.6 EXTENDED EXAMPLE: TOPOLOGICAL SORTING
In this section we will prove the following: every directed acyclicgraph (DAG, see Definition 1.9) can be arranged in layers so that forall directed edges đą â đŁ, the layer of đŁ is larger than the layer of đą.This result is known as topological sorting and is used in many appli-cations, including task scheduling, build systems, software packagemanagement, spreadsheet cell calculations, and many others (seeFig. 1.8). In fact, we will also use it ourselves later on in this book.
Figure 1.8: An example of topological sorting. We con-sider a directed graph corresponding to a prerequisitegraph of the courses in some Computer Science pro-gram. The edge đą â đŁ means that the course đą is aprerequisite for the course đŁ. A layering or âtopologi-cal sortingâ of this graph is the same as mapping thecourses to semesters so that if we decide to take thecourse đŁ in semester đ(đŁ), then we have already takenall the prerequisites for đŁ (i.e., its in-neighbors) inprior semesters.
We start with the following definition. A layering of a directedgraph is a way to assign for every vertex đŁ a natural number(corresponding to its layer), such that đŁâs in-neighbors are inlower-numbered layers than đŁ, and đŁâs out-neighbors are inhigher-numbered layers. The formal definition is as follows:
Definition 1.21 â Layering of a DAG. Let đș = (đ , đž) be a directed graph.A layering of đș is a function đ ⶠđ â â such that for every edgeđą â đŁ of đș, đ(đą) < đ(đŁ).In this section we prove that a directed graph is acyclic if and only if
it has a valid layering.
Theorem 1.22 â Topological Sort. Let đș be a directed graph. Then đș isacyclic if and only if there exists a layering đ of đș.
To prove such a theorem, we need to first understand what itmeans. Since it is an âif and only ifâ statement, Theorem 1.22 corre-sponds to two statements:Lemma 1.23 For every directed graph đș, if đș is acyclic then it has alayering.Lemma 1.24 For every directed graph đș, if đș has a layering, then it isacyclic.
70 introduction to theoretical computer science
Figure 1.9: Some examples of DAGs of one, two andthree vertices, and valid ways to assign layers to thevertices.
To prove Theorem 1.22 we need to prove both Lemma 1.23 andLemma 1.24. Lemma 1.24 is actually not that hard to prove. Intuitively,if đș contains a cycle, then it cannot be the case that all edges on thecycle increase in layer number, since if we travel along the cycle atsome point we must come back to the place we started from. Theformal proof is as follows:
Proof. Let đș = (đ , đž) be a directed graph and let đ ⶠđ â â be alayering of đș as per Definition 1.21 . Suppose, towards a contradiction,that đș is not acyclic, and hence there exists some cycle đą0, đą1, ⊠, đąđsuch that đą0 = đąđ and for every đ â [đ] the edge đąđ â đąđ+1 is present inđș. Since đ is a layering, for every đ â [đ], đ(đąđ) < đ(đąđ+1), which meansthat đ(đą0) < đ(đą1) < ⯠< đ(đąđ) (1.19)
but this is a contradiction since đą0 = đąđ and hence đ(đą0) = đ(đąđ).
Lemma 1.23 corresponds to the more difficult (and useful) direc-tion. To prove it, we need to show how given an arbitrary DAG đș, wecan come up with a layering of the vertices of đș so that all edges âgoupâ.
PIf you have not seen the proof of this theorem before(or donât remember it), this would be an excellentpoint to pause and try to prove it yourself. One wayto do it would be to describe an algorithm that given asinput a directed acyclic graph đș on đ vertices and đâ2or fewer edges, constructs an array đč of length đ suchthat for every edge đą â đŁ in the graph đč[đą] < đč[đŁ].
1.6.1 Mathematical inductionThere are several ways to prove Lemma 1.23. One approach to do isto start by proving it for small graphs, such as graphs with 1, 2 or 3vertices (see Fig. 1.9, for which we can check all the cases, and then tryto extend the proof for larger graphs. The technical term for this proofapproach is proof by induction.
Induction is simply an application of the self-evident Modus Ponensrule that says that if
(a) đ is trueand
(b) đ implies đthen đ is true.
mathematical background 71
In the setting of proofs by induction we typically have a statementđ(đ) that is parameterized by some integer đ, and we prove that (a)đ(0) is true, and (b) For every đ > 0, if đ(0), ⊠, đ(đ â 1) are all truethen đ(đ) is true. (Usually proving (b) is the hard part, though thereare examples where the âbase caseâ (a) is quite subtle.) By applyingModus Ponens, we can deduce from (a) and (b) that đ(1) is true.Once we did so, since we now know that both đ(0) and đ(1) are true,then we can use this and (b) to deduce (again using Modus Ponens)that đ(2) is true. We can repeat the same reasoning again and againto obtain that đ(đ) is true for every đ. The statement (a) is called theâbase caseâ, while (b) is called the âinductive stepâ. The assumptionin (b) that đ(đ) holds for đ < đ is called the âinductive hypothesisâ.(The form of induction described here is sometimes called âstronginductionâ as opposed to âweak inductionâ where we replace (b)by the statement (bâ) that if đ(đ â 1) is true then đ(đ) is true; weakinduction can be thought of as the special case of strong inductionwhere we donât use the assumption that đ(0), ⊠, đ(đ â 2) are true.)
RRemark 1.25 â Induction and recursion. Proofs by in-duction are closely related to algorithms by recursion.In both cases we reduce solving a larger problem tosolving a smaller instance of itself. In a recursive algo-rithm to solve some problem P on an input of lengthđ we ask ourselves âwhat if someone handed me away to solve P on instances smaller than đ?â. In aninductive proof to prove a statement Q parameterizedby a number đ, we ask ourselves âwhat if I alreadyknew that đ(đâČ) is true for đâČ < đ?â. Both inductionand recursion are crucial concepts for this course andComputer Science at large (and even other areas ofinquiry, including not just mathematics but othersciences as well). Both can be confusing at first, butwith time and practice they become clearer. For moreon proofs by induction and recursion, you might findthe following Stanford CS 103 handout, this MIT 6.00lecture or this excerpt of the Lehman-Leighton bookuseful.
1.6.2 Proving the result by inductionThere are several ways to use induction to prove Lemma 1.23 by in-duction. We will use induction on the number đ of vertices, and so wewill define the statement đ(đ) as follows:
đ(đ) is âFor every DAG đș = (đ , đž) with đ vertices, there is a layering of đș.â
72 introduction to theoretical computer science
5 QED stands for âquod erat demonstrandumâ, whichis Latin for âwhat was to be demonstratedâ or âthevery thing it was required to have shownâ.
6 Using đ = 0 as the base case is logically valid, butcan be confusing. If you find the trivial đ = 0 caseto be confusing, you can always directly verify thestatement for đ = 1 and then use both đ = 0 andđ = 1 as the base cases.
The statement for đ(0) (where the graph contains no vertices) istrivial. Thus it will suffice to prove the following: for every đ > 0, ifđ(đ â 1) is true then đ(đ) is true.
To do so, we need to somehow find a way, given a graph đș of đvertices, to reduce the task of finding a layering for đș into the task offinding a layering for some other graph đșâČ of đâ1 vertices. The idea isthat we will find a source of đș: a vertex đŁ that has no in-neighbors. Wecan then assign to đŁ the layer 0, and layer the remaining vertices usingthe inductive hypothesis in layers 1, 2, âŠ.
The above is the intuition behind the proof of Lemma 1.23, butwhen writing the formal proof below, we use the benefit of hind-sight, and try to streamline what was a messy journey into a linearand easy-to-follow flow of logic that starts with the word âProof:âand ends with âQEDâ or the symbol .5 Discussions, examples anddigressions can be very insightful, but we keep them outside the spacedelimited between these two words, where (as described by this ex-cellent handout) âevery sentence must be load bearingâ. Just like wedo in programming, we can break the proof into little âsubroutinesâor âfunctionsâ (known as lemmas or claims in math language), whichwill be smaller statements that help us prove the main result. How-ever, the proof should be structured in a way that ensures that it isalways crystal-clear to the reader in what stage we are of the proof.The reader should be able to tell what is the role of every sentence inthe proof and which part it belongs to. We now present the formalproof of Lemma 1.23.
Proof of Lemma 1.23. Let đș = (đ , đž) be a DAG and đ = |đ | be thenumber of its vertices. We prove the lemma by induction on đ. Thebase case is đ = 0 where there are no vertices, and so the statement istrivially true.6 For the case of đ > 0, we make the inductive hypothesisthat every DAG đșâČ of at most đ â 1 vertices has a layering.
We make the following claim:Claim: đș must contain a vertex đŁ of in-degree zero.Proof of Claim: Suppose otherwise that every vertex đŁ â đ has an
in-neighbor. Let đŁ0 be some vertex of đș, let đŁ1 be an in-neighbor of đŁ0,đŁ2 be an in-neighbor of đŁ1, and continue in this way for đ steps untilwe construct a list đŁ0, đŁ1, ⊠, đŁđ such that for every đ â [đ], đŁđ+1 is anin-neighbor of đŁđ, or in other words the edge đŁđ+1 â đŁđ is present in thegraph. Since there are only đ vertices in this graph, one of the đ + 1vertices in this sequence must repeat itself, and so there exists đ < đsuch that đŁđ = đŁđ. But then the sequence đŁđ â đŁđâ1 â ⯠â đŁđ is a cyclein đș, contradicting our assumption that it is acyclic. (QED Claim)
Given the claim, we can let đŁ0 be some vertex of in-degree zero inđș, and let đșâČ be the graph obtained by removing đŁ0 from đș. đșâČ has
mathematical background 73
đ â 1 vertices and hence per the inductive hypothesis has a layeringđ âČ â¶ (đ ⧔ đŁ0) â â. We define đ ⶠđ â â as follows:
đ(đŁ) = â§âšâ©đ âČ(đŁ) + 1 đŁ â đŁ00 đŁ = đŁ0 . (1.20)
We claim that đ is a valid layering, namely that for every edge đą âđŁ, đ(đą) < đ(đŁ). To prove this, we split into cases:
âą Case 1: đą â đŁ0, đŁ â đŁ0. In this case the edge đą â đŁ exists in thegraph đșâČ and hence by the inductive hypothesis đ âČ(đą) < đ âČ(đŁ)which implies that đ âČ(đą) + 1 < đ âČ(đŁ) + 1.
âą Case 2: đą = đŁ0, đŁ â đŁ0. In this case đ(đą) = 0 and đ(đŁ) = đ âČ(đŁ) + 1 >0.âą Case 3: đą â đŁ0, đŁ = đŁ0. This case canât happen since đŁ0 does not
have in-neighbors.
âą Case 4: đą = đŁ0, đŁ = đŁ0. This case again canât happen since it meansthat đŁ0 is its own-neighbor â it is involved in a self loop which is aform cycle that is disallowed in an acyclic graph.
Thus, đ is a valid layering for đș which completes the proof.
PReading a proof is no less of an important skill thanproducing one. In fact, just like understanding code,it is a highly non-trivial skill in itself. Therefore Istrongly suggest that you re-read the above proof, ask-ing yourself at every sentence whether the assumptionit makes is justified, and whether this sentence trulydemonstrates what it purports to achieve. Anothergood habit is to ask yourself when reading a proof forevery variable you encounter (such as đą, đ, đșâČ, đ âČ, etc.in the above proof) the following questions: (1) Whattype of variable is it? is it a number? a graph? a vertex?a function? and (2) What do we know about it? Is itan arbitrary member of the set? Have we shown somefacts about it?, and (3) What are we trying to showabout it?.
1.6.3 Minimality and uniquenessTheorem 1.22 guarantees that for every DAG đș = (đ , đž) there existssome layering đ ⶠđ â â but this layering is not necessarily unique.For example, if đ ⶠđ â â is a valid layering of the graph then so isthe function đ âČ defined as đ âČ(đŁ) = 2 â đ(đŁ). However, it turns out that
74 introduction to theoretical computer science
the minimal layering is unique. A minimal layering is one where everyvertex is given the smallest layer number possible. We now formallydefine minimality and state the uniqueness theorem:
Theorem 1.26 â Minimal layering is unique. Let đș = (đ , đž) be a DAG. Wesay that a layering đ ⶠđ â â is minimal if for every vertex đŁ â đ , ifđŁ has no in-neighbors then đ(đŁ) = 0 and if đŁ has in-neighbors thenthere exists an in-neighbor đą of đŁ such that đ(đą) = đ(đŁ) â 1.
For every layering đ, đ ⶠđ â â of đș, if both đ and đ are minimalthen đ = đ.The definition of minimality in Theorem 1.26 implies that for every
vertex đŁ â đ , we cannot move it to a lower layer without makingthe layering invalid. If đŁ is a source (i.e., has in-degree zero) thena minimal layering đ must put it in layer 0, and for every other đŁ, ifđ(đŁ) = đ, then we cannot modify this to set đ(đŁ) †đ â 1 since thereis an-neighbor đą of đŁ satisfying đ(đą) = đ â 1. What Theorem 1.26says is that a minimal layering đ is unique in the sense that every otherminimal layering is equal to đ .Proof Idea:
The idea is to prove the theorem by induction on the layers. If đ andđ are minimal then they must agree on the source vertices, since bothđ and đ should assign these vertices to layer 0. We can then show thatif đ and đ agree up to layer đ â 1, then the minimality property impliesthat they need to agree in layer đ as well. In the actual proof we usea small trick to save on writing. Rather than proving the statementthat đ = đ (or in other words that đ(đŁ) = đ(đŁ) for every đŁ â đ ),we prove the weaker statement that đ(đŁ) †đ(đŁ) for every đŁ â đ .(This is a weaker statement since the condition that đ(đŁ) is lesser orequal than to đ(đŁ) is implied by the condition that đ(đŁ) is equal tođ(đŁ).) However, since đ and đ are just labels we give to two minimallayerings, by simply changing the names âđâ and âđâ the same proofalso shows that đ(đŁ) †đ(đŁ) for every đŁ â đ and hence that đ = đ.âProof of Theorem 1.26. Let đș = (đ , đž) be a DAG and đ, đ ⶠđ â â betwo minimal valid layering of đș. We will prove that for every đŁ â đ ,đ(đŁ) †đ(đŁ). Since we didnât assume anything about đ, đ except theirminimality, the same proof will imply that for every đŁ â đ , đ(đŁ) †đ(đŁ)and hence that đ(đŁ) = đ(đŁ) for every đŁ â đ , which is what we neededto show.
We will prove that đ(đŁ) †đ(đŁ) for every đŁ â đ by induction onđ = đ(đŁ). The case đ = 0 is immediate: since in this case đ(đŁ) = 0,đ(đŁ) must be at least đ(đŁ). For the case đ > 0, by the minimality of đ ,
mathematical background 75
if đ(đŁ) = đ then there must exist some in-neighbor đą of đŁ such thatđ(đą) = đ â 1. By the induction hypothesis we get that đ(đą) â„ đ â 1, andsince đ is a valid layering it must hold that đ(đŁ) > đ(đą) which meansthat đ(đŁ) â„ đ = đ(đŁ).
PThe proof of Theorem 1.26 is fully rigorous, but iswritten in a somewhat terse manner. Make sure thatyou read through it and understand why this is indeedan airtight proof of the Theoremâs statement.
1.7 THIS BOOK: NOTATION AND CONVENTIONS
Most of the notation we use in this book is standard and is used inmost mathematical texts. The main points where we diverge are:
âą We index the natural numbers â starting with 0 (though manyother texts, especially in computer science, do the same).
âą We also index the set [đ] starting with 0, and hence define it as0, ⊠, đâ1. In other texts it is often defined as 1, ⊠, đ. Similarly,we index our strings starting with 0, and hence a string đ„ â 0, 1đis written as đ„0đ„1 ⯠đ„đâ1.
âą If đ is a natural number then 1đ does not equal the number 1 butrather this is the length đ string 11 ⯠1 (that is a string of đ ones).Similarly, 0đ refers to the length đ string 00 ⯠0.
âą Partial functions are functions that are not necessarily defined onall inputs. When we write đ ⶠđŽ â đ” this means that đ is a totalfunction unless we say otherwise. When we want to emphasize thatđ can be a partial function, we will sometimes write đ ⶠđŽ âđ đ”.
âą As we will see later on in the course, we will mostly describe ourcomputational problems in terms of computing a Boolean functionđ ⶠ0, 1â â 0, 1. In contrast, many other textbooks refer to thesame task as deciding a language đż â 0, 1â. These two viewpointsare equivalent, since for every set đż â 0, 1â there is a correspond-ing function đč such that đč(đ„) = 1 if and only if đ„ â đż. Computingpartial functions corresponds to the task known in the literatureas a solving a promise problem. Because the language notation isso prevalent in other textbooks, we will occasionally remind thereader of this correspondence.
âą We use âđ„â and âđ„â for the âceilingâ and âfloorâ operators thatcorrespond to ârounding upâ or ârounding downâ a number to the
76 introduction to theoretical computer science
nearest integer. We use (đ„ mod đŠ) to denote the âremainderâ of đ„when divided by đŠ. That is, (đ„ mod đŠ) = đ„ â đŠâđ„/đŠâ. In contextwhen an integer is expected weâll typically âsilently roundâ thequantities to an integer. For example, if we say that đ„ is a string oflength âđ then this means that đ„ is of length ââđ â. (We round upfor the sake of convention, but in most such cases, it will not make adifference whether we round up or down.)
âą Like most Computer Science texts, we default to the logarithm inbase two. Thus, logđ is the same as log2 đ.
âą We will also use the notation đ(đ) = đđđđŠ(đ) as a shorthand forđ(đ) = đđ(1) (i.e., as shorthand for saying that there are someconstants đ, đ such that đ(đ) †đ â đđ for every sufficiently largeđ). Similarly, we will use đ(đ) = đđđđŠđđđ(đ) as shorthand forđ(đ) = đđđđŠ(logđ) (i.e., as shorthand for saying that there aresome constants đ, đ such that đ(đ) †đ â (logđ)đ for every sufficientlylarge đ).
âą As in often the case in mathematical literature, we use the apostro-phe character to enrich our set of identifiers. Typically if đ„ denotessome object, then đ„âČ, đ„âł, etc. will denote other objects of the sametype.
âą To save on âcognitive loadâ we will often use round constants suchas 10, 100, 1000 in the statements of both theorems and problemset questions. When you see such a âroundâ constant, you cantypically assume that it has no special significance and was justchosen arbitrarily. For example, if you see a theorem of the formâAlgorithm đŽ takes at most 1000 â đ2 steps to compute function đčon inputs of length đâ then probably the number 1000 is an abitrarysufficiently large constant, and one could prove the same theoremwith a bound of the form đ â đ2 for a constant đ that is smaller than1000. Similarly, if a problem asks you to prove that some quantity isat least đ/100, it is quite possible that in truth the quantity is at leastđ/đ for some constant đ that is smaller than 100.
1.7.1 Variable name conventionsLike programming, mathematics is full of variables. Whenever you seea variable, it is always important to keep track of what is its type (e.g.,whether the variable is a number, a string, a function, a graph, etc.).To make this easier, we try to stick to certain conventions and consis-tently use certain identifiers for variables of the same type. Some ofthese conventions are listed in Section 1.7.1 below. These conventionsare not immutable laws and we might occasionally deviate from them.
mathematical background 77
Also, such conventions do not replace the need to explicitly declare foreach new variable the type of object that it denotes.
Table 1.2: Conventions for identifiers in this book
Identifier Often denotes object of typeđ,đ,đ,â,đ,đNatural numbers (i.e., in â = 0, 1, 2, âŠ)đ, đż Small positive real numbers (very close to 0)đ„, đŠ, đ§, đ€ Typically strings in 0, 1â though sometimes numbers orother objects. We often identify an object with itsrepresentation as a string.đș A graph. The set of đșâs vertices is typically denoted by đ .Often đ = [đ]. The set of đșâs edges is typically denoted byđž.đ Setđ, đ, â Functions. We often (though not always) use lowercaseidentifiers for finite functions, which map 0, 1đ to 0, 1đ(often đ = 1).đč, đș, đ» Infinite (unbounded input) functions mapping 0, 1â to0, 1â or 0, 1â to 0, 1đ for some đ. Based on context,the identifiers đș, đ» are sometimes used to denotefunctions and sometimes graphs.đŽ, đ”, đ¶ Boolean circuitsđ, đ Turing machinesđ , đ Programsđ A function mapping â to â that corresponds to a timebound.đ A positive number (often an unspecified constant; e.g.,đ (đ) = đ(đ) corresponds to the existence of đ s.t.đ (đ) †đ â đ every đ > 0). We sometimes use đ, đ in asimilar way.ÎŁ Finite set (often used as the alphabet for a set of strings).
1.7.2 Some idiomsMathematical texts often employ certain conventions or âidiomsâ.Some examples of such idioms that we use in this text include thefollowing:
âą âLet đ be âŠâ, âlet đ denote âŠâ, or âlet đ = âŠâ: These are alldifferent ways for us to say that we are defining the symbol đ tostand for whatever expression is in the âŠ. When đ is a property ofsome objects we might define đ by writing something along thelines of âWe say that ⊠has the property đ if âŠ.â. While we often
78 introduction to theoretical computer science
try to define terms before they are used, sometimes a mathematicalsentence reads easier if we use a term before defining it, in whichcase we add âWhere đ is âŠâ to explain how đ is defined in thepreceding expression.
âą Quantifiers: Mathematical texts involve many quantifiers such asâfor allâ and âexistsâ. We sometimes spell these in words as in âforall đ â ââ or âthere is đ„ â 0, 1ââ, and sometimes use the formalsymbols â and â. It is important to keep track on which variable isquantified in what way the dependencies between the variables. Forexample, a sentence fragment such as âfor every đ > 0 there existsđâ means that đ can be chosen in a way that depends on đ. The orderof quantifiers is important. For example, the following is a truestatement: âfor every natural number đ > 1 there exists a prime numberđ such that đ divides đ.â In contrast, the following statement is false:âthere exists a prime number đ such that for every natural number đ > 1,đ divides đ.â
âą Numbered equations, theorems, definitions: To keep track of allthe terms we define and statements we prove, we often assign thema (typically numeric) label, and then refer back to them in otherparts of the text.
âą (i.e.,), (e.g.,): Mathematical texts tend to contain quite a few ofthese expressions. We use đ (i.e., đ ) in cases where đ is equivalentto đ and đ (e.g., đ ) in cases where đ is an example of đ (e.g., onecan use phrases such as âa natural number (i.e., a non-negativeinteger)â or âa natural number (e.g., 7)â).
âą âThusâ, âThereforeâ , âWe get thatâ: This means that the followingsentence is implied by the preceding one, as in âThe đ-vertex graphđș is connected. Therefore it contains at least đ â 1 edges.â Wesometimes use âindeedâ to indicate that the following text justifiesthe claim that was made in the preceding sentence as in âThe đ-vertex graph đș has at least đ â 1 edges. Indeed, this follows since đș isconnected.â
âą Constants: In Computer Science, we typically care about how ouralgorithmsâ resource consumption (such as running time) scaleswith certain quantities (such as the length of the input). We refer toquantities that do not depend on the length of the input as constantsand so often use statements such as âthere exists a constant đ > 0 suchthat for every đ â â, Algorithm đŽ runs in at most đ â đ2 steps on inputs oflength đ.â The qualifier âconstantâ for đ is not strictly needed but isadded to emphasize that đ here is a fixed number independent of đ.In fact sometimes, to reduce cognitive load, we will simply replace đ
mathematical background 79
by a sufficiently large round number such as 10, 100, or 1000, or useđ-notation and write âAlgorithm đŽ runs in đ(đ2) time.â
Chapter Recap
âą The basic âmathematical data structuresâ weâllneed are numbers, sets, tuples, strings, graphs andfunctions.
âą We can use basic objects to define more complexnotions. For example, graphs can be defined as a listof pairs.
âą Given precise definitions of objects, we can stateunambiguous and precise statements. We can thenuse mathematical proofs to determine whether thesestatements are true or false.
âą A mathematical proof is not a formal ritual butrather a clear, precise and âbulletproofâ argumentcertifying the truth of a certain statement.
âą Big-đ notation is an extremely useful formalismto suppress less significant details and allows usto focus on the high level behavior of quantities ofinterest.
âą The only way to get comfortable with mathematicalnotions is to apply them in the contexts of solvingproblems. You should expect to need to go backtime and again to the definitions and notation inthis chapter as you work through problems in thiscourse.
1.8 EXERCISES
Exercise 1.1 â Logical expressions. a. Write a logical expression đ(đ„)involving the variables đ„0, đ„1, đ„2 and the operators ⧠(AND), âš(OR), and ÂŹ (NOT), such that đ(đ„) is true if the majority of theinputs are True.
b. Write a logical expression đ(đ„) involving the variables đ„0, đ„1, đ„2and the operators ⧠(AND), âš (OR), and ÂŹ (NOT), such that đ(đ„)is true if the sum â2đ=0 đ„đ (identifying âtrueâ with 1 and âfalseâwith 0) is odd.
Exercise 1.2 â Quantifiers. Use the logical quantifiers â (for all), â (thereexists), as well as â§, âš, ÂŹ and the arithmetic operations +, Ă, =, >, < towrite the following:a. An expression đ(đ, đ) such that for every natural numbers đ, đ,đ(đ, đ) is true if and only if đ divides đ.b. An expression đ(đ) such that for every natural number đ, đ(đ) is
true if and only if đ is a power of three.
80 introduction to theoretical computer science
Exercise 1.3 Describe the following statement in English words:âđâââđ>đâđ, đ â â(đ Ă đ â đ) âš (đ = 1).
Exercise 1.4 â Set construction notation. Describe in words the followingsets:
a. đ = đ„ â 0, 1100 ⶠâđâ0,âŠ,99đ„đ = đ„99âđb. đ = đ„ â 0, 1â ⶠâđ,đâ2,âŠ,|đ„|â1đ â đ â |đ„|
Exercise 1.5 â Existence of one to one mappings. For each one of the fol-lowing pairs of sets (đ, đ ), prove or disprove the following statement:there is a one to one function đ mapping đ to đ .
a. Let đ > 10. đ = 0, 1đ and đ = [đ] Ă [đ] Ă [đ].b. Let đ > 10. đ is the set of all functions mapping 0, 1đ to 0, 1.đ = 0, 1đ3 .c. Let đ > 100. đ = đ â [đ] | đ is prime, đ = 0, 1âlogđâ1â.
Exercise 1.6 â Inclusion Exclusion. a. Let đŽ, đ” be finite sets. Prove that|đŽ âȘ đ”| = |đŽ| + |đ”| â |đŽ â© đ”|.b. Let đŽ0, ⊠, đŽđâ1 be finite sets. Prove that |đŽ0 âȘ ⯠âȘ đŽđâ1| â„âđâ1đ=0 |đŽđ| â â0â€đ<đ<đ |đŽđ â© đŽđ|.c. Let đŽ0, ⊠, đŽđâ1 be finite subsets of 1, ⊠, đ, such that |đŽđ| = đ for
every đ â [đ]. Prove that if đ > 100đ, then there exist two distinctsets đŽđ, đŽđ s.t. |đŽđ â© đŽđ| â„ đ2/(10đ).
Exercise 1.7 Prove that if đ, đ are finite and đč ⶠđ â đ is one to onethen |đ| †|đ |.
Exercise 1.8 Prove that if đ, đ are finite and đč ⶠđ â đ is onto then|đ| â„ |đ |.
Exercise 1.9 Prove that for every finite đ, đ , there are (|đ | + 1)|đ| partialfunctions from đ to đ .
Exercise 1.10 Suppose that đđđââ is a sequence such that đ0 †10 andfor đ > 1 đđ †5đâ đ5 â + 2đ. Prove by induction that đđ †100đ logđ forevery đ.
mathematical background 81
7 one way to do this is to use Stirlingâs approximationfor the factorial function..
Exercise 1.11 Prove that for every undirected graph đș of 100 vertices,if every vertex has degree at most 4, then there exists a subset đ of atleast 20 vertices such that no two vertices in đ are neighbors of oneanother.
Exercise 1.12 â đ-notation. For every pair of functions đč, đș below, deter-mine which of the following relations holds: đč = đ(đș), đč = Ω(đș),đč = đ(đș) or đč = đ(đș).a. đč(đ) = đ, đș(đ) = 100đ.b. đč(đ) = đ, đș(đ) = âđ.c. đč(đ) = đ logđ, đș(đ) = 2(log(đ))2 .d. đč(đ) = âđ, đș(đ) = 2âlogđe. đč(đ) = ( đâ0.2đâ) , đș(đ) = 20.1đ (where (đđ) is the number of đ-sized
subsets of a set of size đ) and đ(đ) = 20.1đ. See footnote for hint.7
Exercise 1.13 Give an example of a pair of functions đč, đș ⶠâ â â suchthat neither đč = đ(đș) nor đș = đ(đč) holds.
Exercise 1.14 Prove that for every undirected graph đș on đ vertices, if đșhas at least đ edges then đș contains a cycle.
Exercise 1.15 Prove that for every undirected graph đș of 1000 vertices,if every vertex has degree at most 4, then there exists a subset đ of atleast 200 vertices such that no two vertices in đ are neighbors of oneanother.
1.9 BIBLIOGRAPHICAL NOTES
The heading âA Mathematicianâs Apologyâ, refers to Hardyâs classicbook [Har41]. Even when Hardy is wrong, he is very much worthreading.
There are many online sources for the mathematical backgroundneeded for this book. In particular, the lecture notes for MIT 6.042âMathematics for Computer Scienceâ [LLM18] are extremely com-prehensive, and videos and assignments for this course are availableonline. Similarly, Berkeley CS 70: âDiscrete Mathematics and Proba-bility Theoryâ has extensive lecture notes online.
82 introduction to theoretical computer science
Other sources for discrete mathematics are Rosen [Ros19] andJim Aspensâ online book [Asp18]. Lewis and Zax [LZ19], as wellas the online book of Fleck [Fle18], give a more gentle overview ofthe much of the same material. Solow [Sol14] is a good introductionto proof reading and writing. Kun [Kun18] gives an introductionto mathematics aimed at readers with programming background.Stanfordâs CS 103 course has a wonderful collections of handouts onmathematical proof techniques and discrete mathematics.
The word graph in the sense of Definition 1.3 was coined by themathematician Sylvester in 1878 in analogy with the chemical graphsused to visualize molecules. There is an unfortunate confusion be-tween this term and the more common usage of the word âgraphâ asa way to plot data, and in particular a plot of some function đ(đ„) as afunction of đ„. One way to relate these two notions is to identify everyfunction đ ⶠđŽ â đ” with the directed graph đșđ over the vertex setđ = đŽ âȘ đ” such that đșđ contains the edge đ„ â đ(đ„) for every đ„ â đŽ. Ina graph đșđ constructed in this way, every vertex in đŽ has out-degreeequal to one. If the function đ is one to one then every vertex in đ” hasin-degree at most one. If the function đ is onto then every vertex in đ”has in-degree at least one. If đ is a bijection then every vertex in đ” hasin-degree exactly equal to one.
Carl Pomeranceâs quote is taken from the home page of DoronZeilberger.
Figure 2.1: Our basic notion of computation is someprocess that maps an input to an output
2Computation and Representation
âThe alphabet (sic) was a great invention, which enabled men (sic) to storeand to learn with little effort what others had learned the hard way â that is, tolearn from books rather than from direct, possibly painful, contact with the realworld.â, B.F. Skinner
âThe name of the song is called âHADDOCKâS EYES.ââ [said the Knight]âOh, thatâs the name of the song, is it?â Alice said, trying to feel interested.âNo, you donât understand,â the Knight said, looking a little vexed. âThatâswhat the name is CALLED. The name really is âTHE AGED AGEDMAN.âââThen I ought to have said âThatâs what the SONG is calledâ?â Alice cor-rected herself.âNo, you oughtnât: thatâs quite another thing! The SONG is called âWAYSANDMEANSâ: but thatâs only what itâs CALLED, you know!ââWell, what IS the song, then?â said Alice, who was by this time com-pletely bewildered.âI was coming to that,â the Knight said. âThe song really IS âA-SITTING ONA GATEâ: and the tuneâs my own invention.âLewis Carroll, Through the Looking-Glass
To a first approximation, computation is a process that maps an inputto an output.
When discussing computation, it is essential to separate the ques-tion of what is the task we need to perform (i.e., the specification) fromthe question of how we achieve this task (i.e., the implementation).For example, as weâve seen, there is more than one way to achieve thecomputational task of computing the product of two integers.
In this chapter we focus on the what part, namely defining compu-tational tasks. For starters, we need to define the inputs and outputs.Capturing all the potential inputs and outputs that we might everwant to compute on seems challenging, since computation today isapplied to a wide variety of objects. We do not compute merely onnumbers, but also on texts, images, videos, connection graphs of social
Compiled on 8.26.2020 18:10
Learning Objectives:âą Distinguish between specification and
implementation, or equivalently betweenmathematical functions andalgorithms/programs.
âą Representing an object as a string (often ofzeroes and ones).
âą Examples of representations for commonobjects such as numbers, vectors, lists, graphs.
âą Prefix-free representations.âą Cantorâs Theorem: The real numbers cannot
be represented exactly as finite strings.
84 introduction to theoretical computer science
Figure 2.2: We represent numbers, texts, images, net-works and many other objects using strings of zeroesand ones. Writing the zeroes and ones themselves ingreen font over a black background is optional.
networks, MRI scans, gene data, and even other programs. We willrepresent all these objects as strings of zeroes and ones, that is objectssuch as 0011101 or 1011 or any other finite list of 1âs and 0âs. (Thischoice is for convenience: there is nothing âholyâ about zeroes andones, and we could have used any other finite collection of symbols.)
Today, we are so used to the notion of digital representation thatwe are not surprised by the existence of such an encoding. But it isactually a deep insight with significant implications. Many animalscan convey a particular fear or desire, but what is unique about hu-mans is language: we use a finite collection of basic symbols to describea potentially unlimited range of experiences. Language allows trans-mission of information over both time and space and enables soci-eties that span a great many people and accumulate a body of sharedknowledge over time.
Over the last several decades, we have seen a revolution in what wecan represent and convey in digital form. We can capture experienceswith almost perfect fidelity, and disseminate it essentially instanta-neously to an unlimited audience. Moreover, once information is indigital form, we can compute over it, and gain insights from data thatwere not accessible in prior times. At the heart of this revolution is thesimple but profound observation that we can represent an unboundedvariety of objects using a finite set of symbols (and in fact using onlythe two symbols 0 and 1).
In later chapters, we will typically take such representations forgranted, and hence use expressions such as âprogram đ takes đ„ asinputâ when đ„ might be a number, a vector, a graph, or any otherobject, when we really mean that đ takes as input the representation ofđ„ as a binary string. However, in this chapter we will dwell a bit moreon how we can construct such representations.
This chapter: A non-mathy overviewThe main takeaways from this chapter are:
âą We can represent all kinds of objects we want to use as in-puts and outputs using binary strings. For example, we canuse the binary basis to represent integers and rational num-bers as binary strings (see Section 2.1.1 and Section 2.2).
âą We can use compose the representations of simple objectsto represent more complex objects. In this way, we canrepresent lists of integers or rational numbers, and usethat to represent objects such as matrices, images, andgraphs. Prefix-free encoding is one way to achieve such acomposition (see Section 2.5.2).
computation and representation 85
âą A computational task specifies a map between an input to anoutputâ a function. It is crucially important to distinguishbetween the âwhatâ and the âhowâ, or the specificationand implementation (see Section 2.6.1). A function simplydefines which output corresponds to which input. It doesnot specify how to compute the output from the input, andas weâve seen in the context of multiplication, there can bemore than one way to compute the same function.
âą While the set of all possible binary strings is infinite, itstill cannot represent everything. In particular, there is norepresentation of the real numbers (with absolute accuracy)as binary strings. This result is also known as âCantorâsTheoremâ (see Section 2.4) and is typically referred to asthe result that the âreals are uncountableâ. It is also im-plies that there are different levels of infinity though we willnot get into this topic in this book (see Remark 2.10).
The two âbig ideasâ â we discuss are Big Idea 1 - we can com-pose representations for simple objects to represent morecomplex objects and Big Idea 2 - it is crucial to distinguishbetween functions (âwhatâ) and programs (âhowâ). The latterwill be a theme we will come back to time and again in thisbook.
2.1 DEFINING REPRESENTATIONS
Every time we store numbers, images, sounds, databases, or other ob-jects on a computer, what we actually store in the computerâs memoryis the representation of these objects. Moreover, the idea of representa-tion is not restricted to digital computers. When we write down text ormake a drawing we are representing ideas or experiences as sequencesof symbols (which might as well be strings of zeroes and ones). Evenour brain does not store the actual sensory inputs we experience, butrather only a representation of them.
To use objects such as numbers, images, graphs, or others as inputsfor computation, we need to define precisely how to represent theseobjects as binary strings. A representation scheme is a way to map an ob-ject đ„ to a binary string đž(đ„) â 0, 1â. For example, a representationscheme for natural numbers is a function đž ⶠâ â 0, 1â. Of course,we cannot merely represent all numbers as the string â0011â (for ex-ample). A minimal requirement is that if two numbers đ„ and đ„âČ aredifferent then they would be represented by different strings. Anotherway to say this is that we require the encoding function đž to be one toone.
86 introduction to theoretical computer science
Figure 2.3: Representing each one the digits0, 1, 2, ⊠, 9 as a 12 Ă 8 bitmap image, which can bethought of as a string in 0, 196. Using this schemewe can represent a natural number đ„ of đ decimaldigits as a string in 0, 196đ. Image taken from blogpost of A. C. Andersen.
2.1.1 Representing natural numbersWe now show how we can represent natural numbers as binarystrings. Over the years people have represented numbers in a varietyof ways, including Roman numerals, tally marks, our own Hindu-Arabic decimal system, and many others. We can use any one ofthose as well as many others to represent a number as a string (seeFig. 2.3). However, for the sake of concreteness, we use the binarybasis as our default representation of natural numbers as strings.For example, we represent the number six as the string 110 since1 â 22 + 1 â 21 + 0 â 20 = 6, and similarly we represent the number thirty-five as the string đŠ = 100011 which satisfies â5đ=0 đŠđ â 2|đŠ|âđâ1 = 35.Some more examples are given in the table below.
Table 2.1: Representing numbers in the binary basis. The lefthandcolumn contains representations of natural numbers in the deci-mal basis, while the righthand column contains representationsof the same numbers in the binary basis.
Number (decimal representation) Number (binary representation)0 01 12 105 10116 1000040 10100053 110101389 1100001013750 111010100110
If đ is even, then the least significant digit of đâs binary representa-tion is 0, while if đ is odd then this digit equals 1. Just like the numberâđ/10â corresponds to âchopping offâ the least significant decimaldigit (e.g., â457/10â = â45.7â = 45), the number âđ/2â correspondsto the âchopping offâ the least significant binary digit. Hence the bi-nary representation can be formally defined as the following functionđđĄđ ⶠâ â 0, 1â (đđĄđ stands for ânatural numbers to stringsâ):
đđĄđ(đ) = â§âšâ©0 đ = 01 đ = 1đđĄđ(âđ/2â)đđđđđĄđŠ(đ) đ > 1 (2.1)
where đđđđđĄđŠ ⶠâ â 0, 1 is the function defined as đđđđđĄđŠ(đ) = 0if đ is even and đđđđđĄđŠ(đ) = 1 if đ is odd, and as usual, for stringsđ„, đŠ â 0, 1â, đ„đŠ denotes the concatenation of đ„ and đŠ. The function
computation and representation 87
đđĄđ is defined recursively: for every đ > 0 we define đđđ(đ) in termsof the representation of the smaller number âđ/2â. It is also possible todefine đđĄđ non-recursively, see Exercise 2.2.
Throughout most of this book, the particular choices of represen-tation of numbers as binary strings would not matter much: we justneed to know that such a representation exists. In fact, for many of ourpurposes we can even use the simpler representation of mapping anatural number đ to the length-đ all-zero string 0đ.
RRemark 2.1 â Binary representation in python (optional).We can implement the binary representation in Pythonas follows:
def NtS(n):# natural numbers to stringsif n > 1:
return NtS(n // 2) + str(n % 2)else:
return str(n % 2)
print(NtS(236))# 11101100
print(NtS(19))# 10011
We can also use Python to implement the inversetransformation, mapping a string back to the naturalnumber it represents.
def StN(x):# String to numberk = len(x)-1return sum(int(x[i])*(2**(k-i)) for i in
range(k+1))print(StN(NtS(236)))# 236
RRemark 2.2 â Programming examples. In this book,we sometimes use code examples as in Remark 2.1.The point is always to emphasize that certain com-putations can be achieved concretely, rather thanillustrating the features of Python or any other pro-gramming language. Indeed, one of the messages ofthis book is that all programming languages are ina certain precise sense equivalent to one another, andhence we could have just as well used JavaScript, C,COBOL, Visual Basic or even BrainF*ck. This bookis not about programming, and it is absolutely OK if
88 introduction to theoretical computer science
1 While the Babylonians already invented a positionalsystem much earlier, the decimal positional systemwe use today was invented by Indian mathematiciansaround the third century. Arab mathematicians tookit up in the 8th century. It first received significantattention in Europe with the publication of the 1202book âLiber Abaciâ by Leonardo of Pisa, also known asFibonacci, but it did not displace Roman numerals incommon usage until the 15th century.
you are not familiar with Python or do not follow codeexamples such as those in Remark 2.1.
2.1.2 Meaning of representations (discussion)It is natural for us to think of 236 as the âactualâ number, and of11101100 as âmerelyâ its representation. However, for most Euro-peans in the middle ages CCXXXVI would be the âactualâ number and236 (if they have even heard about it) would be the weird Hindu-Arabic positional representation.1 When our AI robot overlords ma-terialize, they will probably think of 11101100 as the âactualâ numberand of 236 as âmerelyâ a representation that they need to use whenthey give commands to humans.
So what is the âactualâ number? This is a question that philoso-phers of mathematics have pondered over throughout history. Platoargued that mathematical objects exist in some ideal sphere of exis-tence (that to a certain extent is more ârealâ than the world we per-ceive via our senses, as this latter world is merely the shadow of thisideal sphere). In Platoâs vision, the symbols 236 are merely notationfor some ideal object, that, in homage to the late musician, we canrefer to as âthe number commonly represented by 236â.
The Austrian philosopher Ludwig Wittgenstein, on the other hand,argued that mathematical objects do not exist at all, and the onlythings that exist are the actual marks on paper that make up 236,11101100 or CCXXXVI. In Wittgensteinâs view, mathematics is merelyabout formal manipulation of symbols that do not have any inherentmeaning. You can think of the âactualâ number as (somewhat recur-sively) âthat thing which is common to 236, 11101100 and CCXXXVIand all other past and future representations that are meant to capturethe same objectâ.
While reading this book, you are free to choose your own phi-losophy of mathematics, as long as you maintain the distinction be-tween the mathematical objects themselves and the various particularchoices of representing them, whether as splotches of ink, pixels on ascreen, zeroes and one, or any other form.
2.2 REPRESENTATIONS BEYOND NATURAL NUMBERS
We have seen that natural numbers can be represented as binarystrings. We now show that the same is true for other types of objects,including (potentially negative) integers, rational numbers, vectors,lists, graphs and many others. In many instances, choosing the ârightâstring representation for a piece of data is highly nontrivial, and find-ing the âbestâ one (e.g., most compact, best fidelity, most efficientlymanipulable, robust to errors, most informative features, etc.) is the
computation and representation 89
object of intense research. But for now, we focus on presenting somesimple representations for various objects that we would like to use asinputs and outputs for computation.
2.2.1 Representing (potentially negative) integersSince we can represent natural numbers as strings, we canrepresent the full set of integers (i.e., members of the set†= ⊠, â3, â2, â1, 0, +1, +2, +3, ⊠) by adding one more bitthat represents the sign. To represent a (potentially negative) numberđ, we prepend to the representation of the natural number |đ| a bit đthat equals 0 if đ â„ 0 and equals 1 if đ < 0. Formally, we define thefunction đđĄđ ⶠ†â 0, 1â as followsđđĄđ(đ) = â§âšâ©0 đđĄđ(đ) đ â„ 01 đđĄđ(âđ) đ < 0 (2.2)
where đđĄđ is defined as in (2.1).While the encoding function of a representation needs to be one
to one, it does not have to be onto. For example, in the representationabove there is no number that is represented by the empty stringbut it is still a fine representation, since every integer is representeduniquely by some string.
RRemark 2.3 â Interpretation and context. Given a stringđŠ â 0, 1â, how do we know if itâs âsupposedâ torepresent a (nonnegative) natural number or a (po-tentially negative) integer? For that matter, even ifwe know đŠ is âsupposedâ to be an integer, how dowe know what representation scheme it uses? Theshort answer is that we do not necessarily know thisinformation, unless it is supplied from the context. (Inprogramming languages, the compiler or interpreterdetermines the representation of the sequence of bitscorresponding to a variable based on the variableâstype.) We can treat the same string đŠ as representing anatural number, an integer, a piece of text, an image,or a green gremlin. Whenever we say a sentence suchas âlet đ be the number represented by the string đŠ,âwe will assume that we are fixing some canonical rep-resentation scheme such as the ones above. The choiceof the particular representation scheme will rarelymatter, except that we want to make sure to stick withthe same one for consistency.
2.2.2 Twoâs complement representation (optional)Section 2.2.1âs approach of representing an integer using a specificâsign bitâ is known as the Signed Magnitude Representation and was
90 introduction to theoretical computer science
Figure 2.4: In the twoâs complement representationwe represent a potentially negative integer đ ââ2đ, ⊠, 2đ â 1 as an đ + 1 length string using thebinary representation of the integer đ mod 2đ+1. Onthe lefthand side: this representation for đ = 3 (thered integers are the numbers being represented bythe blue binary strings). If a microprocessor does notcheck for overflows, adding the two positive numbers6 and 5 might result in the negative number â5 (sinceâ5 mod 16 = 11. The righthand side is a C programthat will on some 32 bit architecture print a negativenumber after adding two positive numbers. (Integeroverflow in C is considered undefined behavior whichmeans the result of this program, including whetherit runs or crashes, could differ depending on thearchitecture, compiler, and even compiler options andversion.)
used in some early computers. However, the twoâs complement rep-resentation is much more common in practice. The twoâs complementrepresentation of an integer đ in the set â2đ, â2đ + 1, ⊠, 2đ â 1 is thestring đđĄđđ(đ) of length đ + 1 defined as follows:đđĄđđ(đ) = â§âšâ©đđĄđđ+1(đ) 0 †đ †2đ â 1đđĄđđ+1(2đ+1 + đ) â2đ †đ †â1 , (2.3)
where đđĄđâ(đ) demotes the standard binary representation of a num-ber đ â 0, ⊠, 2â as string of length â, padded with leading zerosas needed. For example, if đ = 3 then đđĄđ3(1) = đđĄđ4(1) = 0001,đđĄđ3(2) = đđĄđ4(2) = 0010, đđĄđ3(â1) = đđĄđ4(16 â 1) = 1111, andđđĄđ3(â8) = đđĄđ4(16 â 8) = 1000. If đ is a negative number larger thanor equal to â2đ then 2đ+1 + đ is a number between 2đ and 2đ+1 â 1.Hence the twoâs complement representation of such a number đ is astring of length đ + 1 with its first digit equal to 1.
Another way to say this is that we represent a potentially negativenumber đ â â2đ, ⊠, 2đ â1 as the non-negative number đ mod 2đ+1(see also Fig. 2.4). This means that if two (potentially negative) num-bers đ and đâČ are not too large (i.e., |đ| + |đâČ| < 2đ+1), then we cancompute the representation of đ + đâČ by adding modulo 2đ+1 the rep-resentations of đ and đâČ as if they were non-negative integers. Thisproperty of the twoâs complement representation is its main attractionsince, depending on their architectures, microprocessors can oftenperform arithmetic operations modulo 2đ€ very efficiently (for certainvalues of đ€ such as 32 and 64). Many systems leave it to the pro-grammer to check that values are not too large and will carry out thismodular arithmetic regardless of the size of the numbers involved. Forthis reason, in some systems adding two large positive numbers canresult in a negative number (e.g., adding 2đ â 100 and 2đ â 200 mightresult in â300 since (2đ+1 â 300) mod 2đ+1 = â300, see also Fig. 2.4).
2.2.3 Rational numbers, and representing pairs of stringsWe can represent a rational number of the form đ/đ by represent-ing the two numbers đ and đ. However, merely concatenating therepresentations of đ and đ will not work. For example, the binary rep-resentation of 4 is 100 and the binary representation of 43 is 101011,but the concatenation 100101011 of these strings is also the concatena-tion of the representation 10010 of 18 and the representation 1011 of11. Hence, if we used such simple concatenation then we would notbe able to tell if the string 100101011 is supposed to represent 4/43 or18/11.
We tackle this by giving a general representation for pairs of strings.If we were using a pen and paper, we would just use a separator sym-bol such as â to represent, for example, the pair consisting of the num-
computation and representation 91
bers represented by 10 and 110001 as the length-9 string â01â110001â.In other words, there is a one to one map đč from pairs of stringsđ„, đŠ â 0, 1â into a single string đ§ over the alphabet ÎŁ = 0, 1, â(in other words, đ§ â ÎŁâ). Using such separators is similar to theway we use spaces and punctuation to separate words in English. Byadding a little redundancy, we achieve the same effect in the digitaldomain. We can map the three-element set ÎŁ to the three-element set00, 11, 01 â 0, 12 in a one-to-one fashion, and hence encode alength đ string đ§ â ÎŁâ as a length 2đ string đ€ â 0, 1â.
Our final representation for rational numbers is obtained by com-posing the following steps:1. Representing a non-negative rational number as a pair of natural
numbers.2. Representing a natural number by a string via the binary represen-
tation.3. Combining 1 and 2 to obtain a representation of a rational number
as a pair of strings.4. Representing a pair of strings over 0, 1 as a single string overÎŁ = 0, 1, â.5. Representing a string over ÎŁ as a longer string over 0, 1.
Example 2.4 â Representing a rational number as a string. Consider therational number đ = â5/8. We represent â5 as 1101 and +8 as01000, and so we can represent đ as the pair of strings (1101, 01000)and represent this pair as the length 10 string 1101â01000 overthe alphabet 0, 1, â. Now, applying the map 0 ⊠00, 1 ⊠11,â ⊠01, we can represent the latter string as the length 20 stringđ = 11110011010011000000 over the alphabet 0, 1.The same idea can be used to represent triples of strings, quadru-
ples, and so on as a string. Indeed, this is one instance of a very gen-eral principle that we use time and again in both the theory and prac-tice of computer science (for example, in Object Oriented program-ming):
Big Idea 1 If we can represent objects of type đ as strings, then wecan represent tuples of objects of type đ as strings as well.
Repeating the same idea, once we can represent objects of type đ ,we can also represent lists of lists of such objects, and even lists of listsof lists and so on and so forth. We will come back to this point whenwe discuss prefix free encoding in Section 2.5.2.
92 introduction to theoretical computer science
Figure 2.6: XKCD cartoon on floating-point arithmetic.
2.3 REPRESENTING REAL NUMBERS
The set of real numbers â contains all numbers including positive,negative, and fractional, as well as irrational numbers such as đ or đ.Every real number can be approximated by a rational number, andthus we can represent every real number đ„ by a rational number đ/đthat is very close to đ„. For example, we can represent đ by 22/7 withinan error of about 10â3. If we want a smaller error (e.g., about 10â4)then we can use 311/99, and so on and so forth.
Figure 2.5: The floating point representation of a realnumber đ„ â â is its approximation as a number ofthe form đđ â 2đ where đ â ±1, đ is an (potentiallynegative) integer, and đ is a rational number between1 and 2 expressed as a binary fraction 1.đ0đ1đ2 ⊠đđfor some đ1, ⊠, đđ â 0, 1 (that is đ = 1 + đ1/2 +đ2/4 + ⊠+ đđ/2đ). Commonly-used floating pointrepresentations fix the numbers â and đ of bits torepresent đ and đ respectively. In the example above,assuming we use twoâs complement representationfor đ, the number represented is â1 Ă 25 Ă (1 + 1/2 +1/4 + 1/64 + 1/512) = â56.5625.The above representation of real numbers via rational numbers
that approximate them is a fine choice for a representation scheme.However, typically in computing applications, it is more common touse the floating point representation scheme (see Fig. 2.5) to representreal numbers. In the floating point representation scheme we rep-resent đ„ â â by the pair (đ, đ) of (positive or negative) integers ofsome prescribed sizes (determined by the desired accuracy) such thatđ Ă 2đ is closest to đ„. Floating point representation is the base-twoversion of scientific notation, where one represents a number đŠ â đ as its approximation of the form đ Ă 10đ for đ, đ. It is called âfloatingpointâ because we can think of the number đ as specifying a sequenceof binary digits, and đ as describing the location of the âbinary pointâwithin this sequence. The use of floating representation is the reasonwhy in many programming systems, printing the expression 0.1+0.2will result in 0.30000000000000004 and not 0.3, see here, here andhere for more.
The reader might be (rightly) worried about the fact that the float-ing point representation (or the rational number one) can only ap-proximately represent real numbers. In many (though not all) com-putational applications, one can make the accuracy tight enough sothat this does not affect the final result, though sometimes we do needto be careful. Indeed, floating-point bugs can sometimes be no jok-ing matter. For example, floating point rounding errors have beenimplicated in the failure of a U.S. Patriot missile to intercept an IraqiScud missile, costing 28 lives, as well as a 100 million pound error incomputing payouts to British pensioners.
computation and representation 93
2 đ đĄđ stands for âreal numbers to stringsâ.
2.4 CANTORâS THEOREM, COUNTABLE SETS, AND STRING REP-RESENTATIONS OF THE REAL NUMBERS
âFor any collection of fruits, we can make more fruit salads than there arefruits. If not, we could label each salad with a different fruit, and consider thesalad of all fruits not in their salad. The label of this salad is in it if and only ifit is not.â, Martha Storey.
Given the issues with floating point approximations for real num-bers, a natural question is whether it is possible to represent real num-bers exactly as strings. Unfortunately, the following theorem showsthat this cannot be done:
Theorem 2.5 â Cantorâs Theorem. There does not exist a one-to-onefunction đ đĄđ ⶠâ â 0, 1â. 2
Countable sets. We say that a set đ is countable if there is an ontomap đ¶ ⶠâ â đ, or in other words, we can write đ as the sequenceđ¶(0), đ¶(1), đ¶(2), âŠ. Since the binary representation yields an ontomap from 0, 1â to â, and the composition of two onto maps is onto,a set đ is countable iff there is an onto map from 0, 1â to đ. Usingthe basic properties of functions (see Section 1.4.3), a set is countableif and only if there is a one-to-one function from đ to 0, 1â. Hence,we can rephrase Theorem 2.5 as follows:
Theorem 2.6 â Cantorâs Theorem (equivalent statement). The reals are un-countable. That is, there does not exist an onto function đđĄđ ⶠâ ââ.
Theorem 2.6 was proven by Georg Cantor in 1874. This result (andthe theory around it) was quite shocking to mathematicians at thetime. By showing that there is no one-to-one map from â to 0, 1â (orâ), Cantor showed that these two infinite sets have âdifferent forms ofinfinityâ and that the set of real numbers â is in some sense âbiggerâthan the infinite set 0, 1â. The notion that there are âshades of infin-ityâ was deeply disturbing to mathematicians and philosophers at thetime. The philosopher Ludwig Wittgenstein (whom we mentioned be-fore) called Cantorâs results âutter nonsenseâ and âlaughable.â Othersthought they were even worse than that. Leopold Kronecker calledCantor a âcorrupter of youth,â while Henri PoincarĂ© said that Can-torâs ideas âshould be banished from mathematics once and for all.âThe tide eventually turned, and these days Cantorâs work is univer-sally accepted as the cornerstone of set theory and the foundations ofmathematics. As David Hilbert said in 1925, âNo one shall expel us fromthe paradise which Cantor has created for us.â As we will see later in this
94 introduction to theoretical computer science
3 đčđĄđ stands for âfunctions to stringsâ.4 đčđĄđ stands for âfunctions to reals.â
book, Cantorâs ideas also play a huge role in the theory of computa-tion.
Now that we have discussed Theorem 2.5âs importance, let us seethe proof. It is achieved in two steps:
1. Define some infinite set đł for which it is easier for us to prove thatđł is not countable (namely, itâs easier for us to prove that there isno one-to-one function from đł to 0, 1â).
2. Prove that there is a one-to-one function đș mapping đł to â.
We can use a proof by contradiction to show that these two factstogether imply Theorem 2.5. Specifically, if we assume (towards thesake of contradiction) that there exists some one-to-one đč mapping âto 0, 1â then the function đ„ ⊠đč(đș(đ„)) obtained by composing đčwith the function đș from Step 2 above would be a one-to-one functionfrom đł to 0, 1â, which contradicts what we proved in Step 1!
To turn this idea into a full proof of Theorem 2.5 we need to:
âą Define the set đł.
âą Prove that there is no one-to-one function from đł to 0, 1ââą Prove that there is a one-to-one function from đł to â.
We now proceed to do precisely that. That is, we will define the set0, 1â, which will play the role of đł, and then state and prove twolemmas that show that this set satisfies our two desired properties.
Definition 2.7 We denote by 0, 1â the set đ | đ ⶠâ â 0, 1.That is, 0, 1â is a set of functions, and a function đ is in 0, 1â
iff its domain is â and its codomain is 0, 1. We can also think of0, 1â as the set of all infinite sequences of bits, since a function đ â¶â â 0, 1 can be identified with the sequence (đ(0), đ(1), đ(2), âŠ).The following two lemmas show that 0, 1â can play the role of đł toestablish Theorem 2.5.Lemma 2.8 There does not exist a one-to-one map đčđĄđ ⶠ0, 1â â0, 1â.3Lemma 2.9 There does exist a one-to-one map đčđĄđ ⶠ0, 1â â â.4
As weâve seen above, Lemma 2.8 and Lemma 2.9 together implyTheorem 2.5. To repeat the argument more formally, suppose, forthe sake of contradiction, that there did exist a one-to-one functionđ đĄđ ⶠâ â 0, 1â. By Lemma 2.9, there exists a one-to-one functionđčđĄđ ⶠ0, 1â â â. Thus, under this assumption, since the composi-tion of two one-to-one functions is one-to-one (see Exercise 2.12), the
computation and representation 95
function đčđĄđ ⶠ0, 1â â 0, 1â defined as đčđĄđ(đ) = đ đĄđ(đčđĄđ (đ))will be one to one, contradicting Lemma 2.8. See Fig. 2.7 for a graphi-cal illustration of this argument.
Figure 2.7: We prove Theorem 2.5 by combiningLemma 2.8 and Lemma 2.9. Lemma 2.9, which usesstandard calculus tools, shows the existence of aone-to-one map đčđĄđ from the set 0, 1â to the realnumbers. So, if a hypothetical one-to-one map đ đĄđ â¶â â 0, 1â existed, then we could compose themto get a one-to-one map đčđĄđ ⶠ0, 1â â 0, 1â.Yet this contradicts Lemma 2.8- the heart of the proof-which rules out the existence of such a map.
Now all that is left is to prove these two lemmas. We start by prov-ing Lemma 2.8 which is really the heart of Theorem 2.5.
Figure 2.8: We construct a function đ such that đ â đđĄđč(đ„) for every đ„ â 0, 1â by ensuring thatđ(đ(đ„)) â đđĄđč(đ„)(đ(đ„)) for every đ„ â 0, 1âwith lexicographic order đ(đ„). We can think of thisas building a table where the columns correspondto numbers đ â â and the rows correspond tođ„ â 0, 1â (sorted according to đ(đ„)). If the entryin the đ„-th row and the đ-th column corresponds tođ(đ)) where đ = đđĄđč(đ„) then đ is obtained by goingover the âdiagonalâ elements in this table (the entriescorresponding to the đ„-th row and đ(đ„)-th column)and ensuring that đ(đ„)(đ(đ„)) â đđĄđč(đ„)(đ„).
Warm-up: âBaby Cantorâ. The proof of Lemma 2.8 is rather subtle. Oneway to get intution for it is to consider the following finite statementâthere is no onto function đ ⶠ0, ⊠, 99 â 0, 1100. Of course weknow itâs true since the set 0, 1100 is bigger than the set [100], butletâs see a direct proof. For every đ ⶠ0, ⊠, 99 â 0, 1100, wecan define the string đ â 0, 1100 as follows: đ = (1 â đ(0)0, 1 âđ(1)1, ⊠, 1 â đ(99)99). If đ was onto, then there would exist someđ â [100] such that đ(đ) = đ(đ), but we claim that no such đ exists.
96 introduction to theoretical computer science
Indeed, if there was such đ, then the đ-th coordinate of đ would equalđ(đ)đ but by definition this coordinate equals 1 â đ(đ)đ. See also aâproof by codeâ of this statement.
Proof of Lemma 2.8. We will prove that there does not exist an ontofunction đđĄđč ⶠ0, 1â â 0, 1â. This implies the lemma sincefor every two sets đŽ and đ”, there exists an onto function from đŽ tođ” if and only if there exists a one-to-one function from đ” to đŽ (seeLemma 1.2).
The technique of this proof is known as the âdiagonal argumentâand is illustrated in Fig. 2.8. We assume, towards a contradiction, thatthere exists such a function đđĄđč ⶠ0, 1â â 0, 1â. We will showthat đđĄđč is not onto by demonstrating a function đ â 0, 1â such thatđ â đđĄđč(đ„) for every đ„ â 0, 1â. Consider the lexicographic orderingof binary strings (i.e., "",0,1,00,01,âŠ). For every đ â â, we let đ„đ be theđ-th string in this order. That is đ„0 = "", đ„1 = 0, đ„2 = 1 and so on andso forth. We define the function đ â 0, 1â as follows:đ(đ) = 1 â đđĄđč(đ„đ)(đ) (2.4)
for every đ â â. That is, to compute đ on input đ â â, we first com-pute đ = đđĄđč(đ„đ), where đ„đ â 0, 1â is the đ-th string in the lexico-graphical ordering. Since đ â 0, 1â, it is a function mapping â to0, 1. The value đ(đ) is defined to be the negation of đ(đ).
The definition of the function đ is a bit subtle. One way to thinkabout it is to imagine the function đđĄđč as being specified by an in-finitely long table, in which every row corresponds to a string đ„ â0, 1â (with strings sorted in lexicographic order), and contains thesequence đđĄđč(đ„)(0), đđĄđč(đ„)(1), đđĄđč(đ„)(2), âŠ. The diagonal elements inthis table are the values
đđĄđč("")(0), đđĄđč(0)(1), đđĄđč(1)(2), đđĄđč(00)(3), đđĄđč(01)(4), ⊠(2.5)
which correspond to the elements đđĄđč(đ„đ)(đ) in the đ-th row andđ-th column of this table for đ = 0, 1, 2, âŠ. The function đ we definedabove maps every đ â â to the negation of the đ-th diagonal value.
To complete the proof that đđĄđč is not onto we need to show thatđ â đđĄđč(đ„) for every đ„ â 0, 1â. Indeed, let đ„ â 0, 1â be some stringand let đ = đđĄđč(đ„). If đ is the position of đ„ in the lexicographicalorder then by construction đ(đ) = 1 â đ(đ) â đ(đ) which means thatđ â đ which is what we wanted to prove.
computation and representation 97
RRemark 2.10 â Generalizing beyond strings and reals.Lemma 2.8 doesnât really have much to do with thenatural numbers or the strings. An examination ofthe proof shows that it really shows that for everyset đ, there is no one-to-one map đč ⶠ0, 1đ â đwhere 0, 1đ denotes the set đ | đ ⶠđ â 0, 1of all Boolean functions with domain đ. Since we canidentify a subset đ â đ with its characteristic functionđ = 1đ (i.e., 1đ (đ„) = 1 iff đ„ â đ ), we can think of0, 1đ also as the set of all subsets of đ. This subsetis sometimes called the power set of đ and denoted byđ«(đ) or 2đ .The proof of Lemma 2.8 can be generalized to showthat there is no one-to-one map between a set and itspower set. In particular, it means that the set 0, 1â isâeven biggerâ than â. Cantor used these ideas to con-struct an infinite hierarchy of shades of infinity. Thenumber of such shades turns out to be much largerthan |â| or even |â|. He denoted the cardinality of âby â”0 and denoted the next largest infinite numberby â”1. (â” is the first letter in the Hebrew alphabet.)Cantor also made the continuum hypothesis that|â| = â”1. We will come back to the fascinating storyof this hypothesis later on in this book. This lecture ofAaronson mentions some of these issues (see also thisBerkeley CS 70 lecture).
To complete the proof of Theorem 2.5, we need to show Lemma 2.9.This requires some calculus background but is otherwise straightfor-ward. If you have not had much experience with limits of a real seriesbefore, then the formal proof below might be a little hard to follow.This part is not the core of Cantorâs argument, nor are such limitsimportant to the remainder of this book, so you can feel free to takeLemma 2.9 on faith and skip the proof.
Proof Idea:
We define đčđĄđ (đ) to be the number between 0 and 2 whose dec-imal expansion is đ(0).đ(1)đ(2) âŠ, or in other words đčđĄđ (đ) =ââđ=0 đ(đ) â 10âđ. If đ and đ are two distinct functions in 0, 1â, thenthere must be some input đ in which they disagree. If we take theminimum such đ, then the numbers đ(0).đ(1)đ(2) ⊠đ(đ â 1)đ(đ) âŠand đ(0).đ(1)đ(2) ⊠đ(đ) ⊠agree with each other all the way up to theđ â 1-th digit after the decimal point, and disagree on the đ-th digit.But then these numbers must be distinct. Concretely, if đ(đ) = 1 andđ(đ) = 0 then the first number is larger than the second, and otherwise(đ(đ) = 0 and đ(đ) = 1) the first number is smaller than the second.In the proof we have to be a little careful since these are numbers withinfinite expansions. For example, the number one half has two decimal
98 introduction to theoretical computer science
expansions 0.5 and 0.49999 âŻ. However, this issue does not come uphere, since we restrict attention only to numbers with decimal expan-sions that do not involve the digit 9.âProof of Lemma 2.9. For every đ â 0, 1â, we define đčđĄđ (đ) to be thenumber whose decimal expansion is đ(0).đ(1)đ(2)đ(3) âŠ. Formally,đčđĄđ (đ) = ââđ=0 đ(đ) â 10âđ (2.6)
It is a known result in calculus (whose proof we will not repeat here)that the series on the righthand side of (2.6) converges to a definitelimit in â.
We now prove that đčđĄđ is one to one. Let đ, đ be two distinct func-tions in 0, 1â. Since đ and đ are distinct, there must be some inputon which they differ, and we define đ to be the smallest such inputand assume without loss of generality that đ(đ) = 0 and đ(đ) = 1.(Otherwise, if đ(đ) = 1 and đ(đ) = 0, then we can simply switch theroles of đ and đ.) The numbers đčđĄđ (đ) and đčđĄđ (đ) agree with eachother up to the đâ1-th digit up after the decimal point. Since this digitequals 0 for đčđĄđ (đ) and equals 1 for đčđĄđ (đ), we claim that đčđĄđ (đ) isbigger than đčđĄđ (đ) by at least 0.5 â 10âđ. To see this note that the dif-ference đčđĄđ (đ) â đčđĄđ (đ) will be minimized if đ(â) = 0 for every â > đand đ(â) = 1 for every â > đ, in which case (since đ and đ agree up tothe đ â 1-th digit)
đčđĄđ (đ) â đčđĄđ (đ) = 10âđ â 10âđâ1 â 10âđâ2 â 10âđâ3 â ⯠(2.7)
Since the infinite series ââđ=0 10âđ converges to 10/9, it follows thatfor every such đ and đ, đčđĄđ (đ) â đčđĄđ (đ) â„ 10âđ â 10âđâ1 â (10/9) > 0.In particular we see that for every distinct đ, đ â 0, 1â, đčđĄđ (đ) â đčđĄđ (đ), implying that the function đčđĄđ is one to one.
RRemark 2.11 â Using decimal expansion (op-tional). In the proof above we used the fact that1 + 1/10 + 1/100 + ⯠converges to 10/9, whichplugging into (2.7) yields that the difference betweenđčđĄđ (đ) and đčđĄđ (â) is at least 10âđâ10âđâ1 â (10/9) > 0.While the choice of the decimal representation for đčđĄđ was arbitrary, we could not have used the binaryrepresentation in its place. Had we used the binaryexpansion instead of decimal, the corresponding se-quence 1 + 1/2 + 1/4 + ⯠converges to 2/1 = 2,
computation and representation 99
and since 2âđ = 2âđâ1 â 2, we could not have de-duced that đčđĄđ is one to one. Indeed there do existpairs of distinct sequences đ, đ â 0, 1â such thatââđ=0 đ(đ)2âđ = ââđ=0 đ(đ)2âđ. (For example, the se-quence 1, 0, 0, 0, ⊠and the sequence 0, 1, 1, 1, ⊠havethis property.)
2.4.1 Corollary: Boolean functions are uncountableCantorâs Theorem yields the following corollary that we will useseveral times in this book: the set of all Boolean functions (mapping0, 1â to 0, 1) is not countable:
Theorem 2.12 â Boolean functions are uncountable. Let ALL be the set ofall functions đč ⶠ0, 1â â 0, 1. Then ALL is uncountable. Equiv-alently, there does not exist an onto map đđĄđŽđżđż ⶠ0, 1â â ALL.
Proof Idea:
This is a direct consequence of Lemma 2.8, since we can use thebinary representation to show a one-to-one map from 0, 1â to ALL.Hence the uncountability of 0, 1â implies the uncountability ofALL.âProof of Theorem 2.12. Since 0, 1â is uncountable, the result willfollow by showing a one-to-one map from 0, 1â to ALL. The reasonis that the existence of such a map implies that if ALL was countable,and hence there was a one-to-one map from ALL to â, then therewould have been a one-to-one map from 0, 1â to â, contradictingLemma 2.8.
We now show this one-to-one map. We simply map a functionđ â 0, 1â to the function đč ⶠ0, 1â â 0, 1 as follows. We letđč(0) = đ(0), đč(1) = đ(1), đč(10) = đč(2), đč(11) = đč(3) and so on andso forth. That is, for every đ„ â 0, 1â that represents a natural numberđ in the binary basis, we define đč(đ„) = đ(đ). If đ„ does not representsuch a number (e.g., it has a leading zero), then we set đč(đ„) = 0.
This map is one-to-one since if đ â đ are two distinct elements in0, 1â, then there must be some input đ â â on which đ(đ) â đ(đ).But then if đ„ â 0, 1â is the string representing đ, we see that đč(đ„) â đș(đ„) where đč is the function in ALL that đ mapped to, and đș is thefunction that đ is mapped to.
100 introduction to theoretical computer science
2.4.2 Equivalent conditions for countabilityThe results above establish many equivalent ways to phrase the factthat a set is countable. Specifically, the following statements are allequivalent:
1. The set đ is countable
2. There exists an onto map from â to đ3. There exists an onto map from 0, 1â to đ.
4. There exists a one-to-one map from đ to â5. There exists a one-to-one map from đ to 0, 1â.6. There exists an onto map from some countable set đ to đ.
7. There exists a one-to-one map from đ to some countable set đ .
PMake sure you know how to prove the equivalence ofall the results above.
2.5 REPRESENTING OBJECTS BEYOND NUMBERS
Numbers are of course by no means the only objects that we can repre-sent as binary strings. A representation scheme for representing objectsfrom some set đȘ consists of an encoding function that maps an object inđȘ to a string, and a decoding function that decodes a string back to anobject in đȘ. Formally, we make the following definition:
Definition 2.13 â String representation. Let đȘ be any set. A representationscheme for đȘ is a pair of functions đž, đ· where đž ⶠđȘ â 0, 1â is atotal one-to-one function, đ· ⶠ0, 1â âđ đȘ is a (possibly partial)function, and such that đ· and đž satisfy that đ·(đž(đ)) = đ for everyđ â đȘ. đž is known as the encoding function and đ· is known as thedecoding function.
Note that the condition đ·(đž(đ)) = đ for every đ â đȘ impliesthat đ· is onto (can you see why?). It turns out that to construct arepresentation scheme we only need to find an encoding function. Thatis, every one-to-one encoding function has a corresponding decodingfunction, as shown in the following lemma:Lemma 2.14 Suppose that đž ⶠđȘ â 0, 1â is one-to-one. Then thereexists a function đ· ⶠ0, 1â â đȘ such that đ·(đž(đ)) = đ for everyđ â đȘ.
computation and representation 101
Proof. Let đ0 be some arbitrary element of đȘ. For every đ„ â 0, 1â,there exists either zero or a single đ â đȘ such that đž(đ) = đ„ (otherwiseđž would not be one-to-one). We will define đ·(đ„) to equal đ0 in thefirst case and this single object đ in the second case. By definitionđ·(đž(đ)) = đ for every đ â đȘ.
RRemark 2.15 â Total decoding functions. While thedecoding function of a representation scheme can ingeneral be a partial function, the proof of Lemma 2.14implies that every representation scheme has a totaldecoding function. This observation can sometimes beuseful.
2.5.1 Finite representationsIf đȘ is finite, then we can represent every object in đȘ as a string oflength at most some number đ. What is the value of đ? Let us denoteby 0, 1â€đ the set đ„ â 0, 1â ⶠ|đ„| †đ of strings of length at most đ.The size of 0, 1â€đ is equal to|0, 10| + |0, 11| + |0, 12| + ⯠+ |0, 1đ| = đâđ=0 2đ = 2đ+1 â 1 (2.8)
using the standard formula for summing a geometric progression.To obtain a representation of objects in đȘ as strings of length at
most đ we need to come up with a one-to-one function from đȘ to0, 1â€đ. We can do so, if and only if |đȘ| †2đ+1 â 1 as is implied bythe following lemma:Lemma 2.16 For every two finite sets đ, đ , there exists a one-to-oneđž ⶠđ â đ if and only if |đ| †|đ |.Proof. Let đ = |đ| and đ = |đ | and so write the elements of đ andđ as đ = đ 0, đ 1, ⊠, đ đâ1 and đ = đĄ0, đĄ1, ⊠, đĄđâ1. We need toshow that there is a one-to-one function đž ⶠđ â đ iff đ †đ. Forthe âifâ direction, if đ †đ we can simply define đž(đ đ) = đĄđ for everyđ â [đ]. Clearly for đ â đ, đĄđ = đž(đ đ) â đž(đ đ) = đĄđ, and hence thisfunction is one-to-one. In the other direction, suppose that đ > đ andđž ⶠđ â đ is some function. Then đž cannot be one-to-one. Indeed, forđ = 0, 1, ⊠, đ â 1 let us âmarkâ the element đĄđ = đž(đ đ) in đ . If đĄđ wasmarked before, then we have found two objects in đ mapping to thesame element đĄđ. Otherwise, since đ has đ elements, when we get tođ = đ â 1 we mark all the objects in đ . Hence, in this case, đž(đ đ) mustmap to an element that was already marked before. (This observationis sometimes known as the âPigeon Hole Principleâ: the principle that
102 introduction to theoretical computer science
if you have a pigeon coop with đ holes, and đ > đ pigeons, then theremust be two pigeons in the same hole. )
2.5.2 Prefix-free encodingWhen showing a representation scheme for rational numbers, we usedthe âhackâ of encoding the alphabet 0, 1, â to represent tuples ofstrings as a single string. This is a special case of the general paradigmof prefix-free encoding. The idea is the following: if our representationhas the property that no string đ„ representing an object đ is a prefix(i.e., an initial substring) of a string đŠ representing a different objectđâČ, then we can represent a list of objects by merely concatenating therepresentations of all the list members. For example, because in En-glish every sentence ends with a punctuation mark such as a period,exclamation, or question mark, no sentence can be a prefix of anotherand so we can represent a list of sentences by merely concatenatingthe sentences one after the other. (English has some complicationssuch as periods used for abbreviations (e.g., âe.g.â) or sentence quotescontaining punctuation, but the high level point of a prefix-free repre-sentation for sentences still holds.)
It turns out that we can transform every representation to a prefix-free form. This justifies Big Idea 1, and allows us to transform a repre-sentation scheme for objects of a type đ to a representation scheme oflists of objects of the type đ . By repeating the same technique, we canalso represent lists of lists of objects of type đ , and so on and so forth.But first let us formally define prefix-freeness:
Definition 2.17 â Prefix free encoding. For two strings đŠ, đŠâČ, we say that đŠis a prefix of đŠâČ if |đŠ| †|đŠâČ| and for every đ < |đŠ|, đŠâČđ = đŠđ.
Let đȘ be a non-empty set and đž ⶠđȘ â 0, 1â be a function. Wesay that đž is prefix-free if đž(đ) is non-empty for every đ â đȘ andthere does not exist a distinct pair of objects đ, đâČ â đȘ such thatđž(đ) is a prefix of đž(đâČ).Recall that for every set đȘ, the set đȘâ consists of all finite length
tuples (i.e., lists) of elements in đȘ. The following theorem shows thatif đž is a prefix-free encoding of đȘ then by concatenating encodings wecan obtain a valid (i.e., one-to-one) representation of đȘâ:
Theorem 2.18 â Prefix-free implies tuple encoding. Suppose that đž ⶠđȘ â0, 1â is prefix-free. Then the following map đž ⶠđȘâ â 0, 1â isone to one, for every đ0, ⊠, đđâ1 â đȘâ, we defineđž(đ0, ⊠, đđâ1) = đž(đ0)đž(đ1) ⯠đž(đđâ1) . (2.9)
computation and representation 103
Figure 2.9: If we have a prefix-free representation ofeach object then we can concatenate the representa-tions of đ objects to obtain a representation for thetuple (đ0, ⊠, đđâ1).
PTheorem 2.18 is an example of a theorem that is a littlehard to parse, but in fact is fairly straightforward toprove once you understand what it means. Therefore,I highly recommend that you pause here to makesure you understand the statement of this theorem.You should also try to prove it on your own beforeproceeding further.
Proof Idea:
The idea behind the proof is simple. Suppose that for examplewe want to decode a triple (đ0, đ1, đ2) from its representation đ„ =đž(đ0, đ1, đ2) = đž(đ0)đž(đ1)đž(đ2). We will do so by first finding thefirst prefix đ„0 of đ„ that is a representation of some object. Then wewill decode this object, remove đ„0 from đ„ to obtain a new string đ„âČ,and continue onwards to find the first prefix đ„1 of đ„âČ and so on and soforth (see Exercise 2.9). The prefix-freeness property of đž will ensurethat đ„0 will in fact be đž(đ0), đ„1 will be đž(đ1), etc.âProof of Theorem 2.18. We now show the formal proof. Suppose, to-wards the sake of contradiction, that there exist two distinct tuples(đ0, ⊠, đđâ1) and (đâČ0, ⊠, đâČđâČâ1) such thatđž(đ0, ⊠, đđâ1) = đž(đâČ0, ⊠, đâČđâČâ1) . (2.10)
We will denote the string đž(đ0, ⊠, đđâ1) by đ„.Let đ be the first index such that đđ â đâČđ. (If đđ = đâČđ for all đ then,
since we assume the two tuples are distinct, one of them must belarger than the other. In this case we assume without loss of generalitythat đâČ > đ and let đ = đ.) In the case that đ < đ, we see that the stringđ„ can be written in two different ways:
đ„ = đž(đ0, ⊠, đđâ1) = đ„0 ⯠đ„đâ1đž(đđ)đž(đđ+1) ⯠đž(đđâ1) (2.11)
and
đ„ = đž(đâČ0, ⊠, đâČđâČâ1) = đ„0 ⯠đ„đâ1đž(đâČđ)đž(đâČđ+1) ⯠đž(đâČđâ1) (2.12)
where đ„đ = đž(đđ) = đž(đâČđ) for all đ < đ. Let đŠ be the string obtainedafter removing the prefix đ„0 ⯠đ„đâđ from đ„. We see that đŠ can be writ-ten as both đŠ = đž(đđ)đ for some string đ â 0, 1â and as đŠ = đž(đâČđ)đ âČfor some đ âČ â 0, 1â. But this means that one of đž(đđ) and đž(đâČđ) mustbe a prefix of the other, contradicting the prefix-freeness of đž.
104 introduction to theoretical computer science
In the case that đ = đ and đâČ > đ, we get a contradiction in thefollowing way. In this case
đ„ = đž(đ0) ⯠đž(đđâ1) = đž(đ0) ⯠đž(đđâ1)đž(đâČđ) ⯠đž(đâČđâČâ1) (2.13)
which means that đž(đâČđ) ⯠đž(đâČđâČâ1) must correspond to the emptystring "". But in such a case đž(đâČđ) must be the empty string, which inparticular is the prefix of any other string, contradicting the prefix-freeness of đž.
RRemark 2.19 â Prefix freeness of list representation.Even if the representation đž of objects in đȘ is prefixfree, this does not mean that our representation đžof lists of such objects will be prefix free as well. Infact, it wonât be: for every three objects đ, đâČ, đâł therepresentation of the list (đ, đâČ) will be a prefix of therepresentation of the list (đ, đâČ, đâł). However, as we seein Lemma 2.20 below, we can transform every repre-sentation into prefix-free form, and so will be able touse that transformation if needed to represent lists oflists, lists of lists of lists, and so on and so forth.
2.5.3 Making representations prefix-freeSome natural representations are prefix-free. For example, every fixedoutput length representation (i.e., one-to-one function đž ⶠđȘ â 0, 1đ)is automatically prefix-free, since a string đ„ can only be a prefix ofan equal-length đ„âČ if đ„ and đ„âČ are identical. Moreover, the approachwe used for representing rational numbers can be used to show thefollowing:Lemma 2.20 Let đž ⶠđȘ â 0, 1â be a one-to-one function. Then there isa one-to-one prefix-free encoding đž such that |đž(đ)| †2|đž(đ)| + 2 forevery đ â đȘ.
PFor the sake of completeness, we will include theproof below, but it is a good idea for you to pausehere and try to prove it on your own, using the sametechnique we used for representing rational numbers.
Proof of Lemma 2.20. The idea behind the proof is to use the map 0 âŠ00, 1 ⊠11 to âdoubleâ every bit in the string đ„ and then mark theend of the string by concatenating to it the pair 01. If we encode a
computation and representation 105
string đ„ in this way, it ensures that the encoding of đ„ is never a prefixof the encoding of a distinct string đ„âČ. Formally, we define the functionPF ⶠ0, 1â â 0, 1â as follows:
PF(đ„) = đ„0đ„0đ„1đ„1 ⊠đ„đâ1đ„đâ101 (2.14)
for every đ„ â 0, 1â. If đž ⶠđȘ â 0, 1â is the (potentially notprefix-free) representation for đȘ, then we transform it into a prefix-free representation đž ⶠđȘ â 0, 1â by defining đž(đ) = PF(đž(đ)).
To prove the lemma we need to show that (1) đž is one-to-one and(2) đž is prefix-free. In fact, prefix freeness is a stronger condition thanone-to-one (if two strings are equal then in particular one of them is aprefix of the other) and hence it suffices to prove (2), which we nowdo.
Let đ â đâČ in đȘ be two distinct objects. We will prove that đž(đ)is a not a prefix of đž(đâČ). Define đ„ = đž(đ) and đ„âČ = đž(đâČ). Sinceđž is one-to-one, đ„ â đ„âČ. Under our assumption, PF(đ„) is a prefixof PF(đ„âČ). If |đ„| < |đ„âČ| then the two bits in positions 2|đ„|, 2|đ„| + 1in PF(đ„) have the value 01 but the corresponding bits in PF(đ„âČ) willequal either 00 or 11 (depending on the |đ„|-th bit of đ„âČ) and hencePF(đ„) cannot be a prefix of PF(đ„âČ). If |đ„| = |đ„âČ| then, since đ„ â đ„âČ,there must be a coordinate đ in which they differ, meaning that thestrings PF(đ„) and PF(đ„âČ) differ in the coordinates 2đ, 2đ + 1, whichagain means that PF(đ„) cannot be a prefix of PF(đ„âČ). If |đ„| > |đ„âČ|then |PF(đ„)| = 2|đ„| + 2 > |PF(đ„âČ)| = 2|đ„âČ| + 2 and hence PF(đ„) islonger than (and cannot be a prefix of) PF(đ„âČ). In all cases we see thatPF(đ„) = đž(đ) is not a prefix of PF(đ„âČ) = đž(đâČ), hence completing theproof.
The proof of Lemma 2.20 is not the only or even the best way totransform an arbitrary representation into prefix-free form. Exer-cise 2.10 asks you to construct a more efficient prefix-free transforma-tion satisfying |đž(đ)| †|đž(đ)| + đ(log |đž(đ)|).2.5.4 âProof by Pythonâ (optional)The proofs of Theorem 2.18 and Lemma 2.20 are constructive in thesense that they give us:
âą A way to transform the encoding and decoding functions of anyrepresentation of an object đ to encoding and decoding functionsthat are prefix-free, and
âą A way to extend prefix-free encoding and decoding of single objectsto encoding and decoding of lists of objects by concatenation.
106 introduction to theoretical computer science
Specifically, we could transform any pair of Python functions en-code and decode to functions pfencode and pfdecode that correspondto a prefix-free encoding and decoding. Similarly, given pfencode andpfdecode for single objects, we can extend them to encoding of lists.Let us show how this works for the case of the NtS and StN functionswe defined above.
We start with the âPython proofâ of Lemma 2.20: a way to trans-form an arbitrary representation into one that is prefix free. The func-tion prefixfree below takes as input a pair of encoding and decodingfunctions, and returns a triple of functions containing prefix-free encod-ing and decoding functions, as well as a function that checks whethera string is a valid encoding of an object.
# takes functions encode and decode mapping# objects to lists of bits and vice versa,# and returns functions pfencode and pfdecode that# maps objects to lists of bits and vice versa# in a prefix-free way.# Also returns a function pfvalid that says# whether a list is a valid encodingdef prefixfree(encode, decode):
def pfencode(o):L = encode(o)return [L[i//2] for i in range(2*len(L))]+[0,1]
def pfdecode(L):return decode([L[j] for j in range(0,len(L)-2,2)])
def pfvalid(L):return (len(L) % 2 == 0 ) and all(L[2*i]==L[2*i+1]
for i in range((len(L)-2)//2)) andL[-2:]==[0,1]
return pfencode, pfdecode, pfvalid
pfNtS, pfStN , pfvalidN = prefixfree(NtS,StN)
NtS(234)# 11101010pfNtS(234)# 111111001100110001pfStN(pfNtS(234))# 234pfvalidM(pfNtS(234))# true
computation and representation 107
PNote that the Python function prefixfree abovetakes two Python functions as input and outputsthree Python functions as output. (When itâs nottoo awkward, we use the term âPython functionâ orâsubroutineâ to distinguish between such snippets ofPython programs and mathematical functions.) Youdonât have to know Python in this course, but you doneed to get comfortable with the idea of functions asmathematical objects in their own right, that can beused as inputs and outputs of other functions.
We now show a âPython proofâ of Theorem 2.18. Namely, we showa function represlists that takes as input a prefix-free representationscheme (implemented via encoding, decoding, and validity testingfunctions) and outputs a representation scheme for lists of such ob-jects. If we want to make this representation prefix-free then we couldfit it into the function prefixfree above.
def represlists(pfencode,pfdecode,pfvalid):"""Takes functions pfencode, pfdecode and pfvalid,and returns functions encodelists, decodeliststhat can encode and decode lists of the objectsrespectively."""
def encodelist(L):"""Gets list of objects, encodes it as list of
bits"""return "".join([pfencode(obj) for obj in L])
def decodelist(S):"""Gets lists of bits, returns lists of objects"""i=0; j=1 ; res = []while j<=len(S):
if pfvalid(S[i:j]):res += [pfdecode(S[i:j])]i=j
j+= 1return res
return encodelist,decodelist
LtS , StL = represlists(pfNtS,pfStN,pfvalidN)
108 introduction to theoretical computer science
Figure 2.10: The word âBinaryâ in âGrade 1â orâuncontractedâ Unified English Braille. This word isencoded using seven symbols since the first one is amodifier indicating that the first letter is capitalized.
LtS([234,12,5])# 111111001100110001111100000111001101StL(LtS([234,12,5]))# [234, 12, 5]
2.5.5 Representing letters and textWe can represent a letter or symbol by a string, and then if this rep-resentation is prefix-free, we can represent a sequence of symbols bymerely concatenating the representation of each symbol. One suchrepresentation is the ASCII that represents 128 letters and symbols asstrings of 7 bits. Since the ASCII representation is fixed-length, it isautomatically prefix-free (can you see why?). Unicode is representa-tion of (at the time of this writing) about 128,000 symbols as numbers(known as code points) between 0 and 1, 114, 111. There are severaltypes of prefix-free representations of the code points, a popular onebeing UTF-8 that encodes every codepoint into a string of length be-tween 8 and 32.
Example 2.21 â The Braille representation. The Braille system is anotherway to encode letters and other symbols as binary strings. Specifi-cally, in Braille, every letter is encoded as a string in 0, 16, whichis written using indented dots arranged in two columns and threerows, see Fig. 2.10. (Some symbols require more than one six-bitstring to encode, and so Braille uses a more general prefix-freeencoding.)
The Braille system was invented in 1821 by Louis Braille whenhe was just 12 years old (though he continued working on it andimproving it throughout his life). Braille was a French boy wholost his eyesight at the age of 5 as the result of an accident.
Example 2.22 â Representing objects in C (optional). We can use pro-gramming languages to probe how our computing environmentrepresents various values. This is easiest to do in âunsafeâ pro-gramming languages such as C that allow direct access to thememory.
Using a simple C program we have produced the followingrepresentations of various values. One can see that for integers,multiplying by 2 corresponds to a âleft shiftâ inside each byte. Incontrast, for floating point numbers, multiplying by two corre-sponds to adding one to the exponent part of the representation.In the architecture we used, a negative number is represented
computation and representation 109
using the twoâs complement approach. C represents strings in aprefix-free form by ensuring that a zero byte is at their end.
int 2 : 00000010 00000000 00000000 00000000int 4 : 00000100 00000000 00000000 00000000int 513 : 00000001 00000010 00000000 00000000long 513 : 00000001 00000010 00000000 00000000
00000000 00000000 00000000 00000000int -1 : 11111111 11111111 11111111 11111111int -2 : 11111110 11111111 11111111 11111111string Hello: 01001000 01100101 01101100 01101100
01101111 00000000string abcd : 01100001 01100010 01100011 01100100
00000000float 33.0 : 00000000 00000000 00000100 01000010float 66.0 : 00000000 00000000 10000100 01000010float 132.0: 00000000 00000000 00000100 01000011double 132.0: 00000000 00000000 00000000 00000000
00000000 10000000 01100000 010000002.5.6 Representing vectors, matrices, imagesOnce we can represent numbers and lists of numbers, then we can alsorepresent vectors (which are just lists of numbers). Similarly, we canrepresent lists of lists, and thus, in particular, can represent matrices.To represent an image, we can represent the color at each pixel by alist of three numbers corresponding to the intensity of Red, Green andBlue. (We can restrict to three primary colors since most humans onlyhave three types of cones in their retinas; we would have needed 16primary colors to represent colors visible to the Mantis Shrimp.) Thusan image of đ pixels would be represented by a list of đ such length-three lists. A video can be represented as a list of images. Of coursethese representations are rather wasteful and much more compactrepresentations are typically used for images and videos, though thiswill not be our concern in this book.
2.5.7 Representing graphsA graph on đ vertices can be represented as an đ Ă đ adjacency matrixwhose (đ, đ)đĄâ entry is equal to 1 if the edge (đ, đ) is present and isequal to 0 otherwise. That is, we can represent an đ vertex directedgraph đș = (đ , đž) as a string đŽ â 0, 1đ2 such that đŽđ,đ = 1 iff theedge đ đ â đž. We can transform an undirected graph to a directedgraph by replacing every edge đ, đ with both edges đ đ and đ đ
Another representation for graphs is the adjacency list representa-tion. That is, we identify the vertex set đ of a graph with the set [đ]
110 introduction to theoretical computer science
Figure 2.11: Representing the graph đș =(0, 1, 2, 3, 4, (1, 0), (4, 0), (1, 4), (4, 1), (2, 1), (3, 2), (4, 3))in the adjacency matrix and adjacency list representa-tions.
where đ = |đ |, and represent the graph đș = (đ , đž) as a list of đlists, where the đ-th list consists of the out-neighbors of vertex đ. Thedifference between these representations can be significant for someapplications, though for us would typically be immaterial.
2.5.8 Representing lists and nested listsIf we have a way of representing objects from a set đȘ as binary strings,then we can represent lists of these objects by applying a prefix-freetransformation. Moreover, we can use a trick similar to the above tohandle nested lists. The idea is that if we have some representationđž ⶠđȘ â 0, 1â, then we can represent nested lists of items fromđȘ using strings over the five element alphabet ÎŁ = 0,1,[ , ] , , .For example, if đ1 is represented by 0011, đ2 is represented by 10011,and đ3 is represented by 00111, then we can represent the nested list(đ1, (đ2, đ3)) as the string "[0011,[10011,00111]]" over the alphabetÎŁ. By encoding every element of ÎŁ itself as a three-bit string, we cantransform any representation for objects đȘ into a representation thatenables representing (potentially nested) lists of these objects.
2.5.9 NotationWe will typically identify an object with its representation as a string.For example, if đč ⶠ0, 1â â 0, 1â is some function that mapsstrings to strings and đ is an integer, we might make statements suchas âđč(đ) + 1 is primeâ to mean that if we represent đ as a string đ„,then the integer đ represented by the string đč(đ„) satisfies that đ + 1is prime. (You can see how this convention of identifying objects withtheir representation can save us a lot of cumbersome formalism.)Similarly, if đ„, đŠ are some objects and đč is a function that takes stringsas inputs, then by đč(đ„, đŠ) we will mean the result of applying đč to therepresentation of the ordered pair (đ„, đŠ). We use the same notation toinvoke functions on đ-tuples of objects for every đ.
This convention of identifying an object with its representation asa string is one that we humans follow all the time. For example, whenpeople say a statement such as â17 is a prime numberâ, what theyreally mean is that the integer whose decimal representation is thestring â17â, is prime.
When we sayđŽ is an algorithm that computes the multiplication function on natural num-bers.what we really mean is thatđŽ is an algorithm that computes the function đč ⶠ0, 1â â 0, 1â such thatfor every pair đ, đ â â, if đ„ â 0, 1â is a string representing the pair (đ, đ)then đč(đ„) will be a string representing their product đ â đ.
computation and representation 111
2.6 DEFINING COMPUTATIONAL TASKS AS MATHEMATICAL FUNC-TIONS
Abstractly, a computational process is some process that takes an inputwhich is a string of bits and produces an output which is a stringof bits. This transformation of input to output can be done using amodern computer, a person following instructions, the evolution ofsome natural system, or any other means.
In future chapters, we will turn to mathematically defining com-putational processes, but, as we discussed above, at the moment wefocus on computational tasks. That is, we focus on the specification andnot the implementation. Again, at an abstract level, a computationaltask can specify any relation that the output needs to have with the in-put. However, for most of this book, we will focus on the simplest andmost common task of computing a function. Here are some examples:
âą Given (a representation of) two integers đ„, đŠ, compute the productđ„ Ă đŠ. Using our representation above, this corresponds to com-puting a function from 0, 1â to 0, 1â. We have seen that there ismore than one way to solve this computational task, and in fact, westill do not know the best algorithm for this problem.
âą Given (a representation of) an integer đ§ > 1, compute its factoriza-tion; i.e., the list of primes đ1 †⯠†đđ such that đ§ = đ1 ⯠đđ. Thisagain corresponds to computing a function from 0, 1â to 0, 1â.The gaps in our knowledge of the complexity of this problem areeven larger.
âą Given (a representation of) a graph đș and two vertices đ and đĄ,compute the length of the shortest path in đș between đ and đĄ, or dothe same for the longest path (with no repeated vertices) betweenđ and đĄ. Both these tasks correspond to computing a function from0, 1â to 0, 1â, though it turns out that there is a vast difference intheir computational difficulty.
âą Given the code of a Python program, determine whether there is aninput that would force it into an infinite loop. This task correspondsto computing a partial function from 0, 1â to 0, 1 since not everystring corresponds to a syntactically valid Python program. We willsee that we do understand the computational status of this problem,but the answer is quite surprising.
âą Given (a representation of) an image đŒ , decide if đŒ is a photo of acat or a dog. This corresponds to computing some (partial) func-tion from 0, 1â to 0, 1.
112 introduction to theoretical computer science
Figure 2.12: A subset đż â 0, 1â can be identifiedwith the function đč ⶠ0, 1â â 0, 1 such thatđč(đ„) = 1 if đ„ â đż and đč(đ„) = 0 if đ„ â đż. Functionswith a single bit of output are called Boolean functions,while subsets of strings are called languages. Theabove shows that the two are essentially the sameobject, and we can identify the task of decidingmembership in đż (known as deciding a language in theliterature) with the task of computing the function đč .
RRemark 2.23 â Boolean functions and languages. Animportant special case of computational tasks corre-sponds to computing Boolean functions, whose outputis a single bit 0, 1. Computing such functions corre-sponds to answering a YES/NO question, and hencethis task is also known as a decision problem. Given anyfunction đč ⶠ0, 1â â 0, 1 and đ„ â 0, 1â, the taskof computing đč(đ„) corresponds to the task of decidingwhether or not đ„ â đż where đż = đ„ ⶠđč(đ„) = 1 isknown as the language that corresponds to the functionđč . (The language terminology is due to historicalconnections between the theory of computation andformal linguistics as developed by Noam Chomsky.)Hence many texts refer to such a computational taskas deciding a language.
For every particular function đč , there can be several possible algo-rithms to compute đč . We will be interested in questions such as:
âą For a given function đč , can it be the case that there is no algorithm tocompute đč ?
âą If there is an algorithm, what is the best one? Could it be that đč isâeffectively uncomputableâ in the sense that every algorithm forcomputing đč requires a prohibitively large amount of resources?
âą If we cannot answer this question, can we show equivalence be-tween different functions đč and đč âČ in the sense that either they areboth easy (i.e., have fast algorithms) or they are both hard?
âą Can a function being hard to compute ever be a good thing? Can weuse it for applications in areas such as cryptography?
In order to do that, we will need to mathematically define the no-tion of an algorithm, which is what we will do in Chapter 3.
2.6.1 Distinguish functions from programs!You should always watch out for potential confusions between speci-fications and implementations or equivalently between mathematicalfunctions and algorithms/programs. It does not help that program-ming languages (Python included) use the term âfunctionsâ to denote(parts of) programs. This confusion also stems from thousands of yearsof mathematical history, where people typically defined functions bymeans of a way to compute them.
For example, consider the multiplication function on natural num-bers. This is the function MULT ⶠâ Ă â â â that maps a pair (đ„, đŠ)of natural numbers to the number đ„ â đŠ. As we mentioned, it can beimplemented in more than one way:
computation and representation 113
Figure 2.13: A function is a mapping of inputs tooutputs. A program is a set of instructions on howto obtain an output given an input. A programcomputes a function, but it is not the same as a func-tion, popular programming language terminologynotwithstanding.
def mult1(x,y):res = 0while y>0:
res += xy -= 1
return res
def mult2(x,y):a = str(x) # represent x as string in decimal notationb = str(y) # represent y as string in decimal notationres = 0for i in range(len(a)):
for j in range(len(b)):res += int(a[len(a)-i])*int(b[len(b)-
j])*(10**(i+j))return res
print(mult1(12,7))# 84print(mult2(12,7))# 84
Both mult1 and mult2 produce the same output given the samepair of natural number inputs. (Though mult1 will take far longer todo so when the numbers become large.) Hence, even though these aretwo different programs, they compute the same mathematical function.This distinction between a program or algorithm đŽ, and the function đčthat đŽ computes will be absolutely crucial for us in this course (see alsoFig. 2.13).
Big Idea 2 A function is not the same as a program. A programcomputes a function.
Distinguishing functions from programs (or other ways for comput-ing, including circuits and machines) is a crucial theme for this course.For this reason, this is often a running theme in questions that I (andmany other instructors) assign in homework and exams (hint, hint).
RRemark 2.24 â Computation beyond functions (advanced,optional). Functions capture quite a lot of compu-tational tasks, but one can consider more generalsettings as well. For starters, we can and will talkabout partial functions, that are not defined on allinputs. When computing a partial function, we only
114 introduction to theoretical computer science
need to worry about the inputs on which the functionis defined. Another way to say it is that we can designan algorithm for a partial function đč under the as-sumption that someone âpromisedâ us that all inputsđ„ would be such that đč(đ„) is defined (as otherwise,we do not care about the result). Hence such tasks arealso known as promise problems.Another generalization is to consider relations that mayhave more than one possible admissible output. Forexample, consider the task of finding any solution fora given set of equations. A relation đ maps a stringđ„ â 0, 1â into a set of strings đ (đ„) (for example, đ„might describe a set of equations, in which case đ (đ„)would correspond to the set of all solutions to đ„). Wecan also identify a relation đ with the set of pairs ofstrings (đ„, đŠ) where đŠ â đ (đ„). A computational pro-cess solves a relation if for every đ„ â 0, 1â, it outputssome string đŠ â đ (đ„).Later in this book, we will consider even more generaltasks, including interactive tasks, such as finding agood strategy in a game, tasks defined using proba-bilistic notions, and others. However, for much of thisbook, we will focus on the task of computing a func-tion, and often even a Boolean function, that has only asingle bit of output. It turns out that a great deal of thetheory of computation can be studied in the context ofthis task, and the insights learned are applicable in themore general settings.
Chapter Recap
âą We can represent objects we want to compute onusing binary strings.
âą A representation scheme for a set of objects đȘ is aone-to-one map from đȘ to 0, 1â.
âą We can use prefix-free encoding to âboostâ a repre-sentation for a set đȘ into representations of lists ofelements in đȘ.
âą A basic computational task is the task of computinga function đč ⶠ0, 1â â 0, 1â. This task encom-passes not just arithmetical computations suchas multiplication, factoring, etc. but a great manyother tasks arising in areas as diverse as scientificcomputing, artificial intelligence, image processing,data mining and many many more.
âą We will study the question of finding (or at leastgiving bounds on) what the best algorithm forcomputing đč for various interesting functions đč is.
computation and representation 115
2.7 EXERCISES
Exercise 2.1 Which one of these objects can be represented by a binarystring?
a. An integer đ„b. An undirected graph đș.
c. A directed graph đ»d. All of the above.
Exercise 2.2 â Binary representation. a. Prove that the function đđĄđ ⶠâ â0, 1â of the binary representation defined in (2.1) satisfies that forevery đ â â, if đ„ = đđĄđ(đ) then |đ„| = 1 + max(0, âlog2 đâ) andđ„đ = âđ„/2âlog2 đââđâ mod 2.
b. Prove that đđĄđ is a one to one function by coming up with a func-tion đđĄđ ⶠ0, 1â â â such that đđĄđ(đđĄđ(đ)) = đ for everyđ â â.
Exercise 2.3 â More compact than ASCII representation. The ASCII encodingcan be used to encode a string of đ English letters as a 7đ bit binarystring, but in this exercise, we ask about finding a more compact rep-resentation for strings of English lowercase letters.
1. Prove that there exists a representation scheme (đž, đ·) for stringsover the 26-letter alphabet đ, đ, đ, ⊠, đ§ as binary strings suchthat for every đ > 0 and length-đ string đ„ â đ, đ, ⊠, đ§đ, therepresentation đž(đ„) is a binary string of length at most 4.8đ + 1000.In other words, prove that for every đ, there exists a one-to-onefunction đž ⶠđ, đ, ⊠, đ§đ â 0, 1â4.8đ+1000â.
2. Prove that there exists no representation scheme for strings over thealphabet đ, đ, ⊠, đ§ as binary strings such that for every length-đstring đ„ â đ, đ, ⊠, đ§đ, the representation đž(đ„) is a binary string oflength â4.6đ + 1000â. In other words, prove that there exists someđ > 0 such that there is no one-to-one function đž ⶠđ, đ, ⊠, đ§đ â0, 1â4.6đ+1000â.
3. Pythonâs bz2.compress function is a mapping from strings tostrings, which uses the lossless (and hence one to one) bzip2 algo-rithm for compression. After converting to lowercase, and truncat-ing spaces and numbers, the text of Tolstoyâs âWar and Peaceâ con-tains đ = 2, 517, 262. Yet, if we run bz2.compress on the string ofthe text of âWar and Peaceâ we get a string of length đ = 6, 274, 768
116 introduction to theoretical computer science
5 Actually that particular fictional company uses ametric that focuses more on compression speed thenratio, see here and here.
bits, which is only 2.49đ (and in particular much smaller than4.6đ). Explain why this does not contradict your answer to theprevious question.
4. Interestingly, if we try to apply bz2.compress on a random string,we get much worse performance. In my experiments, I got a ratioof about 4.78 between the number of bits in the output and thenumber of characters in the input. However, one could imagine thatone could do better and that there exists a company called âPiedPiperâ with an algorithm that can losslessly compress a string of đrandom lowercase letters to fewer than 4.6đ bits.5 Show that thisis not the case by proving that for every đ > 100 and one to onefunction đžđđđđđ ⶠđ, ⊠, đ§đ â 0, 1â, if we let đ be the randomvariable |đžđđđđđ(đ„)| (i.e., the length of đžđđđđđ(đ„)) for đ„ chosenuniformly at random from the set đ, ⊠, đ§đ, then the expectedvalue of đ is at least 4.6đ.
Exercise 2.4 â Representing graphs: upper bound. Show that there is a stringrepresentation of directed graphs with vertex set [đ] and degree atmost 10 that uses at most 1000đ logđ bits. More formally, show thefollowing: Suppose we define for every đ â â, the set đșđ as the setcontaining all directed graphs (with no self loops) over the vertexset [đ] where every vertex has degree at most 10. Then, prove that forevery sufficiently large đ, there exists a one-to-one function đž ⶠđșđ â0, 1â1000đ logđâ.
Exercise 2.5 â Representing graphs: lower bound. 1. Define đđ to be theset of one-to-one and onto functions mapping [đ] to [đ]. Prove thatthere is a one-to-one mapping from đđ to đș2đ, where đș2đ is the setdefined in Exercise 2.4 above.
2. In this question you will show that one cannot improve the rep-resentation of Exercise 2.4 to length đ(đ logđ). Specifically, provefor every sufficiently large đ â â there is no one-to-one functionđž ⶠđșđ â 0, 1â0.001đ logđâ+1000.
Exercise 2.6 â Multiplying in different representation. Recall that the grade-school algorithm for multiplying two numbers requires đ(đ2) oper-ations. Suppose that instead of using decimal representation, we useone of the following representations đ (đ„) to represent a number đ„between 0 and 10đ â 1. For which one of these representations you canstill multiply the numbers in đ(đ2) operations?
computation and representation 117
a. The standard binary representation: đ”(đ„) = (đ„0, ⊠, đ„đ) wheređ„ = âđđ=0 đ„đ2đ and đ is the largest number s.t. đ„ â„ 2đ.b. The reverse binary representation: đ”(đ„) = (đ„đ, ⊠, đ„0) where đ„đ is
defined as above for đ = 0, ⊠, đ â 1.c. Binary coded decimal representation: đ”(đ„) = (đŠ0, ⊠, đŠđâ1) wheređŠđ â 0, 14 represents the đđĄâ decimal digit of đ„ mapping 0 to 0000,1 to 0001, 2 to 0010, etc. (i.e. 9 maps to 1001)d. All of the above.
Exercise 2.7 Suppose that đ ⶠâ â 0, 1â corresponds to representing anumber đ„ as a string of đ„ 1âs, (e.g., đ (4) = 1111, đ (7) = 1111111, etc.).If đ„, đŠ are numbers between 0 and 10đ â 1, can we still multiply đ„ andđŠ using đ(đ2) operations if we are given them in the representationđ (â )?
Exercise 2.8 Recall that if đč is a one-to-one and onto function mappingelements of a finite set đ into a finite set đ then the sizes of đ and đare the same. Let đ” ⶠâ â 0, 1â be the function such that for everyđ„ â â, đ”(đ„) is the binary representation of đ„.1. Prove that đ„ < 2đ if and only if |đ”(đ„)| †đ.2. Use 1. to compute the size of the set đŠ â 0, 1â ⶠ|đŠ| †đ where |đŠ|
denotes the length of the string đŠ.3. Use 1. and 2. to prove that 2đ â 1 = 1 + 2 + 4 + ⯠+ 2đâ1.
Exercise 2.9 â Prefix-free encoding of tuples. Suppose that đč ⶠâ â 0, 1âis a one-to-one function that is prefix-free in the sense that there is nođ â đ s.t. đč(đ) is a prefix of đč(đ).a. Prove that đč2 ⶠâ Ă â â 0, 1â, defined as đč2(đ, đ) = đč(đ)đč(đ) (i.e.,
the concatenation of đč(đ) and đč(đ)) is a one-to-one function.
b. Prove that đčâ ⶠââ â 0, 1â defined as đčâ(đ1, ⊠, đđ) =đč(đ1) ⯠đč(đđ) is a one-to-one function, where ââ denotes the set ofall finite-length lists of natural numbers.
Exercise 2.10 â More efficient prefix-free transformation. Suppose that đč â¶đ â 0, 1â is some (not necessarily prefix-free) representation of theobjects in the set đ, and đș ⶠâ â 0, 1â is a prefix-free representa-tion of the natural numbers. Define đč âČ(đ) = đș(|đč(đ)|)đč(đ) (i.e., theconcatenation of the representation of the length đč(đ) and đč(đ)).
118 introduction to theoretical computer science
6 Hint: Think recursively how to represent the lengthof the string.
a. Prove that đč âČ is a prefix-free representation of đ.
b. Show that we can transform any representation to a prefix-free oneby a modification that takes a đ bit string into a string of length atmost đ + đ(log đ).
c. Show that we can transform any representation to a prefix-free oneby a modification that takes a đ bit string into a string of length atmost đ + log đ + đ(log log đ).6
Exercise 2.11 â Kraftâs Inequality. Suppose that đ â 0, 1đ is some finiteprefix-free set.
a. For every đ †đ and length-đ string đ„ â đ, let đż(đ„) â 0, 1đ denoteall the length-đ strings whose first đ bits are đ„0, ⊠, đ„đâ1. Prove that(1) |đż(đ„)| = 2đâ|đ„| and (2) If đ„ â đ„âČ then đż(đ„) is disjoint fromđż(đ„âČ).
b. Prove that âđ„âđ 2â|đ„| †1.c. Prove that there is no prefix-free encoding of strings with less than
logarithmic overhead. That is, prove that there is no function PF â¶0, 1â â 0, 1â s.t. |PF(đ„)| †|đ„| + 0.9 log |đ„| for every đ„ â 0, 1âand such that the set PF(đ„) ⶠđ„ â 0, 1â is prefix-free. The factor0.9 is arbitrary; all that matters is that it is less than 1.
Exercise 2.12 â Composition of one-to-one functions. Prove that for everytwo one-to-one functions đč ⶠđ â đ and đș ⶠđ â đ , the functionđ» ⶠđ â đ defined as đ»(đ„) = đș(đč(đ„)) is one to one.
Exercise 2.13 â Natural numbers and strings. 1. We have shown thatthe natural numbers can be represented as strings. Prove thatthe other direction holds as well: that there is a one-to-one mapđđĄđ ⶠ0, 1â â â. (đđĄđ stands for âstrings to numbers.â)
2. Recall that Cantor proved that there is no one-to-one map đ đĄđ â¶â â â. Show that Cantorâs result implies Theorem 2.5.
Exercise 2.14 â Map lists of integers to a number. Recall that for every setđ, the set đâ is defined as the set of all finite sequences of mem-bers of đ (i.e., đâ = (đ„0, ⊠, đ„đâ1) | đ â â , âđâ[đ]đ„đ â đ ).Prove that there is a one-one-map from â€â to â where †is the set of⊠, â3, â2, â1, 0, +1, +2, +3, ⊠of all integers.
computation and representation 119
2.8 BIBLIOGRAPHICAL NOTES
The study of representing data as strings, including issues such ascompression and error corrections falls under the purview of informationtheory, as covered in the classic textbook of Cover and Thomas [CT06].Representations are also studied in the field of data structures design, ascovered in texts such as [Cor+09].
The question of whether to represent integers with the most signif-icant digit first or last is known as Big Endian vs. Little Endian repre-sentation. This terminology comes from Cohenâs [Coh81] entertainingand informative paper about the conflict between adherents of bothschools which he compared to the warring tribes in Jonathan SwiftâsâGulliverâs Travelsâ. The twoâs complement representation of signedintegers was suggested in von Neumannâs classic report [Neu45]that detailed the design approaches for a stored-program computer,though similar representations have been used even earlier in abacusand other mechanical computation devices.
The idea that we should separate the definition or specification ofa function from its implementation or computation might seem âobvi-ous,â but it took quite a lot of time for mathematicians to arrive at thisviewpoint. Historically, a function đč was identified by rules or formu-las showing how to derive the output from the input. As we discussin greater depth in Chapter 9, in the 1800s this somewhat informalnotion of a function started âbreaking at the seams,â and eventuallymathematicians arrived at the more rigorous definition of a functionas an arbitrary assignment of input to outputs. While many functionsmay be described (or computed) by one or more formulas, today wedo not consider that to be an essential property of functions, and alsoallow functions that do not correspond to any âniceâ formula.
We have mentioned that all representations of the real numbersare inherently approximate. Thus an important endeavor is to under-stand what guarantees we can offer on the approximation quality ofthe output of an algorithm, as a function of the approximation qualityof the inputs. This question is known as the question of determiningthe numerical stability of given equations. The Floating Points Guidewebsite contains an extensive description of the floating point repre-sentation, as well the many ways in which it could subtly fail, see alsothe website 0.30000000000000004.com.
Dauben [Dau90] gives a biography of Cantor with emphasis onthe development of his mathematical ideas. [Hal60] is a classic text-book on set theory, including also Cantorâs theorem. Cantorâs Theo-rem is also covered in many texts on discrete mathematics, including[LLM18; LZ19].
120 introduction to theoretical computer science
The adjacency matrix representation of graphs is not merely a con-venient way to map a graph into a binary string, but it turns out thatmany natural notions and operations on matrices are useful for graphsas well. (For example, Googleâs PageRank algorithm relies on thisviewpoint.) The notes of Spielmanâs course are an excellent source forthis area, known as spectral graph theory. We will return to this viewmuch later in this book when we talk about random walks.
IFINITE COMPUTATION
Figure 3.1: Calculating wheels by Charles Babbage.Image taken from the Mark I âoperating manualâ
Figure 3.2: A 1944 Popular Mechanics article on theHarvard Mark I computer.
3Defining computation
âthere is no reason why mental as well as bodily labor should not be economizedby the aid of machineryâ, Charles Babbage, 1852
âIf, unwarned by my example, any man shall undertake and shall succeedin constructing an engine embodying in itself the whole of the executive de-partment of mathematical analysis upon different principles or by simplermechanical means, I have no fear of leaving my reputation in his charge, for healone will be fully able to appreciate the nature of my efforts and the value oftheir results.â, Charles Babbage, 1864
âTo understand a program you must become both the machine and the pro-gram.â, Alan Perlis, 1982
People have been computing for thousands of years, with aidsthat include not just pen and paper, but also abacus, slide rules, vari-ous mechanical devices, and modern electronic computers. A priori,the notion of computation seems to be tied to the particular mech-anism that you use. You might think that the âbestâ algorithm formultiplying numbers will differ if you implement it in Python on amodern laptop than if you use pen and paper. However, as we sawin the introduction (Chapter 0), an algorithm that is asymptoticallybetter would eventually beat a worse one regardless of the underly-ing technology. This gives us hope for a technology independent wayof defining computation. This is what we do in this chapter. We willdefine the notion of computing an output from an input by applying asequence of basic operations (see Fig. 3.3). Using this, we will be ableto precisely define statements such as âfunction đ can be computedby model đâ or âfunction đ can be computed by model đ using đ operationsâ.
This chapter: A non-mathy overviewThe main takeaways from this chapter are:
Compiled on 8.26.2020 18:10
Learning Objectives:âą See that computation can be precisely
modeled.âą Learn the computational model of Boolean
circuits / straight-line programs.âą Equivalence of circuits and straight-line
programs.âą Equivalence of AND/OR/NOT and NAND.âą Examples of computing in the physical world.
124 introduction to theoretical computer science
Figure 3.3: A function mapping strings to stringsspecifies a computational task, i.e., describes what thedesired relation between the input and the outputis. In this chapter we define models for implementingcomputational processes that achieve the desiredrelation, i.e., describe how to compute the outputfrom the input. We will see several examples of suchmodels using both Boolean circuits and straight-lineprogramming languages.
âą We can use logical operations such as AND, OR, and NOT tocompute an output from an input (see Section 3.2).
âą A Boolean circuit is a way to compose the basic logicaloperations to compute a more complex function (see Sec-tion 3.3). We can think of Boolean circuits as both a mathe-matical model (which is based on directed acyclic graphs)as well as physical devices we can construct in the realworld in a variety of ways, including not just silicon-basedsemi-conductors but also mechanical and even biologicalmechanisms (see Section 3.4).
âą We can describe Boolean circuits also as straight-line pro-grams, which are programs that do not have any loopingconstructs (i.e., no while / for/ do .. until etc.), seeSection 3.3.2.
âą It is possible to implement the AND, OR, and NOT oper-ations using the NAND operation (as well as vice versa).This means that circuits with AND/OR/NOT gates cancompute the same functions (i.e., are equivalent in power)to circuits with NAND gates, and we can use either modelto describe computation based on our convenience, seeSection 3.5. To give out a âspoilerâ, we will see in Chap-ter 4 that such circuits can compute all finite functions.
One âbig ideaâ of this chapter is the notion of equivalencebetween models (Big Idea 3). Two computational modelsare equivalent if they can compute the same set of functions.
defining computation 125
Figure 3.4: Text pages from Algebra manuscript withgeometrical solutions to two quadratic equations.Shelfmark: MS. Huntington 214 fol. 004v-005r
Figure 3.5: An explanation for children of the two digitaddition algorithm
Boolean circuits with AND/OR/NOT gates are equivalent tocircuits with NAND gates, but this is just one example of themore general phenomenon that we will see many times inthis book.
3.1 DEFINING COMPUTATION
The name âalgorithmâ is derived from the Latin transliteration ofMuhammad ibn Musa al-Khwarizmiâs name. Al-Khwarizmi was aPersian scholar during the 9th century whose books introduced thewestern world to the decimal positional numeral system, as well as tothe solutions of linear and quadratic equations (see Fig. 3.4). HoweverAl-Khwarizmiâs descriptions of algorithms were rather informal bytodayâs standards. Rather than use âvariablesâ such as đ„, đŠ, he usedconcrete numbers such as 10 and 39, and trusted the reader to beable to extrapolate from these examples, much as algorithms are stilltaught to children today.
Here is how Al-Khwarizmi described the algorithm for solving anequation of the form đ„2 + đđ„ = đ:
[How to solve an equation of the form ] âroots and squares are equal to num-bersâ: For instance âone square , and ten roots of the same, amount to thirty-nine dirhemsâ that is to say, what must be the square which, when increasedby ten of its own root, amounts to thirty-nine? The solution is this: you halvethe number of the roots, which in the present instance yields five. This youmultiply by itself; the product is twenty-five. Add this to thirty-nineâ the sumis sixty-four. Now take the root of this, which is eight, and subtract from it halfthe number of roots, which is five; the remainder is three. This is the root of thesquare which you sought for; the square itself is nine.
For the purposes of this book, we will need a much more preciseway to describe algorithms. Fortunately (or is it unfortunately?), atleast at the moment, computers lag far behind school-age childrenin learning from examples. Hence in the 20th century, people cameup with exact formalisms for describing algorithms, namely program-ming languages. Here is al-Khwarizmiâs quadratic equation solvingalgorithm described in the Python programming language:
from math import sqrt#Pythonspeak to enable use of the sqrt function to compute
square roots.def solve_eq(b,c):
# return solution of x^2 + bx = c following AlKhwarizmi's instructions
# Al Kwarizmi demonstrates this for the case b=10 andc= 39
126 introduction to theoretical computer science
val1 = b / 2.0 # "halve the number of the roots"val2 = val1 * val1 # "this you multiply by itself"val3 = val2 + c # "Add this to thirty-nine"val4 = sqrt(val3) # "take the root of this"val5 = val4 - val1 # "subtract from it half the number
of roots"return val5 # "This is the root of the square which
you sought for"# Test: solve x^2 + 10*x = 39print(solve_eq(10,39))# 3.0
We can define algorithms informally as follows:
Informal definition of an algorithm: An algorithm is a set of instruc-tions for how to compute an output from an input by following a se-quence of âelementary stepsâ.An algorithm đŽ computes a function đč if for every input đ„, if we followthe instructions of đŽ on the input đ„, we obtain the output đč(đ„).In this chapter we will make this informal definition precise using
the model of Boolean Circuits. We will show that Boolean Circuitsare equivalent in power to straight line programs that are written inâultra simpleâ programming languages that do not even have loops.We will also see that the particular choice of elementary operations isimmaterial and many different choices yield models with equivalentpower (see Fig. 3.6). However, it will take us some time to get there.We will start by discussing what are âelementary operationsâ and howwe map a description of an algorithm into an actual physical processthat produces an output from an input in the real world.
Figure 3.6: An overview of the computational modelsdefined in this chapter. We will show several equiv-alent ways to represent a recipe for performing afinite computation. Specifically we will show that wecan model such a computation using either a Booleancircuit or a straight line program, and these two repre-sentations are equivalent to one another. We will alsoshow that we can choose as our basic operations ei-ther the set AND,OR,NOT or the set NAND andthese two choices are equivalent in power. By makingthe choice of whether to use circuits or programs,and whether to use AND,OR,NOT or NAND weobtain four equivalent ways of modeling finite com-putation. Moreover, there are many other choices ofsets of basic operations that are equivalent in power.
defining computation 127
3.2 COMPUTING USING AND, OR, AND NOT.
An algorithm breaks down a complex calculation into a series of sim-pler steps. These steps can be executed in a variety of different ways,including:
âą Writing down symbols on a piece of paper.
âą Modifying the current flowing on electrical wires.
âą Binding a protein to a strand of DNA.
âą Responding to a stimulus by a member of a collection (e.g., a bee ina colony, a trader in a market).
To formally define algorithms, let us try to âerr on the side of sim-plicityâ and model our âbasic stepsâ as truly minimal. For example,here are some very simple functions:
âą OR ⶠ0, 12 â 0, 1 defined as
OR(đ, đ) = â§âšâ©0 đ = đ = 01 otherwise(3.1)
âą AND ⶠ0, 12 â 0, 1 defined as
AND(đ, đ) = â§âšâ©1 đ = đ = 10 otherwise(3.2)
âą NOT ⶠ0, 1 â 0, 1 defined as
NOT(đ) = â§âšâ©0 đ = 11 đ = 0 (3.3)
The functions AND, OR and NOT, are the basic logical operatorsused in logic and many computer systems. In the context of logic, it iscommon to use the notation đ ⧠đ for AND(đ, đ), đ âš đ for OR(đ, đ) andđ and ÂŹđ for NOT(đ), and we will use this notation as well.
Each one of the functions AND,OR,NOT takes either one or twosingle bits as input, and produces a single bit as output. Clearly, itcannot get much more basic than that. However, the power of compu-tation comes from composing such simple building blocks together.
128 introduction to theoretical computer science
Example 3.1 â Majority from đŽđđ·,đđ and đđđ . Consider the func-tion MAJ ⶠ0, 13 â 0, 1 that is defined as follows:
MAJ(đ„) = â§âšâ©1 đ„0 + đ„1 + đ„2 â„ 20 otherwise. (3.4)
That is, for every đ„ â 0, 13, MAJ(đ„) = 1 if and only if the ma-jority (i.e., at least two out of the three) of đ„âs elements are equalto 1. Can you come up with a formula involving AND, OR andNOT to compute MAJ? (It would be useful for you to pause at thispoint and work out the formula for yourself. As a hint, althoughthe NOT operator is needed to compute some functions, you willnot need to use it to compute MAJ.)
Let us first try to rephrase MAJ(đ„) in words: âMAJ(đ„) = 1 if andonly if there exists some pair of distinct elements đ, đ such that bothđ„đ and đ„đ are equal to 1.â In other words it means that MAJ(đ„) = 1iff either both đ„0 = 1 and đ„1 = 1, or both đ„1 = 1 and đ„2 = 1, or bothđ„0 = 1 and đ„2 = 1. Since the OR of three conditions đ0, đ1, đ2 canbe written as OR(đ0,OR(đ1, đ2)), we can now translate this into aformula as follows:
MAJ(đ„0, đ„1, đ„2) = OR (AND(đ„0, đ„1) , OR(AND(đ„1, đ„2) , AND(đ„0, đ„2)) ) .(3.5)
Recall that we can also write đ âš đ for OR(đ, đ) and đ ⧠đ forAND(đ, đ). With this notation, (3.5) can also be written as
MAJ(đ„0, đ„1, đ„2) = ((đ„0 ⧠đ„1) âš (đ„1 ⧠đ„2)) âš (đ„0 ⧠đ„2) . (3.6)
We can also write (3.5) in a âprogramming languageâ form,expressing it as a set of instructions for computing MAJ given thebasic operations AND,OR,NOT:
def MAJ(X[0],X[1],X[2]):firstpair = AND(X[0],X[1])secondpair = AND(X[1],X[2])thirdpair = AND(X[0],X[2])temp = OR(secondpair,thirdpair)return OR(firstpair,temp)
3.2.1 Some properties of AND and ORLike standard addition and multiplication, the functions AND and ORsatisfy the properties of commutativity: đ âš đ = đ âš đ and đ ⧠đ = đ ⧠đand associativity: (đâšđ)âšđ = đâš(đâšđ) and (đâ§đ)â§đ = đâ§(đâ§đ). As in
defining computation 129
the case of addition and multiplication, we often drop the parenthesisand write đâšđâšđ âšđ for ((đâšđ)âšđ)âšđ, and similarly ORâs and ANDâsof more terms. They also satisfy a variant of the distributive law:Solved Exercise 3.1 â Distributive law for AND and OR. Prove that for everyđ, đ, đ â 0, 1, đ ⧠(đ âš đ) = (đ ⧠đ) âš (đ ⧠đ).
Solution:
We can prove this by enumerating over all the 8 possible valuesfor đ, đ, đ â 0, 1 but it also follows from the standard distributivelaw. Suppose that we identify any positive integer with âtrueâ andthe value zero with âfalseâ. Then for every numbers đą, đŁ â â, đą + đŁis positive if and only if đą âš đŁ is true and đą â đŁ is positive if and onlyif đą ⧠đŁ is true. This means that for every đ, đ, đ â 0, 1, the expres-sion đ ⧠(đ âš đ) is true if and only if đ â (đ + đ) is positive, and theexpression (đ ⧠đ) âš (đ ⧠đ) is true if and only if đ â đ + đ â đ is positive,But by the standard distributive law đ â (đ + đ) = đ â đ + đ â đ andhence the former expression is true if and only if the latter one is.
3.2.2 Extended example: Computing XOR from AND, OR, and NOTLet us see how we can obtain a different function from the samebuilding blocks. Define XOR ⶠ0, 12 â 0, 1 to be the functionXOR(đ, đ) = đ + đ mod 2. That is, XOR(0, 0) = XOR(1, 1) = 0 andXOR(1, 0) = XOR(0, 1) = 1. We claim that we can construct XORusing only AND, OR, and NOT.
PAs usual, it is a good exercise to try to work out thealgorithm for XOR using AND, OR and NOT on yourown before reading further.
The following algorithm computes XOR using AND, OR, and NOT:
Algorithm 3.2 â đđđ from đŽđđ·/đđ /đđđ .
Input: đ, đ â 0, 1.Output: đđđ (đ, đ)1: đ€1 â đŽđđ·(đ, đ)2: đ€2 â đđđ (đ€1)3: đ€3 â đđ (đ, đ)4: return đŽđđ·(đ€2, đ€3)
Lemma 3.3 For every đ, đ â 0, 1, on input đ, đ, Algorithm 3.2 outputsđ + đ mod 2.
130 introduction to theoretical computer science
Proof. For every đ, đ, XOR(đ, đ) = 1 if and only if đ is different fromđ. On input đ, đ â 0, 1, Algorithm 3.2 outputs AND(đ€2, đ€3) wheređ€2 = NOT(AND(đ, đ)) and đ€3 = OR(đ, đ).âą If đ = đ = 0 then đ€3 = OR(đ, đ) = 0 and so the output will be 0.âą If đ = đ = 1 then AND(đ, đ) = 1 and so đ€2 = NOT(AND(đ, đ)) = 0
and the output will be 0.âą If đ = 1 and đ = 0 (or vice versa) then both đ€3 = OR(đ, đ) = 1
and đ€1 = AND(đ, đ) = 0, in which case the algorithm will outputOR(NOT(đ€1), đ€3) = 1.
We can also express Algorithm 3.2 using a programming language.Specifically, the following is a Python program that computes the XORfunction:
def AND(a,b): return a*bdef OR(a,b): return 1-(1-a)*(1-b)def NOT(a): return 1-a
def XOR(a,b):w1 = AND(a,b)w2 = NOT(w1)w3 = OR(a,b)return AND(w2,w3)
# Test out the codeprint([f"XOR(a,b)=XOR(a,b)" for a in [0,1] for b in
[0,1]])# ['XOR(0,0)=0', 'XOR(0,1)=1', 'XOR(1,0)=1', 'XOR(1,1)=0']
Solved Exercise 3.2 â Compute đđđ on three bits of input. Let XOR3 â¶0, 13 â 0, 1 be the function defined as XOR3(đ, đ, đ) = đ + đ + đmod 2. That is, XOR3(đ, đ, đ) = 1 if đ+đ+đ is odd, and XOR3(đ, đ, đ) =0 otherwise. Show that you can compute XOR3 using AND, OR, andNOT. You can express it as a formula, use a programming languagesuch as Python, or use a Boolean circuit.
Solution:
Addition modulo two satisfies the same properties of associativ-ity ((đ + đ) + đ = đ + (đ + đ)) and commutativity (đ + đ = đ + đ) asstandard addition. This means that, if we define đ â đ to equal đ + đ
defining computation 131
mod 2, thenXOR3(đ, đ, đ) = (đ â đ) â đ (3.7)
or in other words
XOR3(đ, đ, đ) = XOR(XOR(đ, đ), đ) . (3.8)
Since we know how to compute XOR using AND, OR, andNOT, we can compose this to compute XOR3 using the same build-ing blocks. In Python this corresponds to the following program:
def XOR3(a,b,c):w1 = AND(a,b)w2 = NOT(w1)w3 = OR(a,b)w4 = AND(w2,w3)w5 = AND(w4,c)w6 = NOT(w5)w7 = OR(w4,c)return AND(w6,w7)
# Let's test this outprint([f"XOR3(a,b,c)=XOR3(a,b,c)" for a in [0,1]
for b in [0,1] for c in [0,1]])# ['XOR3(0,0,0)=0', 'XOR3(0,0,1)=1', 'XOR3(0,1,0)=1',
'XOR3(0,1,1)=0', 'XOR3(1,0,0)=1', 'XOR3(1,0,1)=0','XOR3(1,1,0)=0', 'XOR3(1,1,1)=1']
PTry to generalize the above examples to obtain a wayto compute XORđ ⶠ0, 1đ â 0, 1 for every đ us-ing at most 4đ basic steps involving applications of afunction in AND,OR,NOT to outputs or previouslycomputed values.
3.2.3 Informally defining âbasic operationsâ and âalgorithmsâWe have seen that we can obtain at least some examples of interestingfunctions by composing together applications of AND, OR, and NOT.This suggests that we can use AND, OR, and NOT as our âbasic opera-tionsâ, hence obtaining the following definition of an âalgorithmâ:
Semi-formal definition of an algorithm: An algorithm consists of asequence of steps of the form âcompute a new value by applying AND,OR, or NOT to previously computed valuesâ.
132 introduction to theoretical computer science
An algorithm đŽ computes a function đč if for every input đ„ to đč , if wefeed đ„ as input to the algorithm, the value computed in its last step isđč(đ„).There are several concerns that are raised by this definition:
1. First and foremost, this definition is indeed too informal. We do notspecify exactly what each step does, nor what it means to âfeed đ„ asinputâ.
2. Second, the choice of AND, OR or NOT seems rather arbitrary.Why not XOR and MAJ? Why not allow operations like additionand multiplication? What about any other logical constructionssuch if/then or while?
3. Third, do we even know that this definition has anything to dowith actual computing? If someone gave us a description of such analgorithm, could we use it to actually compute the function in thereal world?
PThese concerns will to a large extent guide us in theupcoming chapters. Thus you would be well advisedto re-read the above informal definition and see whatyou think about these issues.
A large part of this book will be devoted to addressing the aboveissues. We will see that:
1. We can make the definition of an algorithm fully formal, and sogive a precise mathematical meaning to statements such as âAlgo-rithm đŽ computes function đâ.
2. While the choice of AND/OR/NOT is arbitrary, and we could justas well have chosen other functions, we will also see this choicedoes not matter much. We will see that we would obtain the samecomputational power if we instead used addition and multiplica-tion, and essentially every other operation that could be reasonablythought of as a basic step.
3. It turns out that we can and do compute such âAND/OR/NOTbased algorithmsâ in the real world. First of all, such an algorithmis clearly well specified, and so can be executed by a human with apen and paper. Second, there are a variety of ways to mechanize thiscomputation. Weâve already seen that we can write Python codethat corresponds to following such a list of instructions. But in factwe can directly implement operations such as AND, OR, and NOT
defining computation 133
Figure 3.7: Standard symbols for the logical operationsor âgatesâ of AND, OR, NOT, as well as the operationNAND discussed in Section 3.5.
Figure 3.8: A circuit with AND, OR and NOT gates forcomputing the XOR function.
via electronic signals using components known as transistors. This ishow modern electronic computers operate.
In the remainder of this chapter, and the rest of this book, we willbegin to answer some of these questions. We will see more examplesof the power of simple operations to compute more complex opera-tions including addition, multiplication, sorting and more. We willalso discuss how to physically implement simple operations such asAND, OR and NOT using a variety of technologies.
3.3 BOOLEAN CIRCUITS
Boolean circuits provide a precise notion of âcomposing basic opera-tions togetherâ. A Boolean circuit (see Fig. 3.9) is composed of gatesand inputs that are connected by wires. The wires carry a signal thatrepresents either the value 0 or 1. Each gate corresponds to either theOR, AND, or NOT operation. An OR gate has two incoming wires,and one or more outgoing wires. If these two incoming wires carrythe signals đ and đ (for đ, đ â 0, 1), then the signal on the outgoingwires will be OR(đ, đ). AND and NOT gates are defined similarly. Theinputs have only outgoing wires. If we set a certain input to a valueđ â 0, 1, then this value is propagated on all the wires outgoingfrom it. We also designate some gates as output gates, and their valuecorresponds to the result of evaluating the circuit. For example, ??gives such a circuit for the XOR function, following Section 3.2.2. Weevaluate an đ-input Boolean circuit đ¶ on an input đ„ â 0, 1đ by plac-ing the bits of đ„ on the inputs, and then propagating the values on thewires until we reach an output, see Fig. 3.9.
RRemark 3.4 â Physical realization of Boolean circuits.Boolean circuits are a mathematical model that does notnecessarily correspond to a physical object, but theycan be implemented physically. In physical imple-mentation of circuits, the signal is often implementedby electric potential, or voltage, on a wire, where forexample voltage above a certain level is interpretedas a logical value of 1, and below a certain level isinterpreted as a logical value of 0. Section 3.4 dis-cusses physical implementation of Boolean circuits(with examples including using electrical signals suchas in silicon-based circuits, but also biological andmechanical implementations as well).
Solved Exercise 3.3 â All equal function. Define ALLEQ ⶠ0, 14 â 0, 1to be the function that on input đ„ â 0, 14 outputs 1 if and only ifđ„0 = đ„1 = đ„2 = đ„3. Give a Boolean circuit for computing ALLEQ.
134 introduction to theoretical computer science
Figure 3.9: A Boolean Circuit consists of gates that areconnected by wires to one another and the inputs. Theleft side depicts a circuit with 2 inputs and 5 gates,one of which is designated the output gate. The rightside depicts the evaluation of this circuit on the inputđ„ â 0, 12 with đ„0 = 1 and đ„1 = 0. The value ofevery gate is obtained by applying the correspondingfunction (AND, OR, or NOT) to values on the wire(s)that enter it. The output of the circuit on a giveninput is the value of the output gate(s). In this case,the circuit computes the XOR function and hence itoutputs 1 on the input 10.
Figure 3.10: A Boolean circuit for computing the allequal function ALLEQ ⶠ0, 14 â 0, 1 that outputs1 on đ„ â 0, 14 if and only if đ„0 = đ„1 = đ„2 = đ„3.
Solution:
Another way to describe the function ALLEQ is that it outputs1 on an input đ„ â 0, 14 if and only if đ„ = 04 or đ„ = 14. We canphrase the condition đ„ = 14 as đ„0 ⧠đ„1 ⧠đ„2 ⧠đ„3 which can becomputed using three AND gates. Similarly we can phrase the con-dition đ„ = 04 as đ„0 ⧠đ„1 ⧠đ„2 ⧠đ„3 which can be computed using fourNOT gates and three AND gates. The output of ALLEQ is the ORof these two conditions, which results in the circuit of 4 NOT gates,6 AND gates, and one OR gate presented in Fig. 3.10.
3.3.1 Boolean circuits: a formal definitionWe defined Boolean circuits informally as obtained by connectingAND, OR, and NOT gates via wires so as to produce an output froman input. However, to be able to prove theorems about the existence ornon-existence of Boolean circuits for computing various functions weneed to:
1. Formally define a Boolean circuit as a mathematical object.
2. Formally define what it means for a circuit đ¶ to compute a functionđ .We now proceed to do so. We will define a Boolean circuit as a
labeled Directed Acyclic Graph (DAG). The vertices of the graph corre-spond to the gates and inputs of the circuit, and the edges of the graphcorrespond to the wires. A wire from an input or gate đą to a gate đŁ inthe circuit corresponds to a directed edge between the correspondingvertices. The inputs are vertices with no incoming edges, while eachgate has the appropriate number of incoming edges based on the func-tion it computes. (That is, AND and OR gates have two in-neighbors,while NOT gates have one in-neighbor.) The formal definition is asfollows (see also Fig. 3.11):
defining computation 135
Figure 3.11: A Boolean Circuit is a labeled directedacyclic graph (DAG). It has đ input vertices, which aremarked with X[0],âŠ, X[đ â 1] and have no incomingedges, and the rest of the vertices are gates. AND,OR, and NOT gates have two, two, and one incomingedges, respectively. If the circuit has đ outputs, thenđ of the gates are known as outputs and are markedwith Y[0],âŠ,Y[đ â 1]. When we evaluate a circuitđ¶ on an input đ„ â 0, 1đ, we start by setting thevalue of the input vertices to đ„0, ⊠, đ„đâ1 and thenpropagate the values, assigning to each gate đ theresult of applying the operation of đ to the values ofđâs in-neighbors. The output of the circuit is the valueassigned to the output gates.
1 Having parallel edges means an AND or OR gateđą can have both its in-neighbors be the same gateđŁ. Since AND(đ, đ) = OR(đ, đ) = đ for every đ â0, 1, such parallel edges donât help in computingnew values in circuits with AND/OR/NOT gates.However, we will see circuits with more general setsof gates later on.
Definition 3.5 â Boolean Circuits. Let đ, đ, đ be positive integers withđ â„ đ. A Boolean circuit with đ inputs, đ outputs, and đ gates, is alabeled directed acyclic graph (DAG) đș = (đ , đž) with đ +đ verticessatisfying the following properties:
âą Exactly đ of the vertices have no in-neighbors. These verticesare known as inputs and are labeled with the đ labels X[0], âŠ,X[đ â 1]. Each input has at least one out-neighbor.
âą The other đ vertices are known as gates. Each gate is labeled withâ§, âš or ÂŹ. Gates labeled with ⧠(AND) or âš (OR) have two in-neighbors. Gates labeled with ÂŹ (NOT) have one in-neighbor.We will allow parallel edges. 1
âą Exactly đ of the gates are also labeled with the đ labels Y[0], âŠ,Y[đ â 1] (in addition to their label â§/âš/ÂŹ). These are known asoutputs.
The size of a Boolean circuit is the number of gates it contains.
PThis is a non-trivial mathematical definition, so it isworth taking the time to read it slowly and carefully.As in all mathematical definitions, we are using aknown mathematical object â a directed acyclic graph(DAG) â to define a new object, a Boolean circuit.This might be a good time to review some of the basic
136 introduction to theoretical computer science
properties of DAGs and in particular the fact that theycan be topologically sorted, see Section 1.6.
If đ¶ is a circuit with đ inputs and đ outputs, and đ„ â 0, 1đ, thenwe can compute the output of đ¶ on the input đ„ in the natural way:assign the input vertices X[0], âŠ, X[đ â 1] the values đ„0, ⊠, đ„đâ1,apply each gate on the values of its in-neighbors, and then output thevalues that correspond to the output vertices. Formally, this is definedas follows:
Definition 3.6 â Computing a function via a Boolean circuit. Let đ¶ be aBoolean circuit with đ inputs and đ outputs. For every đ„ â 0, 1đ,the output of đ¶ on the input đ„, denoted by đ¶(đ„), is defined as theresult of the following process:
We let â ⶠđ â â be the minimal layering of đ¶ (aka topologicalsorting, see Theorem 1.26). We let đż be the maximum layer of â,and for â = 0, 1, ⊠, đż we do the following:
âą For every đŁ in the â-th layer (i.e., đŁ such that â(đŁ) = â) do:
â If đŁ is an input vertex labeled with X[đ] for some đ â [đ], thenwe assign to đŁ the value đ„đ.
â If đŁ is a gate vertex labeled with ⧠and with two in-neighborsđą, đ€ then we assign to đŁ the AND of the values assigned tođą and đ€. (Since đą and đ€ are in-neighbors of đŁ, they are in alower layer than đŁ, and hence their values have already beenassigned.)
â If đŁ is a gate vertex labeled with âš and with two in-neighborsđą, đ€ then we assign to đŁ the OR of the values assigned to đąand đ€.
â If đŁ is a gate vertex labeled with ÂŹ and with one in-neighbor đąthen we assign to đŁ the negation of the value assigned to đą.
âą The result of this process is the value đŠ â 0, 1đ such that forevery đ â [đ], đŠđ is the value assigned to the vertex with labelY[đ].Let đ ⶠ0, 1đ â 0, 1đ. We say that the circuit đ¶ computes đ if
for every đ„ â 0, 1đ, đ¶(đ„) = đ(đ„).R
Remark 3.7 â Boolean circuits nitpicks (optional). Inphrasing Definition 3.5, weâve made some technical
defining computation 137
choices that are not very important, but will be con-venient for us later on. Having parallel edges meansan AND or OR gate đą can have both its in-neighborsbe the same gate đŁ. Since AND(đ, đ) = OR(đ, đ) = đfor every đ â 0, 1, such parallel edges donât help incomputing new values in circuits with AND/OR/NOTgates. However, we will see circuits with more gen-eral sets of gates later on. The condition that everyinput vertex has at least one out-neighbor is also notvery important because we can always add âdummygatesâ that touch these inputs. However, it is conve-nient since it guarantees that (since every gate has atmost two in-neighbors) that the number of inputs in acircuit is never larger than twice its size.
3.3.2 Equivalence of circuits and straight-line programsWe have seen two ways to describe how to compute a function đ usingAND, OR and NOT:
âą A Boolean circuit, defined in Definition 3.5, computes đ by connect-ing via wires AND, OR, and NOT gates to the inputs.
âą We can also describe such a computation using a straight-lineprogram that has lines of the form foo = AND(bar,blah), foo =OR(bar,blah) and foo = NOT(bar)where foo, bar and blah arevariable names. (We call this a straight-line program since it containsno loops or branching (e.g., if/then) statements.)
We now formally define the AON-CIRC programming language(âAONâ stands for AND/OR/NOT; âCIRCâ stands for circuit) whichhas the above operations, and show that it is equivalent to Booleancircuits.
Definition 3.8 â AON-CIRC Programming language. An AON-CIRC pro-gram is a string of lines of the form foo = AND(bar,blah), foo= OR(bar,blah) and foo = NOT(bar)where foo, bar and blahare variable names. 2 Variables of the form X[đ] are known asinput variables, and variables of the form Y[đ] are known as outputvariables. In every line, the variables on the righthand side of theassignment operators must either be input variables or variablesthat have already been assigned a value.
A valid AON-CIRC program đ includes input variables of theform X[0],âŠ,X[đ â 1] and output variables of the form Y[0],âŠ,Y[đ â 1] for some đ, đ â„ 1. If đ is valid AON-CIRC program andđ„ â 0, 1đ, then we define the output of đ on input đ„, denoted byđ(đ„), to be the string đŠ â 0, 1đ corresponding to the values ofthe output variables Y[0] ,âŠ, Y[đ â 1] in the execution of đ where
138 introduction to theoretical computer science
2 We follow the common programming languagesconvention of using names such as foo, bar, baz,blah as stand-ins for generic identifiers. A variableidentifier in our programming language can beany combination of letters, numbers, underscores,and brackets. The appendix contains a full formalspecification of our programming language.
we initialize the input variables X[0],âŠ,X[đ â 1] to the valuesđ„0, ⊠, đ„đâ1.We say that such an AON-CIRC program đ computes a functionđ ⶠ0, 1đ â 0, 1đ if đ(đ„) = đ(đ„) for every đ„ â 0, 1đ.
AON-CIRC is not a practical programming language: it was de-signed for pedagogical purposes only, as a way to model computationas composition of AND, OR, and NOT. However, AON-CIRC can stillbe easily implemented on a computer. The following solved exercisegives an example of an AON-CIRC program.Solved Exercise 3.4 Consider the following function CMP ⶠ0, 14 â0, 1 that on four input bits đ, đ, đ, đ â 0, 1, outputs 1 iff the numberrepresented by (đ, đ) is larger than the number represented by (đ, đ).That is CMP(đ, đ, đ, đ) = 1 iff 2đ + đ > 2đ + đ.
Write an AON-CIRC program to compute CMP.
Solution:
Writing such a program is tedious but not truly hard. To com-pare two numbers we first compare their most significant digit,and then go down to the next digit and so on and so forth. In thiscase where the numbers have just two binary digits, these compar-isons are particularly simple. The number represented by (đ, đ) islarger than the number represented by (đ, đ) if and only if one ofthe following conditions happens:
1. The most significant bit đ of (đ, đ) is larger than the most signifi-cant bit đ of (đ, đ).or
2. The two most significant bits đ and đ are equal, but đ > đ.Another way to express the same condition is the following: the
number (đ, đ) is larger than (đ, đ) iff đ > đ OR (đ â„ đ AND đ > đ).For binary digits đŒ, đœ, the condition đŒ > đœ is simply that đŒ = 1
and đœ = 0 or AND(đŒ,NOT(đœ)) = 1, and the condition đŒ â„ đœ is sim-ply OR(đŒ,NOT(đœ)) = 1. Together these observations can be used togive the following AON-CIRC program to compute CMP:
temp_1 = NOT(X[2])temp_2 = AND(X[0],temp_1)temp_3 = OR(X[0],temp_1)temp_4 = NOT(X[3])
defining computation 139
Figure 3.12: A circuit for computing the CMP function.The evaluation of this circuit on (1, 1, 1, 0) yields theoutput 1, since the number 3 (represented in binaryas 11) is larger than the number 2 (represented inbinary as 10).
temp_5 = AND(X[1],temp_4)temp_6 = AND(temp_5,temp_3)Y[0] = OR(temp_2,temp_6)
We can also present this 8-line program as a circuit with 8 gates,see Fig. 3.12.
It turns out that AON-CIRC programs and Boolean circuits haveexactly the same power:
Theorem 3.9 â Equivalence of circuits and straight-line programs. Letđ ⶠ0, 1đ â 0, 1đ and đ â„ đ be some number. Then đ iscomputable by a Boolean circuit of đ gates if and only if đ is com-putable by an AON-CIRC program of đ lines.
Proof Idea:
The idea is simple - AON-CIRC programs and Boolean circuitsare just different ways of describing the exact same computationalprocess. For example, an AND gate in a Boolean circuit correspondingto computing the AND of two previously-computed values. In a AON-CIRC program this will correspond to the line that stores in a variablethe AND of two previously-computed variables.â
PThis proof of Theorem 3.9 is simple at heart, but allthe details it contains can make it a little cumbersometo read. You might be better off trying to work it outyourself before reading it. Our GitHub repository con-tains a âproof by Pythonâ of Theorem 3.9: implemen-tation of functions circuit2prog and prog2circuitsmapping Boolean circuits to AON-CIRC programs andvice versa.
Proof of Theorem 3.9. Let đ ⶠ0, 1đ â 0, 1đ. Since the theorem is anâif and only ifâ statement, to prove it we need to show both directions:translating an AON-CIRC program that computes đ into a circuit thatcomputes đ , and translating a circuit that computes đ into an AON-CIRC program that does so.
We start with the first direction. Let đ be an đ line AON-CIRC thatcomputes đ . We define a circuit đ¶ as follows: the circuit will have đinputs and đ gates. For every đ â [đ ], if the đ-th line has the form foo= AND(bar,blah) then the đ-th gate in the circuit will be an ANDgate that is connected to gates đ and đ where đ and đ correspond tothe last lines before đ where the variables bar and blah (respectively)
140 introduction to theoretical computer science
Figure 3.13: Two equivalent descriptions of the sameAND/OR/NOT computation as both an AON pro-gram and a Boolean circuit.
where written to. (For example, if đ = 57 and the last line bar waswritten to is 35 and the last line blah was written to is 17 then the twoin-neighbors of gate 57 will be gates 35 and 17.) If either bar or blah isan input variable then we connect the gate to the corresponding inputvertex instead. If foo is an output variable of the form Y[đ] then weadd the same label to the corresponding gate to mark it as an outputgate. We do the analogous operations if the đ-th line involves an ORor a NOT operation (except that we use the corresponding OR or NOTgate, and in the latter case have only one in-neighbor instead of two).For every input đ„ â 0, 1đ, if we run the program đ on đ„, then thevalue written that is computed in the đ-th line is exactly the valuethat will be assigned to the đ-th gate if we evaluate the circuit đ¶ on đ„.Hence đ¶(đ„) = đ(đ„) for every đ„ â 0, 1đ.
For the other direction, let đ¶ be a circuit of đ gates and đ inputs thatcomputes the function đ . We sort the gates according to a topologicalorder and write them as đŁ0, ⊠, đŁđ â1. We now can create a programđ of đ lines as follows. For every đ â [đ ], if đŁđ is an AND gate within-neighbors đŁđ, đŁđ then we will add a line to đ of the form temp_đ= AND(temp_đ,temp_đ), unless one of the vertices is an input vertexor an output gate, in which case we change this to the form X[.] orY[.] appropriately. Because we work in topological ordering, we areguaranteed that the in-neighbors đŁđ and đŁđ correspond to variablesthat have already been assigned a value. We do the same for OR andNOT gates. Once again, one can verify that for every input đ„, thevalue đ(đ„) will equal đ¶(đ„) and hence the program computes thesame function as the circuit. (Note that since đ¶ is a valid circuit, perDefinition 3.5, every input vertex of đ¶ has at least one out-neighborand there are exactly đ output gates labeled 0, ⊠, đ â 1; hence all thevariables X[0], âŠ, X[đ â 1] and Y[0] ,âŠ, Y[đ â 1] will appear in theprogram đ .)
3.4 PHYSICAL IMPLEMENTATIONS OF COMPUTING DEVICES (DI-GRESSION)
Computation is an abstract notion that is distinct from its physicalimplementations. While most modern computing devices are obtainedby mapping logical gates to semiconductor based transistors, overhistory people have computed using a huge variety of mechanisms,including mechanical systems, gas and liquid (known as fluidics),biological and chemical processes, and even living creatures (e.g., seeFig. 3.14 or this video for how crabs or slime mold can be used to docomputations).
defining computation 141
Figure 3.14: Crab-based logic gates from the paperâRobust soldier-crab ball gateâ by Gunji, Nishiyamaand Adamatzky. This is an example of an AND gatethat relies on the tendency of two swarms of crabsarriving from different directions to combine to asingle swarm that continues in the average of thedirections.
Figure 3.15: We can implement the logic of transistorsusing water. The water pressure from the gate closesor opens a faucet between the source and the sink.
In this section we will review some of these implementations, bothso you can get an appreciation of how it is possible to directly translateBoolean circuits to the physical world, without going through the en-tire stack of architecture, operating systems, and compilers, as well asto emphasize that silicon-based processors are by no means the onlyway to perform computation. Indeed, as we will see in Chapter 23,a very exciting recent line of work involves using different media forcomputation that would allow us to take advantage of quantum me-chanical effects to enable different types of algorithms.
Such a cool way to explain logic gates. pic.twitter.com/6Wgu2ZKFCxâ Lionel Page (@page_eco) October 28, 2019
3.4.1 TransistorsA transistor can be thought of as an electric circuit with two inputs,known as the source and the gate and an output, known as the sink.The gate controls whether current flows from the source to the sink. Ina standard transistor, if the gate is âONâ then current can flow from thesource to the sink and if it is âOFFâ then it canât. In a complementarytransistor this is reversed: if the gate is âOFFâ then current can flowfrom the source to the sink and if it is âONâ then it canât.
There are several ways to implement the logic of a transistor. Forexample, we can use faucets to implement it using water pressure(e.g. Fig. 3.15). This might seem as merely a curiosity, but there isa field known as fluidics concerned with implementing logical op-erations using liquids or gasses. Some of the motivations includeoperating in extreme environmental conditions such as in space or abattlefield, where standard electronic equipment would not survive.
The standard implementations of transistors use electrical current.One of the original implementations used vacuum tubes. As its nameimplies, a vacuum tube is a tube containing nothing (i.e., a vacuum)and where a priori electrons could freely flow from the source (awire) to the sink (a plate). However, there is a gate (a grid) betweenthe two, where modulating its voltage can block the flow of electrons.
Early vacuum tubes were roughly the size of lightbulbs (andlooked very much like them too). In the 1950âs they were supplantedby transistors, which implement the same logic using semiconduc-tors which are materials that normally do not conduct electricity butwhose conductivity can be modified and controlled by inserting impu-rities (âdopingâ) and applying an external electric field (this is knownas the field effect). In the 1960âs computers started to be implementedusing integrated circuits which enabled much greater density. In 1965,Gordon Moore predicted that the number of transistors per integratedcircuit would double every year (see Fig. 3.16), and that this wouldlead to âsuch wonders as home computers âor at least terminals con-
142 introduction to theoretical computer science
Figure 3.16: The number of transistors per integratedcircuits from 1959 till 1965 and a prediction that ex-ponential growth will continue for at least anotherdecade. Figure taken from âCramming More Com-ponents onto Integrated Circuitsâ, Gordon Moore,1965
Figure 3.17: Cartoon from Gordon Mooreâs articleâpredictingâ the implications of radically improvingtransistor density.
Figure 3.18: The exponential growth in computingpower over the last 120 years. Graph by Steve Jurvet-son, extending a prior graph of Ray Kurzweil.
Figure 3.19: Implementing logical gates using transis-tors. Figure taken from Rory Manglesâ website.
nected to a central computerâ automatic controls for automobiles,and personal portable communications equipmentâ. Since then, (ad-justed versions of) this so-called âMooreâs lawâ have been runningstrong, though exponential growth cannot be sustained forever, andsome physical limitations are already becoming apparent.
3.4.2 Logical gates from transistorsWe can use transistors to implement various Boolean functions such asAND, OR, and NOT. For each a two-input gate đș ⶠ0, 12 â 0, 1,such an implementation would be a system with two input wires đ„, đŠand one output wire đ§, such that if we identify high voltage with â1âand low voltage with â0â, then the wire đ§ will equal to â1â if and onlyif applying đș to the values of the wires đ„ and đŠ is 1 (see Fig. 3.19 andFig. 3.20). This means that if there exists a AND/OR/NOT circuit tocompute a function đ ⶠ0, 1đ â 0, 1đ, then we can compute đ inthe physical world using transistors as well.
3.4.3 Biological computingComputation can be based on biological or chemical systems. For ex-ample the lac operon produces the enzymes needed to digest lactoseonly if the conditions đ„ ⧠(ÂŹđŠ) hold where đ„ is âlactose is presentâ andđŠ is âglucose is presentâ. Researchers have managed to create transis-tors, and from them logic gates, based on DNA molecules (see alsoFig. 3.21). One motivation for DNA computing is to achieve increasedparallelism or storage density; another is to create âsmart biologicalagentsâ that could perhaps be injected into bodies, replicate them-selves, and fix or kill cells that were damaged by a disease such ascancer. Computing in biological systems is not restricted, of course, toDNA: even larger systems such as flocks of birds can be considered ascomputational processes.
3.4.4 Cellular automata and the game of lifeCellular automata is a model of a system composed of a sequence ofcells, each of which can have a finite state. At each step, a cell updatesits state based on the states of its neighboring cells and some simplerules. As we will discuss later in this book (see Section 8.4), cellularautomata such as Conwayâs âGame of Lifeâ can be used to simulatecomputation gates .
3.4.5 Neural networksOne computation device that we all carry with us is our own brain.Brains have served humanity throughout history, doing computationsthat range from distinguishing prey from predators, through makingscientific discoveries and artistic masterpieces, to composing witty 280character messages. The exact working of the brain is still not fully
defining computation 143
Figure 3.20: Implementing a NAND gate (see Sec-tion 3.5) using transistors.
Figure 3.21: Performance of DNA-based logic gates.Figure taken from paper of Bonnet et al, Science, 2013.
Figure 3.22: An AND gate using a âGame of Lifeâconfiguration. Figure taken from Jean-PhilippeRennardâs paper.
understood, but one common mathematical model for it is a (verylarge) neural network.
A neural network can be thought of as a Boolean circuit that insteadof AND/OR/NOT uses some other gates as the basic basis. For exam-ple, one particular basis we can use are threshold gates. For every vectorđ€ = (đ€0, ⊠, đ€đâ1) of integers and integer đĄ (some or all of whichcould be negative), the threshold function corresponding to đ€, đĄ is thefunction đđ€,đĄ ⶠ0, 1đ â 0, 1 that maps đ„ â 0, 1đ to 1 if and only ifâđâ1đ=0 đ€đđ„đ â„ đĄ. For example, the threshold function đđ€,đĄ correspond-ing to đ€ = (1, 1, 1, 1, 1) and đĄ = 3 is simply the majority function MAJ5on 0, 15. Threshold gates can be thought of as an approximation forneuron cells that make up the core of human and animal brains. To afirst approximation, a neuron has đ inputs and a single output, andthe neurons âfiresâ or âturns onâ its output when those signals passsome threshold.
Many machine learning algorithms use artificial neural networkswhose purpose is not to imitate biology but rather to perform somecomputational tasks, and hence are not restricted to threshold orother biologically-inspired gates. Generally, a neural network is oftendescribed as operating on signals that are real numbers, rather than0/1 values, and where the output of a gate on inputs đ„0, ⊠, đ„đâ1 isobtained by applying đ(âđ đ€đđ„đ) where đ ⶠâ â â is an activationfunction such as rectified linear unit (ReLU), Sigmoid, or many others(see Fig. 3.23). However, for the purposes of our discussion, all ofthe above are equivalent (see also Exercise 3.13). In particular we canreduce the setting of real inputs to binary inputs by representing areal number in the binary basis, and multiplying the weight of the bitcorresponding to the đđĄâ digit by 2đ.3.4.6 A computer made from marbles and pipesWe can implement computation using many other physical media,without any electronic, biological, or chemical components. Manysuggestions for mechanical computers have been put forward, goingback at least to Gottfried Leibnizâ computing machines from the 1670sand Charles Babbageâs 1837 plan for a mechanical âAnalytical Engineâ.As one example, Fig. 3.24 shows a simple implementation of a NAND(negation of AND, see Section 3.5) gate using marbles going throughpipes. We represent a logical value in 0, 1 by a pair of pipes, suchthat there is a marble flowing through exactly one of the pipes. Wecall one of the pipes the â0 pipeâ and the other the â1 pipeâ, and sothe identity of the pipe containing the marble determines the logicalvalue. A NAND gate corresponds to a mechanical object with twopairs of incoming pipes and one pair of outgoing pipes, such that forevery đ, đ â 0, 1, if two marbles are rolling toward the object in the
144 introduction to theoretical computer science
Figure 3.23: Common activation functions used inNeural Networks, including rectified linear units(ReLU), sigmoids, and hyperbolic tangent. All ofthose can be thought of as continuous approximationsto simple the step function. All of these can be usedto compute the NAND gate (see Exercise 3.13). Thisproperty enables neural networks to (approximately)compute any function that can be computed by aBoolean circuit.
Figure 3.24: A physical implementation of a NANDgate using marbles. Each wire in a Boolean circuit ismodeled by a pair of pipes representing the values0 and 1 respectively, and hence a gate has four inputpipes (two for each logical input) and two outputpipes. If one of the input pipes representing the value0 has a marble in it then that marble will flow to theoutput pipe representing the value 1. (The dashedline represents a gadget that will ensure that at mostone marble is allowed to flow onward in the pipe.)If both the input pipes representing the value 1 havemarbles in them, then the first marble will be stuckbut the second one will flow onwards to the outputpipe representing the value 0.
Figure 3.25: A âgadgetâ in a pipe that ensures that atmost one marble can pass through it. The first marblethat passes causes the barrier to lift and block newones.
đ pipe of the first pair and the đ pipe of the second pair, then a marblewill roll out of the object in the NAND(đ, đ)-pipe of the outgoing pair.In fact, there is even a commercially-available educational game thatuses marbles as a basis of computing, see Fig. 3.26.
3.5 THE NAND FUNCTION
The NAND function is another simple function that is extremely use-ful for defining computation. It is the function mapping 0, 12 to0, 1 defined by:
NAND(đ, đ) = â§âšâ©0 đ = đ = 11 otherwise. (3.9)
As its name implies, NAND is the NOT of AND (i.e., NAND(đ, đ) =NOT(AND(đ, đ))), and so we can clearly compute NAND using ANDand NOT. Interestingly, the opposite direction holds as well:
Theorem 3.10 â NAND computes AND,OR,NOT. We can compute AND,OR, and NOT by composing only the NAND function.
Proof. We start with the following observation. For every đ â 0, 1,AND(đ, đ) = đ. Hence, NAND(đ, đ) = NOT(AND(đ, đ)) = NOT(đ).This means that NAND can compute NOT. By the principle of âdou-ble negationâ, AND(đ, đ) = NOT(NOT(AND(đ, đ))), and hencewe can use NAND to compute AND as well. Once we can computeAND and NOT, we can compute OR using âDe Morganâs Lawâ:OR(đ, đ) = NOT(AND(NOT(đ),NOT(đ))) (which can also be writ-ten as đ âš đ = đ ⧠đ) for every đ, đ â 0, 1.
PTheorem 3.10âs proof is very simple, but you shouldmake sure that (i) you understand the statement ofthe theorem, and (ii) you follow its proof. In partic-ular, you should make sure you understand why DeMorganâs law is true.
We can use NAND to compute many other functions, as demon-strated in the following exercise.Solved Exercise 3.5 â Compute majority with NAND. Let MAJ ⶠ0, 13 â0, 1 be the function that on input đ, đ, đ outputs 1 iff đ + đ + đ â„ 2.Show how to compute MAJ using a composition of NANDâs.
defining computation 145
Figure 3.26: The game âTuring Tumbleâ contains animplementation of logical gates using marbles.
Figure 3.27: A circuit with NAND gates to computethe Majority function on three bits
Solution:
Recall that (3.5) states that
MAJ(đ„0, đ„1, đ„2) = OR (AND(đ„0, đ„1) , OR(AND(đ„1, đ„2) , AND(đ„0, đ„2)) ) .(3.10)
We can use Theorem 3.10 to replace all the occurrences of ANDand OR with NANDâs. Specifically, we can use the equivalenceAND(đ, đ) = NOT(NAND(đ, đ)), OR(đ, đ) = NAND(NOT(đ),NOT(đ)),and NOT(đ) = NAND(đ, đ) to replace the righthand side of(3.10) with an expression involving only NAND, yielding thatMAJ(đ, đ, đ) is equivalent to the (somewhat unwieldy) expression
NAND(NAND(NAND(NAND(đ, đ),NAND(đ, đ)),NAND(NAND(đ, đ),NAND(đ, đ)) ),
NAND(đ, đ) ) (3.11)
The same formula can also be expressed as a circuit with NANDgates, see Fig. 3.27.
3.5.1 NAND CircuitsWe define NAND Circuits as circuits in which all the gates are NANDoperations. Such a circuit again corresponds to a directed acyclicgraph (DAG) since all the gates correspond to the same function (i.e.,NAND), we do not even need to label them, and all gates have in-degree exactly two. Despite their simplicity, NAND circuits can bequite powerful.
Example 3.11 â đđŽđđ· circuit for đđđ . Recall the XOR functionwhich maps đ„0, đ„1 â 0, 1 to đ„0 + đ„1 mod 2. We have seen inSection 3.2.2 that we can compute XOR using AND, OR, and NOT,and so by Theorem 3.10 we can compute it using only NANDâs.However, the following is a direct construction of computing XORby a sequence of NAND operations:
1. Let đą = NAND(đ„0, đ„1).2. Let đŁ = NAND(đ„0, đą)3. Let đ€ = NAND(đ„1, đą).4. The XOR of đ„0 and đ„1 is đŠ0 = NAND(đŁ, đ€).
146 introduction to theoretical computer science
Figure 3.28: A circuit with NAND gates to computethe XOR of two bits.
One can verify that this algorithm does indeed compute XORby enumerating all the four choices for đ„0, đ„1 â 0, 1. We can alsorepresent this algorithm graphically as a circuit, see Fig. 3.28.
In fact, we can show the following theorem:
Theorem 3.12 â NAND is a universal operation. For every Boolean circuitđ¶ of đ gates, there exists a NAND circuit đ¶âČ of at most 3đ gates thatcomputes the same function as đ¶.
Proof Idea:
The idea of the proof is to just replace every AND, OR and NOTgate with their NAND implementation following the proof of Theo-rem 3.10.âProof of Theorem 3.12. If đ¶ is a Boolean circuit, then since, as weâveseen in the proof of Theorem 3.10, for every đ, đ â 0, 1âą NOT(đ) = NAND(đ, đ)âą AND(đ, đ) = NAND(NAND(đ, đ),NAND(đ, đ))âą OR(đ, đ) = NAND(NAND(đ, đ),NAND(đ, đ))
we can replace every gate of đ¶ with at most three NAND gates toobtain an equivalent circuit đ¶âČ. The resulting circuit will have at most3đ gates.
Big Idea 3 Two models are equivalent in power if they can be usedto compute the same set of functions.
3.5.2 More examples of NAND circuits (optional)Here are some more sophisticated examples of NAND circuits:
Incrementing integers. Consider the task of computing, given as inputa string đ„ â 0, 1đ that represents a natural number đ â â, therepresentation of đ + 1. That is, we want to compute the functionINCđ ⶠ0, 1đ â 0, 1đ+1 such that for every đ„0, ⊠, đ„đâ1, INCđ(đ„) =đŠ which satisfies âđđ=0 đŠđ2đ = (âđâ1đ=0 đ„đ2đ) + 1. (For simplicity ofnotation, in this example we use the representation where the leastsignificant digit is first rather than last.)
The increment operation can be very informally described as fol-lows: âAdd 1 to the least significant bit and propagate the carryâ. A little
defining computation 147
Figure 3.29: NAND circuit with computing the incre-ment function on 4 bits.
more precisely, in the case of the binary representation, to obtain theincrement of đ„, we scan đ„ from the least significant bit onwards, andflip all 1âs to 0âs until we encounter a bit equal to 0, in which case weflip it to 1 and stop.
Thus we can compute the increment of đ„0, ⊠, đ„đâ1 by doing thefollowing:
Algorithm 3.13 â Compute Increment Function.
Input: đ„0, đ„1, ⊠, đ„đâ1 representing the number âđâ1đ=0 đ„đ â 2đ# we use LSB-first representation
Output: đŠ â 0, 1đ+1 such that âđđ=0 đŠđ â 2đ = âđâ1đ=0 đ„đ â 2đ+11: Let đ0 â 1 # we pretend we have a âcarryâ of 1 initially2: for đ = 0, ⊠, đ â 1 do3: Let đŠđ â đđđ (đ„đ, đđ).4: if đđ = đ„đ = 1 then5: đđ+1 = 16: else7: đđ+1 = 08: end if9: end for
10: Let đŠđ â đđ.Algorithm 3.13 describes precisely how to compute the increment
operation, and can be easily transformed into Python code that per-forms the same computation, but it does not seem to directly yielda NAND circuit to compute this. However, we can transform thisalgorithm line by line to a NAND circuit. For example, since for ev-ery đ, NAND(đ,NOT(đ)) = 1, we can replace the initial statementđ0 = 1 with đ0 = NAND(đ„0,NAND(đ„0, đ„0)). We already knowhow to compute XOR using NAND and so we can use this to im-plement the operation đŠđ â XOR(đ„đ, đđ). Similarly, we can writethe âifâ statement as saying đđ+1 â AND(đđ, đ„đ), or in other wordsđđ+1 â NAND(NAND(đđ, đ„đ),NAND(đđ, đ„đ)). Finally, the assignmentđŠđ = đđ can be written as đŠđ = NAND(NAND(đđ, đđ),NAND(đđ, đđ)).Combining these observations yields for every đ â â, a NAND circuitto compute INCđ. For example, Fig. 3.29 shows what this circuit lookslike for đ = 4.From increment to addition. Once we have the increment operation,we can certainly compute addition by repeatedly incrementing (i.e.,compute đ„+đŠ by performing INC(đ„) đŠ times). However, that would bequite inefficient and unnecessary. With the same idea of keeping trackof carries we can implement the âgrade-schoolâ addition algorithmand compute the function ADDđ ⶠ0, 12đ â 0, 1đ+1 that on
148 introduction to theoretical computer science
input đ„ â 0, 12đ outputs the binary representation of the sum of thenumbers represented by đ„0, ⊠, đ„đâ1 and đ„đ+1, ⊠, đ„đ:
Algorithm 3.14 â Addition using NAND.
Input: đą â 0, 1đ, đŁ â 0, 1đ representing numbers inLSB-first binary representation.
Output: LSB-first binary representation of đ„ + đŠ.1: Let đ0 â 02: for đ = 0, ⊠, đ â 1 do3: Let đŠđ â đąđ + đŁđ mod 24: if đąđ + đŁđ + đđ â„ 2 then5: đđ+1 â 16: else7: đđ+1 â 08: end if9: end for
10: Let đŠđ â đđOnce again, Algorithm 3.14 can be translated into a NAND cir-
cuit. The crucial observation is that the âif/thenâ statement simplycorresponds to đđ+1 â MAJ3(đąđ, đŁđ, đŁđ) and we have seen in SolvedExercise 3.5 that the function MAJ3 ⶠ0, 13 â 0, 1 can be computedusing NANDs.
3.5.3 The NAND-CIRC Programming languageJust like we did for Boolean circuits, we can define a programming-language analog of NAND circuits. It is even simpler than the AON-CIRC language since we only have a single operation. We define theNAND-CIRC Programming Language to be a programming languagewhere every line has the following form:
foo = NAND(bar,blah)
where foo, bar and blah are variable identifiers.
Example 3.15 â Our first NAND-CIRC program. Here is an example of aNAND-CIRC program:
u = NAND(X[0],X[1])v = NAND(X[0],u)w = NAND(X[1],u)Y[0] = NAND(v,w)
defining computation 149
Figure 3.30: A NAND program and the correspondingcircuit. Note how every line in the program corre-sponds to a gate in the circuit.
PDo you know what function this program computes?Hint: you have seen it before.
Formally, just like we did in Definition 3.8 for AON-CIRC, we candefine the notion of computation by a NAND-CIRC program in thenatural way:
Definition 3.16 â Computing by a NAND-CIRC program. Let đ ⶠ0, 1đ â0, 1đ be some function, and let đ be a NAND-CIRC program.We say that đ computes the function đ if:
1. đ has đ input variables X[0], ⊠,X[đâ1] and đ output variablesY[0],âŠ,Y[đ â 1].
2. For every đ„ â 0, 1đ, if we execute đ when we assign toX[0], ⊠,X[đ â 1] the values đ„0, ⊠, đ„đâ1, then at the end ofthe execution, the output variables Y[0],âŠ,Y[đ â 1] have thevalues đŠ0, ⊠, đŠđâ1 where đŠ = đ(đ„).
As before we can show that NAND circuits are equivalent toNAND-CIRC programs (see Fig. 3.30):
Theorem 3.17 â NAND circuits and straight-line program equivalence. Forevery đ ⶠ0, 1đ â 0, 1đ and đ â„ đ, đ is computable by aNAND-CIRC program of đ lines if and only if đ is computable by aNAND circuit of đ gates.
We omit the proof of Theorem 3.17 since it follows along exactlythe same lines as the equivalence of Boolean circuits and AON-CIRCprogram (Theorem 3.9). Given Theorem 3.17 and Theorem 3.12, weknow that we can translate every đ -line AON-CIRC program đ intoan equivalent NAND-CIRC program of at most 3đ lines. In fact, thistranslation can be easily done by replacing every line of the formfoo = AND(bar,blah), foo = OR(bar,blah) or foo = NOT(bar)with the equivalent 1-3 lines that use the NAND operation. Our GitHubrepository contains a âproof by codeâ: a simple Python programAON2NAND that transforms an AON-CIRC into an equivalent NAND-CIRC program.
RRemark 3.18 â Is the NAND-CIRC programming languageTuring Complete? (optional note). You might have heardof a term called âTuring Completeâ that is sometimes
150 introduction to theoretical computer science
used to describe programming languages. (If youhavenât, feel free to ignore the rest of this remark: wedefine this term precisely in Chapter 8.) If so, youmight wonder if the NAND-CIRC programming lan-guage has this property. The answer is no, or perhapsmore accurately, the term âTuring Completenessâ isnot really applicable for the NAND-CIRC program-ming language. The reason is that, by design, theNAND-CIRC programming language can only com-pute finite functions đč ⶠ0, 1đ â 0, 1đ that take afixed number of input bits and produce a fixed num-ber of outputs bits. The term âTuring Completeâ isonly applicable to programming languages for infinitefunctions that can take inputs of arbitrary length. Wewill come back to this distinction later on in this book.
3.6 EQUIVALENCE OF ALL THESE MODELS
If we put together Theorem 3.9, Theorem 3.12, and Theorem 3.17, weobtain the following result:
Theorem 3.19 â Equivalence between models of finite computation. Forevery sufficiently large đ , đ, đ and đ ⶠ0, 1đ â 0, 1đ, thefollowing conditions are all equivalent to one another:
âą đ can be computed by a Boolean circuit (with â§, âš, ÂŹ gates) of atmost đ(đ ) gates.
âą đ can be computed by an AON-CIRC straight-line program of atmost đ(đ ) lines.
âą đ can be computed by a NAND circuit of at most đ(đ ) gates.
âą đ can be computed by a NAND-CIRC straight-line program of atmost đ(đ ) lines.
By âđ(đ )â we mean that the bound is at most đ â đ where đ is a con-stant that is independent of đ. For example, if đ can be computed by aBoolean circuit of đ gates, then it can be computed by a NAND-CIRCprogram of at most 3đ lines, and if đ can be computed by a NANDcircuit of đ gates, then it can be computed by an AON-CIRC programof at most 2đ lines.
Proof Idea:
We omit the formal proof, which is obtained by combining Theo-rem 3.9, Theorem 3.12, and Theorem 3.17. The key observation is thatthe results we have seen allow us to translate a program/circuit thatcomputes đ in one of the above models into a program/circuit that
defining computation 151
computes đ in another model by increasing the lines/gates by at mosta constant factor (in fact this constant factor is at most 3).â
Theorem 3.9 is a special case of a more general result. We can con-sider even more general models of computation, where instead ofAND/OR/NOT or NAND, we use other operations (see Section 3.6.1below). It turns out that Boolean circuits are equivalent in power tosuch models as well. The fact that all these different ways to definecomputation lead to equivalent models shows that we are âon theright trackâ. It justifies the seemingly arbitrary choices that weâvemade of using AND/OR/NOT or NAND as our basic operations,since these choices do not affect the power of our computationalmodel. Equivalence results such as Theorem 3.19 mean that we caneasily translate between Boolean circuits, NAND circuits, NAND-CIRC programs and the like. We will use this ability later on in thisbook, often shifting to the most convenient formulation without mak-ing a big deal about it. Hence we will not worry too much about thedistinction between, for example, Boolean circuits and NAND-CIRCprograms.
In contrast, we will continue to take special care to distinguishbetween circuits/programs and functions (recall Big Idea 2). A func-tion corresponds to a specification of a computational task, and it isa fundamentally different object than a program or a circuit, whichcorresponds to the implementation of the task.
3.6.1 Circuits with other gate setsThere is nothing special about AND/OR/NOT or NAND. For everyset of functions đą = đș0, ⊠, đșđâ1, we can define a notion of circuitsthat use elements of đą as gates, and a notion of a âđą programminglanguageâ where every line involves assigning to a variable foo the re-sult of applying some đșđ â đą to previously defined or input variables.Specifically, we can make the following definition:
Definition 3.20 â General straight-line programs. Let â± = đ0, ⊠, đđĄâ1be a finite collection of Boolean functions, such that đđ ⶠ0, 1đđ â0, 1 for some đđ â â. An â± program is a sequence of lines, each ofwhich assigns to some variable the result of applying some đđ â â±to đđ other variables. As above, we use X[đ] and Y[đ] to denote theinput and output variables.
We say that â± is a universal set of operations (also known as a uni-versal gate set) if there exists a â± program to compute the functionNAND.
152 introduction to theoretical computer science
3 One can also define these functions as taking alength zero input. This makes no difference for thecomputational power of the model.
AON-CIRC programs correspond to đŽđđ·,OR,NOT programs,NAND-CIRC programs corresponds to â± programs for the setâ± that only contains the NAND function, but we can also defineIF,ZERO,ONE programs (see below), or use any other set.
We can also define â± circuits, which will be directed graphs inwhich each gate corresponds to applying a function đđ â â±, and willeach have đđ incoming wires and a single outgoing wire. (If the func-tion đđ is not symmetric, in the sense that the order of its input mattersthen we need to label each wire entering a gate as to which parameterof the function it corresponds to.) As in Theorem 3.9, we can showthat â± circuits and â± programs are equivalent. We have seen that forâ± = AND,OR,NOT, the resulting circuits/programs are equivalentin power to the NAND-CIRC programming language, as we can com-pute NAND using AND/OR/NOT and vice versa. This turns out to bea special case of a general phenomenaâ the universality of NAND andother gate sets â that we will explore more in depth later in this book.
Example 3.21 â IF,ZERO,ONE circuits. Let â± = IF,ZERO,ONEwhere ZERO ⶠ0, 1 â 0 and ONE ⶠ0, 1 â 1 are theconstant zero and one functions, 3 and IF ⶠ0, 13 â 0, 1 is thefunction that on input (đ, đ, đ) outputs đ if đ = 1 and đ otherwise.Then â± is universal.
Indeed, we can demonstrate that IF,ZERO,ONE is universalusing the following formula for NAND:
NAND(đ, đ) = IF(đ, IF(đ,ZERO,ONE),ONE) . (3.12)
There are also some sets â± that are more restricted in power. Forexample it can be shown that if we use only AND or OR gates (with-out NOT) then we do not get an equivalent model of computation.The exercises cover several examples of universal and non-universalgate sets.
3.6.2 Specification vs. implementation (again)As we discussed in Section 2.6.1, one of the most important distinc-tions in this book is that of specification versus implementation or sep-arating âwhatâ from âhowâ (see Fig. 3.31). A function correspondsto the specification of a computational task, that is what output shouldbe produced for every particular input. A program (or circuit, or anyother way to specify algorithms) corresponds to the implementation ofhow to compute the desired output from the input. That is, a programis a set of instructions how to compute the output from the input.Even within the same computational model there can be many differ-ent ways to compute the same function. For example, there is more
defining computation 153
Figure 3.31: It is crucial to distinguish between thespecification of a computational task, namely what isthe function that is to be computed and the implemen-tation of it, namely the algorithm, program, or circuitthat contains the instructions defining how to mapan input to an output. The same function could becomputed in many different ways.
than one NAND-CIRC program that computes the majority function,more than one Boolean circuit to compute the addition function, andso on and so forth.
Confusing specification and implementation (or equivalently func-tions and programs) is a common mistake, and one that is unfortu-nately encouraged by the common programming-language termi-nology of referring to parts of programs as âfunctionsâ. However, inboth the theory and practice of computer science, it is important tomaintain this distinction, and it is particularly important for us in thisbook.
Chapter Recap
âą An algorithm is a recipe for performing a compu-tation as a sequence of âelementaryâ or âsimpleâoperations.
âą One candidate definition for âelementaryâ opera-tions is the set AND, OR and NOT.
âą Another candidate definition for an âelementaryâoperation is the NAND operation. It is an operationthat is easily implementable in the physical worldin a variety of methods including by electronictransistors.
âą We can use NAND to compute many other func-tions, including majority, increment, and others.
âą There are other equivalent choices, including thesets đŽđđ·,OR,NOT and IF,ZERO,ONE.
âą We can formally define the notion of a functionđč ⶠ0, 1đ â 0, 1đ being computable using theNAND-CIRC Programming language.
âą For every set of basic operations, the notions of be-ing computable by a circuit and being computableby a straight-line program are equivalent.
154 introduction to theoretical computer science
4 Hint: Use the fact that MAJ(đ, đ, đ) = đđŽđœ(đ, đ, đ)to prove that every đ ⶠ0, 1đ â 0, 1 computableby a circuit with only MAJ and NOT gates satisfiesđ(0, 0, ⊠, 0) â đ(1, 1, ⊠, 1). Thanks to NathanBrunelle and David Evans for suggesting this exercise.
3.7 EXERCISES
Exercise 3.1 â Compare 4 bit numbers. Give a Boolean circuit(with AND/OR/NOT gates) that computes the functionCMP8 ⶠ0, 18 â 0, 1 such that CMP8(đ0, đ1, đ2, đ3, đ0, đ1, đ2, đ3) = 1if and only if the number represented by đ0đ1đ2đ3 is larger than thenumber represented by đ0đ1đ2đ3.
Exercise 3.2 â Compare đ bit numbers. Prove that there exists a constant đsuch that for every đ there is a Boolean circuit (with AND/OR/NOTgates) đ¶ of at most đ â đ gates that computes the function CMP2đ â¶0, 12đ â 0, 1 such that CMP2đ(đ0 ⯠đđâ1đ0 ⯠đđâ1) = 1 if andonly if the number represented by đ0 ⯠đđâ1 is larger than the numberrepresented by đ0 ⯠đđâ1.
Exercise 3.3 â OR,NOT is universal. Prove that the set OR,NOT is univer-sal, in the sense that one can compute NAND using these gates.
Exercise 3.4 â AND,OR is not universal. Prove that for every đ-bit inputcircuit đ¶ that contains only AND and OR gates, as well as gates thatcompute the constant functions 0 and 1, đ¶ is monotone, in the sensethat if đ„, đ„âČ â 0, 1đ, đ„đ †đ„âČđ for every đ â [đ], then đ¶(đ„) †đ¶(đ„âČ).
Conclude that the set AND,OR, 0, 1 is not universal.
Exercise 3.5 â XOR is not universal. Prove that for every đ-bit input circuitđ¶ that contains only XOR gates, as well as gates that compute theconstant functions 0 and 1, đ¶ is affine or linear modulo two, in the sensethat there exists some đ â 0, 1đ and đ â 0, 1 such that for everyđ„ â 0, 1đ, đ¶(đ„) = âđâ1đ=0 đđđ„đ + đ mod 2.
Conclude that the set XOR, 0, 1 is not universal.
Exercise 3.6 â MAJ,NOT, 1 is universal. Let MAJ ⶠ0, 13 â 0, 1 be themajority function. Prove that MAJ,NOT, 1 is a universal set of gates.
Exercise 3.7 â MAJ,NOT is not universal. Prove that MAJ,NOT is not auniversal set. See footnote for hint.4
Exercise 3.8 â NOR is universal. Let NOR ⶠ0, 12 â 0, 1 defined asNOR(đ, đ) = NOT(OR(đ, đ)). Prove that NOR is a universal set ofgates.
defining computation 155
5 Thanks to Alec Sun and Simon Fischer for commentson this problem.
6 Hint: Use the conditions of Definition 3.5 stipulatingthat every input vertex has at least one out-neighborand there are exactly đ output gates. See also Re-mark 3.7.
Exercise 3.9 â Lookup is universal. Prove that LOOKUP1, 0, 1 is a uni-versal set of gates where 0 and 1 are the constant functions andLOOKUP1 ⶠ0, 13 â 0, 1 satisfies LOOKUP1(đ, đ, đ) equals đ ifđ = 0 and equals đ if đ = 1.
Exercise 3.10 â Bound on universal basis size (challenge). Prove that for ev-ery subset đ” of the functions from 0, 1đ to 0, 1, if đ” is universalthen there is a đ”-circuit of at most đ(1) gates to compute the NANDfunction (you can start by showing that there is a đ” circuit of at mostđ(đ16) gates).5
Exercise 3.11 â Size and inputs / outputs. Prove that for every NAND cir-cuit of size đ with đ inputs and đ outputs, đ â„ minđ/2, đ. Seefootnote for hint.6
Exercise 3.12 â Threshold using NANDs. Prove that there is some constantđ such that for every đ > 1, and integers đ0, ⊠, đđâ1, đ â â2đ, â2đ +1, ⊠, â1, 0, +1, ⊠, 2đ, there is a NAND circuit with at most đđ4 gatesthat computes the threshold function đđ0,âŠ,đđâ1,đ ⶠ0, 1đ â 0, 1 thaton input đ„ â 0, 1đ outputs 1 if and only if âđâ1đ=0 đđđ„đ > đ.
Exercise 3.13 â NANDs from activation functions. We say that a functionđ ⶠâ2 â â is a NAND approximator if it has the following property: forevery đ, đ â â, if min|đ|, |1 â đ| †1/3 and min|đ|, |1 â đ| †1/3 then|đ(đ, đ) â NAND(âđâ, âđâ)| †1/3 where we denote by âđ„â the integerclosest to đ„. That is, if đ, đ are within a distance 1/3 to 0, 1 then wewant đ(đ, đ) to equal the NAND of the values in 0, 1 that are closestto đ and đ respectively. Otherwise, we do not care what the output ofđ is on đ and đ.
In this exercise you will show that you can construct a NAND ap-proximator from many common activation functions used in deepneural networks. As a corollary you will obtain that deep neural net-works can simulate NAND circuits. Since NAND circuits can alsosimulate deep neural networks, these two computational models areequivalent to one another.
1. Show that there is a NAND approximator đ defined as đ(đ, đ) =đż(đ đđżđ(đżâČ(đ, đ))) where đżâČ â¶ â2 â â is an affine function (ofthe form đżâČ(đ, đ) = đŒđ + đœđ + đŸ for some đŒ, đœ, đŸ â â), đż is anaffine function (of the form đż(đŠ) = đŒđŠ + đœ for đŒ, đœ â â), andđ đđżđ ⶠâ â â, is the function defined as đ đđżđ(đ„) = max0, đ„.
156 introduction to theoretical computer science
7 One approach to solve this is using recursion andanalyzing it using the so called âMaster Theoremâ.
8 Hint: Vertices in layers beyond the output can besafely removed without changing the functionality ofthe circuit.
2. Show that there is a NAND approximator đ defined as đ(đ, đ) =đż(đ đđđđđđ(đżâČ(đ, đ))) where đżâČ, đż are affine as above and đ đđđđđđ â¶â â â is the function defined as đ đđđđđđ(đ„) = đđ„/(đđ„ + 1).3. Show that there is a NAND approximator đ defined as đ(đ, đ) =đż(đĄđđâ(đżâČ(đ, đ))) where đżâČ, đż are affine as above and đĄđđâ ⶠâ â â
is the function defined as đĄđđâ(đ„) = (đđ„ â đâđ„)/(đđ„ + đâđ„).4. Prove that for every NAND-circuit đ¶ with đ inputs and one output
that computes a function đ ⶠ0, 1đ â 0, 1, if we replace everygate of đ¶ with a NAND-approximator and then invoke the result-ing circuit on some đ„ â 0, 1đ, the output will be a number đŠ suchthat |đŠ â đ(đ„)| †1/3.
Exercise 3.14 â Majority with NANDs efficiently. Prove that there is someconstant đ such that for every đ > 1, there is a NAND circuit of atmost đ â đ gates that computes the majority function on đ input bitsMAJđ ⶠ0, 1đ â 0, 1. That is MAJđ(đ„) = 1 iff âđâ1đ=0 đ„đ > đ/2. Seefootnote for hint.7
Exercise 3.15 â Output at last layer. Prove that for every đ ⶠ0, 1đ â0, 1, if there is a Boolean circuit đ¶ of đ gates that computes đ thenthere is a Boolean circuit đ¶âČ of at most đ gates such that in the minimallayering of đ¶âČ, the output gate of đ¶âČ is placed in the last layer. Seefootnote for hint.8
3.8 BIOGRAPHICAL NOTES
The excerpt from Al-Khwarizmiâs book is from âThe Algebra of Ben-Musaâ, Fredric Rosen, 1831.
Charles Babbage (1791-1871) was a visionary scientist, mathemati-cian, and inventor (see [Swa02; CM00]). More than a century beforethe invention of modern electronic computers, Babbage realized thatcomputation can be in principle mechanized. His first design for amechanical computer was the difference engine that was designed to dopolynomial interpolation. He then designed the analytical engine whichwas a much more general machine and the first prototype for a pro-grammable general purpose computer. Unfortunately, Babbage wasnever able to complete the design of his prototypes. One of the earliestpeople to realize the engineâs potential and far reaching implicationswas Ada Lovelace (see the notes for Chapter 7).
Boolean algebra was first investigated by Boole and DeMorganin the 1840âs [Boo47; De 47]. The definition of Boolean circuits and
defining computation 157
connection to electrical relay circuits was given in Shannonâs MastersThesis [Sha38]. (Howard Gardener called Shannonâs thesis âpossiblythe most important, and also the most famous, masterâs thesis of the[20th] centuryâ.) Savageâs book [Sav98], like this one, introducesthe theory of computation starting with Boolean circuits as the firstmodel. Juknaâs book [Juk12] contains a modern in-depth exposition ofBoolean circuits, see also [Weg87].
The NAND function was shown to be universal by Sheffer [She13],though this also appears in the earlier work of Peirce, see [Bur78].Whitehead and Russell used NAND as the basis for their logic intheir magnum opus Principia Mathematica [WR12]. In her Ph.D thesis,Ernst [Ern09] investigates empirically the minimal NAND circuitsfor various functions. Nisan and Shockenâs book [NS05] builds acomputing system starting from NAND gates and ending with highlevel programs and games (âNAND to Tetrisâ); see also the websitenandtotetris.org.
4Syntactic sugar, and computing every function
â[In 1951] I had a running compiler and nobody would touch it because,they carefully told me, computers could only do arithmetic; they could not doprograms.â, Grace Murray Hopper, 1986.
âSyntactic sugar causes cancer of the semicolon.â, Alan Perlis, 1982.
The computational models we considered thus far are as âbarebonesâ as they come. For example, our NAND-CIRC âprogramminglanguageâ has only the single operation foo = NAND(bar,blah). Inthis chapter we will see that these simple models are actually equiv-alent to more sophisticated ones. The key observation is that we canimplement more complex features using our basic building blocks,and then use these new features themselves as building blocks foreven more sophisticated features. This is known as âsyntactic sugarâin the field of programming language design since we are not modi-fying the underlying programming model itself, but rather we merelyimplement new features by syntactically transforming a program thatuses such features into one that doesnât.
This chapter provides a âtoolkitâ that can be used to show thatmany functions can be computed by NAND-CIRC programs, andhence also by Boolean circuits. We will also use this toolkit to provea fundamental theorem: every finite function đ ⶠ0, 1đ â 0, 1đcan be computed by a Boolean circuit, see Theorem 4.13 below. Whilethe syntactic sugar toolkit is important in its own right, Theorem 4.13can also be proven directly without using this toolkit. We present thisalternative proof in Section 4.5. See Fig. 4.1 for an outline of the resultsof this chapter.
This chapter: A non-mathy overviewIn this chapter we will see our first major result: every fi-nite function can be computed some Boolean circuit (seeTheorem 4.13 and Big Idea 5). This is sometimes known as
Compiled on 8.26.2020 18:10
Learning Objectives:âą Get comfortable with syntactic sugar or
automatic translation of higher level logic tolow level gates.
âą Learn proof of major result: every finitefunction can be computed by a Booleancircuit.
âą Start thinking quantitatively about numberof lines required for computation.
160 introduction to theoretical computer science
Figure 4.1: An outline of the results of this chapter. InSection 4.1 we give a toolkit of âsyntactic sugarâ trans-formations showing how to implement features suchas programmer-defined functions and conditionalstatements in NAND-CIRC. We use these tools inSection 4.3 to give a NAND-CIRC program (or alter-natively a Boolean circuit) to compute the LOOKUPfunction. We then build on this result to show in Sec-tion 4.4 that NAND-CIRC programs (or equivalently,Boolean circuits) can compute every finite function.An alternative direct proof of the same result is givenin Section 4.5.
the âuniversalityâ of AND, OR, and NOT (and, using theequivalence of Chapter 3, of NAND as well)Despite being an important result, Theorem 4.13 is actuallynot that hard to prove. Section 4.5 presents a relatively sim-ple direct proof of this result. However, in Section 4.1 andSection 4.3 we derive this result using the concept of âsyntac-tic sugarâ (see Big Idea 4). This is an important concept forprogramming languages theory and practice. The idea be-hind âsyntactic sugarâ is that we can extend a programminglanguage by implementing advanced features from its basiccomponents. For example, we can take the AON-CIRC andNAND-CIRC programming languages we saw in Chapter 3,and extend them to achieve features such as user-definedfunctions (e.g., def Foo(...)), condtional statements (e.g.,if blah ...), and more. Once we have these features, itis not that hard to show that we can take the âtruth tableâ(table of all inputs and outputs) of any function, and use thatto create an AON-CIRC or NAND-CIRC program that mapseach input to its corresponding output.We will also get our first glimpse of quantitative measures inthis chapter. While Theorem 4.13 tells us that every func-tion can be computed by some circuit, the number of gatesin this circuit can be exponentially large. (We are not usinghere âexponentiallyâ as some colloquial term for âvery verybigâ but in a very precise mathematical sense, which alsohappens to coincide with being very very big.) It turns outthat some functions (for example, integer addition and multi-
syntactic sugar, and computing every function 161
plication) can be in fact computed using far fewer gates. Wewill explore this issue of âgate complexityâ more deeply inChapter 5 and following chapters.
4.1 SOME EXAMPLES OF SYNTACTIC SUGAR
We now present some examples of âsyntactic sugarâ transformationsthat we can use in constructing straightline programs or circuits. Wefocus on the straight-line programming language view of our computa-tional models, and specifically (for the sake of concreteness) on theNAND-CIRC programming language. This is convenient becausemany of the syntactic sugar transformations we present are easiest tothink about in terms of applying âsearch and replaceâ operations tothe source code of a program. However, by Theorem 3.19, all of ourresults hold equally well for circuits, whether ones using NAND gatesor Boolean circuits that use the AND, OR, and NOT operations. Enu-merating the examples of such syntactic sugar transformations can bea little tedious, but we do it for two reasons:
1. To convince you that despite their seeming simplicity and limita-tions, simple models such as Boolean circuits or the NAND-CIRCprogramming language are actually quite powerful.
2. So you can realize how lucky you are to be taking a theory of com-putation course and not a compilers course⊠:)
4.1.1 User-defined proceduresOne staple of almost any programming language is the ability todefine and then execute procedures or subroutines. (These are oftenknown as functions in some programming languages, but we preferthe name procedures to avoid confusion with the function that a pro-gram computes.) The NAND-CIRC programming language doesnot have this mechanism built in. However, we can achieve the sameeffect using the time honored technique of âcopy and pasteâ. Specifi-cally, we can replace code which defines a procedure such as
def Proc(a,b):proc_codereturn c
some_codef = Proc(d,e)some_more_code
with the following code where we âpasteâ the code of Proc
162 introduction to theoretical computer science
some_codeproc_code'some_more_code
and where proc_code' is obtained by replacing all occurrencesof a with d, b with e, and c with f. When doing that we will need toensure that all other variables appearing in proc_code' donât interferewith other variables. We can always do so by renaming variables tonew names that were not used before. The above reasoning leads tothe proof of the following theorem:
Theorem 4.1 â Procedure definition synctatic sugar. Let NAND-CIRC-PROC be the programming language NAND-CIRC augmentedwith the syntax above for defining procedures. Then for everyNAND-CIRC-PROC program đ , there exists a standard (i.e.,âsugar freeâ) NAND-CIRC program đ âČ that computes the samefunction as đ .
RRemark 4.2 â No recursive procedure. NAND-CIRC-PROC only allows non recursive procedures. In particu-lar, the code of a procedure Proc cannot call Proc butonly use procedures that were defined before it. With-out this restriction, the above âsearch and replaceâprocedure might never terminate and Theorem 4.1would not be true.
Theorem 4.1 can be proven using the transformation above, butsince the formal proof is somewhat long and tedious, we omit it here.
Example 4.3 â Computing Majority from NAND using syntactic sugar. Pro-cedures allow us to express NAND-CIRC programs much morecleanly and succinctly. For example, because we can computeAND, OR, and NOT using NANDs, we can compute the Majorityfunction as follows:
def NOT(a):return NAND(a,a)
def AND(a,b):temp = NAND(a,b)return NOT(temp)
def OR(a,b):temp1 = NOT(a)temp2 = NOT(b)
syntactic sugar, and computing every function 163
return NAND(temp1,temp2)
def MAJ(a,b,c):and1 = AND(a,b)and2 = AND(a,c)and3 = AND(b,c)or1 = OR(and1,and2)return OR(or1,and3)
print(MAJ(0,1,1))# 1
Fig. 4.2 presents the âsugar freeâ NAND-CIRC program (andthe corresponding circuit) that is obtained by âexpanding outâ thisprogram, replacing the calls to procedures with their definitions.
Big Idea 4 Once we show that a computational model đ is equiv-alent to a model that has feature đ , we can assume we have đ whenshowing that a function đ is computable by đ.
Figure 4.2: A standard (i.e., âsugar freeâ) NAND-CIRC program that is obtained by expanding out theprocedure definitions in the program for Majorityof Example 4.3. The corresponding circuit is onthe right. Note that this is not the most efficientNAND circuit/program for majority: we can save onsome gates by âshort cuttingâ steps where a gate đącomputes NAND(đŁ, đŁ) and then a gate đ€ computesNAND(đą, đą) (as indicated by the dashed greenarrows in the above figure).
RRemark 4.4 â Counting lines. While we can use syn-tactic sugar to present NAND-CIRC programs in morereadable ways, we did not change the definition ofthe language itself. Therefore, whenever we say thatsome function đ has an đ -line NAND-CIRC programwe mean a standard âsugar freeâ NAND-CIRC pro-gram, where all syntactic sugar has been expandedout. For example, the program of Example 4.3 is a12-line program for computing the MAJ function,
164 introduction to theoretical computer science
even though it can be written in fewer lines usingNAND-CIRC-PROC.
4.1.2 Proof by Python (optional)We can write a Python program that implements the proof of Theo-rem 4.1. This is a Python program that takes a NAND-CIRC-PROCprogram đ that includes procedure definitions and uses simpleâsearch and replaceâ to transform đ into a standard (i.e., âsugarfreeâ) NAND-CIRC program đ âČ that computes the same functionas đ without using any procedures. The idea is simple: if the programđ contains a definition of a procedure Proc of two arguments x and y,then whenever we see a line of the form foo = Proc(bar,blah), wecan replace this line by:
1. The body of the procedure Proc (replacing all occurrences of x andy with bar and blah respectively).
2. A line foo = exp, where exp is the expression following the re-turn statement in the definition of the procedure Proc.
To make this more robust we add a prefix to the internal variablesused by Proc to ensure they donât conflict with the variables of đ ;for simplicity we ignore this issue in the code below though it can beeasily added.
The code of the Python function desugar below achieves such atransformation.
Fig. 4.2 shows the result of applying desugar to the program of Ex-ample 4.3 that uses syntactic sugar to compute the Majority function.Specifically, we first apply desugar to remove usage of the OR func-tion, then apply it to remove usage of the AND function, and finallyapply it a third time to remove usage of the NOT function.
RRemark 4.5 â Parsing function definitions (optional). Thefunction desugar in Fig. 4.3 assumes that it is giventhe procedure already split up into its name, argu-ments, and body. It is not crucial for our purposes todescribe precisely how to scan a definition and split itup into these components, but in case you are curious,it can be achieved in Python via the following code:
def parse_func(code):"""Parse a function definition into name,
arguments and body"""lines = [l.strip() for l in code.split('\n')]regexp = r'def\s+([a-zA-Z\_0-9]+)\(([\sa-zA-
Z0-9\_,]+)\)\s*:\s*'
syntactic sugar, and computing every function 165
Figure 4.3: Python code for transforming NAND-CIRC-PROC programs into standard sugar free NAND-CIRC programs.
def desugar(code, func_name, func_args,func_body):"""Replaces all occurences of
foo = func_name(func_args)with
func_body[x->a,y->b]foo = [result returned in func_body]
"""# Uses Python regular expressions to simplify the search and replace,# see https://docs.python.org/3/library/re.html and Chapter 9 of the book
# regular expression for capturing a list of variable names separated by commasarglist = ",".join([r"([a-zA-Z0-9\_\[\]]+)" for i in range(len(func_args))])# regular expression for capturing a statement of the form# "variable = func_name(arguments)"regexp = fr'([a-zA-Z0-9\_\[\]]+)\s*=\s*func_name\(arglist\)\s*$'while True:
m = re.search(regexp, code, re.MULTILINE)if not m: breaknewcode = func_body# replace function arguments by the variables from the function invocationfor i in range(len(func_args)):
newcode = newcode.replace(func_args[i], m.group(i+2))# Splice the new code insidenewcode = newcode.replace('return', m.group(1) + " = ")code = code[:m.start()] + newcode + code[m.end()+1:]
return code
166 introduction to theoretical computer science
m = re.match(regexp,lines[0])return m.group(1), m.group(2).split(','),
'\n'.join(lines[1:])
4.1.3 Conditional statementsAnother sorely missing feature in NAND-CIRC is a conditionalstatement such as the if/then constructs that are found in manyprogramming languages. However, using procedures, we can ob-tain an ersatz if/then construct. First we can compute the functionIF ⶠ0, 13 â 0, 1 such that IF(đ, đ, đ) equals đ if đ = 1 and đ if đ = 0.
PBefore reading onward, try to see how you could com-pute the IF function using NANDâs. Once you do that,see how you can use that to emulate if/then types ofconstructs.
The IF function can be implemented from NANDs as follows (seeExercise 4.2):
def IF(cond,a,b):notcond = NAND(cond,cond)temp = NAND(b,notcond)temp1 = NAND(a,cond)return NAND(temp,temp1)
The IF function is also known as a multiplexing function, since đđđđcan be thought of as a switch that controls whether the output is con-nected to đ or đ. Once we have a procedure for computing the IF func-tion, we can implement conditionals in NAND. The idea is that wereplace code of the form
if (condition): assign blah to variable foo
with code of the form
foo = IF(condition, blah, foo)
that assigns to foo its old value when condition equals 0, andassign to foo the value of blah otherwise. More generally we canreplace code of the form
if (cond):a = ...b = ...c = ...
syntactic sugar, and computing every function 167
with code of the form
temp_a = ...temp_b = ...temp_c = ...a = IF(cond,temp_a,a)b = IF(cond,temp_b,b)c = IF(cond,temp_c,c)
Using such transformations, we can prove the following theorem.Once again we omit the (not too insightful) full formal proof, thoughsee Section 4.1.2 for some hints on how to obtain it.
Theorem 4.6 â Conditional statements synctatic sugar. Let NAND-CIRC-IF be the programming language NAND-CIRC augmented withif/then/else statements for allowing code to be conditionallyexecuted based on whether a variable is equal to 0 or 1.
Then for every NAND-CIRC-IF program đ , there exists a stan-dard (i.e., âsugar freeâ) NAND-CIRC program đ âČ that computesthe same function as đ .
4.2 EXTENDED EXAMPLE: ADDITION AND MULTIPLICATION (OP-TIONAL)
Using âsyntactic sugarâ, we can write the integer addition function asfollows:
# Add two n-bit integers# Use LSB first notation for simplicitydef ADD(A,B):
Result = [0]*(n+1)Carry = [0]*(n+1)Carry[0] = zero(A[0])for i in range(n):
Result[i] = XOR(Carry[i],XOR(A[i],B[i]))Carry[i+1] = MAJ(Carry[i],A[i],B[i])
Result[n] = Carry[n]return Result
ADD([1,1,1,0,0],[1,0,0,0,0]);;# [0, 0, 0, 1, 0, 0]
where zero is the constant zero function, and MAJ and XOR corre-spond to the majority and XOR functions respectively. While we usePython syntax for convenience, in this example đ is some fixed integerand so for every such đ, ADD is a finite function that takes as input 2đ
168 introduction to theoretical computer science
1 The value of đ can be improved to 9, see Exercise 4.5.
Figure 4.5: The number of lines in our NAND-CIRCprogram to add two đ bit numbers, as a function ofđ, for đâs between 1 and 100. This is not the mostefficient program for this task, but the important pointis that it has the form đ(đ).
bits and outputs đ + 1 bits. In particular for every đ we can removethe loop construct for i in range(n) by simply repeating the code đtimes, replacing the value of i with 0, 1, 2, ⊠, đ â 1. By expanding outall the features, for every value of đ we can translate the above pro-gram into a standard (âsugar freeâ) NAND-CIRC program. Fig. 4.4depicts what we get for đ = 2.
Figure 4.4: The NAND-CIRC program and corre-sponding NAND circuit for adding two-digit binarynumbers that are obtained by âexpanding outâ all thesyntactic sugar. The program/circuit has 43 lines/-gates which is by no means necessary. It is possibleto add đ bit numbers using 9đ NAND gates, seeExercise 4.5.
By going through the above program carefully and accounting forthe number of gates, we can see that it yields a proof of the followingtheorem (see also Fig. 4.5):
Theorem 4.7 â Addition using NAND-CIRC programs. For every đ â â,let ADDđ ⶠ0, 12đ â 0, 1đ+1 be the function that, givenđ„, đ„âČ â 0, 1đ computes the representation of the sum of the num-bers that đ„ and đ„âČ represent. Then there is a constant đ †30 suchthat for every đ there is a NAND-CIRC program of at most đđ linescomputing ADDđ. 1
Once we have addition, we can use the grade-school algorithm toobtain multiplication as well, thus obtaining the following theorem:
Theorem 4.8 â Multiplication using NAND-CIRC programs. For every đ,let MULTđ ⶠ0, 12đ â 0, 12đ be the function that, givenđ„, đ„âČ â 0, 1đ computes the representation of the product of thenumbers that đ„ and đ„âČ represent. Then there is a constant đ suchthat for every đ, there is a NAND-CIRC program of at most đđ2that computes the function MULTđ.We omit the proof, though in Exercise 4.7 we ask you to supply
a âconstructive proofâ in the form of a program (in your favorite
syntactic sugar, and computing every function 169
programming language) that on input a number đ, outputs the codeof a NAND-CIRC program of at most 1000đ2 lines that computes theMULTđ function. In fact, we can use Karatsubaâs algorithm to showthat there is a NAND-CIRC program of đ(đlog2 3) lines to computeMULTđ (and can get even further asymptotic improvements usingbetter algorithms).
4.3 THE LOOKUP FUNCTION
The LOOKUP function will play an important role in this chapter andlater. It is defined as follows:
Definition 4.9 â Lookup function. For every đ, the lookup function oforder đ, LOOKUPđ ⶠ0, 12đ+đ â 0, 1 is defined as follows: Forevery đ„ â 0, 12đ and đ â 0, 1đ,
LOOKUPđ(đ„, đ) = đ„đ (4.1)
where đ„đ denotes the đđĄâ entry of đ„, using the binary representationto identify đ with a number in 0, ⊠, 2đ â 1.
Figure 4.6: The LOOKUPđ function takes an inputin 0, 12đ+đ, which we denote by đ„, đ (with đ„ â0, 12đ and đ â 0, 1đ). The output is đ„đ: the đ-thcoordinate of đ„, where we identify đ as a numberin [đ] using the binary representation. In the aboveexample đ„ â 0, 116 and đ â 0, 14. Since đ = 0110is the binary representation of the number 6, theoutput of LOOKUP4(đ„, đ) in this case is đ„6 = 1.
See Fig. 4.6 for an illustration of the LOOKUP function. It turnsout that for every đ, we can compute LOOKUPđ using a NAND-CIRCprogram:
Theorem 4.10 â Lookup function. For every đ > 0, there is a NAND-CIRC program that computes the function LOOKUPđ ⶠ0, 12đ+đ â0, 1. Moreover, the number of lines in this program is at most4 â 2đ.An immediate corollary of Theorem 4.10 is that for every đ > 0,
LOOKUPđ can be computed by a Boolean circuit (with AND, OR andNOT gates) of at most 8 â 2đ gates.
4.3.1 Constructing a NAND-CIRC program for LOOKUPWe prove Theorem 4.10 by induction. For the case đ = 1, LOOKUP1maps (đ„0, đ„1, đ) â 0, 13 to đ„đ. In other words, if đ = 0 then it outputs
170 introduction to theoretical computer science
đ„0 and otherwise it outputs đ„1, which (up to reordering variables) isthe same as the IF function presented in Section 4.1.3, which can becomputed by a 4-line NAND-CIRC program.
As a warm-up for the case of general đ, let us consider the caseof đ = 2. Given input đ„ = (đ„0, đ„1, đ„2, đ„3) for LOOKUP2 and anindex đ = (đ0, đ1), if the most significant bit đ0 of the index is 0 thenLOOKUP2(đ„, đ) will equal đ„0 if đ1 = 0 and equal đ„1 if đ1 = 1. Similarly,if the most significant bit đ0 is 1 then LOOKUP2(đ„, đ) will equal đ„2 ifđ1 = 0 and will equal đ„3 if đ1 = 1. Another way to say this is that wecan write LOOKUP2 as follows:
def LOOKUP2(X[0],X[1],X[2],X[3],i[0],i[1]):if i[0]==1:
return LOOKUP1(X[2],X[3],i[1])else:
return LOOKUP1(X[0],X[1],i[1])
or in other words,
def LOOKUP2(X[0],X[1],X[2],X[3],i[0],i[1]):a = LOOKUP1(X[2],X[3],i[1])b = LOOKUP1(X[0],X[1],i[1])return IF( i[0],a,b)
More generally, as shown in the following lemma, we can computeLOOKUPđ using two invocations of LOOKUPđâ1 and one invocationof IF:Lemma 4.11 â Lookup recursion. For every đ â„ 2, LOOKUPđ(đ„0, ⊠, đ„2đâ1, đ0, ⊠, đđâ1)is equal to
IF (đ0,LOOKUPđâ1(đ„2đâ1 , ⊠, đ„2đâ1, đ1, ⊠, đđâ1),LOOKUPđâ1(đ„0, ⊠, đ„2đâ1â1, đ1, ⊠, đđâ1))(4.2)
Proof. If the most significant bit đ0 of đ is zero, then the index đ isin 0, ⊠, 2đâ1 â 1 and hence we can perform the lookup on theâfirst halfâ of đ„ and the result of LOOKUPđ(đ„, đ) will be the same asđ = LOOKUPđâ1(đ„0, ⊠, đ„2đâ1â1, đ1, ⊠, đđâ1). On the other hand, if thismost significant bit đ0 is equal to 1, then the index is in 2đâ1, ⊠, 2đ â1, in which case the result of LOOKUPđ(đ„, đ) is the same as đ =LOOKUPđâ1(đ„2đâ1 , ⊠, đ„2đâ1, đ1, ⊠, đđâ1). Thus we can computeLOOKUPđ(đ„, đ) by first computing đ and đ and then outputtingIF(đđâ1, đ, đ).
Proof of Theorem 4.10 from Lemma 4.11. Now that we have Lemma 4.11,we can complete the proof of Theorem 4.10. We will prove by induc-tion on đ that there is a NAND-CIRC program of at most 4 â (2đ â 1)
syntactic sugar, and computing every function 171
Figure 4.7: The number of lines in our implementationof the LOOKUP_k function as a function of đ (i.e., thelength of the index). The number of lines in ourimplementation is roughly 3 â 2đ.
lines for LOOKUPđ. For đ = 1 this follows by the four line program forIF weâve seen before. For đ > 1, we use the following pseudocode:
a = LOOKUP_(k-1)(X[0],...,X[2^(k-1)-1],i[1],...,i[k-1])b = LOOKUP_(k-1)(X[2^(k-1)],...,Z[2^(k-1)],i[1],...,i[k-
1])return IF(i[0],b,a)
If we let đż(đ) be the number of lines required for LOOKUPđ, thenthe above pseudo-code shows thatđż(đ) †2đż(đ â 1) + 4 . (4.3)
Since under our induction hypothesis đż(đ â 1) †4(2đâ1 â 1), we getthat đż(đ) †2 â 4(2đâ1 â 1) + 4 = 4(2đ â 1) which is what we wantedto prove. See Fig. 4.7 for a plot of the actual number of lines in ourimplementation of LOOKUPđ.4.4 COMPUTING EVERY FUNCTION
At this point we know the following facts about NAND-CIRC pro-grams (and so equivalently about Boolean circuits and our otherequivalent models):
1. They can compute at least some non trivial functions.
2. Coming up with NAND-CIRC programs for various functions is avery tedious task.
Thus I would not blame the reader if they were not particularlylooking forward to a long sequence of examples of functions that canbe computed by NAND-CIRC programs. However, it turns out we arenot going to need this, as we can show in one fell swoop that NAND-CIRC programs can compute every finite function:
Theorem 4.12 â Universality of NAND. There exists some constant đ > 0such that for every đ, đ > 0 and function đ ⶠ0, 1đ â 0, 1đ,there is a NAND-CIRC program with at most đâ đ2đ lines that com-putes the function đ .
By Theorem 3.19, the models of NAND circuits, NAND-CIRC pro-grams, AON-CIRC programs, and Boolean circuits, are all equivalentto one another, and hence Theorem 4.12 holds for all these models. Inparticular, the following theorem is equivalent to Theorem 4.12:
Theorem 4.13 â Universality of Boolean circuits. There exists someconstant đ > 0 such that for every đ, đ > 0 and function
172 introduction to theoretical computer science
2 In case you are curious, this is the function on inputđ â 0, 14 (which we interpret as a number in [16]),that outputs the đ-th digit of đ in the binary basis.
đ ⶠ0, 1đ â 0, 1đ, there is a Boolean circuit with at mostđ â đ2đ gates that computes the function đ .
Big Idea 5 Every finite function can be computed by a largeenough Boolean circuit.
Improved bounds. Though it will not be of great importance to us, itis possible to improve on the proof of Theorem 4.12 and shave an extrafactor of đ, as well as optimize the constant đ, and so prove that forevery đ > 0, đ â â and sufficiently large đ, if đ ⶠ0, 1đ â 0, 1đthen đ can be computed by a NAND circuit of at most (1 + đ) đâ 2đđgates. The proof of this result is beyond the scope of this book, but wedo discuss how to obtain a bound of the form đ( đâ 2đđ ) in Section 4.4.2;see also the biographical notes.
4.4.1 Proof of NANDâs UniversalityTo prove Theorem 4.12, we need to give a NAND circuit, or equiva-lently a NAND-CIRC program, for every possible function. We willrestrict our attention to the case of Boolean functions (i.e., đ = 1).Exercise 4.9 asks you to extend the proof for all values of đ. A func-tion đč ⶠ0, 1đ â 0, 1 can be specified by a table of its values foreach one of the 2đ inputs. For example, the table below describes oneparticular function đș ⶠ0, 14 â 0, 1:2
Table 4.1: An example of a function đș ⶠ0, 14 â 0, 1.Input (đ„) Output (đș(đ„))0000 10001 10010 00011 00100 10101 00110 00111 11000 01001 01010 01011 01100 11101 11110 11111 1
syntactic sugar, and computing every function 173
For every đ„ â 0, 14, đș(đ„) = LOOKUP4(1100100100001111, đ„), andso the following is NAND-CIRC âpseudocodeâ to compute đș usingsyntactic sugar for the LOOKUP_4 procedure.
G0000 = 1G1000 = 1G0100 = 0...G0111 = 1G1111 = 1Y[0] = LOOKUP_4(G0000,G1000,...,G1111,
X[0],X[1],X[2],X[3])
We can translate this pseudocode into an actual NAND-CIRC pro-gram by adding three lines to define variables zero and one that areinitialized to 0 and 1 respectively, and then replacing a statement suchas Gxxx = 0 with Gxxx = NAND(one,one) and a statement such asGxxx = 1 with Gxxx = NAND(zero,zero). The call to LOOKUP_4 willbe replaced by the NAND-CIRC program that computes LOOKUP4,plugging in the appropriate inputs.
There was nothing about the above reasoning that was particular tothe function đș above. Given every function đč ⶠ0, 1đ â 0, 1, wecan write a NAND-CIRC program that does the following:
1. Initialize 2đ variables of the form F00...0 till F11...1 so that forevery đ§ â 0, 1đ, the variable corresponding to đ§ is assigned thevalue đč(đ§).
2. Compute LOOKUPđ on the 2đ variables initialized in the previ-ous step, with the index variable being the input variables X[0],âŠ,X[2đ â 1 ]. That is, just like in the pseudocode for G above, weuse Y[0] = LOOKUP(F00..00,...,F11..1,X[0],..,x[đ â 1])The total number of lines in the resulting program is 3 + 2đ lines for
initializing the variables plus the 4 â 2đ lines that we pay for computingLOOKUPđ. This completes the proof of Theorem 4.12.
RRemark 4.14 â Result in perspective. While Theo-rem 4.12 seems striking at first, in retrospect, it isperhaps not that surprising that every finite functioncan be computed with a NAND-CIRC program. Afterall, a finite function đč ⶠ0, 1đ â 0, 1đ can berepresented by simply the list of its outputs for eachone of the 2đ input values. So it makes sense that wecould write a NAND-CIRC program of similar sizeto compute it. What is more interesting is that somefunctions, such as addition and multiplication, have
174 introduction to theoretical computer science
3 The constant đ in this theorem is at most 10 and infact can be arbitrarily close to 1, see Section 4.8.
Figure 4.8: We can compute đ ⶠ0, 1đ â 0, 1 oninput đ„ = đđ where đ â 0, 1đ and đ â 0, 1đâđby first computing the 2đâđ long string đ(đ) thatcorresponds to all đâs values on inputs that begin withđ, and then outputting the đ-th coordinate of thisstring.
a much more efficient representation: one that onlyrequires đ(đ2) or even fewer lines.
4.4.2 Improving by a factor of đ (optional)By being a little more careful, we can improve the bound of Theo-rem 4.12 and show that every function đč ⶠ0, 1đ â 0, 1đ can becomputed by a NAND-CIRC program of at most đ(đ2đ/đ) lines. Inother words, we can prove the following improved version:
Theorem 4.15 â Universality of NAND circuits, improved bound. There ex-ists a constant đ > 0 such that for every đ, đ > 0 and functionđ ⶠ0, 1đ â 0, 1đ, there is a NAND-CIRC program with at mostđ â đ2đ/đ lines that computes the function đ . 3
Proof. As before, it is enough to prove the case that đ = 1. Hencewe let đ ⶠ0, 1đ â 0, 1, and our goal is to prove that there existsa NAND-CIRC program of đ(2đ/đ) lines (or equivalently a Booleancircuit of đ(2đ/đ) gates) that computes đ .
We let đ = log(đ â 2 logđ) (the reasoning behind this choice willbecome clear later on). We define the function đ ⶠ0, 1đ â 0, 12đâđas follows: đ(đ) = đ(đ0đâđ)đ(đ0đâđâ11) ⯠đ(đ1đâđ) . (4.4)
In other words, if we use the usual binary representation to identifythe numbers 0, ⊠, 2đâđ â 1 with the strings 0, 1đâđ, then for everyđ â 0, 1đ and đ â 0, 1đâđđ(đ)đ = đ(đđ) . (4.5)
(4.5) means that for every đ„ â 0, 1đ, if we write đ„ = đđ withđ â 0, 1đ and đ â 0, 1đâđ then we can compute đ(đ„) by firstcomputing the string đ = đ(đ) of length 2đâđ, and then computingLOOKUPđâđ(đ , đ) to retrieve the element of đ at the position cor-responding to đ (see Fig. 4.8). The cost to compute the LOOKUPđâđis đ(2đâđ) lines/gates and the cost in NAND-CIRC lines (or Booleangates) to compute đ is at mostđđđ đĄ(đ) + đ(2đâđ) , (4.6)
where đđđ đĄ(đ) is the number of operations (i.e., lines of NAND-CIRCprograms or gates in a circuit) needed to compute đ.
To complete the proof we need to give a bound on đđđ đĄ(đ). Since đis a function mapping 0, 1đ to 0, 12đâđ , we can also think of it as acollection of 2đâđ functions đ0, ⊠, đ2đâđâ1 ⶠ0, 1đ â 0, 1, where
syntactic sugar, and computing every function 175
Figure 4.9: If đ0, ⊠, đđâ1 is a collection of functionseach mapping 0, 1đ to 0, 1 such that at most đof them are distinct then for every đ â 0, 1đ, wecan compute all the values đ0(đ), ⊠, đđâ1(đ) usingat most đ(đ â 2đ + đ) operations by first computingthe distinct functions and then copying the resultingvalues.
đđ(đ„) = đ(đ)đ for every đ â 0, 1đ and đ â [2đâđ]. (That is, đđ(đ) isthe đ-th bit of đ(đ).) Naively, we could use Theorem 4.12 to computeeach đđ in đ(2đ) lines, but then the total cost is đ(2đâđ â 2đ) = đ(2đ)which does not save us anything. However, the crucial observationis that there are only 22đ distinct functions mapping 0, 1đ to 0, 1.For example, if đ17 is an identical function to đ67 that means that ifwe already computed đ17(đ) then we can compute đ67(đ) using onlya constant number of operations: simply copy the same value! Ingeneral, if you have a collection of đ functions đ0, ⊠, đđâ1 mapping0, 1đ to 0, 1, of which at most đ are distinct then for every valueđ â 0, 1đ we can compute the đ values đ0(đ), ⊠, đđâ1(đ) using atmost đ(đ â 2đ + đ) operations (see Fig. 4.9).
In our case, because there are at most 22đ distinct functions map-ping 0, 1đ to 0, 1, we can compute the function đ (and hence by(4.5) also đ) using at mostđ(22đ â 2đ + 2đâđ) (4.7)operations. Now all that is left is to plug into (4.7) our choice of đ =log(đ â 2 logđ). By definition, 2đ = đ â 2 logđ, which means that (4.7)can be boundedđ (2đâ2 logđ â (đ â 2 logđ) + 2đâlog(đâ2 logđ)) †(4.8)
đ ( 2đđ2 â đ + 2đđâ2 logđ ) †đ ( 2đđ + 2đ0.5đ ) = đ ( 2đđ ) (4.9)which is what we wanted to prove. (We used above the fact that đ â2 logđ â„ 0.5 logđ for sufficiently large đ.)
Using the connection between NAND-CIRC programs and Booleancircuits, an immediate corollary of Theorem 4.15 is the followingimprovement to Theorem 4.13:
Theorem 4.16 â Universality of Boolean circuits, improved bound. Thereexists some constant đ > 0 such that for every đ, đ > 0 and func-tion đ ⶠ0, 1đ â 0, 1đ, there is a Boolean circuit with at mostđ â đ2đ/đ gates that computes the function đ .
4.5 COMPUTING EVERY FUNCTION: AN ALTERNATIVE PROOF
Theorem 4.13 is a fundamental result in the theory (and practice!) ofcomputation. In this section we present an alternative proof of thisbasic fact that Boolean circuits can compute every finite function. Thisalternative proof gives a somewhat worse quantitative bound on thenumber of gates but it has the advantage of being simpler, working
176 introduction to theoretical computer science
directly with circuits and avoiding the usage of all the syntactic sugarmachinery. (However, that machinery is useful in its own right, andwill find other applications later on.)
Theorem 4.17 â Universality of Boolean circuits (alternative phrasing). Thereexists some constant đ > 0 such that for every đ, đ > 0 and func-tion đ ⶠ0, 1đ â 0, 1đ, there is a Boolean circuit with at mostđ â đ â đ2đ gates that computes the function đ .
Figure 4.10: Given a function đ ⶠ0, 1đ â 0, 1,we let đ„0, đ„1, ⊠, đ„đâ1 â 0, 1đ be the set ofinputs such that đ(đ„đ) = 1, and note that đ †2đ.We can express đ as the OR of đżđ„đ for đ â [đ] wherethe function đżđŒ ⶠ0, 1đ â 0, 1 (for đŒ â 0, 1đ)is defined as follows: đżđŒ(đ„) = 1 iff đ„ = đŒ. We cancompute the OR of đ values using đ two-input ORgates. Therefore if we have a circuit of size đ(đ) tocompute đżđŒ for every đŒ â 0, 1đ, we can compute đusing a circuit of size đ(đ â đ) = đ(đ â 2đ).
Proof Idea:
The idea of the proof is illustrated in Fig. 4.10. As before, it isenough to focus on the case that đ = 1 (the function đ has a sin-gle output), since we can always extend this to the case of đ > 1by looking at the composition of đ circuits each computing a differ-ent output bit of the function đ . We start by showing that for everyđŒ â 0, 1đ, there is an đ(đ) sized circuit that computes the functionđżđŒ ⶠ0, 1đ â 0, 1 defined as follows: đżđŒ(đ„) = 1 iff đ„ = đŒ (that is,đżđŒ outputs 0 on all inputs except the input đŒ). We can then write anyfunction đ ⶠ0, 1đ â 0, 1 as the OR of at most 2đ functions đżđŒ forthe đŒâs on which đ(đŒ) = 1.âProof of Theorem 4.17. We prove the theorem for the case đ = 1. Theresult can be extended for đ > 1 as before (see also Exercise 4.9). Letđ ⶠ0, 1đ â 0, 1. We will prove that there is an đ(đ â 2đ)-sizedBoolean circuit to compute đ in the following steps:
syntactic sugar, and computing every function 177
Figure 4.11: For every string đŒ â 0, 1đ, there is aBoolean circuit of đ(đ) gates to compute the functionđżđŒ ⶠ0, 1đ â 0, 1 such that đżđŒ(đ„) = 1 if andonly if đ„ = đŒ. The circuit is very simple. Given inputđ„0, ⊠, đ„đâ1 we compute the AND of đ§0, ⊠, đ§đâ1where đ§đ = đ„đ if đŒđ = 1 and đ§đ = NOT(đ„đ) if đŒđ = 0.While formally Boolean circuits only have a gate forcomputing the AND of two inputs, we can implementan AND of đ inputs by composing đ two-inputANDs.
1. We show that for every đŒ â 0, 1đ, there is an đ(đ) sized circuitthat computes the function đżđŒ ⶠ0, 1đ â 0, 1, where đżđŒ(đ„) = 1 iffđ„ = đŒ.
2. We then show that this implies the existence of an đ(đ â 2đ)-sizedcircuit that computes đ , by writing đ(đ„) as the OR of đżđŒ(đ„) for allđŒ â 0, 1đ such that đ(đŒ) = 1.We start with Step 1:CLAIM: For đŒ â 0, 1đ, define đżđŒ ⶠ0, 1đ as follows:đżđŒ(đ„) = â§âšâ©1 đ„ = đŒ0 otherwise
. (4.10)
then there is a Boolean circuit using at most 2đ gates that computes đżđŒ.PROOF OF CLAIM: The proof is illustrated in Fig. 4.11. As an
example, consider the function đż011 ⶠ0, 13 â 0, 1. This functionoutputs 1 on đ„ if and only if đ„0 = 0, đ„1 = 1 and đ„2 = 1, and so we canwrite đż011(đ„) = đ„0 ⧠đ„1 ⧠đ„2, which translates into a Boolean circuitwith one NOT gate and two AND gates. More generally, for everyđŒ â 0, 1đ, we can express đżđŒ(đ„) as (đ„0 = đŒ0)â§(đ„1 = đŒ1)â§âŻâ§(đ„đâ1 =đŒđâ1), where if đŒđ = 0 we replace đ„đ = đŒđ with đ„đ and if đŒđ = 1 wereplace đ„đ = đŒđ by simply đ„đ. This yields a circuit that computes đżđŒusing đ AND gates and at most đ NOT gates, so a total of at most 2đgates.
Now for every function đ ⶠ0, 1đ â 0, 1, we can writeđ(đ„) = đżđ„0(đ„) âš đżđ„1(đ„) ⚠⯠⚠đżđ„đâ1(đ„) (4.11)where đ = đ„0, ⊠, đ„đâ1 is the set of inputs on which đ outputs 1.
(To see this, you can verify that the right-hand side of (4.11) evaluatesto 1 on đ„ â 0, 1đ if and only if đ„ is in the set đ.)
Therefore we can compute đ using a Boolean circuit of at most 2đgates for each of the đ functions đżđ„đ and combine that with at most đOR gates, thus obtaining a circuit of at most 2đ â đ + đ gates. Sinceđ â 0, 1đ, its size đ is at most 2đ and hence the total number ofgates in this circuit is đ(đ â 2đ).
4.6 THE CLASS SIZE(đ )We have seen that every function đ ⶠ0, 1đ â 0, 1đ can be com-puted by a circuit of size đ(đ â 2đ), and some functions (such as ad-dition and multiplication) can be computed by much smaller circuits.We define SIZE(đ ) to be the set of functions that can be computed byNAND circuits of at most đ gates (or equivalently, by NAND-CIRCprograms of at most đ lines). Formally, the definition is as follows:
178 introduction to theoretical computer science
4 The restriction that đ, đ †2đ makes no difference;see Exercise 3.11.
Definition 4.18 â Size class of functions. For every đ, đ â 1, ⊠, 2đ ,we let set SIZEđ,đ(đ ) denotes the set of all functions đ ⶠ0, 1đ â0, 1đ such that đ â SIZE(đ ). 4 We denote by SIZEđ(đ ) the setSIZEđ,1(đ ). For every integer đ â„ 1, we let SIZE(đ ) = âȘđ,đâ€2đ SIZEđ,đ(đ )be the set of all functions đ for which there exists a NAND circuitof at most đ gates that compute đ .Fig. 4.12 depicts the set SIZEđ,1(đ ). Note that SIZEđ,đ(đ ) is a set of
functions, not of programs! (Asking if a program or a circuit is a mem-ber of SIZEđ,đ(đ ) is a category error as in the sense of Fig. 4.13.) Aswe discussed in Section 3.6.2 (and Section 2.6.1), the distinction be-tween programs and functions is absolutely crucial. You should alwaysremember that while a program computes a function, it is not equal toa function. In particular, as weâve seen, there can be more than oneprogram to compute the same function.
Figure 4.12: There are 22đ functions mapping 0, 1đto 0, 1, and an infinite number of circuits with đ bitinputs and a single bit of output. Every circuit com-putes one function, but every function can be com-puted by many circuits. We say that đ â SIZEđ,1(đ )if the smallest circuit that computes đ has đ or fewergates. For example XORđ â SIZEđ,1(4đ). Theo-rem 4.12 shows that every function đ is computableby some circuit of at most đ â 2đ/đ gates, and henceSIZEđ,1(đ â 2đ/đ) corresponds to the set of all func-tions from 0, 1đ to 0, 1.
While we defined SIZE(đ ) with respect to NAND gates, wewould get essentially the same class if we defined it with respect toAND/OR/NOT gates:Lemma 4.19 Let SIZEđŽđđđ,đ (đ ) denote the set of all functions đ ⶠ0, 1đ â0, 1đ that can be computed by an AND/OR/NOT Boolean circuit ofat most đ gates. Then,
SIZEđ,đ(đ /2) â SIZEđŽđđđ,đ (đ ) â SIZEđ,đ(3đ ) (4.12)
Proof. If đ can be computed by a NAND circuit of at most đ /2 gates,then by replacing each NAND with the two gates NOT and AND, wecan obtain an AND/OR/NOT Boolean circuit of at most đ gates thatcomputes đ . On the other hand, if đ can be computed by a Boolean
syntactic sugar, and computing every function 179
Figure 4.13: A âcategory errorâ is a question such asâis a cucumber even or odd?â which does not evenmake sense. In this book one type of category erroryou should watch out for is confusing functions andprograms (i.e., confusing specifications and implemen-tations). If đ¶ is a circuit or program, then asking ifđ¶ â SIZEđ,1(đ ) is a category error, since SIZEđ,1(đ ) isa set of functions and not programs or circuits.
AND/OR/NOT circuit of at most đ gates, then by Theorem 3.12 it canbe computed by a NAND circuit of at most 3đ gates.
The results we have seen in this chapter can be phrased as showingthat ADDđ â SIZE2đ,đ+1(100đ) and MULTđ â SIZE2đ,2đ(10000đlog2 3).Theorem 4.12 shows that for some constant đ, SIZEđ,đ(đđ2đ) is equalto the set of all functions from 0, 1đ to 0, 1đ.
RRemark 4.20 â Finite vs infinite functions. A NAND-CIRC program đ can only compute a function witha certain number đ of inputs and a certain numberđ of outputs. Hence, for example, there is no singleNAND-CIRC program that can compute the incre-ment function INC ⶠ0, 1â â 0, 1â that maps astring đ„ (which we identify with a number via thebinary representation) to the string that representsđ„ + 1. Rather for every đ > 0, there is a NAND-CIRCprogram đđ that computes the restriction INCđ ofthe function INC to inputs of length đ. Since it can beshown that for every đ > 0 such a program đđ existsof length at most 10đ, INCđ â SIZE(10đ) for everyđ > 0.If đ ⶠâ â â and đč ⶠ0, 1â â 0, 1â, we willwrite đč â SIZEâ(đ (đ)) (or sometimes slightly abusenotation and write simply đč â SIZE(đ (đ))) to in-dicate that for every đ the restriction đčđ of đč toinputs in 0, 1đ is in SIZE(đ (đ)). Hence we can writeINC â SIZEâ(10đ). We will come back to this issue offinite vs infinite functions later in this course.
Solved Exercise 4.1 â đđŒđđž closed under complement.. In this exercise weprove a certain âclosure propertyâ of the class SIZEđ(đ ). That is, weshow that if đ is in this class then (up to some small additive term) sois the complement of đ , which is the function đ(đ„) = 1 â đ(đ„).
Prove that there is a constant đ such that for every đ ⶠ0, 1đ â0, 1 and đ â â, if đ â SIZEđ(đ ) then 1 â đ â SIZEđ(đ + đ).
Solution:
If đ â SIZE(đ ) then there is an đ -line NAND-CIRC programđ that computes đ . We can rename the variable Y[0] in đ to avariable temp and add the line
Y[0] = NAND(temp,temp)
at the very end to obtain a program đ âČ that computes 1 â đ .
180 introduction to theoretical computer science
Chapter Recap
âą We can define the notion of computing a functionvia a simplified âprogramming languageâ, wherecomputing a function đč in đ steps would corre-spond to having a đ -line NAND-CIRC programthat computes đč .
âą While the NAND-CIRC programming only has oneoperation, other operations such as functions andconditional execution can be implemented using it.
âą Every function đ ⶠ0, 1đ â 0, 1đ can be com-puted by a circuit of at most đ(đ2đ) gates (and infact at most đ(đ2đ/đ) gates).
âą Sometimes (or maybe always?) we can translate anefficient algorithm to compute đ into a circuit thatcomputes đ with a number of gates comparable tothe number of steps in this algorithm.
4.7 EXERCISES
Exercise 4.1 â Pairing. This exercise asks you to give a one-to-one mapfrom â2 to â. This can be useful to implement two-dimensional arraysas âsyntactic sugarâ in programming languages that only have one-dimensional array.
1. Prove that the map đč(đ„, đŠ) = 2đ„3đŠ is a one-to-one map from â2 toâ.
2. Show that there is a one-to-one map đč ⶠâ2 â â such that for everyđ„, đŠ, đč(đ„, đŠ) †100 â maxđ„, đŠ2 + 100.3. For every đ, show that there is a one-to-one map đč ⶠâđ â â such
that for every đ„0, ⊠, đ„đâ1 â â, đč(đ„0, ⊠, đ„đâ1) †100 â (đ„0 + đ„1 + ⊠+đ„đâ1 + 100đ)đ.
Exercise 4.2 â Computing MUX. Prove that the NAND-CIRC program be-low computes the function MUX (or LOOKUP1) where MUX(đ, đ, đ)equals đ if đ = 0 and equals đ if đ = 1:t = NAND(X[2],X[2])u = NAND(X[0],t)v = NAND(X[1],X[2])Y[0] = NAND(u,v)
Exercise 4.3 â At least two / Majority. Give a NAND-CIRC program of atmost 6 lines to compute the function MAJ ⶠ0, 13 â 0, 1 whereMAJ(đ, đ, đ) = 1 iff đ + đ + đ â„ 2.
syntactic sugar, and computing every function 181
5 You can start by transforming đ into a NAND-CIRC-PROC program that uses procedure statements, andthen use the code of Fig. 4.3 to transform the latterinto a âsugar freeâ NAND-CIRC program.
6 Use a âcascadeâ of adding the bits one after theother, starting with the least significant digit, just likein the elementary-school algorithm.
Exercise 4.4 â Conditional statements. In this exercise we will explore The-orem 4.6: transforming NAND-CIRC-IF programs that use code suchas if .. then .. else .. to standard NAND-CIRC programs.
1. Give a âproof by codeâ of Theorem 4.6: a program in a program-ming language of your choice that transforms a NAND-CIRC-IFprogram đ into a âsugar freeâ NAND-CIRC program đ âČ that com-putes the same function. See footnote for hint.5
2. Prove the following statement, which is the heart of Theorem 4.6:suppose that there exists an đ -line NAND-CIRC program to com-pute đ ⶠ0, 1đ â 0, 1 and an đ âČ-line NAND-CIRC programto compute đ ⶠ0, 1đ â 0, 1. Prove that there exist a NAND-CIRC program of at most đ + đ âČ + 10 lines to compute the func-tion â ⶠ0, 1đ+1 â 0, 1 where â(đ„0, ⊠, đ„đâ1, đ„đ) equalsđ(đ„0, ⊠, đ„đâ1) if đ„đ = 0 and equals đ(đ„0, ⊠, đ„đâ1) otherwise.(All programs in this item are standard âsugar-freeâ NAND-CIRCprograms.)
Exercise 4.5 â Half and full adders. 1. A half adder is the function HA â¶0, 12 â¶â 0, 12 that corresponds to adding two binary bits. Thatis, for every đ, đ â 0, 1, HA(đ, đ) = (đ, đ) where 2đ + đ = đ + đ.Prove that there is a NAND circuit of at most five NAND gates thatcomputes HA.
2. A full adder is the function FA ⶠ0, 13 â 0, 12 that takes intwo bits and a âcarryâ bit and outputs their sum. That is, for everyđ, đ, đ â 0, 1, FA(đ, đ, đ) = (đ, đ) such that 2đ + đ = đ + đ + đ.Prove that there is a NAND circuit of at most nine NAND gates thatcomputes FA.
3. Prove that if there is a NAND circuit of đ gates that computes FA,then there is a circuit of đđ gates that computes ADDđ where (asin Theorem 4.7) ADDđ ⶠ0, 12đ â 0, 1đ+1 is the function thatoutputs the addition of two input đ-bit numbers. See footnote forhint.6
4. Show that for every đ there is a NAND-CIRC program to computeADDđ with at most 9đ lines.
Exercise 4.6 â Addition. Write a program using your favorite program-ming language that on input of an integer đ, outputs a NAND-CIRCprogram that computes ADDđ. Can you ensure that the program itoutputs for ADDđ has fewer than 10đ lines?
182 introduction to theoretical computer science
7 Hint: Use Karatsubaâs algorithm.
Exercise 4.7 â Multiplication. Write a program using your favorite pro-gramming language that on input of an integer đ, outputs a NAND-CIRC program that computes MULTđ. Can you ensure that the pro-gram it outputs for MULTđ has fewer than 1000 â đ2 lines?
Exercise 4.8 â Efficient multiplication (challenge). Write a program usingyour favorite programming language that on input of an integer đ,outputs a NAND-CIRC program that computes MULTđ and has atmost 10000đ1.9 lines.7 What is the smallest number of lines you canuse to multiply two 2048 bit numbers?
Exercise 4.9 â Multibit function. In the text Theorem 4.12 is only provenfor the case đ = 1. In this exercise you will extend the proof for everyđ.
Prove that
1. If there is an đ -line NAND-CIRC program to computeđ ⶠ0, 1đ â 0, 1 and an đ âČ-line NAND-CIRC programto compute đ âČ â¶ 0, 1đ â 0, 1 then there is an đ + đ âČ-lineprogram to compute the function đ ⶠ0, 1đ â 0, 12 such thatđ(đ„) = (đ(đ„), đ âČ(đ„)).
2. For every function đ ⶠ0, 1đ â 0, 1đ, there is a NAND-CIRCprogram of at most 10đ â 2đ lines that computes đ . (You can use theđ = 1 case of Theorem 4.12, as well as Item 1.)
Exercise 4.10 â Simplifying using syntactic sugar. Let đ be the followingNAND-CIRC program:
Temp[0] = NAND(X[0],X[0])Temp[1] = NAND(X[1],X[1])Temp[2] = NAND(Temp[0],Temp[1])Temp[3] = NAND(X[2],X[2])Temp[4] = NAND(X[3],X[3])Temp[5] = NAND(Temp[3],Temp[4])Temp[6] = NAND(Temp[2],Temp[2])Temp[7] = NAND(Temp[5],Temp[5])Y[0] = NAND(Temp[6],Temp[7])
1. Write a program đ âČ with at most three lines of code that uses bothNAND as well as the syntactic sugar OR that computes the same func-tion as đ .
syntactic sugar, and computing every function 183
8 You can use the fact that (đ+đ)+đ mod 2 = đ+đ+đmod 2. In particular it means that if you have thelines d = XOR(a,b) and e = XOR(d,c) then e gets thesum modulo 2 of the variable a, b and c.
2. Draw a circuit that computes the same function as đ and uses onlyAND and NOT gates.
In the following exercises you are asked to compare the power ofpairs of programming languages. By âcomparing the powerâ of twoprogramming languages đ and đ we mean determining the relationbetween the set of functions that are computable using programs in đand đ respectively. That is, to answer such a question you need to doboth of the following:
1. Either prove that for every program đ in đ there is a program đ âČin đ that computes the same function as đ , or give an example fora function that is computable by an đ-program but not computableby a đ -program.
and
2. Either prove that for every program đ in đ there is a program đ âČin đ that computes the same function as đ , or give an example for afunction that is computable by a đ -program but not computable byan đ-program.
When you give an example as above of a function that is com-putable in one programming language but not the other, you needto prove that the function you showed is (1) computable in the firstprogramming language and (2) not computable in the second program-ming language.Exercise 4.11 â Compare IF and NAND. Let IF-CIRC be the programminglanguage where we have the following operations foo = 0, foo = 1,foo = IF(cond,yes,no) (that is, we can use the constants 0 and 1,and the IF ⶠ0, 13 â 0, 1 function such that IF(đ, đ, đ) equals đ ifđ = 1 and equals đ if đ = 0). Compare the power of the NAND-CIRCprogramming language and the IF-CIRC programming language.
Exercise 4.12 â Compare XOR and NAND. Let XOR-CIRC be the pro-gramming language where we have the following operations foo= XOR(bar,blah), foo = 1 and bar = 0 (that is, we can use theconstants 0, 1 and the XOR function that maps đ, đ â 0, 12 to đ + đmod 2). Compare the power of the NAND-CIRC programminglanguage and the XOR-CIRC programming language. See footnote forhint.8
Exercise 4.13 â Circuits for majority. Prove that there is some constant đsuch that for every đ > 1, MAJđ â SIZE(đđ) where MAJđ ⶠ0, 1đ â
184 introduction to theoretical computer science
9 One approach to solve this is using recursion and theso-called Master Theorem.
0, 1 is the majority function on đ input bits. That is MAJđ(đ„) = 1 iffâđâ1đ=0 đ„đ > đ/2. See footnote for hint.9
Exercise 4.14 â Circuits for threshold. Prove that there is some constant đsuch that for every đ > 1, and integers đ0, ⊠, đđâ1, đ â â2đ, â2đ +1, ⊠, â1, 0, +1, ⊠, 2đ, there is a NAND circuit with at most đđ gatesthat computes the threshold function đđ0,âŠ,đđâ1,đ ⶠ0, 1đ â 0, 1 thaton input đ„ â 0, 1đ outputs 1 if and only if âđâ1đ=0 đđđ„đ > đ.
4.8 BIBLIOGRAPHICAL NOTES
See Juknaâs and Wegenerâs books [Juk12; Weg87] for much moreextensive discussion on circuits. Shannon showed that every Booleanfunction can be computed by a circuit of exponential size [Sha38]. Theimproved bound of đ â 2đ/đ (with the optimal value of đ for manybases) is due to Lupanov [Lup58]. An exposition of this for the caseof NAND (where đ = 1) is given in Chapter 4 of his book [Lup84].(Thanks to Sasha Golovnev for tracking down this reference!)
The concept of âsyntactic sugarâ is also known as âmacrosâ orâmeta-programmingâ and is sometimes implemented via a prepro-cessor or macro language in a programming language or a text editor.One modern example is the Babel JavaScript syntax transformer, thatconverts JavaScript programs written using the latest features intoa format that older Browsers can accept. It even has a plug-in ar-chitecture, that allows users to add their own syntactic sugar to thelanguage.
5Code as data, data as code
âThe term code script is, of course, too narrow. The chromosomal structuresare at the same time instrumental in bringing about the development theyforeshadow. They are law-code and executive power - or, to use another simile,they are architectâs plan and builderâs craft - in one.â , Erwin Schrödinger,1944.
âA mathematician would hardly call a correspondence between the set of 64triples of four units and a set of twenty other units,âuniversalâ, while suchcorrespondence is, probably, the most fundamental general feature of life onEarthâ, Misha Gromov, 2013
A program is simply a sequence of symbols, each of which can beencoded as a string of 0âs and 1âs using (for example) the ASCII stan-dard. Therefore we can represent every NAND-CIRC program (andhence also every Boolean circuit) as a binary string. This statementseems obvious but it is actually quite profound. It means that we cantreat circuits or NAND-CIRC programs both as instructions to car-rying computation and also as data that could potentially be used asinputs to other computations.
Big Idea 6 A program is a piece of text, and so it can be fed as inputto other programs.
This correspondence between code and data is one of the most fun-damental aspects of computing. It underlies the notion of generalpurpose computers, that are not pre-wired to compute only one task,and also forms the basis of our hope for obtaining general artificialintelligence. This concept finds immense use in all areas of comput-ing, from scripting languages to machine learning, but it is fair to saythat we havenât yet fully mastered it. Many security exploits involvecases such as âbuffer overflowsâ when attackers manage to inject codewhere the system expected only âpassiveâ data (see Fig. 5.1). The re-lation between code and data reaches beyond the realm of electronic
Compiled on 8.26.2020 18:10
Learning Objectives:âą See one of the most important concepts in
computing: duality between code and data.âą Build up comfort in moving between
different representations of programs.âą Follow the construction of a âuniversal circuit
evaluatorâ that can evaluate other circuitsgiven their representation.
âą See major result that complements the resultof the last chapter: some functions require anexponential number of gates to compute.
âą Discussion of Physical extendedChurch-Turing thesis stating that Booleancircuits capture all feasible computation inthe physical world, and its physical andphilosophical implications.
186 introduction to theoretical computer science
Figure 5.1: As illustrated in this xkcd cartoon, manyexploits, including buffer overflow, SQL injections,and more, utilize the blurry line between âactiveprogramsâ and âstatic stringsâ.
computers. For example, DNA can be thought of as both a programand data (in the words of Schrödinger, who wrote before the discov-ery of DNAâs structure a book that inspired Watson and Crick, DNA isboth âarchitectâs plan and builderâs craftâ).
This chapter: A non-mathy overviewIn this chapter, we will begin to explore some of the manyapplications of the correspondence between code and data.We start by using the representation of programs/circuitsas strings to count the number of programs/circuits up to acertain size, and use that to obtain a counterpart to the resultwe proved in Chapter 4. There we proved that every functioncan be computed by a circuit, but that circuit could be expo-nentially large (see Theorem 4.16 for the precise bound) Inthis chapter we will prove that there are some functions forwhich we cannot do better: the smallest circuit that computesthem is exponentially large.We will also use the notion of representing programs/cir-cuits as strings to show the existence of a âuniversal circuitâ- a circuit that can evaluate other circuits. In programminglanguages, this is known as a âmeta circular evaluatorâ - aprogram in a certain programming language that can eval-uate other programs in the same language. These resultsdo have an important restriction: the universal circuit willhave to be of bigger size than the circuits it evaluates. We willshow how to get around this restriction in Chapter 7 wherewe introduce loops and Turing Machines.See Fig. 5.2 for an overview of the results of this chapter.
Figure 5.2: Overview of the results in this chapter.We use the representation of programs/circuits asstrings to derive two main results. First we showthe existence of a universal program/circuit, andin fact (with more work) the existence of such aprogram/circuit whose size is at most polynomial inthe size of the program/circuit it evaluates. We thenuse the string representation to count the numberof programs/circuits of a given size, and use that toestablish that some functions require an exponentialnumber of lines/gates to compute.
code as data, data as code 187
5.1 REPRESENTING PROGRAMS AS STRINGS
We can represent programs or circuits as strings in a myriad of ways.For example, since Boolean circuits are labeled directed acyclic graphs,we can use the adjacency matrix or adjacency list representations forthem. However, since the code of a program is ultimately just a se-quence of letters and symbols, arguably the conceptually simplestrepresentation of a program is as such a sequence. For example, thefollowing NAND-CIRC program đtemp_0 = NAND(X[0],X[1])temp_1 = NAND(X[0],temp_0)temp_2 = NAND(X[1],temp_0)Y[0] = NAND(temp_1,temp_2)
is simply a string of 107 symbols which include lower and uppercase letters, digits, the underscore character _ and equality sign =,punctuation marks such as â(â,â)â,â,â, spaces, and ânew lineâ mark-ers (often denoted as â\nâ or ââ”â). Each such symbol can be encodedas a string of 7 bits using the ASCII encoding, and hence the programđ can be encoded as a string of length 7 â 107 = 749 bits.
Nothing in the above discussion was specific to the program đ , andhence we can use the same reasoning to prove that every NAND-CIRCprogram can be represented as a string in 0, 1â. In fact, we can do abit better. Since the names of the working variables of a NAND-CIRCprogram do not affect its functionality, we can always transform a pro-gram to have the form of đ âČ where all variables apart from the inputsand outputs have the form temp_0, temp_1, temp_2, etc.. Moreover,if the program has đ lines, then we will never need to use an indexlarger than 3đ (since each line involves at most three variables), andsimilarly the indices of the input and output variables will all be atmost 3đ . Since a number between 0 and 3đ can be expressed usingat most âlog10(3đ + 1)â = đ(log đ ) digits, each line in the program(which has the form foo = NAND(bar,blah)), can be representedusing đ(1) + đ(log đ ) = đ(log đ ) symbols, each of which can be rep-resented by 7 bits. Hence an đ line program can be represented as astring of đ(đ log đ ) bits, resulting in the following theorem:
Theorem 5.1 â Representing programs as strings. There is a constant đsuch that for đ â SIZE(đ ), there exists a program đ computing đwhose string representation has length at most đđ log đ .
P
188 introduction to theoretical computer science
1 The implicit constant in the đ(â ) notation is smallerthan 10. That is, for all sufficiently large đ , |SIZE(đ )| <210đ log đ , see Remark 5.4. As discussed in Section 1.7,we use the bound 10 simply because it is a roundnumber.
2 âAstronomicalâ here is an understatement: there aremuch fewer than 2210 stars, or even particles, in theobservable universe.
We omit the formal proof of Theorem 5.1 but pleasemake sure that you understand why it follows fromthe reasoning above.
5.2 COUNTING PROGRAMS, AND LOWER BOUNDS ON THE SIZEOF NAND-CIRC PROGRAMS
One consequence of the representation of programs as strings is thatthe number of programs of certain length is bounded by the numberof strings that represent them. This has consequences for the setsSIZE(đ ) that we defined in Section 4.6.
Theorem 5.2 â Counting programs. For every đ â â,|SIZE(đ )| †2đ(đ log đ ). (5.1)
That is, there are at most 2đ(đ log đ ) functions computed by NAND-CIRC programs of at most đ lines. 1
Proof. We will show a one-to-one map đž from SIZE(đ ) to the set ofstrings of length đđ log đ for some constant đ. This will conclude theproof, since it implies that |SIZE(đ )| is smaller than the size of the setof all strings of length at most â, which equals 1+2+4+âŻ+2â = 2â+1â1by the formula for sums of geometric progressions.
The map đž will simply map đ to the representation of the programcomputing đ . Specifically, we let đž(đ) be the representation of theprogram đ computing đ given by Theorem 5.1. This representationhas size at most đđ log đ , and moreover the map đž is one to one, sinceif đ â đ âČ then every two programs computing đ and đ âČ respectivelymust have different representations.
A function mapping 0, 12 to 0, 1 can be identified with thetable of its four values on the inputs 00, 01, 10, 11. A function mapping0, 13 to 0, 1 can be identified with the table of its eight values onthe inputs 000, 001, 010, 011, 100, 101, 110, 111. More generally, everyfunction đč ⶠ0, 1đ â 0, 1 can be identified with the table of its 2đvalues on the inputs 0, 1đ. Hence the number of functions mapping0, 1đ to 0, 1 is equal to the number of such tables which (sincewe can choose either 0 or 1 for every row) is exactly 22đ . Note thatthis is double exponential in đ, and hence even for small values of đ(e.g., đ = 10) the number of functions from 0, 1đ to 0, 1 is trulyastronomical.2 This has the following important corollary:
code as data, data as code 189
3 The constant đż is at least 0.1 and in fact, can be im-proved to be arbitrarily close to 1/2, see Exercise 5.7.
Theorem 5.3 â Counting argument lower bound. There is a constantđż > 0, such that for every sufficiently large đ, there is a functionđ ⶠ0, 1đ â 0, 1 such that đ â SIZE ( đż2đđ ). That is, the shortestNAND-CIRC program to compute đ requires more than đż â 2đ/đlines. 3
Proof. The proof is simple. If we let đ be the constant such that|SIZE(đ )| †2đđ log đ and đż = 1/đ, then setting đ = đż2đ/đ we see that|SIZE( đż2đđ )| †2đ đż2đđ log đ < 2đđż2đ = 22đ (5.2)
using the fact that since đ < 2đ, log đ < đ and đż = 1/đ. But since|SIZE(đ )| is smaller than the total number of functions mapping đ bitsto 1 bit, there must be at least one such function not in SIZE(đ ), whichis what we needed to prove.
We have seen before that every function mapping 0, 1đ to 0, 1can be computed by an đ(2đ/đ) line program. Theorem 5.3 showsthat this is tight in the sense that some functions do require such anastronomical number of lines to compute.
Big Idea 7 Some functions đ ⶠ0, 1đ â 0, 1 cannot be computedby a Boolean circuit using fewer than exponential (in đ) number ofgates.
In fact, as we explore in the exercises, this is the case for most func-tions. Hence functions that can be computed in a small number oflines (such as addition, multiplication, finding short paths in graphs,or even the EVAL function) are the exception, rather than the rule.
RRemark 5.4 â More efficient representation (advanced,optional). The ASCII representation is not the shortestrepresentation for NAND-CIRC programs. NAND-CIRC programs are equivalent to circuits with NANDgates, which means that a NAND-CIRC program of đ lines, đ inputs, and đ outputs can be represented bya labeled directed graph of đ + đ vertices, of which đhave in-degree zero, and the đ others have in-degreeat most two. Using the adjacency matrix represen-tation for such graphs, we can reduce the implicitconstant in Theorem 5.2 to be arbitrarily close to 5, seeExercise 5.6.
190 introduction to theoretical computer science
Figure 5.4: We prove Theorem 5.5 by coming upwith a list đ0, ⊠, đ2đ of functions such that đ0 is theall zero function, đ2đ is a function (obtained fromTheorem 5.3) outside of SIZE(0.1 â 2đ/đ) and suchthat đđâ1 and đđ differ by one another on at most oneinput. We can show that for every đ, the number ofgates to compute đđ is at most 10đ larger than thenumber of gates to compute đđâ1 and so if we let đbe the smallest number such that đđ â SIZE(đ ), thenđđ â SIZE(đ + 10đ).
5.2.1 Size hierarchy theorem (optional)By Theorem 4.15 the class SIZEđ(10 â 2đ/đ) contains all functionsfrom 0, 1đ to 0, 1, while by Theorem 5.3, there is some functionđ ⶠ0, 1đ â 0, 1 that is not contained in SIZEđ(0.1 â 2đ/đ). In otherwords, for every sufficiently large đ,
SIZEđ (0.1 2đđ ) â SIZEđ (10 2đđ ) . (5.3)
It turns out that we can use Theorem 5.3 to show a more general re-sult: whenever we increase our âbudgetâ of gates we can computenew functions.
Theorem 5.5 â Size Hierarchy Theorem. For every sufficiently large đand 10đ < đ < 0.1 â 2đ/đ,
SIZEđ(đ ) â SIZEđ(đ + 10đ) . (5.4)
Proof Idea:
To prove the theorem we need to find a function đ ⶠ0, 1đ â 0, 1such that đ can be computed by a circuit of đ + 10đ gates but it cannotbe computed by a circuit of đ gates. We will do so by coming up witha sequence of functions đ0, đ1, đ2, ⊠, đđ with the following properties:(1) đ0 can be computed by a circuit of at most 10đ gates, (2) đđ cannotbe computed by a circuit of 0.1 â 2đ/đ gates, and (3) for every đ â0, ⊠, đ, if đđ can be computed by a circuit of size đ , then đđ+1 can becomputed by a circuit of size at most đ +10đ. Together these propertiesimply that if we let đ be the smallest number such that đđ â SIZEđ(đ ),then since đđâ1 â SIZE(đ ) it must hold that đđ â SIZE(đ + 10đ) which iswhat we need to prove. See Fig. 5.4 for an illustration.âProof of Theorem 5.5. Let đâ ⶠ0, 1đ â 0, 1 be the function(whose existence we are guaranteed by Theorem 5.3) such thatđâ â SIZEđ(0.1 â 2đ/đ). We define the functions đ0, đ1, ⊠, đ2đ map-ping 0, 1đ to 0, 1 as follows. For every đ„ â 0, 1đ, if đđđ„(đ„) â0, 1, ⊠, 2đ â 1 is đ„âs order in the lexicographical order then
đđ(đ„) = â§âšâ©đâ(đ„) đđđ„(đ„) < đ0 otherwise. (5.5)
The function đ0 is simply the constant zero function, while thefunction đ2đ is equal to đâ. Moreover, for every đ â [2đ], the functionsđđ and đđ+1 differ on at most one input (i.e., the input đ„ â 0, 1đ suchthat đđđ„(đ„) = đ). Let 10đ < đ < 0.1 â 2đ/đ, and let đ be the first indexsuch that đđ â SIZEđ(đ ). Since đ2đ = đâ â SIZE(0.1 â 2đ/đ) there
code as data, data as code 191
must exist such an index đ, and moreover đ > 0 since the constant zerofunction is a member of SIZEđ(10đ).
By our choice of đ, đđâ1 is a member of SIZEđ(đ ). To complete theproof, we need to show that đđ â SIZEđ(đ + 10đ). Let đ„â be the stringsuch that đđđ„(đ„â) = đ đ â 0, 1 is the value of đâ(đ„â). Then we candefine đđ also as follows
đđ(đ„) = â§âšâ©đ đ„ = đ„âđđ(đ„) đ„ â đ„â (5.6)
or in other wordsđđ(đ„) = đđâ1(đ„) ⧠EQUAL(đ„â, đ„) âš đ ⧠EQUAL(đ„â, đ„) (5.7)
where EQUAL ⶠ0, 12đ â 0, 1 is the function that maps đ„, đ„âČ â0, 1đ to 1 if they are equal and to 0 otherwise. Since (by our choiceof đ), đđâ1 can be computed using at most đ gates and (as can be easilyverified) that EQUAL â SIZEđ(9đ), we can compute đđ using at mostđ + 9đ + đ(1) †đ + 10đ gates which is what we wanted to prove.
Figure 5.5: An illustration of some of what we knowabout the size complexity classes (not to scale!). Thisfigure depicts classes of the form SIZEđ,đ(đ ) but thestate of affairs for other size complexity classes suchas SIZEđ,1(đ ) is similar. We know by Theorem 4.12(with the improvement of Section 4.4.2) that allfunctions mapping đ bits to đ bits can be computedby a circuit of size đ â 2đ for đ †10, while on theother hand the counting lower bound (Theorem 5.3,see also Exercise 5.4) shows that some such functionswill require 0.1 â 2đ, and the size hierarchy theorem(Theorem 5.5) shows the existence of functions inSIZE(đ) ⧔ SIZE(đ ) whenever đ = đ(đ), see alsoExercise 5.5. We also consider some specific examples:addition of two đ/2 bit numbers can be done in đ(đ)lines, while we donât know of such a program formultiplying two đ bit numbers, though we do knowit can be done in đ(đ2) and in fact even better size.In the above, FACTORđ corresponds to the inverseproblem of multiplying- finding the prime factorizationof a given number. At the moment we do not knowof any circuit a polynomial (or even sub-exponential)number of lines that can compute FACTORđ.R
Remark 5.6 â Explicit functions. While the size hierar-chy theorem guarantees that there exists some functionthat can be computed using, for example, đ2 gates, butnot using 100đ gates, we do not know of any explicitexample of such a function. While we suspect thatinteger multiplication is such an example, we do nothave any proof that this is the case.
192 introduction to theoretical computer science
5.3 THE TUPLES REPRESENTATION
ASCII is a fine presentation of programs, but for some applicationsit is useful to have a more concrete representation of NAND-CIRCprograms. In this section we describe a particular choice, that willbe convenient for us later on. A NAND-CIRC program is simply asequence of lines of the form
blah = NAND(baz,boo)
There is of course nothing special about the particular names weuse for variables. Although they would be harder to read, we couldwrite all our programs using only working variables such as temp_0,temp_1 etc. Therefore, our representation for NAND-CIRC programsignores the actual names of the variables, and just associate a numberwith each variable. We encode a line of the program as a triple ofnumbers. If the line has the form foo = NAND(bar,blah) then weencode it with the triple (đ, đ, đ) where đ is the number correspondingto the variable foo and đ and đ are the numbers corresponding to barand blah respectively.
More concretely, we will associate every variable with a numberin the set [đĄ] = 0, 1, ⊠, đĄ â 1. The first đ numbers 0, ⊠, đ â 1correspond to the input variables, the last đ numbers đĄ â đ, ⊠, đĄ â 1correspond to the output variables, and the intermediate numbersđ, ⊠, đĄ â đ â 1 correspond to the remaining âworkspaceâ variables.Formally, we define our representation as follows:
Definition 5.7 â List of tuples representation. Let đ be a NAND-CIRCprogram of đ inputs, đ outputs, and đ lines, and let đĄ be the num-ber of distinct variables used by đ . The list of tuples representationof đ is the triple (đ, đ, đż) where đż is a list of triples of the form(đ, đ, đ) for đ, đ, đ â [đĄ].
We assign a number for variable of đ as follows:
âą For every đ â [đ], the variable X[đ] is assigned the number đ.âą For every đ â [đ], the variable Y[đ] is assigned the numberđĄ â đ + đ.âą Every other variable is assigned a number in đ, đ+1, ⊠, đĄâđâ1 in the order in which the variable appears in the program đ .
The list of tuples representation is our default choice for represent-ing NAND-CIRC programs. Since âlist of tuples representationâ is abit of a mouthful, we will often call it simply âthe representationâ fora program đ . Sometimes, when the number đ of inputs and number
code as data, data as code 193
4 If youâre curious what these few lines are, see ourGitHub repository.
đ of outputs are known from the context, we will simply represent aprogram as the list đż instead of the triple (đ, đ, đż).
Example 5.8 â Representing the XOR program. Our favorite NAND-CIRC program, the program
u = NAND(X[0],X[1])v = NAND(X[0],u)w = NAND(X[1],u)Y[0] = NAND(v,w)
computing the XOR function is represented as the tuple (2, 1, đż)where đż = ((2, 0, 1), (3, 0, 2), (4, 1, 2), (5, 3, 4)). That is, the variablesX[0] and X[1] are given the indices 0 and 1 respectively, the vari-ables u,v,w are given the indices 2, 3, 4 respectively, and the variableY[0] is given the index 5.Transforming a NAND-CIRC program from its representation as
code to the representation as a list of tuples is a fairly straightforwardprogramming exercise, and in particular can be done in a few lines ofPython.4 The list-of-tuples representation loses information such as theparticular names we used for the variables, but this is OK since thesenames do not make a difference to the functionality of the program.
5.3.1 From tuples to stringsIf đ is a program of size đ , then the number đĄ of variables is at most 3đ (as every line touches at most three variables). Hence we can encodeevery variable index in [đĄ] as a string of length â = âlog(3đ )â, by addingleading zeroes as needed. Since this is a fixed-length encoding, it isprefix free, and so we can encode the list đż of đ triples (correspondingto the encoding of the đ lines of the program) as simply the string oflength 3âđ obtained by concatenating all of these encodings.
We define đ(đ ) to be the length of the string representing the list đżcorresponding to a size đ program. By the above we see thatđ(đ ) = 3đ âlog(3đ )â . (5.8)
We can represent đ = (đ, đ, đż) as a string by prepending a prefixfree representation of đ and đ to the list đż. Since đ, đ †3đ (a pro-gram must touch at least once all its input and output variables), thoseprefix free representations can be encoded using strings of lengthđ(log đ ). In particular, every program đ of at most đ lines can be rep-resented by a string of length đ(đ log đ ). Similarly, every circuit đ¶ ofat most đ gates can be represented by a string of length đ(đ log đ ) (forexample by translating đ¶ to the equivalent program đ ).
194 introduction to theoretical computer science
5.4 A NAND-CIRC INTERPRETER IN NAND-CIRC
Since we can represent programs as strings, we can also think of aprogram as an input to a function. In particular, for every naturalnumber đ , đ, đ > 0 we define the function EVALđ ,đ,đ ⶠ0, 1đ(đ )+đ â0, 1đ as follows:
EVALđ ,đ,đ(đđ„) = â§âšâ©đ(đ„) đ â 0, 1đ(đ ) represents a size-đ program đ with đ inputs and đ outputs0đ otherwise(5.9)
where đ(đ ) is defined as in (5.8) and we use the concrete representa-tion scheme described in Section 5.1.
That is, EVALđ ,đ,đ takes as input the concatenation of two strings:a string đ â 0, 1đ(đ ) and a string đ„ â 0, 1đ. If đ is a string thatrepresents a list of triples đż such that (đ, đ, đż) is a list-of-tuples rep-resentation of a size-đ NAND-CIRC program đ , then EVALđ ,đ,đ(đđ„)is equal to the evaluation đ(đ„) of the program đ on the input đ„. Oth-erwise, EVALđ ,đ,đ(đđ„) equals 0đ (this case is not very important: youcan simply think of 0đ as some âjunk valueâ that indicates an error).
Take-away points. The fine details of EVALđ ,đ,đâs definition are notvery crucial. Rather, what you need to remember about EVALđ ,đ,đ isthat:
âą EVALđ ,đ,đ is a finite function taking a string of fixed length as inputand outputting a string of fixed length as output.
âą EVALđ ,đ,đ is a single function, such that computing EVALđ ,đ,đallows to evaluate arbitrary NAND-CIRC programs of a certainlength on arbitrary inputs of the appropriate length.
âą EVALđ ,đ,đ is a function, not a program (recall the discussion in Sec-tion 3.6.2). That is, EVALđ ,đ,đ is a specification of what output is as-sociated with what input. The existence of a program that computesEVALđ ,đ,đ (i.e., an implementation for EVALđ ,đ,đ) is a separatefact, which needs to be established (and which we will do in Theo-rem 5.9, with a more efficient program shown in Theorem 5.10).
One of the first examples of self circularity we will see in this book isthe following theorem, which we can think of as showing a âNAND-CIRC interpreter in NAND-CIRCâ:
Theorem 5.9 â Bounded Universality of NAND-CIRC programs. For everyđ , đ, đ â â with đ â„ đ there is a NAND-CIRC program đđ ,đ,đ thatcomputes the function EVALđ ,đ,đ.
That is, the NAND-CIRC program đđ ,đ,đ takes the descriptionof any other NAND-CIRC program đ (of the right length and input-
code as data, data as code 195
Figure 5.6: A universal circuit đ is a circuit that gets asinput the description of an arbitrary (smaller) circuitđ as a binary string, and an input đ„, and outputs thestring đ(đ„) which is the evaluation of đ on đ„. We canalso think of đ as a straight-line program that gets asinput the code of a straight-line program đ and aninput đ„, and outputs đ(đ„).
s/outputs) and any input đ„, and computes the result of evaluating theprogram đ on the input đ„. Given the equivalence between NAND-CIRC programs and Boolean circuits, we can also think of đđ ,đ,đ asa circuit that takes as input the description of other circuits and theirinputs, and returns their evaluation, see Fig. 5.6. We call this NAND-CIRC program đđ ,đ,đ that computes EVALđ ,đ,đ a bounded universalprogram (or a universal circuit, see Fig. 5.6). âUniversalâ stands forthe fact that this is a single program that can evaluate arbitrary code,where âboundedâ stands for the fact that đđ ,đ,đ only evaluates pro-grams of bounded size. Of course this limitation is inherent for theNAND-CIRC programming language, since a program of đ lines (or,equivalently, a circuit of đ gates) can take at most 2đ inputs. Later, inChapter 7, we will introduce the concept of loops (and the model ofTuring Machines), that allow to escape this limitation.
Proof. Theorem 5.9 is an important result, but it is actually not hard toprove. Specifically, since EVALđ ,đ,đ is a finite function Theorem 5.9 isan immediate corollary of Theorem 4.12, which states that every finitefunction can be computed by some NAND-CIRC program.
PTheorem 5.9 is simple but important. Make sure youunderstand what this theorem means, and why it is acorollary of Theorem 4.12.
5.4.1 Efficient universal programsTheorem 5.9 establishes the existence of a NAND-CIRC programfor computing EVALđ ,đ,đ, but it provides no explicit bound on thesize of this program. Theorem 4.12, which we used to prove Theo-rem 5.9, guarantees the existence of a NAND-CIRC program whosesize can be as large as exponential in the length of its input. This wouldmean that even for moderately small values of đ , đ, đ (for exampleđ = 100, đ = 300, đ = 1), computing EVALđ ,đ,đ might require aNAND program with more lines than there are atoms in the observ-able universe! Fortunately, we can do much better than that. In fact,for every đ , đ, đ there exists a NAND-CIRC program for comput-ing EVALđ ,đ,đ with size that is polynomial in its input length. This isshown in the following theorem.
Theorem 5.10 â Efficient bounded universality of NAND-CIRC programs.
For every đ , đ, đ â â there is a NAND-CIRC program of atmost đ(đ 2 log đ ) lines that computes the function EVALđ ,đ,đ â¶
196 introduction to theoretical computer science
0, 1đ+đ â 0, 1đ defined above (where đ is the number of bitsneeded to represent programs of đ lines).
PIf you havenât done so already, now might be a goodtime to review đ notation in Section 1.4.8. In particu-lar, an equivalent way to state Theorem 5.10 is that itsays that there exists some number đ > 0 such that forevery đ , đ, đ â â, there exists a NAND-CIRC programđ of at most đđ 2 log đ lines that computes the functionEVALđ ,đ,đ.
Unlike Theorem 5.9, Theorem 5.10 is not a trivial corollary of thefact that every finite function can be computed by some circuit. Prov-ing Theorem 5.9 requires us to present a concrete NAND-CIRC pro-gram for computing the function EVALđ ,đ,đ. We will do so in severalstages.1. First, we will describe the algorithm to evaluate EVALđ ,đ,đ in
âpseudo codeâ.
2. Then, we will show how we can write a program to computeEVALđ ,đ,đ in Python. We will not use much about Python, anda reader that has familiarity with programming in any languageshould be able to follow along.
3. Finally, we will show how we can transform this Python programinto a NAND-CIRC program.This approach yields much more than just proving Theorem 5.10:
we will see that it is in fact always possible to transform (loop free)code in high level languages such as Python to NAND-CIRC pro-grams (and hence to Boolean circuits as well).
5.4.2 A NAND-CIRC interpeter in âpseudocodeâTo prove Theorem 5.10 it suffices to give a NAND-CIRC program ofđ(đ 2 log đ ) lines that can evaluate NAND-CIRC programs of đ lines.Let us start by thinking how we would evaluate such programs if wewerenât restricted to only performing NAND operations. That is, let usdescribe informally an algorithm that on input đ, đ, đ , a list of triplesđż, and a string đ„ â 0, 1đ, evaluates the program represented by(đ, đ, đż) on the string đ„.
PIt would be highly worthwhile for you to stop hereand try to solve this problem yourself. For example,you can try thinking how you would write a program
code as data, data as code 197
NANDEVAL(n,m,s,L,x) that computes this function inthe programming language of your choice.
We will now describe such an algorithm. We assume that we haveaccess to a bit array data structure that can store for every đ â [đĄ] abit đđ â 0, 1. Specifically, if Table is a variable holding this datastructure, then we assume we can perform the operations:
âą GET(Table,i)which retrieves the bit corresponding to i in Table.The value of i is assumed to be an integer in [đĄ].
âą Table = UPDATE(Table,i,b)which updates Table so the the bitcorresponding to i is now set to b. The value of i is assumed to bean integer in [đĄ] and b is a bit in 0, 1.Algorithm 5.11 â Eval NAND-CIRC programs.
Input: Numbers đ, đ, đ and 𥠆3đ , as well as a list đż of đ triples of numbers in [đĄ], and a string đ„ â 0, 1đ.
Output: Evaluation of the program represented by(đ, đ, đż) on theInput: đ„ â 0, 1đ.1: Let Vartable be table of size đĄ2: for đ in [đ] do3: Vartable = UPDATE(Vartable,đ,đ„đ)4: end for5: for (đ, đ, đ) in đż do6: đ â GET(Vartable,đ)7: đ â GET(Vartable,đ)8: Vartable = UPDATE(Vartable,đ,NAND(đ,đ))9: end for
10: for đ in [đ] do11: đŠđ â GET(Vartable,đĄ â đ + đ)12: end for13: return đŠ0, ⊠, đŠđâ1
Algorithm 5.11 evaluates the program given to it as input one lineat a time, updating the Vartable table to contain the value of eachvariable. At the end of the execution it outputs the variables at posi-tions đĄ â đ, đĄ â đ + 1, ⊠, đĄ â 1 which correspond to the input variables.
5.4.3 A NAND interpreter in PythonTo make things more concrete, let us see how we implement Algo-rithm 5.11 in the Python programming language. (There is nothingspecial about Python. We could have easily presented a corresponding
198 introduction to theoretical computer science
5 Python does not distinguish between lists andarrays, but allows constant time random access to anindexed elements to both of them. One could arguethat if we allowed programs of truly unboundedlength (e.g., larger than 264) then the price wouldnot be constant but logarithmic in the length of thearray/lists, but the difference between đ(đ ) andđ(đ log đ ) will not be important for our discussions.
function in JavaScript, C, OCaml, or any other programming lan-guage.) We will construct a function NANDEVAL that on input đ, đ, đż, đ„will output the result of evaluating the program represented by(đ, đ, đż) on đ„. To keep things simple, we will not worry about the casethat đż does not represent a valid program of đ inputs and đ outputs.The code is presented in Fig. 5.7.
Accessing an element of the array Vartable at a given index takesa constant number of basic operations. Hence (since đ, đ †đ and𥠆3đ ), the program above will use đ(đ ) basic operations.5
5.4.4 Constructing the NAND-CIRC interpreter in NAND-CIRCWe now turn to describing the proof of Theorem 5.10. To prove thetheorem it is not enough to give a Python program. Rather, we need toshow how we compute the function EVALđ ,đ,đ using a NAND-CIRCprogram. In other words, our job is to transform, for every đ , đ, đ, thePython code of Section 5.4.3 to a NAND-CIRC program đđ ,đ,đ thatcomputes the function EVALđ ,đ,đ.
PBefore reading further, try to think how you could givea âconstructive proofâ of Theorem 5.10. That is, thinkof how you would write, in the programming lan-guage of your choice, a function universal(s,n,m)that on input đ , đ, đ outputs the code for the NAND-CIRC program đđ ,đ,đ such that đđ ,đ,đ computesEVALđ ,đ,đ. There is a subtle but crucial differencebetween this function and the Python NANDEVAL pro-gram described above. Rather than actually evaluatinga given program đ on some input đ€, the functionuniversal should output the code of a NAND-CIRCprogram that computes the map (đ , đ„) ⊠đ(đ„).
Our construction will follow very closely the Python implementa-tion of EVAL above. We will use variables Vartable[0],âŠ,Vartable[2ââ1], where â = âlog 3đ â to store our variables. However, NAND doesnâthave integer-valued variables, so we cannot write code such asVartable[i] for some variable i. However, we can implement thefunction GET(Vartable,i) that outputs the i-th bit of the arrayVartable. Indeed, this is nothing but the function LOOKUPâ that wehave seen in Theorem 4.10!
PPlease make sure that you understand why GET andLOOKUPâ are the same function.
We saw that we can compute LOOKUPâ in time đ(2â) = đ(đ ) forour choice of â.
code as data, data as code 199
Figure 5.7: Code for evaluating a NAND-CIRC program given in the list-of-tuples representation
def NANDEVAL(n,m,L,X):# Evaluate a NAND-CIRC program from list of tuple representation.s = len(L) # num of linest = max(max(a,b,c) for (a,b,c) in L)+1 # max index in L + 1Vartable = [0] * t # initialize array
# helper functionsdef GET(V,i): return V[i]def UPDATE(V,i,b):
V[i]=breturn V
# load input values to Vartable:for i in range(n):
Vartable = UPDATE(Vartable,i,X[i])
# Run the programfor (i,j,k) in L:
a = GET(Vartable,j)b = GET(Vartable,k)c = NAND(a,b)Vartable = UPDATE(Vartable,i,c)
# Return outputs Vartable[t-m], Vartable[t-m+1],....,Vartable[t-1]return [GET(Vartable,t-m+j) for j in range(m)]
# Test on XOR (2 inputs, 1 output)L = ((2, 0, 1), (3, 0, 2), (4, 1, 2), (5, 3, 4))print(NANDEVAL(2,1,L,(0,1))) # XOR(0,1)# [1]print(NANDEVAL(2,1,L,(1,1))) # XOR(1,1)# [0]
200 introduction to theoretical computer science
For every â, let UPDATEâ ⶠ0, 12â+â+1 â 0, 12â correspond to theUPDATE function for arrays of length 2â. That is, on input đ â 0, 12â ,đ â 0, 1â, đ â 0, 1, UPDATEâ(đ , đ, đ) is equal to đ âČ â 0, 12â suchthat đ âČđ = â§âšâ©đđ đ â đđ đ = 1 (5.10)
where we identify the string đ â 0, 1â with a number in 0, ⊠, 2â â 1using the binary representation. We can compute UPDATEâ using anđ(2ââ) = (đ log đ ) line NAND-CIRC program as as follows:
1. For every đ â [2â], there is an đ(â) line NAND-CIRC program tocompute the function EQUALSđ ⶠ0, 1â â 0, 1 that on input đoutputs 1 if and only if đ is equal to (the binary representation of) đ.(We leave verifying this as Exercise 5.2 and Exercise 5.3.)
2. We have seen that we can compute the function IF ⶠ0, 13 â 0, 1such that IF(đ, đ, đ) equals đ if đ = 1 and đ if đ = 0.Together, this means that we can compute UPDATE (using some
âsyntactic sugarâ for bounded length loops) as follows:
def UPDATE_ell(V,i,b):# Get V[0]...V[2^ell-1], i in 0,1^ell, b in 0,1# Return NewV[0],...,NewV[2^ell-1]# updated array with NewV[i]=b and all# else same as Vfor j in range(2**ell): # j = 0,1,2,....,2^ell -1
a = EQUALS_j(i)NewV[j] = IF(a,b,V[j])
return NewV
Since the loop over j in UPDATE is run 2â times, and computingEQUALS_j takes đ(â) lines, the total number of lines to compute UP-DATE is đ(2â â â) = đ(đ log đ ). Once we can compute GET and UPDATE,the rest of the implementation amounts to âbook keepingâ that needsto be done carefully, but is not too insightful, and hence we omit thefull details. Since we run GET and UPDATE đ times, the total number oflines for computing EVALđ ,đ,đ is đ(đ 2) + đ(đ 2 log đ ) = đ(đ 2 log đ ).This completes (up to the omitted details) the proof of Theorem 5.10.
RRemark 5.12 â Improving to quasilinear overhead (ad-vanced optional note). The NAND-CIRC programabove is less efficient than its Python counterpart,since NAND does not offer arrays with efficient ran-dom access. Hence for example the LOOKUP operation
code as data, data as code 201
6 ARM stands for âAdvanced RISC Machineâ whereRISC in turn stands for âReduced instruction setcomputerâ.
on an array of đ bits takes Ω(đ ) lines in NAND eventhough it takes đ(1) steps (or maybe đ(log đ ) steps,depending on how we count) in Python.It turns out that it is possible to improve the boundof Theorem 5.10, and evaluate đ line NAND-CIRCprograms using a NAND-CIRC program of đ(đ log đ )lines. The key is to consider the description of NAND-CIRC programs as circuits, and in particular as di-rected acyclic graphs (DAGs) of bounded in-degree.A universal NAND-CIRC program đđ for đ line pro-grams will correspond to a universal graph đ»đ for suchđ vertex DAGs. We can think of such a graph đđ asfixed âwiringâ for a communication network, thatshould be able to accommodate any arbitrary patternof communication between đ vertices (where this pat-tern corresponds to an đ line NAND-CIRC program).It turns out that such efficient routing networks existthat allow embedding any đ vertex circuit inside a uni-versal graph of size đ(đ log đ ), see the bibliographicalnotes Section 5.9 for more on this issue.
5.5 A PYTHON INTERPRETER IN NAND-CIRC (DISCUSSION)
To prove Theorem 5.10 we essentially translated every line of thePython program for EVAL into an equivalent NAND-CIRC snip-pet. However none of our reasoning was specific to the particu-lar function EVAL. It is possible to translate every Python programinto an equivalent NAND-CIRC program of comparable efficiency.(More concretely, if the Python program takes đ (đ) operations oninputs of length at most đ then there exists NAND-CIRC program ofđ(đ (đ) logđ (đ)) lines that agrees with the Python program on inputsof length đ.) Actually doing so requires taking care of many detailsand is beyond the scope of this book, but let me try to convince youwhy you should believe it is possible in principle.
For starters, one can use CPython (the reference implementationfor Python), to evaluate every Python program using a C program. Wecan combine this with a C compiler to transform a Python programto various flavors of âmachine languageâ. So, to transform a Pythonprogram into an equivalent NAND-CIRC program, it is enough toshow how to transform a machine language program into an equiva-lent NAND-CIRC program. One minimalistic (and hence convenient)family of machine languages is known as the ARM architecture whichpowers many mobile devices including essentially all Android de-vices.6 There are even simpler machine languages, such as the LEGarchitecture for which a backend for the LLVM compiler was imple-mented (and hence can be the target of compiling any of the largeand growing list of languages that this compiler supports). Other ex-
202 introduction to theoretical computer science
amples include the TinyRAM architecture (motivated by interactiveproof systems that we will discuss in Chapter 22) and the teaching-oriented Ridiculously Simple Computer architecture. Going one byone over the instruction sets of such computers and translating themto NAND snippets is no fun, but it is a feasible thing to do. In fact,ultimately this is very similar to the transformation that takes placein converting our high level code to actual silicon gates that are notso different from the operations of a NAND-CIRC program. Indeed,tools such as MyHDL that transform âPython to Siliconâ can be usedto convert a Python program to a NAND-CIRC program.
The NAND-CIRC programming language is just a teaching tool,and by no means do I suggest that writing NAND-CIRC programs, orcompilers to NAND-CIRC, is a practical, useful, or enjoyable activity.What I do want is to make sure you understand why it can be done,and to have the confidence that if your life (or at least your grade)depended on it, then you would be able to do this. Understandinghow programs in high level languages such as Python are eventuallytransformed into concrete low-level representation such as NAND isfundamental to computer science.
The astute reader might notice that the above paragraphs onlyoutlined why it should be possible to find for every particular Python-computable function đ , a particular comparably efficient NAND-CIRCprogram đ that computes đ . But this still seems to fall short of ourgoal of writing a âPython interpreter in NANDâ which would meanthat for every parameter đ, we come up with a single NAND-CIRCprogram UNIVđ such that given a description of a Python programđ , a particular input đ„, and a bound đ on the number of operations(where the lengths of đ and đ„ and the value of đ are all at most đ )returns the result of executing đ on đ„ for at most đ steps. After all,the transformation above takes every Python program into a differentNAND-CIRC program, and so does not yield âone NAND-CIRC pro-gram to rule them allâ that can evaluate every Python program up tosome given complexity. However, we can in fact obtain one NAND-CIRC program to evaluate arbitrary Python programs. The reason isthat there exists a Python interpreter in Python: a Python program đthat takes a bit string, interprets it as Python code, and then runs thatcode. Hence, we only need to show a NAND-CIRC program đ â thatcomputes the same function as the particular Python program đ , andthis will give us a way to evaluate all Python programs.
What we are seeing time and again is the notion of universality orself reference of computation, which is the sense that all reasonably richmodels of computation are expressive enough that they can âsimulatethemselvesâ. The importance of this phenomenon to both the theoryand practice of computing, as well as far beyond it, including the
code as data, data as code 203
foundations of mathematics and basic questions in science, cannot beoverstated.
5.6 THE PHYSICAL EXTENDED CHURCH-TURING THESIS (DISCUS-SION)
Weâve seen that NAND gates (and other Boolean operations) can beimplemented using very different systems in the physical world. Whatabout the reverse direction? Can NAND-CIRC programs simulate anyphysical computer?
We can take a leap of faith and stipulate that Boolean circuits (orequivalently NAND-CIRC programs) do actually encapsulate everycomputation that we can think of. Such a statement (in the realm ofinfinite functions, which weâll encounter in Chapter 7) is typicallyattributed to Alonzo Church and Alan Turing, and in that contextis known as the Church Turing Thesis. As we will discuss in futurelectures, the Church-Turing Thesis is not a mathematical theorem orconjecture. Rather, like theories in physics, the Church-Turing Thesisis about mathematically modeling the real world. In the context offinite functions, we can make the following informal hypothesis orprediction:
âPhysical Extended Church-Turing Thesisâ (PECTT): If a functionđč ⶠ0, 1đ â 0, 1đ can be computed in the physical world using đ amountof âphysical resourcesâ then it can be computed by a Boolean circuit program ofroughly đ gates.
A priori it might seem rather extreme to hypothesize that our mea-ger model of NAND-CIRC programs or Boolean circuits captures allpossible physical computation. But yet, in more than a century ofcomputing technologies, no one has yet built any scalable computingdevice that challenges this hypothesis.
We now discuss the âfine printâ of the PECTT in more detail, aswell as the (so far unsuccessful) challenges that have been raisedagainst it. There is no single universally-agreed-upon formalizationof âroughly đ physical resourcesâ, but we can approximate this notionby considering the size of any physical computing device and thetime it takes to compute the output, and ask that any such device canbe simulated by a Boolean circuit with a number of gates that is apolynomial (with not too large exponent) in the size of the system andthe time it takes it to operate.
In other words, we can phrase the PECTT as stipulating that anyfunction that can be computed by a device that takes a certain volumeđ of space and requires đĄ time to complete the computation, must becomputable by a Boolean circuit with a number of gates đ(đ , đĄ) that ispolynomial in đ and đĄ.
204 introduction to theoretical computer science
The exact form of the function đ(đ , đĄ) is not universally agreedupon but it is generally accepted that if đ ⶠ0, 1đ â 0, 1 is anexponentially hard function, in the sense that it has no NAND-CIRCprogram of fewer than, say, 2đ/2 lines, then a demonstration of a phys-ical device that can compute in the real world đ for moderate inputlengths (e.g., đ = 500) would be a violation of the PECTT.
RRemark 5.13 â Advanced note: making PECTT concrete(advanced, optional). We can attempt a more exactphrasing of the PECTT as follows. Suppose that đ isa physical system that accepts đ binary stimuli andhas a binary output, and can be enclosed in a sphereof volume đ . We say that the system đ computes afunction đ ⶠ0, 1đ â 0, 1 within đĄ seconds if when-ever we set the stimuli to some value đ„ â 0, 1đ, ifwe measure the output after đĄ seconds then we obtainđ(đ„).One can then phrase the PECTT as stipulating that ifthere exists such a system đ that computes đč withinđĄ seconds, then there exists a NAND-CIRC programthat computes đč and has at most đŒ(đ đĄ)2 lines, wheređŒ is some normalization constant. (We can also con-sider variants where we use surface area insteadof volume, or take (đ đĄ) to a different power than 2.However, none of these choices makes a qualitativedifference to the discussion below.) In particular,suppose that đ ⶠ0, 1đ â 0, 1 is a function thatrequires 2đ/(100đ) > 20.8đ lines for any NAND-CIRCprogram (such a function exists by Theorem 5.3).Then the PECTT would imply that either the volumeor the time of a system that computes đč will have tobe at least 20.2đ/âđŒ. Since this quantity grows expo-nentially in đ, it is not hard to set parameters so thateven for moderately large values of đ, such a systemcould not fit in our universe.To fully make the PECTT concrete, we need to decideon the units for measuring time and volume, and thenormalization constant đŒ. One conservative choice isto assume that we could squeeze computation to theabsolute physical limits (which are many orders ofmagnitude beyond current technology). This corre-sponds to setting đŒ = 1 and using the Planck unitsfor volume and time. The Planck length âđ (which is,roughly speaking, the shortest distance that can the-oretically be measured) is roughly 2â120 meters. ThePlanck time đĄđ (which is the time it takes for light totravel one Planck length) is about 2â150 seconds. In theabove setting, if a function đč takes, say, 1KB of input(e.g., roughly 104 bits, which can encode a 100 by 100bitmap image), and requires at least 20.8đ = 20.8â 104NAND lines to compute, then any physical systemthat computes it would require either volume of
code as data, data as code 205
20.2â 104 Planck length cubed, which is more than 21500meters cubed or take at least 20.2â 104 Planck Time units,which is larger than 21500 seconds. To get a sense ofhow big that number is, note that the universe is onlyabout 260 seconds old, and its observable radius isonly roughly 290 meters. The above discussion sug-gests that it is possible to empirically falsify the PECTTby presenting a smaller-than-universe-size system thatcomputes such a function.There are of course several hurdles to refuting thePECTT in this way, one of which is that we canât actu-ally test the system on all possible inputs. However,it turns out that we can get around this issue usingnotions such as interactive proofs and program checkingthat we might encounter later in this book. Another,perhaps more salient problem, is that while we knowmany hard functions exist, at the moment there is nosingle explicit function đč ⶠ0, 1đ â 0, 1 for whichwe can prove an đ(đ) (let alone Ω(2đ/đ)) lower boundon the number of lines that a NAND-CIRC programneeds to compute it.
5.6.1 Attempts at refuting the PECTTOne of the admirable traits of mankind is the refusal to accept limita-tions. In the best case this is manifested by people achieving long-standing âimpossibleâ challenges such as heavier-than-air flight,putting a person on the moon, circumnavigating the globe, or evenresolving Fermatâs Last Theorem. In the worst case it is manifested bypeople continually following the footsteps of previous failures to try todo proven-impossible tasks such as build a perpetual motion machine,trisect an angle with a compass and straightedge, or refute Bellâs in-equality. The Physical Extended Church Turing thesis (in its variousforms) has attracted both types of people. Here are some physicaldevices that have been speculated to achieve computational tasks thatcannot be done by not-too-large NAND-CIRC programs:
âą Spaghetti sort: One of the first lower bounds that Computer Sci-ence students encounter is that sorting đ numbers requires makingΩ(đ logđ) comparisons. The âspaghetti sortâ is a description of aproposed âmechanical computerâ that would do this faster. Theidea is that to sort đ numbers đ„1, ⊠, đ„đ, we could cut đ spaghettinoodles into lengths đ„1, ⊠, đ„đ, and then if we simply hold themtogether in our hand and bring them down to a flat surface, theywill emerge in sorted order. There are a great many reasons whythis is not truly a challenge to the PECTT hypothesis, and I will notruin the readerâs fun in finding them out by her or himself.
206 introduction to theoretical computer science
Figure 5.8: Scott Aaronson tests a candidate device forcomputing Steiner trees using soap bubbles.
7 We were extremely conservative in the suggestedparameters for the PECTT, having assumed that asmany as ââ2đ 10â6 ⌠1061 bits could potentially bestored in a millimeter radius region.
âą Soap bubbles: One function đč ⶠ0, 1đ â 0, 1 that is conjecturedto require a large number of NAND lines to solve is the EuclideanSteiner Tree problem. This is the problem where one is given đpoints in the plane (đ„1, đŠ1), ⊠, (đ„đ, đŠđ) (say with integer coordi-nates ranging from 1 till đ, and hence the list can be representedas a string of đ = đ(đ logđ) size) and some number đŸ. The goalis to figure out whether it is possible to connect all the points byline segments of total length at most đŸ. This function is conjec-tured to be hard because it is NP complete - a concept that weâll en-counter later in this course - and it is in fact reasonable to conjecturethat as đ grows, the number of NAND lines required to computethis function grows exponentially in đ, meaning that the PECTTwould predict that if đ is sufficiently large (such as few hundredsor so) then no physical device could compute đč . Yet, some peopleclaimed that there is in fact a very simple physical device that couldsolve this problem, that can be constructed using some woodenpegs and soap. The idea is that if we take two glass plates, and putđ wooden pegs between them in the locations (đ„1, đŠ1), ⊠, (đ„đ, đŠđ)then bubbles will form whose edges touch those pegs in a way thatwill minimize the total energy which turns out to be a function ofthe total length of the line segments. The problem with this deviceis that nature, just like people, often gets stuck in âlocal optimaâ.That is, the resulting configuration will not be one that achievesthe absolute minimum of the total energy but rather one that canâtbe improved with local changes. Aaronson has carried out actualexperiments (see Fig. 5.8), and saw that while this device oftenis successful for three or four pegs, it starts yielding suboptimalresults once the number of pegs grows beyond that.
âą DNA computing. People have suggested using the properties ofDNA to do hard computational problems. The main advantage ofDNA is the ability to potentially encode a lot of information in arelatively small physical space, as well as compute on this infor-mation in a highly parallel manner. At the time of this writing, itwas demonstrated that one can use DNA to store about 1016 bitsof information in a region of radius about a millimeter, as opposedto about 1010 bits with the best known hard disk technology. Thisdoes not posit a real challenge to the PECTT but does suggest thatone should be conservative about the choice of constant and not as-sume that current hard disk + silicon technologies are the absolutebest possible.7
âą Continuous/real computers. The physical world is often describedusing continuous quantities such as time and space, and people
code as data, data as code 207
have suggested that analog devices might have direct access tocomputing with real-valued quantities and would be inherentlymore powerful than discrete models such as NAND machines.Whether the âtrueâ physical world is continuous or discrete is anopen question. In fact, we do not even know how to precisely phrasethis question, let alone answer it. Yet, regardless of the answer, itseems clear that the effort to measure a continuous quantity growswith the level of accuracy desired, and so there is no âfree lunchâor way to bypass the PECTT using such machines (see also thispaper). Related to that are proposals known as âhypercomputingâor âZenoâs computersâ which attempt to use the continuity of timeby doing the first operation in one second, the second one in half asecond, the third operation in a quarter second and so on.. Thesefail for a similar reason to the one guaranteeing that Achilles willeventually catch the tortoise despite the original Zenoâs paradox.
âą Relativity computer and time travel. The formulation above as-sumed the notion of time, but under the theory of relativity time isin the eye of the observer. One approach to solve hard problems isto leave the computer to run for a lot of time from his perspective,but to ensure that this is actually a short while from our perspective.One approach to do so is for the user to start the computer and thengo for a quick jog at close to the speed of light before checking onits status. Depending on how fast one goes, few seconds from thepoint of view of the user might correspond to centuries in com-puter time (it might even finish updating its Windows operatingsystem!). Of course the catch here is that the energy required fromthe user is proportional to how close one needs to get to the speedof light. A more interesting proposal is to use time travel via closedtimelike curves (CTCs). In this case we could run an arbitrarily longcomputation by doing some calculations, remembering the currentstate, and then travelling back in time to continue where we left off.Indeed, if CTCs exist then weâd probably have to revise the PECTT(though in this case I will simply travel back in time and edit thesenotes, so I can claim I never conjectured it in the first placeâŠ)
âą Humans. Another computing system that has been proposed asa counterexample to the PECTT is a 3 pound computer of about0.1m radius, namely the human brain. Humans can walk around,talk, feel, and do other things that are not commonly done byNAND-CIRC programs, but can they compute partial functionsthat NAND-CIRC programs cannot? There are certainly compu-tational tasks that at the moment humans do better than computers(e.g., play some video games, at the moment), but based on ourcurrent understanding of the brain, humans (or other animals)
208 introduction to theoretical computer science
8 This is a very rough approximation that couldbe wrong to a few orders of magnitude in eitherdirection. For one, there are other structures in thebrain apart from neurons that one might need tosimulate, hence requiring higher overhead. On theother hand, it is by no mean clear that we need tofully clone the brain in order to achieve the samecomputational tasks that it does.
9 There are some well known scientists that haveadvocated that humans have inherent computationaladvantages over computers. See also this.
have no inherent computational advantage over computers. Thebrain has about 1011 neurons, each operating at a speed of about1000 operations per seconds. Hence a rough first approximation isthat a Boolean circuit of about 1014 gates could simulate one secondof a brainâs activity.8 Note that the fact that such a circuit (likely)exists does not mean it is easy to find it. After all, constructing thiscircuit took evolution billions of years. Much of the recent effortsin artificial intelligence research is focused on finding programsthat replicate some of the brainâs capabilities and they take massivecomputational effort to discover, these programs often turn out tobe much smaller than the pessimistic estimates above. For example,at the time of this writing, Googleâs neural network for machinetranslation has about 104 nodes (and can be simulated by a NAND-CIRC program of comparable size). Philosophers, priests and manyothers have since time immemorial argued that there is somethingabout humans that cannot be captured by mechanical devices suchas computers; whether or not that is the case, the evidence is thinthat humans can perform computational tasks that are inherentlyimpossible to achieve by computers of similar complexity.9
âą Quantum computation. The most compelling attack on the Physi-cal Extended Church Turing Thesis comes from the notion of quan-tum computing. The idea was initiated by the observation that sys-tems with strong quantum effects are very hard to simulate on acomputer. Turning this observation on its head, people have pro-posed using such systems to perform computations that we do notknow how to do otherwise. At the time of this writing, scalablequantum computers have not yet been built, but it is a fascinatingpossibility, and one that does not seem to contradict any known lawof nature. We will discuss quantum computing in much more detailin Chapter 23. Modeling quantum computation involves extendingthe model of Boolean circuits into Quantum circuits that have onemore (very special) gate. However, the main takeaway is that whilequantum computing does suggest we need to amend the PECTT,it does not require a complete revision of our worldview. Indeed,almost all of the content of this book remains the same regardless ofwhether the underlying computational model is Boolean circuits orquantum circuits.
RRemark 5.14 â Physical Extended Church-Turing Thesisand Cryptography. While even the precise phrasing ofthe PECTT, let alone understanding its correctness, isstill a subject of active research, some variants of it are
code as data, data as code 209
already implicitly assumed in practice. Governments,companies, and individuals currently rely on cryptog-raphy to protect some of their most precious assets,including state secrets, control of weapon systemsand critical infrastructure, securing commerce, andprotecting the confidentiality of personal information.In applied cryptography, one often encounters state-ments such as âcryptosystem đ provides 128 bits ofsecurityâ. What such a statement really means is that(a) it is conjectured that there is no Boolean circuit(or, equivalently, a NAND-CIRC program) of sizemuch smaller than 2128 that can break đ, and (b) weassume that no other physical mechanism can do bet-ter, and hence it would take roughly a 2128 amount ofâresourcesâ to break đ. We say âconjecturedâ and notâprovedâ because, while we can phrase the statementthat breaking the system cannot be done by an đ -gatecircuit as a precise mathematical conjecture, at themoment we are unable to prove such a statement forany non-trivial cryptosystem. This is related to the Pvs NP question we will discuss in future chapters. Wewill explore Cryptography in Chapter 21.
Chapter Recap
âą We can think of programs both as describing a pro-cess, as well as simply a list of symbols that can beconsidered as data that can be fed as input to otherprograms.
âą We can write a NAND-CIRC program that evalu-ates arbitrary NAND-CIRC programs (or equiv-alently a circuit that evaluates other circuits).Moreover, the efficiency loss in doing so is not toolarge.
âą We can even write a NAND-CIRC program thatevaluates programs in other programming lan-guages such as Python, C, Lisp, Java, Go, etc.
âą By a leap of faith, we could hypothesize that thenumber of gates in the smallest circuit that com-putes a function đ captures roughly the amountof physical resources required to compute đ . Thisstatement is known as the Physical Extended Church-Turing Thesis (PECTT).
âą Boolean circuits (or equivalently AON-CIRC orNAND-CIRC programs) capture a surprisinglywide array of computational models. The strongestcurrently known challenge to the PECTT comesfrom the potential for using quantum mechanicaleffects to speed-up computation, a model known asquantum computers.
210 introduction to theoretical computer science
Figure 5.9: A finite computational task is specified bya function đ ⶠ0, 1đ â 0, 1đ. We can modela computational process using Boolean circuits (ofvarying gate sets) or straight-line program. Everyfunction can be computed by many programs. Wesay that đ â SIZEđ,đ(đ ) if there exists a NANDcircuit of at most đ gates (equivalently a NAND-CIRCprogram of at most đ lines) that computes đ. Everyfunction đ ⶠ0, 1đ â 0, 1đ can be computed bya circuit of đ(đ â 2đ/đ) gates. Many functions suchas multiplication, addition, solving linear equations,computing the shortest path in a graph, and others,can be computed by circuits of much fewer gates.In particular there is an đ(đ 2 log đ )-size circuitthat computes the map đ¶, đ„ ⊠đ¶(đ„) where đ¶ isa string describing a circuit of đ gates. However,the counting argument shows there do exist somefunctions đ ⶠ0, 1đ â 0, 1đ that requireΩ(đ â 2đ/đ) gates to compute.
5.7 RECAP OF PART I: FINITE COMPUTATION
This chapter concludes the first part of this book that deals with finitecomputation (computing functions that map a fixed number of Booleaninputs to a fixed number of Boolean outputs). The main take-awaysfrom Chapter 3, Chapter 4, and Chapter 5 are as follows (see alsoFig. 5.9):
âą We can formally define the notion of a function đ ⶠ0, 1đ â0, 1đ being computable using đ basic operations. Whether theseoperations are AND/OR/NOT, NAND, or some other universalbasis does not make much difference. We can describe such a com-putation either using a circuit or using a straight-line program.
âą We define SIZEđ,đ(đ ) to be the set of functions that are computableby NAND circuits of at most đ gates. This set is equal to the set offunctions computable by a NAND-CIRC program of at most đ linesand up to a constant factor in đ (which we will not care about);this is also the same as the set of functions that are computableby a Boolean circuit of at most đ AND/OR/NOT gates. The classSIZEđ,đ(đ ) is a set of functions, not of programs/circuits.
âą Every function đ ⶠ0, 1đ â 0, 1đ can be computed using acircuit of at most đ(đ â 2đ/đ) gates. Some functions require at leastΩ(đ â 2đ/đ) gates. We define SIZEđ,đ(đ ) to be the set of functionsfrom 0, 1đ to 0, 1đ that can be computed using at most đ gates.
âą We can describe a circuit/program đ as a string. For every đ , thereis a universal circuit/program đđ that can evaluate programs oflength đ given their description as strings. We can use this repre-sentation also to count the number of circuits of at most đ gates and
code as data, data as code 211
hence prove that some functions cannot be computed by circuits ofsmaller-than-exponential size.
âą If there is a circuit of đ gates that computes a function đ , then wecan build a physical device to compute đ using đ basic components(such as transistors). The âPhysical Extended Church-Turing The-sisâ postulates that the reverse direction is true as well: if đ is afunction for which every circuit requires at least đ gates then thatevery physical device to compute đ will require about đ âphysicalresourcesâ. The main challenge to the PECTT is quantum computing,which we will discuss in Chapter 23.
Sneak preview: In the next part we will discuss how to model compu-tational tasks on unbounded inputs, which are specified using functionsđč ⶠ0, 1â â 0, 1â (or đč ⶠ0, 1â â 0, 1) that can take anunbounded number of Boolean inputs.
5.8 EXERCISES
Exercise 5.1 Which one of the following statements is false:
a. There is an đ(đ 3) line NAND-CIRC program that given as inputprogram đ of đ lines in the list-of-tuples representation computesthe output of đ when all its input are equal to 1.
b. There is an đ(đ 3) line NAND-CIRC program that given as inputprogram đ of đ characters encoded as a string of 7đ bits using theASCII encoding, computes the output of đ when all its input areequal to 1.
c. There is an đ(âđ ) line NAND-CIRC program that given as inputprogram đ of đ lines in the list-of-tuples representation computesthe output of đ when all its input are equal to 1.
Exercise 5.2 â Equals function. For every đ â â, show that there is an đ(đ)line NAND-CIRC program that computes the function EQUALSđ â¶0, 12đ â 0, 1 where EQUALS(đ„, đ„âČ) = 1 if and only if đ„ = đ„âČ.
Exercise 5.3 â Equal to constant function. For every đ â â and đ„âČ â 0, 1đ,show that there is an đ(đ) line NAND-CIRC program that computesthe function EQUALSđ„âČ â¶ 0, 1đ â 0, 1 that on input đ„ â 0, 1đoutputs 1 if and only if đ„ = đ„âČ.
Exercise 5.4 â Counting lower bound for multibit functions. Prove that thereexists a number đż > 0 such that for every đ, đ there exists a function
212 introduction to theoretical computer science
10 How many functions from 0, 1đ to 0, 1đ exist?
11 Follow the proof of Theorem 5.5, replacing the useof the counting argument with Exercise 5.4.
12 Using the adjacency list representation, a graphwith đ in-degree zero vertices and đ in-degree twovertices can be represented using roughly 2đ log(đ +đ) †2đ (log đ + đ(1)) bits. The labeling of the đ inputand đ output vertices can be specified by a list of đlabels in [đ] and đ labels in [đ].13 Hint: Use the results of Exercise 5.6 and the fact thatin this regime đ = 1 and đ âȘ đ .
14 Hint: An equivalent way to say this is that youneed to prove that the set of functions that can becomputed using at most 2đ/(1000đ) lines has fewerthan 2â10022đ elements. Can you see why?
15 Note that if đ is big enough, then it is easy torepresent such a pair using đ2 bits, since we canrepresent the program using đ(đ1.1 logđ) bits, andwe can always pad our representation to have exactlyđ2 length.
đ ⶠ0, 1đ â 0, 1đ that requires at least đżđ â 2đ/đ NAND gates tocompute. See footnote for hint.10
Exercise 5.5 â Size hierarchy theorem for multibit functions. Prove that thereexists a number đ¶ such that for every đ, đ and đ+đ < đ < đâ 2đ/(đ¶đ)there exists a function đ â SIZEđ,đ(đ¶ â đ ) ⧔ SIZEđ,đ(đ ). See footnote forhint.11
Exercise 5.6 â Efficient representation of circuits and a tighter counting upper
bound. Use the ideas of Remark 5.4 to show that for every đ > 0 andsufficiently large đ , đ, đ,|SIZEđ,đ(đ )| < 2(2+đ)đ log đ +đ logđ+đ log đ . (5.11)
Conclude that the implicit constant in Theorem 5.2 can be made arbi-trarily close to 5. See footnote for hint.12
Exercise 5.7 â Tighter counting lower bound. Prove that for every đż < 1/2, ifđ is sufficiently large then there exists a function đ ⶠ0, 1đ â 0, 1such that đ â SIZEđ,1 ( đż2đđ ). See footnote for hint.13
Exercise 5.8 â Random functions are hard. Suppose đ > 1000 and that wechoose a function đč ⶠ0, 1đ â 0, 1 at random, choosing for everyđ„ â 0, 1đ the value đč(đ„) to be the result of tossing an independentunbiased coin. Prove that the probability that there is a 2đ/(1000đ)line program that computes đč is at most 2â100.14
Exercise 5.9 The following is a tuple representing a NAND program:(3, 1, ((3, 2, 2), (4, 1, 1), (5, 3, 4), (6, 2, 1), (7, 6, 6), (8, 0, 0), (9, 7, 8), (10, 5, 0), (11, 9, 10))).1. Write a table with the eight values đ(000), đ(001), đ(010), đ(011),đ(100), đ(101), đ(110), đ(111) in this order.
2. Describe what the programs does in words.
Exercise 5.10 â EVAL with XOR. For every sufficiently large đ, let đžđ â¶0, 1đ2 â 0, 1 be the function that takes an đ2-length string thatencodes a pair (đ , đ„) where đ„ â 0, 1đ and đ is a NAND programof đ inputs, a single output, and at most đ1.1 lines, and returns theoutput of đ on đ„.15 That is, đžđ(đ , đ„) = đ(đ„).
Prove that for every sufficiently large đ, there does not exist an XORcircuit đ¶ that computes the function đžđ, where a XOR circuit has theXOR gate as well as the constants 0 and 1 (see Exercise 3.5). That is,
code as data, data as code 213
16 Hint: Use our bound on the number of program-s/circuits of size đ (Theorem 5.2), as well as theChernoff Bound ( Theorem 18.12) and the unionbound.
prove that there is some constant đ0 such that for every đ > đ0 andXOR circuit đ¶ of đ2 inputs and a single output, there exists a pair(đ , đ„) such that đ¶(đ , đ„) â đžđ(đ , đ„).
Exercise 5.11 â Learning circuits (challenge, optional, assumes more background).
(This exercise assumes background in probability theory and/ormachine learning that you might not have at this point. Feel freeto come back to it at a later point and in particular after going overChapter 18.) In this exercise we will use our bound on the number ofcircuits of size đ to show that (if we ignore the cost of computation)every such circuit can be learned from not too many training samples.Specifically, if we find a size-đ circuit that classifies correctly a trainingset of đ(đ log đ ) samples from some distribution đ·, then it is guaran-teed to do well on the whole distribution đ·. Since Boolean circuitsmodel very many physical processes (maybe even all of them, if the(controversial) physical extended Church-Turing thesis is true), thisshows that all such processes could be learned as well (again, ignor-ing the computation cost of finding a classifier that does well on thetraining data).
Let đ· be any probability distribution over 0, 1đ and let đ¶ be aNAND circuit with đ inputs, one output, and size đ â„ đ. Prove thatthere is some constant đ such that with probability at least 0.999 thefollowing holds: if đ = đđ log đ and đ„0, ⊠, đ„đâ1 are chosen indepen-dently from đ·, then for every circuit đ¶âČ such that đ¶âČ(đ„đ) = đ¶(đ„đ) onevery đ â [đ], Prđ„âŒđ·[đ¶âČ(đ„) †đ¶(đ„)] †0.99.
In other words, if đ¶âČ is a so called âempirical risk minimizerâ thatagrees with đ¶ on all the training examples đ„0, ⊠, đ„đâ1, then it willalso agree with đ¶ with high probability for samples drawn from thedistribution đ· (i.e., it âgeneralizesâ, to use Machine-Learning lingo).See footnote for hint.16
5.9 BIBLIOGRAPHICAL NOTES
The EVAL function is usually known as a universal circuit. The imple-mentation we describe is not the most efficient known. Valiant [Val76]first showed a universal circuit of đ(đ logđ) size where đ is the size ofthe input. Universal circuits have seen in recent years new motivationsdue to their applications for cryptography, see [LMS16; GKS17] .
While weâve seen that âmostâ functions mapping đ bits to one bitrequire circuits of exponential size Ω(2đ/đ), we actually do not knowof any explicit function for which we can prove that it requires, say, atleast đ100 or even 100đ size. At the moment, the strongest such lowerbound we know is that there are quite simple and explicit đ-variable
214 introduction to theoretical computer science
functions that require at least (5 â đ(1))đ lines to compute, see thispaper of Iwama et al as well as this more recent work of Kulikov et al.Proving lower bounds for restricted models of circuits is an extremelyinteresting research area, for which Juknaâs book [Juk12] (see alsoWegener [Weg87]) provides a very good introduction and overview. Ilearned of the proof of the size hierarchy theorem (Theorem 5.5) fromSasha Golovnev.
Scott Aaronsonâs blog post on how information is physical is a gooddiscussion on issues related to the physical extended Church-TuringPhysics. Aaronsonâs survey on NP complete problems and physicalreality [Aar05] discusses these issues as well, though it might beeasier to read after we reach Chapter 15 on NP and NP-completeness.
IIUNIFORM COMPUTATION
Figure 6.1: Once you know how to multiply multi-digit numbers, you can do so for every number đof digits, but if you had to describe multiplicationusing Boolean circuits or NAND-CIRC programs,you would need a different program/circuit for everylength đ of the input.
6Functions with Infinite domains, Automata, and Regular ex-pressions
âAn algorithm is a finite answer to an infinite number of questions.â, At-tributed to Stephen Kleene.
The model of Boolean circuits (or equivalently, the NAND-CIRCprogramming language) has one very significant drawback: a Booleancircuit can only compute a finite function đ , and in particular sinceevery gate has two inputs, a size đ circuit can compute on an inputof length at most 2đ . This does not capture our intuitive notion of analgorithm as a single recipe to compute a potentially infinite function.For example, the standard elementary school multiplication algorithmis a single algorithm that multiplies numbers of all lengths, but yetwe cannot express this algorithm as a single circuit, but rather need adifferent circuit (or equivalently, a NAND-CIRC program) for everyinput length (see Fig. 6.1).
In this chapter we extend our definition of computational tasks toconsider functions with the unbounded domain of 0, 1â. We focuson the question of defining what tasks to compute, mostly leavingthe question of how to do so to later chapters, where we will seeTuring Machines and other computational models for computing onunbounded inputs. In this chapter we will see however one examplefor a simple and restricted model of computation - deterministic finiteautomata (DFAs).
This chapter: A non-mathy overviewIn this chapter we discuss functions that as input strings ofarbitary length. Such functions could have outputs that arelong strings as well, but the case of Boolean functions, wherethe output is a single bit, is of particular importance. The taskof computing a Boolean function is equivalent to the task
Compiled on 8.26.2020 18:10
Learning Objectives:âą Define functions on unbounded length inputs,
that cannot be described by a finite size tableof inputs and outputs.
âą Equivalence with the task of decidingmembership in a language.
âą Deterministic finite automatons (optional): Asimple example for a model for unboundedcomputation
âą Equivalence with regular expressions.
218 introduction to theoretical computer science
Figure 6.2: The NAND circuit and NAND-CIRCprogram for computing the XOR of 5 bits. Note howthe circuit for XOR5 merely repeats four times thecircuit to compute the XOR of 2 bits.
of deciding a language. We will also define the restriction ofa function over unbounded length strings to a finite length,and how this restriction can be computed by Boolean circuits.In the second half of this chapter we discuss finite automata,which is a model for computing functions of unboundedlength. This model is not as powerful as Python or othergeneral-purpose programming languages, but can serveas an introduction to these more general models. We alsoshow a beautiful result - the functions computable by finiteautomata are exactly the ones that correspond to regularexpressions. However, the reader can also feel free to skipautomata and go straight to our discussion of Turing Machinesin Chapter 7.
6.1 FUNCTIONS WITH INPUTS OF UNBOUNDED LENGTH
Up until now, we considered the computational task of mappingsome string of length đ into a string of length đ. However, in gen-eral computational tasks can involve inputs of unbounded length.For example, the following Python function computes the functionXOR ⶠ0, 1â â 0, 1, where XOR(đ„) equals 1 iff the number of 1âsin đ„ is odd. (In other words, XOR(đ„) = â|đ„|â1đ=0 đ„đ mod 2 for everyđ„ â 0, 1â.) As simple as it is, the XOR function cannot be com-puted by a Boolean circuit. Rather, for every đ, we can compute XORđ(the restriction of XOR to 0, 1đ) using a different circuit (e.g., seeFig. 6.2).
def XOR(X):'''Takes list X of 0's and 1's
Outputs 1 if the number of 1's is odd and outputs 0otherwise'''result = 0for i in range(len(X)):
result += X[i] % 2return result
Previously in this book we studied the computation of finite func-tions đ ⶠ0, 1đ â 0, 1đ. Such a function đ can always be describedby listing all the 2đ values it takes on inputs đ„ â 0, 1đ. In this chap-ter we consider functions such as XOR that take inputs of unboundedsize. While we can describe XOR using a finite number of symbols(in fact we just did so above), it takes infinitely many possible inputsand so we cannot just write down all of its values. The same is truefor many other functions capturing important computational tasks
functions with infinite domains, automata, and regular expressions 219
including addition, multiplication, sorting, finding paths in graphs,fitting curves to points, and so on and so forth. To contrast with thefinite case, we will sometimes call a function đč ⶠ0, 1â â 0, 1 (orđč ⶠ0, 1â â 0, 1â) infinite. However, this does not mean that đčtaks as input strings of infinite length! It just means that đč can take asinput a string of that can be arbitrarily long, and so we cannot simplywrite down a table of all the outputs of đč on different inputs.
Big Idea 8 A function đč ⶠ0, 1â â 0, 1â specifies the computa-tional task mapping an input đ„ â 0, 1â into the output đč(đ„).
As weâve seen before, restricting attention to functions that usebinary strings as inputs and outputs does not detract from our gener-ality, since other objects, including numbers, lists, matrices, images,videos, and more, can be encoded as binary strings.
As before, it is important to differentiate between specification andimplementation. For example, consider the following function:
TWINP(đ„) = â§âšâ©1 âđââ s.t.đ, đ + 2 are primes and đ > |đ„|0 otherwise(6.1)
This is a mathematically well-defined function. For every đ„,TWINP(đ„) has a unique value which is either 0 or 1. However, atthe moment no one knows of a Python program that computes thisfunction. The Twin prime conjecture posits that for every đ thereexists đ > đ such that both đ and đ + 2 are primes. If this conjectureis true, then đ is easy to compute indeed - the program def T(x):return 1 will do the trick. However, mathematicians have triedunsuccesfully to prove this conjecture since 1849. That said, whetheror not we know how to implement the function TWINP, the definitionabove provides its specification.
6.1.1 Varying inputs and outputsMany of the functions we are interested in take more than one input.For example the function
MULT(đ„, đŠ) = đ„ â đŠ (6.2)that takes the binary representation of a pair of integers đ„, đŠ â â
and outputs the binary representation of their product đ„ â đŠ. How-ever, since we can represent a pair of strings as a single string, we willconsider functions such as MULT as mapping 0, 1â to 0, 1â. Wewill typically not be concerned with low-level details such as the pre-cise way to represent a pair of integers as a string, since essentially allchoices will be equivalent for our purposes.
220 introduction to theoretical computer science
Another example for a function we want to compute is
PALINDROME(đ„) = â§âšâ©1 âđâ[|đ„|]đ„đ = đ„|đ„|âđ0 otherwise(6.3)
PALINDROME has a single bit as output. Functions with a singlebit of output are known as Boolean functions. Boolean functions arecentral to the theory of computation, and we will come discuss themoften in this book. Note that even though Boolean functions have asingle bit of output, their input can be of arbitrary length. Thus theyare still infinite functions, that cannot be described via a finite table ofvalues.
âBooleanizingâ functions. Sometimes it might be convenient to ob-tain a Boolean variant for a non-Boolean function. For example, thefollowing is a Boolean variant of MULT.
BMULT(đ„, đŠ, đ) = â§âšâ©đđĄâ bit of đ„ â đŠ đ < |đ„ â đŠ|0 otherwise(6.4)
If we can compute BMULT via any programming language suchas Python, C, Java, etc. then we can compute MULT as well, and viceversa.Solved Exercise 6.1 â Booleanizing general functions. Show that for everyfunction đč ⶠ0, 1â â 0, 1â there exists a Boolean function BF â¶0, 1â â 0, 1 such that a Python program to compute BF can betransformed into a program to compute đč and vice versa.
Solution:
For every đč ⶠ0, 1â â 0, 1â, we can define
BF(đ„, đ, đ) = â§âšâ©đč(đ„)đ đ < |đč(đ„)|, đ = 01 đ < |đč(đ„)|, đ = 10 đ â„ |đ„| (6.5)
to be the function that on input đ„ â 0, 1â, đ â â, đ â 0, 1 out-puts the đđĄâ bit of đč(đ„) if đ = 0 and đ < |đ„|. If đ = 1 then BF(đ„, đ, đ)outputs 1 iff đ < |đč(đ„)| and hence this allows to compute the lengthof đč(đ„).
Computing BF from đč is straightforward. For the other direc-tion, given a Python function BF that computes BF, we can computeđč as follows:
def F(x):res = []
functions with infinite domains, automata, and regular expressions 221
i = 0while BF(x,i,1):
res.apppend(BF(x,i,0))i += 1
return res
6.1.2 Formal LanguagesFor every Boolean function đč ⶠ0, 1â â 0, 1, we can define theset đżđč = đ„|đč(đ„) = 1 of strings on which đč outputs 1. Such setsare known as languages. This name is rooted in formal language theoryas pursued by linguists such as Noam Chomsky. A formal languageis a subset đż â 0, 1â (or more generally đż â ÎŁâ for some finitealphabet ÎŁ). The membership or decision problem for a language đż, isthe task of determining, given đ„ â 0, 1â, whether or not đ„ â đż. Ifwe can compute the function đč then we can decide membership in thelanguage đżđč and vice versa. Hence, many texts such as [Sip97] referto the task of computing a Boolean function as âdeciding a languageâ.In this book we mostly describe computational task using the functionnotation, which is easier to generalize to computation with more thanone bit of output. However, since the language terminology is sopopular in the literature, we will sometimes mention it.
6.1.3 Restrictions of functionsIf đč ⶠ0, 1â â 0, 1 is a Boolean function and đ â â then the re-striction of đč to inputs of length đ, denoted as đčđ, is the finite functionđ ⶠ0, 1đ â 0, 1 such that đ(đ„) = đč(đ„) for every đ„ â 0, 1đ. Thatis, đčđ is the finite function that is only defined on inputs in 0, 1đ, butagrees with đč on those inputs. Since đčđ is a finite function, it can becomputed by a Boolean circuit, implying the following theorem:
Theorem 6.1 â Circuit collection for every infinite function. Let đč ⶠ0, 1â â0, 1. Then there is a collection đ¶đđâ1,2,⊠of circuits such thatfor every đ > 0, đ¶đ computes the restriction đčđ of đč to inputs oflength đ.
Proof. This is an immediate corollary of the universality of Booleancircuits. Indeed, since đčđ maps 0, 1đ to 0, 1, Theorem 4.15 impliesthat there exists a Boolean circuit đ¶đ to compute it. In fact. the size ofthis circuit is at most đ â 2đ/đ gates for some constant đ †10.
In particular, Theorem 6.1 implies that there exists such as circuitcollection đ¶đ even for the TWINP function we described before,
222 introduction to theoretical computer science
despite the fact that we donât know of any program to compute it. In-deed, this is not that surprising: for every particular đ â â, TWINPđis either the constant zero function or the constant one function, bothof which can be computed by very simple Boolean circuits. Hencea collection of circuits đ¶đ that computes TWINP certainly exists.The difficulty in computing TWINP using Python or any other pro-gramming language arises from the fact that we donât know for eachparticular đ what is the circuit đ¶đ in this collection.
6.2 DETERMINISTIC FINITE AUTOMATA (OPTIONAL)
All our computational models so far - Boolean circuits and straight-line programs - were only applicable for finite functions. In Chapter 7we will present Turing Machines, which are the central models of com-putation for functions of unbounded input length. However, in thissection we will present the more basic model of deterministic finiteautomata (DFA). Automata can serve as a good stepping-stone forTuring machines, though they will not be used much in later parts ofthis book, and so the reader can feel free to skip ahead fo Chapter 7.DFAs turn out to be equivalent in power to regular expressions, whichare a powerful mechanism to specify patterns that is widely used inpractice. Our treatment of automata is quite brief. There are plenty ofresources that help you get more comfortable with DFAâs. In particu-lar, Chapter 1 of Sipserâs book [Sip97] contains an excellent expositionof this material. There are also many websites with online simula-tors for automata, as well as translators from regular expressions toautomata and vice versa (see for example here and here).
At a high level, an algorithm is a recipe for computing an outputfrom an input via a combination of the following steps:
1. Read a bit from the input2. Update the state (working memory)3. Stop and produce an output
For example, recall the Python program that computes the XORfunction:
def XOR(X):'''Takes list X of 0's and 1's
Outputs 1 if the number of 1's is odd and outputs 0otherwise'''result = 0for i in range(len(X)):
result += X[i] % 2return result
functions with infinite domains, automata, and regular expressions 223
Figure 6.3: A deterministic finite automaton thatcomputes the XOR function. It has two states 0 and 1,and when it observes đ it transitions from đŁ to đŁ â đ.
In each step, this program reads a single bit X[i] and updatesits state result based on that bit (flipping result if X[i] is 1 andkeeping it the same otherwise). When its done transversing the input,the program outputs result. In computer science, such a program iscalled a single-pass constant-memory algorithm since it makes a singlepass over the input and its working memory is of finite size. (Indeedin this case result can either be 0 or 1.) Such an algorithm is alsoknown as a Deterministic Finite Automaton or DFA (another name forDFAâs is a finite state machine). We can think of such an algorithm asa âmachineâ that can be in one of đ¶ states, for some constant đ¶. Themachine starts in some initial state, and then reads its input đ„ â 0, 1âone bit at a time. Whenever the machine reads a bit đ â 0, 1, ittransitions into a new state based on đ and its prior state. The outputof the machine is based on the final state. Every single-pass constant-memory algorithm corresponds to such a machine. If an algorithmuses đ bits of memory, then the contents of its memory are a string oflength đ. Since there are 2đ such strings, at any point in the execution,such an algorithm can be in one of 2đ states.
We can specify a DFA of đ¶ states by a list of đ¶ â 2 rules. Each rulewill be of the form âIf the DFA is in state đŁ and the bit read from theinput is đ then the new state is đŁâČâ. At the end of the computationwe will also have a rule of the form âIf the final state is one of thefollowing ⊠then output 1, otherwise output 0â. For example, thePython program above can be represented by a two state automata forcomputing XOR of the following form:
âą Initialize in state 0âą For every state đ â 0, 1 and input bit đ read, if đ = 1 then change
to state 1 â đ , otherwise stay in state đ .âą At the end output 1 iff đ = 1.
We can also describe a đ¶-state DFA as a labeled graph of đ¶ vertices.For every state đ and bit đ, we add a directed edge labeled with đbetween đ and the state đ âČ such that if the DFA is at state đ and reads đthen it transitions to đ âČ. (If the state stays the same then this edge willbe a self loop; similarly, if đ transitions to đ âČ in both the case đ = 0 andđ = 1 then the graph will contain two parallel edges.) We also labelthe set đź of states on which the automate will output 1 at the end ofthe computation. This set is known as the set of accepting states. SeeFig. 6.3 for the graphical representation of the XOR automaton.
Formally, a DFA is specified by (1) the table of the đ¶ â 2 rules, whichcan be represented as a transition function đ that maps a state đ â [đ¶]and bit đ â 0, 1 to the state đ âČ â [đ¶] which the DFA will transition tofrom state đ on input đ and (2) the set đź of accepting states. This leadsto the following definition.
224 introduction to theoretical computer science
Definition 6.2 â Deterministic Finite Automaton. A deterministic finiteautomaton (DFA) with đ¶ states over 0, 1 is a pair (đ , đź) withđ ⶠ[đ¶] Ă 0, 1 â [đ¶] and đź â [đ¶]. The finite function đ is knownas the transition function of the DFA and the set đź is known as theset of accepting states.
Let đč ⶠ0, 1â â 0, 1 be a Boolean function with the infinitedomain 0, 1â. We say that (đ , đź) computes a function đč ⶠ0, 1â â0, 1 if for every đ â â and đ„ â 0, 1đ, if we define đ 0 = 0 andđ đ+1 = đ (đ đ, đ„đ) for every đ â [đ], thenđ đ â đź â đč(đ„) = 1 (6.6)
PMake sure not to confuse the transition function ofan automaton (đ in Definition 6.2), which is a finitefunction specifying the table of ârulesâ which it fol-lows, with the function the automaton computes (đč inDefinition 6.2) which is an infinite function.
RRemark 6.3 â Definitions in other texts. Deterministicfinite automata can be defined in several equivalentways. In particular Sipser [Sip97] defines a DFAs as afive-tuple (đ, ÎŁ, đż, đ0, đč ) where đ is the set of states,ÎŁ is the alphabet, đż is the transition function, đ0 isthe initial state, and đč is the set of accepting states.In this book the set of states is always of the formđ = 0, ⊠, đ¶ â 1 and the initial state is always đ0 = 0,but this makes no difference to the computationalpower of these models. Also, we restrict our attentionto the case that the alphabet ÎŁ is equal to 0, 1.
Solved Exercise 6.2 â DFA for (010)â. Prove that there is a DFA that com-putes the following function đč :
đč(đ„) = â§âšâ©1 3 divides |đ„| and âđâ[|đ„|/3]đ„đđ„đ+1đ„đ+2 = 0100 otherwise(6.7)
Solution:
When asked to construct a deterministic finite automaton, ithelps to start by thinking of it a single-pass constant-memory al-
functions with infinite domains, automata, and regular expressions 225
Figure 6.4: A DFA that outputs 1 only on inputsđ„ â 0, 1â that are a concatenation of zero or morecopies of 010. The state 0 is both the starting stateand the only accepting state. The table denotes thetransition function of đ , which maps the current stateand symbol read to the new symbol.
gorithm (for example, a Python program) and then translate thisprogram into a DFA. Here is a simple Python program for comput-ing đč :
def F(X):'''Return 1 iff X is a concatenation of zero/more
copies of [0,1,0]'''if len(X) % 3 != 0:
return Falseultimate = 0penultimate = 1antepenultimate = 0for idx, b in enumerate(X):
antepenultimate = penultimatepenultimate = ultimateultimate = bif idx % 3 == 2 and ((antepenultimate,
penultimate, ultimate) != (0,1,0)):return False
return True
Since we keep three Boolean variables, the working memory canbe in one of 23 = 8 configurations, and so the program above canbe directly translated into an 8 state DFA. While this is not neededto solve the question, by examining the resulting DFA, we can seethat we can merge some states together and obtain a 4 state au-tomaton, described in Fig. 6.4. See also Fig. 6.5, which depicts theexecution of this DFA on a particular input.
6.2.1 Anatomy of an automaton (finite vs. unbounded)Now that we consider computational tasks with unbounded inputsizes, it is crucially important to distinguish between the componentsof our algorithm that are fixed in size, and the components that growwith the size of the input. For the case of DFAs these are the follow-ing:
Constant size components: Given a DFA đŽ, the following quantities arefixed independent of the input size:
âą The number of states đ¶ in đŽ.
âą The transition function đ (which has 2đ¶ inputs, and so can be speci-fied by a table of 2đ¶ rows, each entry in which is a number in [đ¶]).
226 introduction to theoretical computer science
âą The set đź â [đ¶] of accepting states. There are at most 2đ¶ such states,each of which can be described by a string in 0, 1đ¶ specifiyingwhich state is in đź and which isnât
Together the above means that we can fully describe an automatonusing finitely many symbols. This is a property we require out of anynotion of âalgorithmâ: we should be able to write down a completelyspecification of how it produces an output from an input.
Components of unbounded size: The following quantities relating to aDFA are not bounded by any constant. We stress that these are stillfinite for any given input.
âą The length of the input đ„ â 0, 1â that the DFA is provided with. Itis always finite, but not bounded.
âą The number of steps that the DFA takes can grow with the length ofthe input. Indeed, a DFA makes a single pass on the input and so ittakes exactly |đ„| steps on an input đ„ â 0, 1â.
Figure 6.5: Execution of the DFA of Fig. 6.4. The num-ber of states and the size of the transition functionare bounded, but the input can be arbitrarily long. Ifthe DFA is at state đ and observes the value đ then itmoves to the state đ (đ , đ). At the end of the executionthe DFA accepts iff the final state is in đź.
6.2.2 DFA-computable functionsWe say that a function đč ⶠ0, 1â â 0, 1 is DFA computable if thereexists some DFA that computes đč . In Chapter 4 we saw that everyfinite function is computable by some Boolean circuit. Thus, at thispoint you might expect that every infinite function is computable bysome DFA. However, this is very much not the case. We will soon seesome simple examples of infinite functions that are not computable byDFAs, but for starters, letâs prove that such functions exist.
functions with infinite domains, automata, and regular expressions 227
Theorem 6.4 â DFA-computable functions are countable. Let DFACOMP bethe set of all Boolean functions đč ⶠ0, 1â â 0, 1 such that thereexists a DFA computing đč . Then DFACOMP is countable.
Proof Idea:
Every DFA can be described by a finite length string, which yieldsan onto map from 0, 1â to DFACOMP: namely the function thatmaps a string describing an automaton đŽ to the function that it com-putes.âProof of Theorem 6.4. Every DFA can be described by a finite string,representing the transition function đ and the set of accepting states,and every DFA đŽ computes some function đč ⶠ0, 1â â 0, 1. Thuswe can define the following function đđĄđ·đ¶ ⶠ0, 1â â DFACOMP:
đđĄđ·đ¶(đ) = â§âšâ©đč đ represents automaton đŽ and đč is the function đŽ computesONE otherwise
(6.8)where ONE ⶠ0, 1â â 0, 1 is the constant function that outputs1 on all inputs (and is a member of DFACOMP). Since by definition,every function đč in DFACOMP is computable by some automaton,đđĄđ·đ¶ is an onto function from 0, 1â to DFACOMP, which meansthat DFACOMP is countable (see Section 2.4.2).
Since the set of all Boolean functions is uncountable, we get thefollowing corollary:
Theorem 6.5 â Existence of DFA-uncomputable functions. There exists aBoolean function đč ⶠ0, 1â â 0, 1 that is not computable by anyDFA.
Proof. If every Boolean function đč is computable by some DFA thenDFACOMP equals the set ALL of all Boolean functions, but by Theo-rem 2.12, the latter set is uncountable, contradicting Theorem 6.4.
6.3 REGULAR EXPRESSIONS
Searching for a piece of text is a common task in computing. At itsheart, the search problem is quite simple. We have a collection đ =đ„0, ⊠, đ„đ of strings (e.g., files on a hard-drive, or student records ina database), and the user wants to find out the subset of all the đ„ â đ
228 introduction to theoretical computer science
that are matched by some pattern (e.g., all files whose names end withthe string .txt). In full generality, we can allow the user to specify thepattern by specifying a (computable) function đč ⶠ0, 1â â 0, 1,where đč(đ„) = 1 corresponds to the pattern matching đ„. That is, theuser provides a program đ in some Turing-complete programminglanguage such as Python, and the system will return all the đ„ â đsuch that đ(đ„) = 1. For example, one could search for all text filesthat contain the string important document or perhaps (letting đcorrespond to a neural-network based classifier) all images that con-tain a cat. However, we donât want our system to get into an infiniteloop just trying to evaluate the program đ ! For this reason, typicalsystems for searching files or databases do not allow users to specifythe patterns using full-fledged programming languages. Rather, suchsystems use restricted computational models that on the one hand arerich enough to capture many of the queries needed in practice (e.g., allfilenames ending with .txt, or all phone numbers of the form (617)xxx-xxxx), but on the other hand are restricted enough so the queriescan be evaluated very efficiently on huge files and in particular cannotresult in an infinite loop.
One of the most popular such computational models is regularexpressions. If you ever used an advanced text editor, a command lineshell, or have done any kind of manipulation of text files, then youhave probably come across regular expressions.
A regular expression over some alphabet ÎŁ is obtained by combin-ing elements of ÎŁ with the operation of concatenation, as well as |(corresponding to or) and â (corresponding to repetition zero ormore times). (Common implementations of regular expressions inprogramming languages and shells typically include some extra oper-ations on top of | and â, but these operations can be implemented asâsyntactic sugarâ using the operators | and â.) For example, the fol-lowing regular expression over the alphabet 0, 1 corresponds to theset of all strings đ„ â 0, 1â where every digit is repeated at least twice:(00(0â)|11(1â))â . (6.9)
The following regular expression over the alphabet đ, ⊠, đ§, 0, ⊠, 9corresponds to the set of all strings that consist of a sequence of oneor more of the letters đ-đ followed by a sequence of one or more digits(without a leading zero):
(đ|đ|đ|đ)(đ|đ|đ|đ)â(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)â . (6.10)
Formally, regular expressions are defined by the following recursivedefinition:
functions with infinite domains, automata, and regular expressions 229
Definition 6.6 â Regular expression. A regular expression đ over an al-phabet ÎŁ is a string over ÎŁ âȘ (, ), |, â,â , "" that has one of thefollowing forms:
1. đ = đ where đ â ÎŁ2. đ = (đâČ|đâł) where đâČ, đâł are regular expressions.
3. đ = (đâČ)(đâł) where đâČ, đâł are regular expressions. (We oftendrop the parentheses when there is no danger of confusion andso write this as đâČ đâł.)
4. đ = (đâČ)â where đâČ is a regular expression.
Finally we also allow the following âedge casesâ: đ = â andđ = "". These are the regular expressions corresponding to accept-ing no strings, and accepting only the empty string respectively.
We will drop parenthesis when they can be inferred from thecontext. We also use the convention that OR and concatenation areleft-associative, and we give highest precedence to â, then concate-nation, and then OR. Thus for example we write 00â|11 instead of((0)(0â))|((1)(1)).
Every regular expression đ corresponds to a function Ίđ ⶠΣâ â0, 1 where Ίđ(đ„) = 1 if đ„ matches the regular expression. For exam-ple, if đ = (00|11)â then Ίđ(110011) = 1 but Ίđ(101) = 0 (can you seewhy?).
PThe formal definition of Ίđ is one of those definitionsthat is more cumbersome to write than to grasp. Thusit might be easier for you to first work it out on yourown and then check that your definition matches whatis written below.
Definition 6.7 â Matching a regular expression. Let đ be a regular expres-sion over the alphabet ÎŁ. The function Ίđ ⶠΣâ â 0, 1 is definedas follows:
1. If đ = đ then Ίđ(đ„) = 1 iff đ„ = đ.2. If đ = (đâČ|đâł) then Ίđ(đ„) = ΊđâČ(đ„)âšÎŠđâł(đ„) where âš is the OR op-
erator.
230 introduction to theoretical computer science
3. If đ = (đâČ)(đâł) then Ίđ(đ„) = 1 iff there is some đ„âČ, đ„âł â ÎŁâ suchthat đ„ is the concatenation of đ„âČ and đ„âł and ΊđâČ(đ„âČ) = Ίđâł(đ„âł) =1.
4. If đ = (đâČ)â then Ίđ(đ„) = 1 iff there is some đ â â and someđ„0, ⊠, đ„đâ1 â ÎŁâ such that đ„ is the concatenation đ„0 ⯠đ„đâ1 andΊđâČ(đ„đ) = 1 for every đ â [đ].5. Finally, for the edge cases Ίâ is the constant zero function, andΊ"" is the function that only outputs 1 on the empty string "".
We say that a regular expression đ over ÎŁ matches a string đ„ â ÎŁâif Ίđ(đ„) = 1.
PThe definitions above are not inherently difficult, butare a bit cumbersome. So you should pause here andgo over it again until you understand why it corre-sponds to our intuitive notion of regular expressions.This is important not just for understanding regularexpressions themselves (which are used time andagain in a great many applications) but also for get-ting better at understanding recursive definitions ingeneral.
A Boolean function is called âregularâ if it outputs 1 on preciselythe set of strings that are matched by some regular expression. That is,
Definition 6.8 â Regular functions / languages. Let ÎŁ be a finite set andđč ⶠΣâ â 0, 1 be a Boolean function. We say that đč is regular ifđč = Ίđ for some regular expression đ.Similarly, for every formal language đż â ÎŁâ, we say that đż is reg-
ular if and only if there is a regular expression đ such that đ„ â đż iffđ matches đ„. Example 6.9 â A regular function. Let ÎŁ = đ, đ, đ, đ, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9and đč ⶠΣâ â 0, 1 be the function such that đč(đ„) outputs 1 iffđ„ consists of one or more of the letters đ-đ followed by a sequenceof one or more digits (without a leading zero). Then đč is a regularfunction, since đč = Ίđ wheređ = (đ|đ|đ|đ)(đ|đ|đ|đ)â(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)â
(6.11)is the expression we saw in (6.10).
functions with infinite domains, automata, and regular expressions 231
If we wanted to verify, for example, that Ίđ(đđđ12078) = 1,we can do so by noticing that the expression (đ|đ|đ|đ) matchesthe string đ, (đ|đ|đ|đ)â matches đđ, (0|1|2|3|4|5|6|7|8|9) matches thestring 1, and the expression (0|1|2|3|4|5|6|7|8|9)â matches the string2078. Each one of those boils down to a simpler expression. For ex-ample, the expression (đ|đ|đ|đ)â matches the string đđ because bothof the one-character strings đ and đ are matched by the expressionđ|đ|đ|đ.Regular expression can be defined over any finite alphabet ÎŁ, but
as usual, we will mostly focus our attention on the binary case, whereÎŁ = 0, 1. Most (if not all) of the theoretical and practical generalinsights about regular expressions can be gleaned from studying thebinary case.
6.3.1 Algorithms for matching regular expressionsRegular expressions would not be very useful for search if we couldnot actually evaulate, given a regular expression đ, whether a string đ„is matched by đ. Luckily, there is an algorithm to do so. Specifically,there is an algorithm (think âPython programâ though later we willformalize the notion of algorithms using Turing machines) that oninput a regular expression đ over the alphabet 0, 1 and a string đ„ â0, 1â, outputs 1 iff đ matches đ„ (i.e., outputs Ίđ(đ„)).
Indeed, Definition 6.7 actually specifies a recursive algorithm forcomputing Ίđ. Specifically, each one of our operations -concatenation,OR, and star- can be thought of as reducing the task of testing whetheran expression đ matches a string đ„ to testing whether some sub-expressions of đ match substrings of đ„. Since these sub-expressionsare always shorter than the original expression, this yields a recursivealgorithm for checking if đ matches đ„ which will eventually terminateat the base cases of the expressions that correspond to a single symbolor the empty string.
232 introduction to theoretical computer science
Algorithm 6.10 â Regular expression matching.
Input: Regular expression đ over ÎŁâ, đ„ â ÎŁâOutput: Ίđ(đ„)1: procedure Match(đ,đ„)2: if đ = â then return 0 ;3: if đ„ = "" then return MatchEmpty(()đ) ;4: if đ â ÎŁ then return 1 iff đ„ = đ ;5: if đ = (đâČ|đâł) then return Match(đâČ, đ„) or Match(đâł, đ„)
;6: if đ = (đâČ)(đâł) then7: for đ â [|đ„| + 1] do8: if Match(đâČ, đ„0 ⯠đ„đâ1) and Match(đâł, đ„đ ⯠đ„|đ„|â1)
then return 1 ;9: end for
10: end if11: if đ = (đâČ)â then12: if đâČ = "" then return Match("", đ„) ;13: # ("")â is the same as ""14: for đ â [|đ„|] do15: # đ„0 ⯠đ„đâ1 is shorter than đ„16: if Match(đ, đ„0 ⯠đ„đâ1) and Match(đâČ, đ„đ ⯠đ„|đ„|â1)
then return 1 ;17: end for18: end if19: return 020: end procedure
We assume above that we have a procedure MatchEmpty thaton input a regular expression đ outputs 1 if and only if đ matches theempty string "".
The key observation is that in our recursive definition of regular ex-pressions, whenever đ is made up of one or two expressions đâČ, đâł thenthese two regular expressions are smaller than đ. Eventually (whenthey have size 1) then they must correspond to the non-recursivecase of a single alphabet symbol. Correspondingly, the recursive callsmade in Algorithm 6.10 always correspond to a shorter expression or(in the case of an expression of the form (đâČ)â) a shorter input string.Thus, we can prove the correctness of Algorithm 6.10 on inputs ofthe form (đ, đ„) by induction over min|đ|, |đ„|. The base case is wheneither đ„ = "" or đ is a single alphabet symbol, "" or â . In the casethe expression is of the forrm đ = (đâČ|đâł) or đ = (đâČ)(đâł) then wemake recursive calls with the shorter expressions đâČ, đâł. In the casethe expression is of the form đ = (đâČ)â we make recursive calls with
functions with infinite domains, automata, and regular expressions 233
either a shorter string đ„ and the same expression, or with the shorterexpression đâČ and a string đ„âČ that is equal in length or shorter than đ„.Solved Exercise 6.3 â Match the empty string. Give an algorithm that oninput a regular expression đ, outputs 1 if and only if Ίđ("") = 1.
Solution:
We can obtain such a recursive algorithm by using the followingobservations:
1. An expression of the form "" or (đâČ)â always matches the emptystring.
2. An expression of the form đ, where đ â ÎŁ is an alphabet sym-bol, never matches the empty string.
3. The regular expression â does not match the empty string.
4. An expression of the form đâČ|đâł matches the empty string if andonly if one of đâČ or đâł matches it.
5. An expression of the form (đâČ)(đâł) matches the empty string ifand only if both đâČ and đâł match it.
Given the above observations, we see that the following algo-rithm will check if đ matches the empty string:
procedureMatchEmptyđ lIf đ = â return 0 lendif lIfđ = "" return 1 lendif lIf đ = â or đ â ÎŁ return 0 lendif lIfđ = (đâČ|đâł) return đđđĄđâđžđđđĄđŠ(đâČ) or đđđĄđâđžđđđĄđŠ(đâł) lendifLIf đ = (đâČ)(đâČ) return đđđĄđâđžđđđĄđŠ(đâČ) or đđđĄđâđžđđđĄđŠ(đâł)lendif lIf đ = (đâČ)â return 1 lendif endprocedure
6.4 EFFICIENT MATCHING OF REGULAR EXPRESSIONS (OP-TIONAL)
Algorithm 6.10 is not very efficient. For example, given an expressioninvolving concatenation or the âstarâ operation and a string of lengthđ, it can make đ recursive calls, and hence it can be shown that in theworst case Algorithm 6.10 can take time exponential in the length ofthe input string đ„. Fortunately, it turns out that there is a much moreefficient algorithm that can match regular expressions in linear (i.e.,đ(đ)) time. Since we have not yet covered the topics of time and spacecomplexity, we describe this algorithm in high level terms, withoutmaking the computational model precise, using the colloquial no-tion of đ(đ) running time as is used in introduction to programming
234 introduction to theoretical computer science
courses and whiteboard coding interviews. We will see a formal defi-nition of time complexity in Chapter 13.
Theorem 6.11 â Matching regular expressions in linear time. Let đ be aregular expression. Then there is an đ(đ) time algorithm thatcomputes Ίđ.The implicit constant in the đ(đ) term of Theorem 6.11 depends
on the expression đ. Thus, another way to state Theorem 6.11 is thatfor every expression đ, there is some constant đ and an algorithm đŽthat computes Ίđ on đ-bit inputs using at most đ â đ steps. This makessense, since in practice we often want to compute Ίđ(đ„) for a smallregular expression đ and a large document đ„. Theorem 6.11 tells usthat we can do so with running time that scales linearly with the sizeof the document, even if it has (potentially) worse dependence on thesize of the regular expression.
We prove Theorem 6.11 by obtaining more efficient recursive al-gorithm, that determines whether đ matches a string đ„ â 0, 1đ byreducing this task to determining whether a related expression đâČmatches đ„0, ⊠, đ„đâ2. This will result in an expression for the runningtime of the form đ (đ) = đ (đ â 1) + đ(1) which solves to đ (đ) = đ(đ).
Restrictions of regular expressions. The central definition for the algo-rithm behind Theorem 6.11 is the notion of a restriction of a regularexpression. The idea is that for every regular expression đ and symbolđ in its alphabet, it is possible to define a regular expression đ[đ] suchthat đ[đ] matches a string đ„ if and only if đ matches the string đ„đ. Forexample, if đ is the regular expression 01|(01) â (01) (i.e., one or moreoccurrences of 01) then đ[1] is equal to 0|(01) â 0 and đ[0] will be â .(Can you see why?)
Algorithm 6.12 computes the resriction đ[đ] given a regular ex-pression đ and an alphabet symbol đ. It always terminates, since therecursive calls it makes are always on expressions smaller than theinput expression. Its correctness can be proven by induction on thelength of the regular expression đ, with the base cases being when đ is"", â , or a single alphabet symbol đ .
functions with infinite domains, automata, and regular expressions 235
Algorithm 6.12 â Restricting regular expression.
Input: Regular expression đ over ÎŁ, symbol đ â ÎŁOutput: Regular expression đâČ = đ[đ] such that ΊđâČ(đ„) =Ίđ(đ„đ) for every đ„ â ÎŁâ1: procedure Restrict(đ,đ)2: if đ = "" or đ = â then return â ;3: if đ = đ for đ â ÎŁ then return "" if đ = đ and returnâ otherwise ;4: if đ = (đâČ|đâł) then return (Restrict(đâČ, đ)|Restrict(đâł, đ))
;5: if đ = (đâČ)â then return (đâČ)â(Restrict(đâČ, đ)) ;6: if đ = (đâČ)(đâł) and Ίđâł("") = 0 then return(đâČ)(Restrict(đâł, đ)) ;7: if đ = (đâČ)(đâł) and Ίđâł("") = 1 then return(đâČ)(Restrict(đâł, đ) | Restrict(đâČ, đ)) ;8: end procedure
Using this notion of restriction, we can define the following recur-sive algorithm for regular expression matching:
Algorithm 6.13 â Regular expression matching in linear time.
Input: Regular expression đ over ÎŁâ, đ„ â ÎŁđ where đ â âOutput: Ίđ(đ„)1: procedure FMatch(đ,đ„)2: if đ„ = "" then return MatchEmpty(()đ) ;3: Let đâČ â Restrict(đ, đ„đâ2)4: return FMatch(đâČ, đ„0 ⯠đ„đâ1)5: end procedure
By the definition of a restriction, for every đ â ÎŁ and đ„âČ â ÎŁâ,the expression đ matches đ„âČđ if and only if đ[đ] matches đ„âČ. Hence forevery đ and đ„ â ÎŁđ, Ίđ[đ„đâ1](đ„0 ⯠đ„đâ2) = Ίđ(đ„) and Algorithm 6.13does return the correct answer. The only remaining task is to analyzeits running time. Note that Algorithm 6.13 uses the MatchEmptyprocedure of Solved Exercise 6.3 in the base case that đ„ = "". However,this is OK since the running time of this procedure depends only on đand is independent of the length of the original input.
For simplicity, let us restrict our attention to the case that the al-phabet ÎŁ is equal to 0, 1. Define đ¶(â) to be the maximum numberof operations that Algorithm 6.12 takes when given as input a regularexpression đ over 0, 1 of at most â symbols. The value đ¶(â) can beshown to be polynomial in â, though this is not important for this the-orem, since we only care about the dependence of the time to compute
236 introduction to theoretical computer science
Ίđ(đ„) on the length of đ„ and not about the dependence of this time onthe length of đ.
Algorithm 6.13 is a recursive algorithm that input an expressionđ and a string đ„ â 0, 1đ, does computation of at most đ¶(|đ|) stepsand then calls itself with input some expression đâČ and a string đ„âČ oflength đ â 1. It will terminate after đ steps when it reaches a string oflength 0. So, the running time đ (đ, đ) that it takes for Algorithm 6.13to compute Ίđ for inputs of length đ satisfies the recursive equation:
đ (đ, đ) = maxđ (đ[0], đ â 1), đ (đ[1], đ â 1) + đ¶(|đ|) (6.12)
(In the base case đ = 0, đ (đ, 0) is equal to some constant dependingonly on đ.) To get some intuition for the expression Eq. (6.12), let usopen up the recursion for one level, writing đ (đ, đ) asđ (đ, đ) = maxđ (đ[0][0], đ â 2) + đ¶(|đ[0]|),đ (đ[0][1], đ â 2) + đ¶(|đ[0]|),đ (đ[1][0], đ â 2) + đ¶(|đ[1]|),đ (đ[1][1], đ â 2) + đ¶(|đ[1]|) + đ¶(|đ|) . (6.13)
Continuing this way, we can see that đ (đ, đ) †đ â đ¶(đż) + đ(1)where đż is the largest length of any expression đâČ that we encounteralong the way. Therefore, the following claim suffices to show thatAlgorithm 6.13 runs in đ(đ) time:
Claim: Let đ be a regular expression over 0, 1, then there is a num-ber đż(đ) â â, such that for every sequence of symbols đŒ0, ⊠, đŒđâ1, ifwe define đâČ = đ[đŒ0][đŒ1] ⯠[đŒđâ1] (i.e., restricting đ to đŒ0, and then đŒ1and so on and so forth), then |đâČ| †đż(đ).
Proof of claim: For a regular expression đ over 0, 1 and đŒ â 0, 1đ,we denote by đ[đŒ] the expression đ[đŒ0][đŒ1] ⯠[đŒđâ1] obtained by restrict-ing đ to đŒ0 and then to đŒ1 and so on. We let đ(đ) = đ[đŒ]|đŒ â 0, 1â.We will prove the claim by showing that for every đ, the set đ(đ) is fi-nite, and hence so is the number đż(đ) which is the maximum length ofđâČ for đâČ â đ(đ).We prove this by induction on the structure of đ. If đ is a symbol, theempty string, or the empty set, then this is straightforward to showas the most expressions đ(đ) can contain are the expression itself, "",and â . Otherwise we split to the two cases (i) đ = đâČâ and (ii) đ =đâČđâł, where đâČ, đâł are smaller expressions (and hence by the inductionhypothesis đ(đâČ) and đ(đâł) are finite). In the case (i), if đ = (đâČ)â thenđ[đŒ] is either equal to (đâČ)âđâČ[đŒ] or it is simply the empty set if đâČ[đŒ] = â .Since đâČ[đŒ] is in the set đ(đâČ), the number of distinct expressions inđ(đ) is at most |đ(đâČ)| + 1. In the case (ii), if đ = đâČđâł then all therestrictions of đ to strings đŒ will either have the form đâČđâł[đŒ] or the formđâČđâł[đŒ]|đâČ[đŒâČ] where đŒâČ is some string such that đŒ = đŒâČđŒâł and đâł[đŒâł]
functions with infinite domains, automata, and regular expressions 237
matches the empty string. Since đâł[đŒ] â đ(đâł) and đâČ[đŒâČ] â đ(đâČ), thenumber of the possible distinct expressions of the form đ[đŒ] is at most|đ(đâł)| + |đ(đâł)| â |đ(đâČ)|. This completes the proof of the claim.
The bottom line is that while running Algorithm 6.13 on a regularexpression đ, all the expressions we ever encounter are in the finite setđ(đ), no matter how large the input đ„ is, and so the running time ofAlgorithm 6.13 satisfies the equation đ (đ) = đ (đ â 1) + đ¶âČ for someconstant đ¶âČ depending on đ. This solves to đ(đ) where the implicitconstant in the O notation can (and will) depend on đ but crucially,not on the length of the input đ„.6.4.1 Matching regular expressions using DFAsTheorem 6.11 is already quite impressive, but we can do even better.Specifically, no matter how long the string đ„ is, we can compute Ίđ(đ„)by maintaining only a constant amount of memory and moreovermaking a single pass over đ„. That is, the algorithm will scan the inputđ„ once from start to finish, and then determine whether or not đ„ ismatched by the expression đ. This is important in the common caseof trying to match a short regular expression over a huge file or docu-ment that might not even fit in our computerâs memory. Of course, aswe have seen before, a single-pass constant-memory algorithm is sim-ply a deterministic finite automaton. As we will see in Theorem 6.16, afunction is can be computed by regular expression if and only if it canbe computed by a DFA. We start with showing the âonly ifâ direction:
Theorem 6.14 â DFA for regular expression matching. Let đ be a regularexpression. Then there is an algorithm that on input đ„ â 0, 1âcomputes Ίđ(đ„) while making a single pass over đ„ and maintaininga constant amount of memory.
Proof Idea:
The single-pass constant-memory for checking if a string matchesa regular expression is presented in Algorithm 6.15. The idea is toreplace the recursive algorithm of Algorithm 6.13 with a dynamic pro-gram, using the technique of memoization. If you havenât taken yet analgorithms course, you might not know these techniques. This is OK;while this more efficient algorithm is crucial for the many practicalapplications of regular expressions, it is not of great importance forthis book.â
238 introduction to theoretical computer science
Algorithm 6.15 â Regular expression matching by a DFA.
Input: Regular expression đ over ÎŁâ, đ„ â ÎŁđ where đ â âOutput: Ίđ(đ„)1: procedure DFAMatch(đ,đ„)2: Let đ â đ(đ) be the set đ[đŒ]|đŒ â 0, 1â as defined
in the proof of [reglintimethm]().ref.3: for đâČ â đ do4: Let đŁđâČ â 1 if ΊđâČ("") = 1 and đŁđâČ â 0 otherwise5: end for6: for đ â [đ] do7: Let đđđ đĄđâČ â đŁđâČ for all đâČ â đ8: Let đŁđâČ â đđđ đĄđâČ[đ„đ] for all đâČ â đ9: end for
10: return đŁđ11: end procedure
Proof of Theorem 6.14. Algorithm 6.15 checks if a given string đ„ â ÎŁâis matched by the regular expression đ. For every regular expres-sion đ, this algorithm has a constant number 2|đ(đ)| Boolean vari-ables (đŁđâČ , đđđ đĄđâČ for đâČ â đ(đ)), and it makes a single pass overthe input string. Hence it corresponds to a DFA. We prove its cor-rectness by induction on the length đ of the input. Specifically, wewill argue that before reading the đ-th bit of đ„, the variable đŁđâČ isequal to ΊđâČ(đ„0 ⯠đ„đâ1) for every đâČ â đ(đ). In the case đ = 0 thisholds since we initialize đŁđâČ = ΊđâČ("") for all đâČ â đ(đ). For đ > 0this holds by induction since the inductive hypothesis implies thatđđđ đĄâČđ = ΊđâČ(đ„0 ⯠đ„đâ2) for all đâČ â đ(đ) and by the definition of the setđ(đâČ), for every đâČ â đ(đ) and đ„đâ1 â ÎŁ, đâł = đâČ[đ„đâ1] is in đ(đ) andΊđâČ(đ„0 ⯠đ„đâ1) = Ίđâł(đ„0 ⯠đ„đ).
6.4.2 Equivalence of regular expressions and automataRecall that a Boolean function đč ⶠ0, 1â â 0, 1 is defined to beregular if it is equal to Ίđ for some regular expression đ. (Equivalently,a language đż â 0, 1â is defined to be regular if there is a regularexpression đ such that đ matches đ„ iff đ„ â đż.) The following theorem isthe central result of automata theory:
Theorem 6.16 â DFA and regular expression equivalency. Let đč ⶠ0, 1â â0, 1. Then đč is regular if and only if there exists a DFA (đ , đź) thatcomputes đč .
Proof Idea:
functions with infinite domains, automata, and regular expressions 239
Figure 6.6: A deterministic finite automaton thatcomputes the function Ί(01)â .
Figure 6.7: Given a DFA of đ¶ states, for every đŁ, đ€ â[đ¶] and number đĄ â 0, ⊠, đ¶ we define the functionđč đĄđŁ,đ€ ⶠ0, 1â â 0, 1 to output one on inputđ„ â 0, 1â if and only if when the DFA is initializedin the state đŁ and is given the input đ„, it will reach thestate đ€ while going only through the intermediatestates 0, ⊠, đĄ â 1.
One direction follows from Theorem 6.14, which shows that forevery regular expression đ, the function Ίđ can be computed by a DFA(see for example Fig. 6.6). For the other direction, we show that givena DFA (đ , đź) for every đŁ, đ€ â [đ¶] we can find a regular expression thatwould match đ„ â 0, 1â if and only if the DFA starting in state đŁ, willend up in state đ€ after reading đ„.âProof of Theorem 6.16. Since Theorem 6.14 proves the âonly ifâ direc-tion, we only need to show the âifâ direction. Let đŽ = (đ , đź) be a DFAwith đ¶ states that computes the function đč . We need to show that đč isregular.
For every đŁ, đ€ â [đ¶], we let đčđŁ,đ€ ⶠ0, 1â â 0, 1 be the functionthat maps đ„ â 0, 1â to 1 if and only if the DFA đŽ, starting at thestate đŁ, will reach the state đ€ if it reads the input đ„. We will prove thatđčđŁ,đ€ is regular for every đŁ, đ€. This will prove the theorem, since byDefinition 6.2, đč(đ„) is equal to the OR of đč0,đ€(đ„) for every đ€ â đź.Hence if we have a regular expression for every function of the formđčđŁ,đ€ then (using the | operation) we can obtain a regular expressionfor đč as well.
To give regular expressions for the functions đčđŁ,đ€, we start bydefining the following functions đč đĄđŁ,đ€: for every đŁ, đ€ â [đ¶] and0 †đĄ †đ¶, đč đĄđŁ,đ€(đ„) = 1 if and only if starting from đŁ and observ-ing đ„, the automata reaches đ€ with all intermediate states being in the set[đĄ] = 0, ⊠, đĄ â 1 (see Fig. 6.7). That is, while đŁ, đ€ themselves mightbe outside [đĄ], đč đĄđŁ,đ€(đ„) = 1 if and only if throughout the execution ofthe automaton on the input đ„ (when initiated at đŁ) it never enters anyof the states outside [đĄ] and still ends up at đ€. If đĄ = 0 then [đĄ] is theempty set, and hence đč 0đŁ,đ€(đ„) = 1 if and only if the automaton reachesđ€ from đŁ directly on đ„, without any intermediate state. If đĄ = đ¶ thenall states are in [đĄ], and hence đč đĄđŁ,đ€ = đčđŁ,đ€.
We will prove the theorem by induction on đĄ, showing that đč đĄđŁ,đ€ isregular for every đŁ, đ€ and đĄ. For the base case of đĄ = 0, đč 0đŁ,đ€ is regularfor every đŁ, đ€ since it can be described as one of the expressions "", â ,0, 1 or 0|1. Specifically, if đŁ = đ€ then đč 0đŁ,đ€(đ„) = 1 if and only if đ„ isthe empty string. If đŁ â đ€ then đč 0đŁ,đ€(đ„) = 1 if and only if đ„ consistsof a single symbol đ â 0, 1 and đ (đŁ, đ) = đ€. Therefore in this caseđč 0đŁ,đ€ corresponds to one of the four regular expressions 0|1, 0, 1 or â ,depending on whether đŽ transitions to đ€ from đŁ when it reads either 0or 1, only one of these symbols, or neither.
Inductive step: Now that weâve seen the base case, letâs prove thegeneral case by induction. Assume, via the induction hypothesis, thatfor every đŁâČ, đ€âČ â [đ¶], we have a regular expression đ đĄđŁ,đ€ that computesđč đĄđŁâČ,đ€âČ . We need to prove that đč đĄ+1đŁ,đ€ is regular for every đŁ, đ€. If the
240 introduction to theoretical computer science
automaton arrives from đŁ to đ€ using the intermediate states [đĄ + 1],then it visits the đĄ-th state zero or more times. If the path labeled by đ„causes the automaton to get from đŁ to đ€ without visiting the đĄ-th stateat all, then đ„ is matched by the regular expression đ đĄđŁ,đ€. If the pathlabeled by đ„ causes the automaton to get from đŁ to đ€ while visiting theđĄ-th state đ > 0 times then we can think of this path as:âą First travel from đŁ to đĄ using only intermediate states in [đĄ â 1].âą Then go from đĄ back to itself đ â 1 times using only intermediate
states in [đĄ â 1]âą Then go from đĄ to đ€ using only intermediate states in [đĄ â 1].
Therefore in this case the string đ„ is matched by the regular expres-sion đ đĄđŁ,đĄ(đ đĄđĄ,đĄ)âđ đĄđĄ,đ€. (See also Fig. 6.8.)
Therefore we can compute đč đĄ+1đŁ,đ€ using the regular expressionđ đĄđŁ,đ€ | đ đĄđŁ,đĄ(đ đĄđĄ,đĄ)âđ đĄđĄ,đ€ . (6.14)This completes the proof of the inductive step and hence of the theo-rem.
Figure 6.8: If we have regular expressions đ đĄđŁâČ,đ€âČcorresponding to đč đĄđŁâČ,đ€âČ for every đŁâČ, đ€âČ â [đ¶], we canobtain a regular expression đ đĄ+1đŁ,đ€ corresponding tođč đĄ+1đŁ,đ€ . The key observation is that a path from đŁ to đ€using 0, ⊠, đĄ either does not touch đĄ at all, in whichcase it is captured by the expression đ đĄđŁ,đ€, or it goesfrom đŁ to đĄ, comes back to đĄ zero or more times, andthen goes from đĄ to đ€, in which case it is captured bythe expression đ đĄđŁ,đĄ(đ đĄđĄ,đĄ)âđ đĄđĄ,đ€.
6.4.3 Closure properties of regular expressionsIf đč and đș are regular functions computed by the expressions đ and đrespectively, then the expression đ|đ computes the function đ» = đč âš đșdefined as đ»(đ„) = đč(đ„) âš đș(đ„). Another way to say this is that the setof regular functions is closed under the OR operation. That is, if đč and đșare regular then so is đč âš đș. An important corollary of Theorem 6.16is that this set is also closed under the NOT operation:
functions with infinite domains, automata, and regular expressions 241
Lemma 6.17 â Regular expressions closed under complement. If đč ⶠ0, 1â â0, 1 is regular then so is the function đč , where đč(đ„) = 1 â đč(đ„) forevery đ„ â 0, 1â.Proof. If đč is regular then by Theorem 6.11 it can be computed by aDFA đŽ = (đ , đ) with some đ¶ states. But then the DFA đŽ = (đ , [đ¶]⧔đ)which does the same computation but where flips the set of acceptedstates will compute đč . By Theorem 6.16 this implies that đč is regularas well.
Since đ ⧠đ = đ âš đ, Lemma 6.17 implies that the set of regularfunctions is closed under the AND operation as well. Moreover, sinceOR, NOT and AND are a universal basis, this set is also closed un-der NAND, XOR, and any other finite function. That is, we have thefollowing corollary:
Theorem 6.18 â Closure of regular expressions. Let đ ⶠ0, 1đ â 0, 1 beany finite Boolean function, and let đč0, ⊠, đčđâ1 ⶠ0, 1â â 0, 1 beregular functions. Then the function đș(đ„) = đ(đč0(đ„), đč1(đ„), ⊠, đčđâ1(đ„))is regular.
Proof. This is a direct consequence of the closure of regular functionsunder OR and NOT (and hence AND) and Theorem 4.13, that statesthat every đ can be computed by a Boolean circuit (which is simply acombination of the AND, OR, and NOT operations).
6.5 LIMITATIONS OF REGULAR EXPRESSIONS AND THE PUMPINGLEMMA
The efficiency of regular expression matching makes them very useful.This is the reason why operating systems and text editors often restricttheir search interface to regular expressions and donât allow searchingby specifying an arbitrary function. But this efficiency comes at a cost.As we have seen, regular expressions cannot compute every function.In fact there are some very simple (and useful!) functions that theycannot compute. Here is one example:Lemma 6.19 â Matching parenthesis. Let ÎŁ = âš, â© and MATCHPAREN â¶ÎŁâ â 0, 1 be the function that given a string of parentheses, out-puts 1 if and only if every opening parenthesis is matched by a corre-sponding closed one. Then there is no regular expression over ÎŁ thatcomputes MATCHPAREN.
Lemma 6.19 is a consequence of the following result, which isknown as the pumping lemma:
242 introduction to theoretical computer science
Theorem 6.20 â Pumping Lemma. Let đ be a regular expression oversome alphabet ÎŁ. Then there is some number đ0 such that for ev-ery đ€ â ÎŁâ with |đ€| > đ0 and Ίđ(đ€) = 1, we can write đ€ = đ„đŠđ§ forstrings đ„, đŠ, đ§ â ÎŁâ satisfying the following conditions:
1. |đŠ| â„ 1.2. |đ„đŠ| †đ0.3. Ίđ(đ„đŠđđ§) = 1 for every đ â â.
Figure 6.9: To prove the âpumping lemmaâ we lookat a word đ€ that is much larger than the regularexpression đ that matches it. In such a case, part ofđ€ must be matched by some sub-expression of theform (đâČ)â, since this is the only operator that allowsmatching words longer than the expression. If welook at the âleftmostâ such sub-expression and defineđŠđ to be the string that is matched by it, we obtain thepartition needed for the pumping lemma.
Proof Idea:
The idea behind the proof the following. Let đ0 be twice the num-ber of symbols that are used in the expression đ, then the only waythat there is some đ€ with |đ€| > đ0 and Ίđ(đ€) = 1 is that đ containsthe â (i.e. star) operator and that there is a nonempty substring đŠ ofđ€ that was matched by (đâČ)â for some sub-expression đâČ of đ. We cannow repeat đŠ any number of times and still get a matching string. Seealso Fig. 6.9.â
PThe pumping lemma is a bit cumbersome to state,but one way to remember it is that it simply says thefollowing: âif a string matching a regular expression islong enough, one of its substrings must be matched usingthe â operatorâ.
functions with infinite domains, automata, and regular expressions 243
Proof of Theorem 6.20. To prove the lemma formally, we use inductionon the length of the expression. Like all induction proofs, this is goingto be somewhat lengthy, but at the end of the day it directly followsthe intuition above that somewhere we must have used the star oper-ation. Reading this proof, and in particular understanding how theformal proof below corresponds to the intuitive idea above, is a verygood way to get more comfortable with inductive proofs of this form.
Our inductive hypothesis is that for an đ length expression, đ0 =2đ satisfies the conditions of the lemma. The base case is when theexpression is a single symbol đ â ÎŁ or that the expression is â or"". In all these cases the conditions of the lemma are satisfied simplybecause there đ0 = 2 and there is no string đ„ of length larger than đ0that is matched by the expression.
We now prove the inductive step. Let đ be a regular expressionwith đ > 1 symbols. We set đ0 = 2đ and let đ€ â ÎŁâ be a stringsatisfying |đ€| > đ0. Since đ has more than one symbol, it has one ofthe the forms (a) đâČ|đâł, (b), (đâČ)(đâł), or (c) (đâČ)â where in all thesecases the subexpressions đâČ and đâł have fewer symbols than đ andhence satisfy the induction hypothesis.
In the case (a), every string đ€ matched by đ must be matched byeither đâČ or đâł. If đâČ matches đ€ then, since |đ€| > 2|đâČ|, by the inductionhypothesis there exist đ„, đŠ, đ§ with |đŠ| â„ 1 and |đ„đŠ| †2|đâČ| < đ0 suchthat đâČ (and therefore also đ = đâČ|đâł) matches đ„đŠđđ§ for every đ. Thesame arguments works in the case that đâł matches đ€.
In the case (b), if đ€ is matched by (đâČ)(đâł) then we can write đ€ =đ€âČđ€âł where đâČ matches đ€âČ and đâł matches đ€âł. We split to subcases. If|đ€âČ| > 2|đâČ| then by the induction hypothesis there exist đ„, đŠ, đ§âČ with|đŠ| †1, |đ„đŠ| †2|đâČ| < đ0 such that đ€âČ = đ„đŠđ§âČ and đâČ matches đ„đŠđđ§âČfor every đ â â. This completes the proof since if we set đ§ = đ§âČđ€âłthen we see that đ€ = đ€âČđ€âł = đ„đŠđ§ and đ = (đâČ)(đâł) matches đ„đŠđđ§ forevery đ â â. Otherwise, if |đ€âČ| †2|đâČ| then since |đ€| = |đ€âČ| + |đ€âł| >đ0 = 2(|đâČ| + |đâł|), it must be that |đ€âł| > 2|đâł|. Hence by the inductionhypothesis there exist đ„âČ, đŠ, đ§ such that |đŠ| â„ 1, |đ„âČđŠ| †2|đâł| and đâłmatches đ„âČđŠđđ§ for every đ â â. But now if we set đ„ = đ€âČđ„âČ we see that|đ„đŠ| †|đ€âČ| + |đ„âČđŠ| †2|đâČ| + 2|đâł| = đ0 and on the other hand theexpression đ = (đâČ)(đâł) matches đ„đŠđđ§ = đ€âČđ„âČđŠđđ§ for every đ â â.
In case (c), if đ€ is matched by (đâČ)â then đ€ = đ€0 ⯠đ€đĄ where forevery đ â [đĄ], đ€đ is a nonempty string matched by đâČ. If |đ€0| > 2|đâČ|then we can use the same approach as in the concatenation case above.Otherwise, we simply note that if đ„ is the empty string, đŠ = đ€0, andđ§ = đ€1 ⯠đ€đĄ then |đ„đŠ| †đ0 and đ„đŠđđ§ is matched by (đâČ)â for everyđ â â.
244 introduction to theoretical computer science
RRemark 6.21 â Recursive definitions and inductiveproofs. When an object is recursively defined (as in thecase of regular expressions) then it is natural to proveproperties of such objects by induction. That is, if wewant to prove that all objects of this type have prop-erty đ , then it is natural to use an inductive steps thatsays that if đâČ, đâł, đ⎠etc have property đ then so is anobject đ that is obtained by composing them.
Using the pumping lemma, we can easily prove Lemma 6.19 (i.e.,the non-regularity of the âmatching parenthesisâ function):
Proof of Lemma 6.19. Suppose, towards the sake of contradiction, thatthere is an expression đ such that Ίđ = MATCHPAREN. Let đ0 bethe number obtained from Theorem 6.20 and let đ€ = âšđ0â©đ0 (i.e.,đ0 left parenthesis followed by đ0 right parenthesis). Then we seethat if we write đ€ = đ„đŠđ§ as in Lemma 6.19, the condition |đ„đŠ| †đ0implies that đŠ consists solely of left parenthesis. Hence the stringđ„đŠ2đ§ will contain more left parenthesis than right parenthesis. HenceMATCHPAREN(đ„đŠ2đ§) = 0 but by the pumping lemma Ίđ(đ„đŠ2đ§) = 1,contradicting our assumption that Ίđ = MATCHPAREN.
The pumping lemma is a very useful tool to show that certain func-tions are not computable by a regular expression. However, it is notan âif and only ifâ condition for regularity: there are non regularfunctions that still satisfy the conditions of the pumping lemma. Tounderstand the pumping lemma, it is important to follow the order ofquantifiers in Theorem 6.20. In particular, the number đ0 in the state-ment of Theorem 6.20 depends on the regular expression (in the proofwe chose đ0 to be twice the number of symbols in the expression). So,if we want to use the pumping lemma to rule out the existence of aregular expression đ computing some function đč , we need to be ableto choose an appropriate input đ€ â 0, 1â that can be arbitrarily largeand satisfies đč(đ€) = 1. This makes sense if you think about the intu-ition behind the pumping lemma: we need đ€ to be large enough as toforce the use of the star operator.Solved Exercise 6.4 â Palindromes is not regular. Prove that the followingfunction over the alphabet 0, 1, ; is not regular: PAL(đ€) = 1 if andonly if đ€ = đą; đąđ where đą â 0, 1â and đąđ denotes đą âreversedâ:the string đą|đą|â1 ⯠đą0. (The Palindrome function is most often definedwithout an explicit separator character ;, but the version with such aseparator is a bit cleaner and so we use it here. This does not make
functions with infinite domains, automata, and regular expressions 245
Figure 6.10: A cartoon of a proof using the pumping lemma that a function đč is not regular. The pumping lemma states that if đč is regular then thereexists a number đ0 such that for every large enough đ€ with đč(đ€) = 1, there exists a partition of đ€ to đ€ = đ„đŠđ§ satisfying certain conditions suchthat for every đ â â, đč(đ„đŠđđ§) = 1. You can imagine a pumping-lemma based proof as a game between you and the adversary. Every there existsquantifier corresponds to an object you are free to choose on your own (and base your choice on previously chosen objects). Every for every quantifiercorresponds to an object the adversary can choose arbitrarily (and again based on prior choices) as long as it satisfies the conditions. A valid proofcorresponds to a strategy by which no matter what the adversary does, you can win the game by obtaining a contradiction which would be a choiceof đ that would result in đč(đ„đŠđđ§) = 0, hence violating the conclusion of the pumping lemma.
246 introduction to theoretical computer science
much difference, as one can easily encode the separator as a specialbinary string instead.)
Solution:
We use the pumping lemma. Suppose towards the sake of con-tradiction that there is a regular expression đ computing PAL,and let đ0 be the number obtained by the pumping lemma (The-orem 6.20). Consider the string đ€ = 0đ0 ; 0đ0 . Since the reverseof the all zero string is the all zero string, PAL(đ€) = 1. Now, bythe pumping lemma, if PAL is computed by đ, then we can writeđ€ = đ„đŠđ§ such that |đ„đŠ| †đ0, |đŠ| â„ 1 and PAL(đ„đŠđđ§) = 1 forevery đ â â. In particular, it must hold that PAL(đ„đ§) = 1, but thisis a contradiction, since đ„đ§ = 0đ0â|đŠ|; 0đ0 and so its two parts arenot of the same length and in particular are not the reverse of oneanother.
For yet another example of a pumping-lemma based proof, seeFig. 6.10 which illustrates a cartoon of the proof of the non-regularityof the function đč ⶠ0, 1â â 0, 1 which is defined as đč(đ„) = 1 iffđ„ = 0đ1đ for some đ â â (i.e., đ„ consists of a string of consecutivezeroes, followed by a string of consecutive ones of the same length).
6.6 ANSWERING SEMANTIC QUESTIONS ABOUT REGULAR EX-PRESSIONS
Regular expressions have applications beyond search. For example,regular expressions are often used to define tokens (such as what is avalid variable identifier, or keyword) in the design of parsers, compilersand interpreters for programming languages. There other applicationstoo: for example, in recent years, the world of networking moved fromfixed topologies to âsoftware defined networksâ. Such networks arerouted by programmable switches that can implement policies suchas âif packet is secured by SSL then forward it to A, otherwise for-ward it to Bâ. To represent such policies we need a language that ison one hand sufficiently expressive to capture the policies we wantto implement, but on the other hand sufficiently restrictive so that wecan quickly execute them at network speed and also be able to answerquestions such as âcan C see the packets moved from A to B?â. TheNetKAT network programming language uses a variant of regularexpressions to achieve precisely that. For this application, it is im-portant that we are not able to merely answer whether an expressionđ matches a string đ„ but also answer semantic questions about regu-lar expressions such as âdo expressions đ and đâČ compute the same
functions with infinite domains, automata, and regular expressions 247
function?â and âdoes there exist a string đ„ that is matched by the ex-pression đ?â. The following theorem shows that we can answer thelatter question:
Theorem 6.22 â Emptiness of regular languages is computable. There is analgorithm that given a regular expression đ, outputs 1 if and only ifΊđ is the constant zero function.
Proof Idea:
The idea is that we can directly observe this from the structureof the expression. The only way a regular expression đ computesthe constant zero function is if đ has the form â or is obtained byconcatenating â with other expressions.âProof of Theorem 6.22. Define a regular expression to be âemptyâ if itcomputes the constant zero function. Given a regular expression đ, wecan determine if đ is empty using the following rules:
âą If đ has the form đ or "" then it is not empty.
âą If đ is not empty then đ|đâČ is not empty for every đâČ.âą If đ is not empty then đâ is not empty.
âą If đ and đâČ are both not empty then đ đâČ is not empty.
âą â is empty.
Using these rules it is straightforward to come up with a recursivealgorithm to determine emptiness.
Using Theorem 6.22, we can obtain an algorithm that determineswhether or not two regular expressions đ and đâČ are equivalent, in thesense that they compute the same function.
Theorem 6.23 â Equivalence of regular expressions is computable. LetREGEQ ⶠ0, 1â â 0, 1 be the function that on input (a stringrepresenting) a pair of regular expressions đ, đâČ, REGEQ(đ, đâČ) = 1if and only if Ίđ = ΊđâČ . Then there is an algorithm that computesREGEQ.
Proof Idea:
The idea is to show that given a pair of regular expressions đ andđâČ we can find an expression đâł such that Ίđâł(đ„) = 1 if and only ifΊđ(đ„) â ΊđâČ(đ„). Therefore Ίđâł is the constant zero function if and only
248 introduction to theoretical computer science
if đ and đâČ are equivalent, and thus we can test for emptiness of đâł todetermine equivalence of đ and đâČ.âProof of Theorem 6.23. We will prove Theorem 6.23 from Theorem 6.22.(The two theorems are in fact equivalent: it is easy to prove Theo-rem 6.22 from Theorem 6.23, since checking for emptiness is the sameas checking equivalence with the expression â .) Given two regu-lar expressions đ and đâČ, we will compute an expression đâł such thatΊđâł(đ„) = 1 if and only if Ίđ(đ„) â ΊđâČ(đ„). One can see that đ is equiva-lent to đâČ if and only if đâł is empty.
We start with the observation that for every bit đ, đ â 0, 1, đ â đ ifand only if (đ ⧠đ) âš (đ ⧠đ) . (6.15)
Hence we need to construct đâł such that for every đ„,Ίđâł(đ„) = (Ίđ(đ„) ⧠ΊđâČ(đ„)) âš (Ίđ(đ„) ⧠ΊđâČ(đ„)) . (6.16)To construct the expression đâł, we will show how given any pair of
expressions đ and đâČ, we can construct expressions đ ⧠đâČ and đ thatcompute the functions Ίđ ⧠ΊđâČ and Ίđ respectively. (Computing theexpression for đ âš đâČ is straightforward using the | operation of regularexpressions.)
Specifically, by Lemma 6.17, regular functions are closed undernegation, which means that for every regular expression đ, there is anexpression đ such that Ίđ(đ„) = 1 â Ίđ(đ„) for every đ„ â 0, 1â. Now,for every two expression đ and đâČ, the expressionđ ⧠đâČ = (đ|đâČ) (6.17)
computes the AND of the two expressions. Given these two transfor-mations, we see that for every regular expressions đ and đâČ we can finda regular expression đâł satisfying (6.16) such that đâł is empty if andonly if đ and đâČ are equivalent.
Chapter Recap
âą We model computational tasks on arbitrarily largeinputs using infinite functions đč ⶠ0, 1â â 0, 1â.
âą Such functions take an arbitrarily long (but stillfinite!) string as input, and cannot be described bya finite table of inputs and ouptuts.
âą A function with a single bit of output is known asa Boolean function, and the tasks of computing it isequivalent to deciding a language đż â 0, 1â.
functions with infinite domains, automata, and regular expressions 249
âą Deterministic finite automata (DFAs) are one simplemodel for computing (infinite) Boolean functions.
âą There are some functions that cannot be computedby DFAs.
âą The set of functions computable by DFAs is thesame as the set of languages that can be recognizedby regular expressions.
6.7 EXERCISES
Exercise 6.1 â Closure properties of regular functions. Suppose that đč, đș â¶0, 1â â 0, 1 are regular. For each one of the following defini-tions of the function đ» , either prove that đ» is always regular or give acounterexample for regular đč, đș that would make đ» not regular.
1. đ»(đ„) = đč(đ„) âš đș(đ„).2. đ»(đ„) = đč(đ„) ⧠đș(đ„)3. đ»(đ„) = NAND(đč(đ„), đș(đ„)).4. đ»(đ„) = đč(đ„đ ) where đ„đ is the reverse of đ„: đ„đ = đ„đâ1đ„đâ2 ⯠đ„đ forđ = |đ„|.5. đ»(đ„) = â§âšâ©1 đ„ = đąđŁ s.t. đč(đą) = đș(đŁ) = 10 otherwise
6. đ»(đ„) = â§âšâ©1 đ„ = đąđą s.t. đč(đą) = đș(đą) = 10 otherwise
7. đ»(đ„) = â§âšâ©1 đ„ = đąđąđ s.t. đč(đą) = đș(đą) = 10 otherwise
Exercise 6.2 One among the following two functions that map 0, 1âto 0, 1 can be computed by a regular expression, and the other onecannot. For the one that can be computed by a regular expression,write the expression that does it. For the one that cannot, prove thatthis cannot be done using the pumping lemma.
âą đč(đ„) = 1 if 4 divides â|đ„|â1đ=0 đ„đ and đč(đ„) = 0 otherwise.
âą đș(đ„) = 1 if and only if â|đ„|â1đ=0 đ„đ â„ |đ„|/4 and đș(đ„) = 0 otherwise.
Exercise 6.3 â Non regularity. 1. Prove that the following function đč â¶0, 1â â 0, 1 is not regular. For every đ„ â 0, 1â, đč(đ„) = 1 iff đ„ isof the form đ„ = 13đ for some đ > 0.
250 introduction to theoretical computer science
2. Prove that the following function đč ⶠ0, 1â â 0, 1 is not regular.For every đ„ â 0, 1â, đč(đ„) = 1 iff âđ đ„đ = 3đ for some đ > 0.
6.8 BIBLIOGRAPHICAL NOTES
The relation of regular expressions with finite automata is a beautifultopic, on which we only touch upon in this text. It is covered moreextensively in [Sip97; HMU14; Koz97]. These texts also discuss top-ics such as non deterministic finite automata (NFA) and the relationbetween context-free grammars and pushdown automata.
The automaton of Fig. 6.4 was generated using the FSM simulatorof Ivan Zuzak and Vedrana Jankovic. Our proof of Theorem 6.11 isclosely related to the Myhill-Nerode Theorem. One direction of theMyhill-Nerode theorem can be stated as saying that if đ is a regularexpression then there is at most a finite number of strings đ§0, ⊠, đ§đâ1such that Ίđ[đ§đ] â Ίđ[đ§đ] for every 0 †đ â đ < đ.
Figure 7.1: An algorithm is a finite recipe to computeon arbitrarily long inputs. The components of analgorithm include the instructions to be performed,finite state or âlocal variablesâ, the memory to storethe input and intermediate computations, as well asmechanisms to decide which part of the memory toaccess, and when to repeat instructions and when tohalt.
7Loops and infinity
âThe bounds of arithmetic were however outstepped the moment the idea ofapplying the [punched] cards had occurred; and the Analytical Engine doesnot occupy common ground with mere âcalculating machines.â ⊠In enablingmechanism to combine together general symbols, in successions of unlim-ited variety and extent, a uniting link is established between the operations ofmatter and the abstract mental processes of the most abstract branch of mathe-matical science.â, Ada Augusta, countess of Lovelace, 1843
As the quote of Chapter 6 says, an algorithm is âa finite answer toan infinite number of questionsâ. To express an algorithm we need towrite down a finite set of instructions that will enable us to computeon arbitrarily long inputs. To describe and execute an algorithm weneed the following components (see Fig. 7.1):
âą The finite set of instructions to be performed.
âą Some âlocal variablesâ or finite state used in the execution.
âą A potentially unbounded working memory to store the input aswell as any other values we may require later.
âą While the memory is unbounded, at every single step we can onlyread and write to a finite part of it, and we need a way to addresswhich are the parts we want to read from and write to.
âą If we only have a finite set of instructions but our input can bearbitrarily long, we will need to repeat instructions (i.e., loop back).We need a mechanism to decide when we will loop and when wewill halt.
This chapter: A non-mathy overviewIn this chapter we give a general model of an algorithm,which (unlike Boolean circuits) is not restricted to a fixedinput lengths, and (unlike finite automata) is not restricted to
Compiled on 8.26.2020 18:10
Learning Objectives:âą Learn the model of Turing machines, which
can compute functions of arbitrary inputlengths.
âą See a programming-language description ofTuring machines, using NAND-TM programs,which add loops and arrays to NAND-CIRC.
âą See some basic syntactic sugar andequivalence of variants of Turing machinesand NAND-TM programs.
252 introduction to theoretical computer science
a finite amount of working memory. We will see two ways tomodel algorithms:
âą Turing machines, invented by Alan Turing in 1936, are anhypothetical abstract device that yields a finite descriptionof an algorithm that can handle arbitrarily long inputs.
âą The NAND-TM Programming language extends NAND-CIRC with the notion of loops and arrays to obtain finiteprograms that can compute a function with arbitrarilylong inputs.
It turns out that these two models are equivalent, and in factthey are equivalent to many other computational modelsincluding programming languages such as C, Lisp, Python,JavaScript, etc. This notion, known as Turing equivalenceor Turing completeness, will be discussed in Chapter 8. SeeFig. 7.2 for an overview of the models presented in this chap-ter and Chapter 8.
Figure 7.2: Overview of our models for finite andunbounded computation. In the previous chapterswe study the computation of finite functions, whichare functions đ ⶠ0, 1đ â 0, 1đ for some fixedđ, đ, and modeled computing these functions usingcircuits or straightline programs. In this chapter westudy computing unbounded functions of the formđč ⶠ0, 1â â 0, 1đ or đč ⶠ0, 1â â 0, 1â.We model computing these functions using TuringMachines or (equivalently) NAND-TM programswhich add the notion of loops to the NAND-CIRCprogramming language. In Chapter 8 we will showthat these models are equivalent to many othermodels, including RAM machines, the đ calculus, andall the common programming languages including C,Python, Java, JavaScript, etc.
7.1 TURING MACHINESâComputing is normally done by writing certain symbols on paper. We maysuppose that this paper is divided into squares like a childâs arithmetic book..The behavior of the [human] computer at any moment is determined by thesymbols which he is observing, and of his âstate of mindâ at that moment⊠Wemay suppose that in a simple operation not more than one symbol is altered.â,âWe compare a man in the process of computing ⊠to a machine which is onlycapable of a finite number of configurations⊠The machine is supplied with aâtapeâ (the analogue of paper) ⊠divided into sections (called âsquaresâ) eachcapable of bearing a âsymbolâ â, Alan Turing, 1936
loops and infinity 253
Figure 7.3: Aside from his many other achievements,Alan Turing was an excellent long distance runnerwho just fell shy of making Englandâs olympic team.A fellow runner once asked him why he punishedhimself so much in training. Alan said âI have sucha stressful job that the only way I can get it out of mymind is by running hard; itâs the only way I can getsome release.â
Figure 7.4: Until the advent of electronic computers,the word âcomputerâ was used to describe a personthat performed calculations. Most of these âhumancomputersâ were women, and they were absolutelyessential to many achievements including mappingthe stars, breaking the Enigma cipher, and the NASAspace mission; see also the bibliographical notes.Photo from National Photo Company Collection; seealso [Sob17].
Figure 7.5: Steam-powered Turing Machine mural,painted by CSE grad students at the University ofWashington on the night before spring qualifyingexaminations, 1987. Image from https://www.cs.washington.edu/building/art/SPTM.
âWhat is the difference between a Turing machine and the modern computer?Itâs the same as that between Hillaryâs ascent of Everest and the establishmentof a Hilton hotel on its peak.â , Alan Perlis, 1982.
The âgranddaddyâ of all models of computation is the Turing Ma-chine. Turing machines were defined in 1936 by Alan Turing in anattempt to formally capture all the functions that can be computedby human âcomputersâ (see Fig. 7.4) that follow a well-defined set ofrules, such as the standard algorithms for addition or multiplication.
Turing thought of such a person as having access to as muchâscratch paperâ as they need. For simplicity we can think of thisscratch paper as a one dimensional piece of graph paper (or tape, asit is commonly referred to), which is divided to âcellsâ, where eachâcellâ can hold a single symbol (e.g., one digit or letter, and moregenerally some element of a finite alphabet). At any point in time, theperson can read from and write to a single cell of the paper, and basedon the contents can update his/her finite mental state, and/or move tothe cell immediately to the left or right of the current one.
Turing modeled such a computation by a âmachineâ that maintainsone of đ states. At each point in time the machine reads from its âworktapeâ a single symbol from a finite alphabet ÎŁ and uses that to up-date its state, write to tape, and possibly move to an adjacent cell (seeFig. 7.7). To compute a function đč using this machine, we initialize thetape with the input đ„ â 0, 1â and our goal is to ensure that the tapewill contain the value đč(đ„) at the end of the computation. Specifically,a computation of a Turing Machine đ with đ states and alphabet ÎŁ oninput đ„ â 0, 1â proceeds as follows:
âą Initially the machine is at state 0 (known as the âstarting stateâ)and the tape is initialized to â·, đ„0, ⊠, đ„đâ1,â ,â , âŠ. We use thesymbol â· to denote the beginning of the tape, and the symbol â todenote an empty cell. We will always assume that the alphabet ÎŁ isa (potentially strict) superset of â·,â , 0, 1.
âą The location đ to which the machine points to is set to 0.âą At each step, the machine reads the symbol đ = đ [đ] that is in theđđĄâ location of the tape, and based on this symbol and its state đ
decides on:
â What symbol đâČ to write on the tapeâ Whether to move Left (i.e., đ â đ â 1), Right (i.e., đ â đ + 1), Stay
in place, or Halt the computation.â What is going to be the new state đ â [đ]
âą The set of rules the Turing machine follows is known as its transi-tion function.
254 introduction to theoretical computer science
Figure 7.6: The components of a Turing Machine. Notehow they correspond to the general components ofalgorithms as described in Fig. 7.1.
âą When the machine halts then its output is the binary string ob-tained by reading the tape from the beginning until the head posi-tion, dropping all symbols such as â·, â , etc. that are not either 0 or1.
7.1.1 Extended example: A Turing machine for palindromesLet PAL (for palindromes) be the function that on input đ„ â 0, 1â,outputs 1 if and only if đ„ is an (even length) palindrome, in the sensethat đ„ = đ€0 ⯠đ€đâ1đ€đâ1đ€đâ2 ⯠đ€0 for some đ â â and đ€ â 0, 1đ.
We now show a Turing Machine đ that computes PAL. To specifyđ we need to specify (i) đ âs tape alphabet ÎŁ which should contain atleast the symbols 0,1, â· and â , and (ii) đ âs transition function whichdetermines what action đ takes when it reads a given symbol while itis in a particular state.
In our case, đ will use the alphabet 0, 1, â·,â , Ă and will haveđ = 14 states. Though the states are simply numbers between 0 andđ â 1, for convenience we will give them the following labels:
State Label0 START1 RIGHT_02 RIGHT_13 LOOK_FOR_04 LOOK_FOR_15 RETURN6 REJECT7 ACCEPT8 OUTPUT_09 OUTPUT_110 0_AND_BLANK11 1_AND_BLANK12 BLANK_AND_STOP
We describe the operation of our Turing Machine đ in words:
âą đ starts in state START and will go right, looking for the first sym-bol that is 0 or 1. If we find â before we hit such a symbol then wewill move to the OUTPUT_1 state that we describe below.
âą Once đ finds such a symbol đ â 0, 1, đ deletes đ from the tapeby writing the Ă symbol, it enters either the RIGHT_0 or RIGHT_1mode according to the value of đ and starts moving rightwardsuntil it hits the first â or Ă symbol.
loops and infinity 255
âą Once we find this symbol we go into the state LOOK_FOR_0 orLOOK_FOR_1 depending on whether we were in the state RIGHT_0or RIGHT_1 and make one left move.
âą In the state LOOK_FOR_đ, we check whether the value on the tape isđ. If it is, then we delete it by changing its value to Ă, and move tothe state RETURN. Otherwise, we change to the OUTPUT_0 state.
âą The RETURN state means we go back to the beginning. Specifically,we move leftward until we hit the first symbol that is not 0 or 1, inwhich case we change our state to START.
âą The OUTPUT_đ states mean that we are going to output the value đ.In both these states we go left until we hit â·. Once we do so, wemake a right step, and change to the 1_AND_BLANK or 0_AND_BLANKstates respectively. In the latter states, we write the correspondingvalue, and then move right and change to the BLANK_AND_STOPstate, in which we write â to the tape and halt.
The above description can be turned into a table describing for eachone of the 13 â 5 combination of state and symbol, what the Turingmachine will do when it is in that state and it reads that symbol. Thistable is known as the transition function of the Turing machine.
7.1.2 Turing machines: a formal definition
Figure 7.7: A Turing machine has access to a tape ofunbounded length. At each point in the execution,the machine can read a single symbol of the tape,and based on that and its current state, write a newsymbol, update the tape, decide whether to move left,right, stay, or halt.
The formal definition of Turing machines is as follows:
Definition 7.1 â Turing Machine. A (one tape) Turing machine with đstates and alphabet ÎŁ â 0, 1, â·,â is represented by a transitionfunction đżđ ⶠ[đ] Ă ÎŁ â [đ] Ă ÎŁ Ă L, R, S, H.
256 introduction to theoretical computer science
For every đ„ â 0, 1â, the output of đ on input đ„, denoted byđ(đ„), is the result of the following process:
âą We initialize đ to be the sequence â·, đ„0, đ„1, ⊠, đ„đâ1,â ,â , âŠ,where đ = |đ„|. (That is, đ [0] = â·, đ [đ + 1] = đ„đ for đ â [đ], andđ [đ] = â for đ > đ.)
âą We also initialize đ = 0 and đ = 0.âą We then repeat the following process:
1. Let (đ âČ, đâČ, đ·) = đżđ(đ , đ [đ]).2. Set đ â đ âČ, đ [đ] â đâČ.3. If đ· = R then set đ â đ+1, if đ· = L then set đ â maxđâ1, 0.
(If đ· = S then we keep đ the same.)4. If đ· = H then halt.
âą If the process above halts, then đ âs output, denoted by đ(đ„),is the string đŠ â 0, 1â obtained by concatenating all the sym-bols in 0, 1 in positions đ [0], ⊠, đ [đ] where đ is the final headposition.
âą If The Turing machine does not halt then we denote đ(đ„) = â„.
PYou should make sure you see why this formal def-inition corresponds to our informal description ofa Turing Machine. To get more intuition on TuringMachines, you can explore some of the online avail-able simulators such as Martin Ugarteâs, AnthonyMorphettâs, or Paul Rendellâs.
One should not confuse the transition function đżđ of a Turing ma-chine đ with the function that the machine computes. The transitionfunction đżđ is a finite function, with đ|ÎŁ| inputs and 4đ|ÎŁ| outputs.(Can you see why?) The machine can compute an infinite function đčthat takes as input a string đ„ â 0, 1â of arbitrary length and mightalso produce an arbitrary length string as output.
In our formal definition, we identified the machine đ with its tran-sition function đżđ since the transition function tells us everythingwe need to know about the Turing machine, and hence serves as agood mathematical representation of it. This choice of representa-tion is somewhat arbitrary, and is based on our convention that thestate space is always the numbers 0, ⊠, đ â 1 with 0 as the startingstate. Other texts use different conventions and so their mathematical
loops and infinity 257
definition of a Turing machine might look superficially different, butthese definitions describe the same computational process and hasthe same computational powers. See Section 7.7 for a comparison be-tween Definition 7.1 and the way Turing Machines are defined in textssuch as Sipser [Sip97]. These definitions are equivalent despite theirsuperficial differences.
7.1.3 Computable functionsWe now turn to making one of the most important definitions in thisbook, that of computable functions.
Definition 7.2 â Computable functions. Let đč ⶠ0, 1â â 0, 1â bea (total) function and let đ be a Turing machine. We say that đcomputes đč if for every đ„ â 0, 1â, đ(đ„) = đč(đ„).
We say that a function đč is computable if there exists a Turingmachine đ that computes it.
Defining a function âcomputableâ if and only if it can be computedby a Turing machine might seem ârecklessâ but, as weâll see in Chap-ter 8, it turns out that being computable in the sense of Definition 7.2is equivalent to being computable in essentially any reasonable modelof computation. This is known as the Church-Turing Thesis. (Unlikethe extended Church-Turing Thesis which we discussed in Section 5.6,the Church-Turing thesis itself is widely believed and there are nocandidate devices that attack it.)
Big Idea 9 We can precisely define what it means for a function tobe computable by any possible algorithm.
This is a good point to remind the reader that functions are not thesame as programs:
Functions â Programs . (7.1)A Turing machine (or program) đ can compute some functionđč , but it is not the same as đč . In particular there can be more than
one program to compute the same function. Being computable is aproperty of functions, not of machines.
We will often pay special attention to functions đč ⶠ0, 1â â 0, 1that have a single bit of output. Hence we give a special name for theset of functions of this form that are computable.
Definition 7.3 â The class R. We define R be the set of all computablefunctions đč ⶠ0, 1â â 0, 1.
258 introduction to theoretical computer science
1 A partial function đč from a set đŽ to a set đ” is afunction that is only defined on a subset of đŽ, (seeSection 1.4.3). We can also think of such a function asmapping đŽ to đ” âȘ â„ where â„ is a special âfailureâsymbol such that đč(đ) = â„ indicates the function đčis not defined on đ.
RRemark 7.4 â Functions vs. languages. As discussedin Section 6.1.2, many texts use the terminology ofâlanguagesâ rather than functions to refer to compu-tational tasks. A Turing machine đ decides a languageđż if for every input đ„ â 0, 1â, đ(đ„) outputs 1 ifand only if đ„ â đż. This is equivalent to computingthe Boolean function đč ⶠ0, 1â â 0, 1 defined asđč(đ„) = 1 iff đ„ â đż. A language đż is decidable if thereis a Turing machine đ that decides it. For historicalreasons, some texts also call such a language recursive(which is the reason that the letter R is often usedto denote the set of computable Boolean functions /decidable languages defined in Definition 7.3).In this book we stick to the terminology of functionsrather than languages, but all definitions and resultscan be easily translated back and forth by using theequivalence between the function đč ⶠ0, 1â â 0, 1and the language đż = đ„ â 0, 1â | đč (đ„) = 1.
7.1.4 Infinite loops and partial functionsOne crucial difference between circuits/straight-line programs andTuring machines is the following. Looking at a NAND-CIRC programđ , we can always tell how many inputs and how many outputs it has(by simply looking at the X and Y variables). Furthermore, we areguaranteed that if we invoke đ on any input then some output will beproduced.
In contrast, given any Turing machine đ , we cannot determinea priori the length of the output. In fact, we donât even know if anoutput would be produced at all! For example, it is very easy to comeup with a Turing machine whose transition function never outputs Hand hence never halts.
If a machine đ fails to stop and produce an output on some aninput đ„, then it cannot compute any total function đč , since clearly oninput đ„, đ will fail to output đč(đ„). However, đ can still compute apartial function.1
For example, consider the partial function DIV that on input a pair(đ, đ) of natural numbers, outputs âđ/đâ if đ > 0, and is undefinedotherwise. We can define a Turing machine đ that computes DIV oninput đ, đ by outputting the first đ = 0, 1, 2, ⊠such that đđ â„ đ. If đ > 0and đ = 0 then the machine đ will never halt, but this is OK, sinceDIV is undefined on such inputs. If đ = 0 and đ = 0, the machine đwill output 0, which is also OK, since we donât care about what theprogram outputs on inputs on which DIV is undefined. Formally, wedefine computability of partial functions as follows:
loops and infinity 259
Definition 7.5 â Computable (partial or total) functions. Let đč be either atotal or partial function mapping 0, 1â to 0, 1â and let đ be aTuring machine. We say that đ computes đč if for every đ„ â 0, 1âon which đč is defined, đ(đ„) = đč(đ„). We say that a (partial ortotal) function đč is computable if there is a Turing machine thatcomputes it.
Note that if đč is a total function, then it is defined on every đ„ â0, 1â and hence in this case, Definition 7.5 is identical to Defini-tion 7.2.
RRemark 7.6 â Bot symbol. We often use â„ as our spe-cial âfailure symbolâ. If a Turing machine đ fails tohalt on some input đ„ â 0, 1â then we denote this byđ(đ„) = â„. This does not mean that đ outputs someencoding of the symbol â„ but rather that đ entersinto an infinite loop when given đ„ as input.If a partial function đč is undefined on đ„ then we canalso write đč(đ„) = â„. Therefore one might thinkthat Definition 7.5 can be simplified to requiring thatđ(đ„) = đč(đ„) for every đ„ â 0, 1â, which would implythat for every đ„, đ halts on đ„ if and only if đč is de-fined on đ„. However this is not the case: for a TuringMachine đ to compute a partial function đč it is notnecessary for đ to enter an infinite loop on inputs đ„on which đč is not defined. All that is needed is for đto output đč(đ„) on đ„âs on which đč is defined: on otherinputs it is OK for đ to output an arbitrary value suchas 0, 1, or anything else, or not to halt at all. To borrowa term from the C programming language, on inputs đ„on which đč is not defined, what đ does is âundefinedbehaviorâ.
7.2 TURING MACHINES AS PROGRAMMING LANGUAGES
The name âTuring machineâ, with its âtapeâ and âheadâ evokes aphysical object, while in contrast we think of a program as a pieceof text. But we can think of a Turing machine as a program as well.For example, consider the Turing Machine đ of Section 7.1.1 thatcomputes the function PAL such that PAL(đ„) = 1 iff đ„ is a palindrome.We can also describe this machine as a program using the Python-likepseudocode of the form below
# Gets an array Tape initialized to# [">", x_0 , x_1 , .... , x_(n-1), "", "", ...]# At the end of the execution, Tape[1] is equal to 1# if x is a palindrome and is equal to 0 otherwise
260 introduction to theoretical computer science
2 Most programming languages use arrays of fixedsize, while a Turing machineâs tape is unbounded. Butof course there is no need to store an infinite numberof â symbols. If you want, you can think of the tapeas a list that starts off just long enough to store theinput, but is dynamically grown in size as the Turingmachineâs head explores new positions.
def PAL(Tape):head = 0state = 0 # STARTwhile (state != 12):
if (state == 0 && Tape[head]=='0'):state = 3 # LOOK_FOR_0Tape[head] = 'x'head += 1 # move right
if (state==0 && Tape[head]=='1')state = 4 # LOOK_FOR_1Tape[head] = 'x'head += 1 # move right
... # more if statements here
The particular details of this program are not important. What mat-ters is that we can describe Turing machines as programs. Moreover,note that when translating a Turing machine into a program, the tapebecomes a list or array that can hold values from the finite set ÎŁ.2 Thehead position can be thought of as an integer valued variable that canhold integers of unbounded size. The state is a local register that canhold one of a fixed number of values in [đ].
More generally we can think of every Turing Machine đ as equiva-lent to a program similar to the following:
# Gets an array Tape initialized to# [">", x_0 , x_1 , .... , x_(n-1), "", "", ...]def M(Tape):
state = 0i = 0 # holds head locationwhile (True):
# Move head, modify state, write to tape# based on current state and cell at head# below are just examples for how program looks
for a particular transition functionif Tape[i]=="0" and state==7: #
ÎŽ_M(7,"0")=(19,"1","R")i += 1Tape[i]="1"state = 19
elif Tape[i]==">" and state == 13: #ÎŽ_M(13,">")=(15,"0","S")Tape[i]="0"state = 15
elif ......
loops and infinity 261
elif Tape[i]==">" and state == 29: #ÎŽ_M(29,">")=(.,.,"H")break # Halt
If we wanted to use only Boolean (i.e., 0/1-valued) variables thenwe can encode the state variables using âlog đâ bits. Similarly, wecan represent each element of the alphabet ÎŁ using â = âlog |ÎŁ|â bitsand hence we can replace the ÎŁ-valued array Tape[] with â Boolean-valued arrays Tape0[],âŠ, Tape(â â 1)[].7.2.1 The NAND-TM Programming languageWe now introduce the NAND-TM programming language, which aimsto capture the power of a Turing machine in a programming languageformalism. Just like the difference between Boolean circuits and Tur-ing Machines, the main difference between NAND-TM and NAND-CIRC is that NAND-TM models a single uniform algorithm that cancompute a function that takes inputs of arbitrary lengths. To do so, weextend the NAND-CIRC programming language with two constructs:
âą Loops: NAND-CIRC is a straight-line programming language- aNAND-CIRC program of đ lines takes exactly đ steps of computa-tion and hence in particular cannot even touch more than 3đ vari-ables. Loops allow us to capture in a short program the instructionsfor a computation that can take an arbitrary amount of time.
âą Arrays: A NAND-CIRC program of đ lines touches at most 3đ vari-ables. While we can use variables with names such as Foo_17 orBar[22], they are not true arrays, since the number in the identifieris a constant that is âhardwiredâ into the program.
Figure 7.8: A NAND-TM program has scalar variablesthat can take a Boolean value, array variables thathold a sequence of Boolean values, and a specialindex variable i that can be used to index the arrayvariables. We refer to the i-th value of the arrayvariable Spam using Spam[i]. At each iteration ofthe program the index variable can be incrementedor decremented by one step using the MODANDJMPoperation.
Thus a good way to remember NAND-TM is using the followinginformal equation:
NAND-TM = NAND-CIRC + loops + arrays (7.2)
262 introduction to theoretical computer science
RRemark 7.7 â NAND-CIRC + loops + arrays = every-thing.. As we will see, adding loops and arrays toNAND-CIRC is enough to capture the full power ofall programming languages! Hence we could replaceâNAND-TMâ with any of Python, C, Javascript, OCaml,etc. in the lefthand side of (7.2). But weâre gettingahead of ourselves: this issue will be discussed inChapter 8.
Concretely, the NAND-TM programming language adds the fol-lowing features on top of NAND-CIRC (see Fig. 7.8):
âą We add a special integer valued variable i. All other variables inNAND-TM are Boolean valued (as in NAND-CIRC).
âą Apart from i NAND-TM has two kinds of variables: scalars andarrays. Scalar variables hold one bit (just as in NAND-CIRC). Arrayvariables hold an unbounded number of bits. At any point in thecomputation we can access the array variables at the location in-dexed by i using Foo[i]. We cannot access the arrays at locationsother than the one pointed to by i.
âą We use the convention that arrays always start with a capital letter,and scalar variables (which are never indexed with i) start withlowercase letters. Hence Foo is an array and bar is a scalar variable.
âą The input and output X and Y are now considered arrays with val-ues of zeroes and ones. (There are also two other special arraysX_nonblank and Y_nonblank, see below.)
âą We add a special MODANDJUMP instruction that takes two booleanvariables đ, đ as input and does the following:
â If đ = 1 and đ = 1 then MODANDJUMP(đ, đ) increments i by oneand jumps to the first line of the program.
â If đ = 0 and đ = 1 then MODANDJUMP(đ, đ) decrements i by oneand jumps to the first line of the program. (If i is already equalto 0 then it stays at 0.)
â If đ = 1 and đ = 0 then MODANDJUMP(đ, đ) jumps to the first line ofthe program without modifying i.
â If đ = đ = 0 then MODANDJUMP(đ, đ) halts execution of theprogram.
âą The MODANDJUMP instruction always appears in the last line of aNAND-TM program and nowhere else.
loops and infinity 263
Default values. We need one more convention to handle âdefault val-uesâ. Turing machines have the special symbol â to indicate that tapelocation is âblankâ or âuninitializedâ. In NAND-TM there is no suchsymbol, and all variables are Boolean, containing either 0 or 1. Allvariables and locations of arrays are default to 0 if they have not beeninitialized to another value. To keep track of whether a 0 in an arraycorresponds to a true zero or to an uninitialized cell, a programmercan always add to an array Foo a âcompanion arrayâ Foo_nonblankand set Foo_nonblank[i] to 1 whenever the iâth location is initial-ized. In particular we will use this convention for the input and out-put arrays X and Y. A NAND-TM program has four special arrays X,X_nonblank, Y, and Y_nonblank. When a NAND-TM program is exe-cuted on input đ„ â 0, 1â of length đ, the first đ cells of the array X areinitialized to đ„0, ⊠, đ„đâ1 and the first đ cells of the array X_nonblankare initialized to 1. (All uninitialized cells default to 0.) The output ofa NAND-TM program is the string Y[0], âŠ, Y[đ â 1] where đ is thesmallest integer such that Y_nonblank[đ]= 0. A NAND-TM programgets called with X and X_nonblank initialized to contain the input, andwrites to Y and Y_nonblank to produce the output.
Formally, NAND-TM programs are defined as follows:
Definition 7.8 â NAND-TM programs. A NAND-TM program consists ofa sequence of lines of the form foo = NAND(bar,blah) endingwith a line of the form MODANDJMP(foo,bar), where foo,bar,blahare either scalar variables (sequences of letters, digits, and under-scores) or array variables of the form Foo[i] (starting with capitalletter and indexed by i). The program has the array variables X,X_nonblank, Y, Y_nonblank and the index variable i built in, andcan use additional array and scalar variables.
If đ is a NAND-TM program and đ„ â 0, 1â is an input then anexecution of đ on đ„ is the following process:
1. The arrays X and X_nonblank are initialized by X[đ]= đ„đ andX_nonblank[đ]= 1 for all đ â [|đ„|]. All other variables and cellsare initialized to 0. The index variable i is also initialized to 0.
2. The program is executed line by line, when the last line MODAND-JMP(foo,bar) is executed then we do as follows:
a. If foo= 1 and bar= 0 then jump to the first line without mod-ifying the value of i.
b. If foo= 1 and bar= 1 then increment i by one and jump tothe first line.
264 introduction to theoretical computer science
c. If foo= 0 and bar= 1 then decrement i by one (unless it is al-ready zero) and jump to the first line.
d. If foo= 0 and bar= 0 then halt and output Y[0], âŠ, Y[đ â 1]where đ is the smallest integer such that Y_nonblank[đ]= 0.
7.2.2 Sneak peak: NAND-TM vs Turing machinesAs the name implies, NAND-TM programs are a direct implemen-tation of Turing machines in programming language form. We willshow the equivalence below but you can already see how the compo-nents of Turing machines and NAND-TM programs correspond to oneanother:
Table 7.2: Turing Machine and NAND-TM analogs
Turing Machine NAND-TM programState: single register thattakes values in [đ] Scalar variables: Several variables
such as foo, bar etc.. each takingvalues in 0, 1.
Tape: One tape containingvalues in a finite set ÎŁ.Potentially infinite but đ [đĄ]defaults to â for all locationsđĄ that have not beenaccessed.
Arrays: Several arrays such as Foo,Bar etc.. for each such array Arr andindex đ, the value of Arr at position đis either 0 or 1. The value defaults to0 for position that have not beenwritten to.
Head location: A numberđ â â that encodes theposition of the head.
Index variable: The variable i that canbe used to access the arrays.
Accessing memory: At everystep the Turing machine hasaccess to its local state, butcan only access the tape atthe position of the currenthead location.
Accessing memory: At every step aNAND-TM program has access to allthe scalar variables, but can onlyaccess the arrays at the location i ofthe index variable
Control of location: In eachstep the machine can movethe head location by at mostone position.
Control of index variable: In eachiteration of its main loop theprogram can modify the index i byat most one.
7.2.3 ExamplesWe now present some examples of NAND-TM programs.
loops and infinity 265
Example 7.9 â Increment in NAND-TM. The following is a NAND-TMprogram to compute the increment function. That is, INC ⶠ0, 1â â0, 1â such that for every đ„ â 0, 1đ, INC(đ„) is the đ + 1 bit longstring đŠ such that if đ = âđâ1đ=0 đ„đ â 2đ is the number represented byđ„, then đŠ is the (least-significant digit first) binary representation ofthe number đ + 1.
We start by showing the program using the âsyntactic sugarâweâve seen before of using shorthand for some NAND-CIRC pro-grams we have seen before to compute simple functions such asIF, XOR and AND (as well as the constant one function as well as thefunction COPY that just maps a bit to itself).
carry = IF(started,carry,one(started))started = one(started)Y[i] = XOR(X[i],carry)carry = AND(X[i],carry)Y_nonblank[i] = one(started)MODANDJUMP(X_nonblank[i],X_nonblank[i])
The above is not, strictly speaking, a valid NAND-TM program.If we âopen upâ all of the syntactic sugar, we get the followingâsugar freeâ valid program to compute the same function.
temp_0 = NAND(started,started)temp_1 = NAND(started,temp_0)temp_2 = NAND(started,started)temp_3 = NAND(temp_1,temp_2)temp_4 = NAND(carry,started)carry = NAND(temp_3,temp_4)temp_6 = NAND(started,started)started = NAND(started,temp_6)temp_8 = NAND(X[i],carry)temp_9 = NAND(X[i],temp_8)temp_10 = NAND(carry,temp_8)Y[i] = NAND(temp_9,temp_10)temp_12 = NAND(X[i],carry)carry = NAND(temp_12,temp_12)temp_14 = NAND(started,started)Y_nonblank[i] = NAND(started,temp_14)MODANDJUMP(X_nonblank[i],X_nonblank[i])
266 introduction to theoretical computer science
Example 7.10 â XOR in NAND-TM. The following is a NAND-TM pro-gram to compute the XOR function on inputs of arbitrary length.That is XOR ⶠ0, 1â â 0, 1 such that XOR(đ„) = â|đ„|â1đ=0 đ„đmod 2 for every đ„ â 0, 1â. Once again, we use a certain âsyn-tactic sugarâ. Specifically, we access the arrays X and Y at theirzero-th entry, while NAND-TM only allows access to arrays in thecoordinate of the variable i.
temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)MODANDJUMP(X_nonblank[i],X_nonblank[i])
To transform the program above to a valid NAND-TM program,we can transform references such as X[0] and Y[0] to scalar vari-ables x_0 and y_0 (similarly we can transform any reference of theform Foo[17] or Bar[15] to scalars such as foo_17 and bar_15).We then need to add code to load the value of X[0] to x_0 andsimilarly to write to Y[0] the value of y_0, but this is not hard todo. Using the fact that variables are initialized to zero by default,we can create a variable init which will be set to 1 at the end ofthe first iteration and not changed since then. We can then add anarray Atzero and code that will modify Atzero[i] to 1 if init is0 and otherwise leave it as it is. This will ensure that Atzero[i] isequal to 1 if and only if i is set to zero, and allow the program toknow when we are at the zero-th location. Thus we can add codeto read and write to the corresponding scalars x_0, y_0 when weare at the zero-th location, and also code to move i to zero andthen halt at the end. Working this out fully is somewhat tedious,but can be a good exercise.
PWorking out the above two examples can go a longway towards understanding the NAND-TM language.See our GitHub repository for a full specification ofthe NAND-TM language.
loops and infinity 267
7.3 EQUIVALENCE OF TURING MACHINES AND NAND-TM PRO-GRAMS
Given the above discussion, it might not be surprising that Turingmachines turn out to be equivalent to NAND-TM programs. Indeed,we designed the NAND-TM language to have this property. Never-theless, this is an important result, and the first of many other suchequivalence results we will see in this book.
Theorem 7.11 â Turing machines and NAND-TM programs are equivalent. Forevery đč ⶠ0, 1â â 0, 1â, đč is computable by a NAND-TM pro-gram đ if and only if there is a Turing Machine đ that computesđč .
Proof Idea:
To prove such an equivalence theorem, we need to show two di-rections. We need to be able to (1) transform a Turing machine đ toa NAND-TM program đ that computes the same function as đ and(2) transform a NAND-TM program đ into a Turing machine đ thatcomputes the same function as đ .
The idea of the proof is illustrated in Fig. 7.9. To show (1), givena Turing machine đ , we will create a NAND-TM program đ thatwill have an array Tape for the tape of đ and scalar (i.e., non array)variable(s) state for the state of đ . Specifically, since the state of aTuring machine is not in 0, 1 but rather in a larger set [đ], we will useâlog đâ variables state_0 , âŠ, state_âlog đâ â 1 variables to store therepresentation of the state. Similarly, to encode the larger alphabet ÎŁof the tape, we will use âlog |ÎŁ|â arrays Tape_0 , âŠ, Tape_âlog |ÎŁ|â â 1,such that the đđĄâ location of these arrays encodes the đđĄâ symbol in thetape for every tape. Using the fact that every function can be computedby a NAND-CIRC program, we will be able to compute the transitionfunction of đ , replacing moving left and right by decrementing andincrementing i respectively.
We show (2) using very similar ideas. Given a program đ thatuses đ array variables and đ scalar variables, we will create a Turingmachine with about 2đ states to encode the values of scalar variables,and an alphabet of about 2đ so we can encode the arrays using ourtape. (The reason the sizes are only âaboutâ 2đ and 2đ is that we willneed to add some symbols and steps for bookkeeping purposes.) TheTuring Machine đ will simulate each iteration of the program đ byupdating its state and tape accordingly.âProof of Theorem 7.11. We start by proving the âifâ direction of The-orem 7.11. Namely we show that given a Turing machine đ , we can
268 introduction to theoretical computer science
Figure 7.9: Comparing a Turing Machine to a NAND-TM program. Both have an unbounded memorycomponent (the tape for a Turing machine, and the ar-rays for a NAND-TM program), as well as a constantlocal memory (state for a Turing machine, and scalarvariables for a NAND-TM program). Both can onlyaccess at each step one location of the unboundedmemory, this is the âheadâ location for a Turingmachine, and the value of the index variable i for aNAND-TM program.
find a NAND-TM program đđ such that for every input đ„, if đ haltson input đ„ with output đŠ then đđ(đ„) = đŠ. Since our goal is just toshow such a program đđ exists, we donât need to write out the fullcode of đđ line by line, and can take advantage of our various âsyn-tactic sugarâ in describing it.
The key observation is that by Theorem 4.12 we can compute everyfinite function using a NAND-CIRC program. In particular, considerthe transition function đżđ ⶠ[đ] Ă ÎŁ â [đ] Ă ÎŁ Ă L, R of our TuringMachine. We can encode the its components as follows:
âą We encode [đ] using 0, 1â and ÎŁ using 0, 1ââČ , where â = âlog đâand ââČ = âlog |ÎŁ|â.
⹠We encode the set L, R, S, H using 0, 12. We will choose theencode L ⊠01, R ⊠11, S ⊠10, H ⊠00. (This convenientlycorresponds to the semantics of the MODANDJUMP operation.)
Hence we can identify đżđ with a function đ ⶠ0, 1â+ââČ â0, 1â+ââČ+2, mapping strings of length â + ââČ to strings of lengthâ + ââČ + 2. By Theorem 4.12 there exists a finite length NAND-CIRCprogram ComputeM that computes this function đ . The NAND-TMprogram to simulate đ will essentially be the following:
loops and infinity 269
Algorithm 7.12 â NAND-TM program to simulate TM đ .
Input: đ„ â 0, 1âOutput: đ(đ„) if đ halts on đ„. Otherwise go into infinite
loop1: # We use variables state_0 ⊠state_â â 1 to encode đ âs
state2: # We use arrays Tape_0[] ⊠Tape_ââČ â 1[] to encode đ âs
tape3: # We omit the initial and final âbook keepingâ to copy
Input: to Tape and copyOutput: from Tape4: # Use the fact that transition is finite and computable by
NAND-CIRC program:5: state_0 ⊠state_â â 1, Tape_0[i]⊠Tape_ââČ â 1[i],
dir0,dir1 â TRANSITION( state_0 ⊠state_â â 1,Tape_0[i]⊠Tape_ââČ â 1[i], dir0,dir1 )
6: MODANDJMP(dir0,dir1)
Every step of the main loop of the above program perfectly mimicsthe computation of the Turing Machine đ and so the program carriesout exactly the definition of computation by a Turing Machine as perDefinition 7.1.
For the other direction, suppose that đ is a NAND-TM programwith đ lines, â scalar variables, and ââČ array variables. We will showthat there exists a Turing machine đđ with 2â + đ¶ states and alphabetÎŁ of size đ¶âČ + 2ââČ that computes the same functions as đ (where đ¶, đ¶âČare some constants to be determined later).
Specifically, consider the function đ ⶠ0, 1â Ă 0, 1ââČ â 0, 1â Ă0, 1ââČ that on input the contents of đ âs scalar variables and the con-tents of the array variables at location i in the beginning of an itera-tion, outputs all the new values of these variables at the last line of theiteration, right before the MODANDJUMP instruction is executed.
If foo and bar are the two variables that are used as input to theMODANDJUMP instruction, then this means that based on the values ofthese variables we can compute whether i will increase, decrease orstay the same, and whether the program will halt or jump back to thebeginning. Hence a Turing machine can simulate an execution of đ inone iteration using a finite function applied to its alphabet. The overalloperation of the Turing machine will be as follows:
1. The machine đđ encodes the contents of the array variables of đin its tape, and the contents of the scalar variables in (part of) itsstate. Specifically, if đ has â local variables and đĄ arrays, then thestate space of đ will be large enough to encode all 2â assignments
270 introduction to theoretical computer science
to the local variables and the alphabet ÎŁ of đ will be large enoughto encode all 2đĄ assignments for the array variables at each location.The head location corresponds to the index variable i.
2. Recall that every line of the program đ corresponds to reading andwriting either a scalar variable, or an array variable at the locationi. In one iteration of đ the value of i remains fixed, and so themachine đ can simulate this iteration by reading the values ofall array variables at i (which are encoded by the single symbolin the alphabet ÎŁ located at the i-th cell of the tape) , reading thevalues of all scalar variables (which are encoded by the state), andupdating both. The transition function of đ can output L, S, Rdepending on whether the values given to the MODANDJMP operationare 01, 10 or 11 respectively.
3. When the program halts (i.e., MODANDJMP gets 00) then the Turingmachine will enter into a special loop to copy the results of the Yarray into the output and then halt. We can achieve this by adding afew more states.
The above is not a full formal description of a Turing Machine, butour goal is just to show that such a machine exists. One can see thatđđ simulates every step of đ , and hence computes the same functionas đ .
RRemark 7.13 â Running time equivalence (optional). Ifwe examine the proof of Theorem 7.11 then we can seethat every iteration of the loop of a NAND-TM pro-gram corresponds to one step in the execution of theTuring machine. We will come back to this questionof measuring number of computation steps later inthis course. For now the main take away point is thatNAND-TM programs and Turing Machines are essen-tially equivalent in power even when taking runningtime into account.
7.3.1 Specification vs implementation (again)Once you understand the definitions of both NAND-TM programsand Turing Machines, Theorem 7.11 is fairly straightforward. Indeed,NAND-TM programs are not as much a different model from TuringMachines as they are simply a reformulation of the same model usingprogramming language notation. You can think of the difference be-tween a Turing machine and a NAND-TM program as the differencebetween representing a number using decimal or binary notation. In
loops and infinity 271
contrast, the difference between a function đč and a Turing machinethat computes đč is much more profound: it is like the difference be-tween the equation đ„2 + đ„ = 12 and the number 3 that is a solutionfor this equation. For this reason, while we take special care in distin-guishing functions from programs or machines, we will often identifythe two latter concepts. We will move freely between describing analgorithm as a Turing machine or as a NAND-TM program (as wellas some of the other equivalent computational models we will see inChapter 8 and beyond).
Table 7.3: Specification vs Implementation formalisms
Setting Specification ImplementationFinite com-putation
Functions mapping 0, 1đ to0, 1đ Circuits, Straightlineprograms
Infinitecomputa-tion
Functions mapping 0, 1â to0, 1 or to 0, 1â. Algorithms, TuringMachines, Programs
7.4 NAND-TM SYNTACTIC SUGAR
Just like we did with NAND-CIRC in Chapter 4, we can use âsyntacticsugarâ to make NAND-TM programs easier to write. For starters, wecan use all of the syntactic sugar of NAND-CIRC, and so have accessto macro definitions and conditionals (i.e., if/then). But we can gobeyond this and achieve for example:
âą Inner loops such as the while and for operations common to manyprogramming language.
âą Multiple index variables (e.g., not just i but we can add j, k, etc.).
âą Arrays with more than one dimension (e.g., Foo[i][j],Bar[i][j][k] etc.)
In all of these cases (and many others) we can implement the newfeature as mere âsyntactic sugarâ on top of standard NAND-TM,which means that the set of functions computable by NAND-TMwith this feature is the same as the set of functions computable bystandard NAND-TM. Similarly, we can show that the set of functionscomputable by Turing Machines that have more than one tape, ortapes of more dimensions than one, is the same as the set of functionscomputable by standard Turing machines.
272 introduction to theoretical computer science
7.4.1 âGOTOâ and inner loopsWe can implement more advanced looping constructs than the simpleMODANDJUMP. For example, we can implement GOTO. A GOTO statementcorresponds to jumping to a certain line in the execution. For example,if we have code of the form
"start": do fooGOTO("end")
"skip": do bar"end": do blah
then the program will only do foo and blah as when it reaches theline GOTO("end") it will jump to the line labeled with "end". We canachieve the effect of GOTO in NAND-TM using conditionals. In thecode below, we assume that we have a variable pc that can take stringsof some constant length. This can be encoded using a finite numberof Boolean variables pc_0, pc_1, âŠ, pc_đ â 1, and so when we writebelow pc = "label" what we mean is something like pc_0 = 0,pc_1= 1, ⊠(where the bits 0, 1, ⊠correspond to the encoding of the finitestring "label" as a string of length đ). We also assume that we haveaccess to conditional (i.e., if statements), which we can emulate usingsyntactic sugar in the same way as we did in NAND-CIRC.
To emulate a GOTO statement, we will first modify a program P ofthe form
do foodo bardo blah
to have the following form (using syntactic sugar for if):
pc = "line1"if (pc=="line1"):
do foopc = "line2"
if (pc=="line2"):do barpc = "line3"
if (pc=="line3"):do blah
These two programs do the same thing. The variable pc cor-responds to the âprogram counterâ and tells the program whichline to execute next. We can see that if we wanted to emulate aGOTO("line3") then we could simply modify the instruction pc ="line2" to be pc = "line3".
loops and infinity 273
In NAND-CIRC we could only have GOTOs that go forward in thecode, but since in NAND-TM everything is encompassed within alarge outer loop, we can use the same ideas to implement GOTOâs thatcan go backwards, as well as conditional loops.
Other loops. Once we have GOTO, we can emulate all the standard loopconstructs such as while, do .. until or for in NAND-TM as well.For example, we can replace the code
while foo:do blah
do bar
with
"loop":if NOT(foo): GOTO("next")do blahGOTO("loop")
"next":do bar
RRemark 7.14 â GOTOâs in programming languages. TheGOTO statement was a staple of most early program-ming languages, but has largely fallen out of favor andis not included in many modern languages such asPython, Java, Javascript. In 1968, Edsger Dijsktra wrote afamous letter titled âGo to statement considered harm-ful.â (see also Fig. 7.10). The main trouble with GOTOis that it makes analysis of programs more difficultby making it harder to argue about invariants of theprogram.When a program contains a loop of the form:
for j in range(100):do something
do blah
you know that the line of code do blah can only bereached if the loop ended, in which case you knowthat j is equal to 100, and might also be able to argueother properties of the state of the program. In con-trast, if the program might jump to do blah from anyother point in the code, then itâs very hard for you asthe programmer to know what you can rely upon inthis code. As Dijkstra said, such invariants are impor-tant because âour intellectual powers are rather gearedto master static relations and .. our powers to visualizeprocesses evolving in time are relatively poorly developedâ
274 introduction to theoretical computer science
Figure 7.10: XKCDâs take on the GOTO statement.
and so âwe should ⊠do âŠour utmost best to shorten theconceptual gap between the static program and the dynamicprocess.âThat said, GOTO is still a major part of lower level lan-guages where it is used to implement higher levellooping constructs such as while and for loops.For example, even though Java doesnât have a GOTOstatement, the Java Bytecode (which is a lower levelrepresentation of Java) does have such a statement.Similarly, Python bytecode has instructions such asPOP_JUMP_IF_TRUE that implement the GOTO function-ality, and similar instructions are included in manyassembly languages. The way we use GOTO to imple-ment a higher level functionality in NAND-TM isreminiscent of the way these various jump instructionsare used to implement higher level looping constructs.
7.5 UNIFORMITY, AND NAND VS NAND-TM (DISCUSSION)
While NAND-TM adds extra operations over NAND-CIRC, it is notexactly accurate to say that NAND-TM programs or Turing machinesare âmore powerfulâ than NAND-CIRC programs or Boolean circuits.NAND-CIRC programs, having no loops, are simply not applicablefor computing functions with an unbounded number of inputs. Thus,to compute a function đč ⶠ0, 1â â¶â 0, 1â using NAND-CIRC (orequivalently, Boolean circuits) we need a collection of programs/cir-cuits: one for every input length.
The key difference between NAND-CIRC and NAND-TM is thatNAND-TM allows us to express the fact that the algorithm for com-puting parities of length-100 strings is really the same one as the al-gorithm for computing parities of length-5 strings (or similarly thefact that the algorithm for adding đ-bit numbers is the same for everyđ, etc.). That is, one can think of the NAND-TM program for generalparity as the âseedâ out of which we can grow NAND-CIRC programsfor length 10, length 100, or length 1000 parities as needed.
This notion of a single algorithm that can compute functions ofall input lengths is known as uniformity of computation and hencewe think of Turing machines / NAND-TM as uniform model of com-putation, as opposed to Boolean circuits or NAND-CIRC which is anonuniform model, where we have to specify a different program forevery input length.
Looking ahead, we will see that this uniformity leads to anothercrucial difference between Turing machines and circuits. Turing ma-chines can have inputs and outputs that are longer than the descrip-tion of the machine as a string and in particular there exists a Turingmachine that can âself replicateâ in the sense that it can print its own
loops and infinity 275
code. This notion of âself replicationâ, and the related notion of âselfreferenceâ is crucial to many aspects of computation, as well of courseto life itself, whether in the form of digital or biological programs.
For now, what you ought to remember is the following differencesbetween uniform and non uniform computational models:
âą Non uniform computational models: Examples are NAND-CIRCprograms and Boolean circuits. These are models where each indi-vidual program/circuit can compute a finite function đ ⶠ0, 1đ â0, 1đ. We have seen that every finite function can be computed bysome program/circuit. To discuss computation of an infinite func-tion đč ⶠ0, 1â â 0, 1â we need to allow a sequence đđđââ ofprograms/circuits (one for every input length), but this does notcapture the notion of a single algorithm to compute the function đč .
âą Uniform computational models: Examples are Turing machines andNAND-TM programs. These are models where a single program/-machine can take inputs of arbitrary length and hence compute aninfinite function đč ⶠ0, 1â â 0, 1â. The number of steps thata program/machine takes on some input is not a priori boundedin advance and in particular there is a chance that it will enter intoan infinite loop. Unlike the nonuniform case, we have not shownthat every infinite function can be computed by some NAND-TMprogram/Turing Machine. We will come back to this point in Chap-ter 9.
Chapter Recap
âą Turing machines capture the notion of a single al-gorithm that can evaluate functions of every inputlength.
âą They are equivalent to NAND-TM programs, whichadd loops and arrays to NAND-CIRC.
âą Unlike NAND-CIRC or Boolean circuits, the num-ber of steps that a Turing machine takes on a giveninput is not fixed in advance. In fact, a Turing ma-chine or a NAND-TM program can enter into aninfinite loop on certain inputs, and not halt at all.
7.6 EXERCISES
Exercise 7.1 â Explicit NAND TM programming. Produce the code of a(syntactic-sugar free) NAND-TM program đ that computes the (un-bounded input length) Majority function đđđ ⶠ0, 1â â 0, 1 wherefor every đ„ â 0, 1â, đđđ(đ„) = 1 if and only if â|đ„|đ=0 đ„đ > |đ„|/2. Wesay âproduceâ rather than âwriteâ because you do not have to write
276 introduction to theoretical computer science
the code of đ by hand, but rather can use the programming languageof your choice to compute this code.
Exercise 7.2 â Computable functions examples. Prove that the followingfunctions are computable. For all of these functions, you do not haveto fully specify the Turing Machine or the NAND-TM program thatcomputes the function, but rather only prove that such a machine orprogram exists:
1. INC ⶠ0, 1â â 0, 1â which takes as input a representation of anatural number đ and outputs the representation of đ + 1.
2. ADD ⶠ0, 1â â 0, 1â which takes as input a representation ofa pair of natural numbers (đ, đ) and outputs the representation ofđ + đ.
3. MULT ⶠ0, 1â â 0, 1â, which takes a representation of a pair ofnatural numbers (đ, đ) and outputs the representation of đ.
4. SORT ⶠ0, 1â â 0, 1â which takes as input the representation ofa list of natural numbers (đ0, ⊠, đđâ1) and returns its sorted version(đ0, ⊠, đđâ1) such that for every đ â [đ] there is some đ â [đ] withđđ = đđ and đ0 †đ1 †⯠†đđâ1.
Exercise 7.3 â Two index NAND-TM. Define NAND-TMâ to be the variant ofNAND-TM where there are two index variables i and j. Arrays can beindexed by either i or j. The operation MODANDJMP takes four variablesđ, đ, đ, đ and uses the values of đ, đ to decide whether to increment j,decrement j or keep it in the same value (corresponding to 01, 10, and00 respectively). Prove that for every function đč ⶠ0, 1â â 0, 1â, đčis computable by a NAND-TM program if and only if đč is computableby a NAND-TMâ program.
Exercise 7.4 â Two tape Turing machines. Define a two tape Turing machineto be a Turing machine which has two separate tapes and two separateheads. At every step, the transition function gets as input the locationof the cells in the two tapes, and can decide whether to move eachhead independently. Prove that for every function đč ⶠ0, 1â â0, 1â, đč is computable by a standard Turing Machine if and only if đčis computable by a two-tape Turing machine.
Exercise 7.5 â Two dimensional arrays. Define NAND-TMâ â to be the vari-ant of NAND-TM where just like NAND-TMâ defined in Exercise 7.3
loops and infinity 277
3 You can use the sequence R, L,R, R, L, L, R,R,R, L, L, L,âŠ.
there are two index variables i and j, but now the arrays are two di-mensional and so we index an array Foo by Foo[i][j]. Prove that forevery function đč ⶠ0, 1â â 0, 1â, đč is computable by a NAND-TMprogram if and only if đč is computable by a NAND-TMâ â program.
Exercise 7.6 â Two dimensional Turing machines. Define a two-dimensionalTuring machine to be a Turing machine in which the tape is two dimen-sional. At every step the machine can move Up, Down, Left, Right, orStay. Prove that for every function đč ⶠ0, 1â â 0, 1â, đč is com-putable by a standard Turing Machine if and only if đč is computableby a two-dimensional Turing machine.
Exercise 7.7 Prove the following closure properties of the set R definedin Definition 7.3:
1. If đč â R then the function đș(đ„) = 1 â đč(đ„) is in R.
2. If đč, đș â R then the function đ»(đ„) = đč(đ„) âš đș(đ„) is in R.
3. If đč â R then the function đč â in in R where đč â is defined as fol-lows: đč â(đ„) = 1 iff there exist some strings đ€0, ⊠, đ€đâ1 such thatđ„ = đ€0đ€1 ⯠đ€đâ1 and đč(đ€đ) = 1 for every đ â [đ].
4. If đč â R then the function
đș(đ„) = â§âšâ©âđŠâ0,1|đ„|đč(đ„đŠ) = 10 otherwise(7.3)
is in R.
Exercise 7.8 â Oblivious Turing Machines (challenging). Define a Turing Ma-chine đ to be oblivious if its head movements are independent of itsinput. That is, we say that đ is oblivious if there exists an infinitesequence MOVE â L, R, Sâ such that for every đ„ â 0, 1â, themovements of đ when given input đ„ (up until the point it halts, ifsuch point exists) are given by MOVE0,MOVE1,MOVE2, âŠ.
Prove that for every function đč ⶠ0, 1â â 0, 1â, if đč is com-putable then it is computable by an oblivious Turing machine. Seefootnote for hint.3
Exercise 7.9 â Single vs multiple bit. Prove that for every đč ⶠ0, 1â â0, 1â, the function đč is computable if and only if the following func-
278 introduction to theoretical computer science
tion đș ⶠ0, 1â â 0, 1 is computable, where đș is defined as follows:đș(đ„, đ, đ) = â§âšâ©đč(đ„)đ đ < |đč(đ„)|, đ = 01 đ < |đč(đ„)|, đ = 10 đ â„ |đč(đ„)|
Exercise 7.10 â Uncomputability via counting. Recall that R is the set of alltotal functions from 0, 1â to 0, 1 that are computable by a Turingmachine (see Definition 7.3). Prove that R is countable. That is, provethat there exists a one-to-one map đ·đĄđ ⶠR â â. You can use theequivalence between Turing machines and NAND-TM programs.
Exercise 7.11 â Not every function is computable. Prove that the set of alltotal functions from 0, 1â â 0, 1 is not countable. You can use theresults of Section 2.4. (We will see an explicit uncomputable functionin Chapter 9.)
7.7 BIBLIOGRAPHICAL NOTES
Augusta Ada Byron, countess of Lovelace (1815-1852) lived a shortbut turbulent life, though is today most well known for her collabo-ration with Charles Babbage (see [Ste87] for a biography). Ada tookan immense interest in Babbageâs analytical engine, which we men-tioned in Chapter 3. In 1842-3, she translated from Italian a paper ofMenabrea on the engine, adding copious notes (longer than the paperitself). The quote in the chapterâs beginning is taken from Nota A inthis text. Lovelaceâs notes contain several examples of programs for theanalytical engine, and because of this she has been called âthe worldâsfirst computer programmerâ though it is not clear whether they werewritten by Lovelace or Babbage himself [Hol01]. Regardless, Ada wasclearly one of very few people (perhaps the only one outside of Bab-bage himself) to fully appreciate how significant and revolutionarythe idea of mechanizing computation truly is.
The books of Shetterly [She16] and Sobel [Sob17] discuss the his-tory of human computers (who were female, more often than not)and their important contributions to scientific discoveries in astron-omy and space exploration.
Alan Turing was one of the intellectual giants of the 20th century.He was not only the first person to define the notion of computation,but also invented and used some of the worldâs earliest computationaldevices as part of the effort to break the Enigma cipher during WorldWar II, saving millions of lives. Tragically, Turing committed suicidein 1954, following his conviction in 1952 for homosexual acts and acourt-mandated hormonal treatment. In 2009, British prime minister
loops and infinity 279
Gordon Brown made an official public apology to Turing, and in 2013Queen Elizabeth II granted Turing a posthumous pardon. Turingâs lifeis the subject of a great book and a mediocre movie.
Sipserâs text [Sip97] defines a Turing machine as a seven tuple con-sisting of the state space, input alphabet, tape alphabet, transitionfunction, starting state, accepting state, and rejecting state. Superfi-cially this looks like a very different definition than Definition 7.1 butit is simply a different representation of the same concept, just as agraph can be represented in either adjacency list or adjacency matrixform.
One difference is that Sipser considers a general set of states đ thatis not necessarily of the form đ = 0, 1, 2, ⊠, đ â 1 for some naturalnumber đ > 0. Sipser also restricts his attention to Turing machinesthat output only a single bit and therefore designates two special halt-ing states: the â0 halting stateâ (often known as the rejecting state) andthe other as the â1 halting stateâ (often known as the accepting state).Thus instead of writing 0 or 1 on an output tape, the machine will en-ter into one of these states and halt. This again makes no differenceto the computational power, though we prefer to consider the moregeneral model of multi-bit outputs. (Sipser presents the basic task of aTuring machine as that of deciding a language as opposed to computinga function, but these are equivalent, see Remark 7.4.)
Sipser considers also functions with input in ÎŁâ for an arbitraryalphabet ÎŁ (and hence distinguishes between the input alphabet whichhe denotes as ÎŁ and the tape alphabet which he denotes as Î), while werestrict attention to functions with binary strings as input. Again thisis not a major issue, since we can always encode an element of ÎŁ usinga binary string of length logâ|ÎŁ|â. Finally (and this is a very minorpoint) Sipser requires the machine to either move left or right in everystep, without the Stay operation, though staying in place is very easyto emulate by simply moving right and then back left.
Another definition used in the literature is that a Turing machineđ recognizes a language đż if for every đ„ â đż, đ(đ„) = 1 and forevery đ„ â đż, đ(đ„) â 0, â„. A language đż is recursively enumerable ifthere exists a Turing machine đ that recognizes it, and the set of allrecursively enumerable languages is often denoted by RE. We will notuse this terminology in this book.
One of the first programming-language formulations of Turingmachines was given by Wang [Wan57]. Our formulation of NAND-TM is aimed at making the connection with circuits more direct, withthe eventual goal of using it for the Cook-Levin Theorem, as well asresults such as P â P/poly and BPP â P/poly. The website esolangs.orgfeatures a large variety of esoteric Turing-complete programminglanguages. One of the most famous of them is Brainf*ck.
8Equivalent models of computation
âAll problems in computer science can be solved by another level of indirec-tionâ, attributed to David Wheeler.
âBecause we shall later compute with expressions for functions, we need adistinction between functions and forms and a notation for expressing thisdistinction. This distinction and a notation for describing it, from which wedeviate trivially, is given by Church.â, John McCarthy, 1960 (in paperdescribing the LISP programming language)
So far we have defined the notion of computing a function usingTuring machines, which are not a close match to the way computationis done in practice. In this chapter we justify this choice by showingthat the definition of computable functions will remain the sameunder a wide variety of computational models. This notion is knownas Turing completeness or Turing equivalence and is one of the mostfundamental facts of computer science. In fact, a widely believedclaim known as the Church-Turing Thesis holds that every âreasonableâdefinition of computable function is equivalent to being computableby a Turing machine. We discuss the Church-Turing Thesis and thepotential definitions of âreasonableâ in Section 8.8.
Some of the main computational models we discuss in this chapterinclude:
âą RAM Machines: Turing Machines do not correspond to standardcomputing architectures that have Random Access Memory (RAM).The mathematical model of RAM machines is much closer to actualcomputers, but we will see that it is equivalent in power to TuringMachines. We also discuss a programming language variant ofRAM machines, which we call NAND-RAM. The equivalence ofTuring Machines and RAM machines enables demonstrating theTuring Equivalence of many popular programming languages, in-cluding all general-purpose languages used in practice such as C,Python, JavaScript, etc.
Compiled on 8.26.2020 18:10
Learning Objectives:⹠Learn about RAM machines and the λ
calculus.âą Equivalence between these and other models
and Turing machines.âą Cellular automata and configurations of
Turing machines.âą Understand the Church-Turing thesis.
282 introduction to theoretical computer science
âą Cellular Automata: Many natural and artificial systems can bemodeled as collections of simple components, each evolving ac-cording to simple rules based on its state and the state of its imme-diate neighbors. One well-known such example is Conwayâs Gameof Life. To prove that cellular automata are equivalent to Turingmachines we introduce the tool of configurations of Turing Ma-chines. These have other applications, and in particular are used inChapter 11 to prove Gödelâs Incompleteness Theorem: a central resultin mathematics.
âą đ calculus: The đ calculus is a model for expressing computationthat originates from the 1930âs, though it is closely connected tofunctional programming languages widely used today. Showingthe equivalence of đ calculus to Turing Machines involves a beauti-ful technique to eliminate recursion known as the âY Combinatorâ.
This chapter: A non-mathy overviewIn this chapter we study equivalence between models. Twocomputational models are equivalent (also known as Turingequivalent) if they can compute the same set of functions. Forexample, we have seen that Turing Machines and NAND-TMprograms are equivalent since we can transform every Tur-ing Machine into a NAND-TM program that computes thesame function, and similarly can transform every NAND-TM program into a Turing Machine that computes the samefunction.In this chapter we show this extends far beyond Turing Ma-chines. The techniques we develop allow us to show thatall general-purpose programming languages (i.e., Python,C, Java, etc.) are Turing Complete, in the sense that they cansimulate Turing Machines and hence compute all functionsthat can be computed by a TM. We will also show the otherdirection- Turing Machines can be used to simulate a pro-gram in any of these languages and hence compute anyfunction computable by them. This means that all these pro-gramming language are Turing equivalent: they are equivalentin power to Turing Machines and to each other. This is apowerful principle, which underlies behind the vast reach ofComputer Science. Moreover, it enables us to âhave our cakeand eat it tooâ- since all these models are equivalent, we canchoose the model of our convenience for the task at hand.To achieve this equivalence, we define a new computationalmodel known as RAM machines. RAM Machines capture the
equivalent models of computation 283
architecture of modern computers more closely than TuringMachines, but are still computationally equivalent to Turingmachines.Finally, we will show that Turing equivalence extends farbeyond traditional programming languages. We will seethat cellular automata which are a mathematical model of ex-tremely simple natural systems is also Turing equivalent, andalso see the Turing equivalence of the đ calculus - a logicalsystem for expressing functions that is the basis for functionalprogramming languages such as Lisp, OCaml, and more.See Fig. 8.1 for an overview of the results of this chapter.
Figure 8.1: Some Turing-equivalent models. All ofthese are equivalent in power to Turing Machines(or equivalently NAND-TM programs) in the sensethat they can compute exactly the same class offunctions. All of these are models for computinginfinite functions that take inputs of unboundedlength. In contrast, Boolean circuits / NAND-CIRCprograms can only compute finite functions and henceare not Turing complete.
8.1 RAM MACHINES AND NAND-RAM
One of the limitations of Turing Machines (and NAND-TM programs)is that we can only access one location of our arrays/tape at a time. Ifthe head is at position 22 in the tape and we want to access the 957-thposition then it will take us at least 923 steps to get there. In contrast,almost every programming language has a formalism for directlyaccessing memory locations. Actual physical computers also provideso called Random Access Memory (RAM) which can be thought of as alarge array Memory, such that given an index đ (i.e., memory address,or a pointer), we can read from and write to the đđĄâ location of Memory.(âRandom access memoryâ is quite a misnomer since it has nothing todo with probability, but since it is a standard term in both the theoryand practice of computing, we will use it as well.)
The computational model that models access to such a memory isthe RAM machine (sometimes also known as the Word RAM model),as depicted in Fig. 8.2. The memory of a RAM machine is an arrayof unbounded size where each cell can store a single word, whichwe think of as a string in 0, 1đ€ and also (equivalently) as a num-ber in [2đ€]. For example, many modern computing architectures
284 introduction to theoretical computer science
Figure 8.2: A RAMMachine contains a finite number oflocal registers, each of which holds an integer, and anunbounded memory array. It can perform arithmeticoperations on its register as well as load to a register đthe contents of the memory at the address indexed bythe number in register đâČ.
Figure 8.3: Different aspects of RAM machines andTuring machines. RAM machines can store integersin their local registers, and can read and write totheir memory at a location specified by a register.In contrast, Turing machines can only access theirmemory in the head location, which moves at mostone position to the right or left in each step.
use 64 bit words, in which every memory location holds a string in0, 164 which can also be thought of as a number between 0 and264 â 1 = 18, 446, 744, 073, 709, 551, 615. The parameter đ€ is knownas the word size. In practice often đ€ is a fixed number such as 64, butwhen doing theory we model đ€ as a parameter that can depend onthe input length or number of steps. (You can think of 2đ€ as roughlycorresponding to the largest memory address that we use in the com-putation.) In addition to the memory array, a RAM machine alsocontains a constant number of registers đ0, ⊠, đđâ1, each of which canalso contain a single word.
The operations a RAM machine can carry out include:âą Data movement: Load data from a certain cell in memory into
a register or store the contents of a register into a certain cell ofmemory. RAM machine can directly access any cell of memorywithout having to move the âheadâ (as Turing machines do) to thatlocation. That is, in one step a RAM machine can load into registerđđ the contents of the memory cell indexed by register đđ, or storeinto the memory cell indexed by register đđ the contents of registerđđ.
âą Computation: RAM machines can carry out computation on regis-ters such as arithmetic operations, logical operations, and compar-isons.
âą Control flow: As in the case of Turing machines, the choice of whatinstruction to perform next can depend on the state of the RAMmachine, which is captured by the contents of its register.We will not give a formal definition of RAM Machines, though the
bibliographical notes section (Section 8.10) contains sources for suchdefinitions. Just as the NAND-TM programming language modelsTuring machines, we can also define a NAND-RAM programming lan-guage that models RAM machines. The NAND-RAM programminglanguage extends NAND-TM by adding the following features:âą The variables of NAND-RAM are allowed to be (non negative)
integer valued rather than only Boolean as is the case in NAND-TM. That is, a scalar variable foo holds a non negative integer in â(rather than only a bit in 0, 1), and an array variable Bar holdsan array of integers. As in the case of RAM machines, we will notallow integers of unbounded size. Concretely, each variable holdsa number between 0 and đ â 1, where đ is the number of stepsthat have been executed by the program so far. (You can ignorethis restriction for now: if we want to hold larger numbers, wecan simply execute dummy instructions; it will be useful in laterchapters.)
equivalent models of computation 285
Figure 8.4: Overview of the steps in the proof of The-orem 8.1 simulating NANDRAM with NANDTM.We first use the inner loop syntactic sugar of Sec-tion 7.4.1 to enable loading an integer from an arrayto the index variable i of NANDTM. Once we can dothat, we can simulate indexed access in NANDTM. Wethen use an embedding of â2 in â to simulate twodimensional bit arrays in NANDTM. Finally, we usethe binary representation to encode one-dimensionalarrays of integers as two dimensional arrays of bitshence completing the simulation of NANDRAM withNANDTM.
âą We allow indexed access to arrays. If foo is a scalar and Bar is anarray, then Bar[foo] refers to the location of Bar indexed by thevalue of foo. (Note that this means we donât need to have a specialindex variable i anymore.)
âą As is often the case in programming languages, we will assumethat for Boolean operations such as NAND, a zero valued integer isconsidered as false, and a nonzero valued integer is considered astrue.
âą In addition to NAND, NAND-RAM also includes all the basic arith-metic operations of addition, subtraction, multiplication, (integer)division, as well as comparisons (equal, greater than, less than,etc..).
âą NAND-RAM includes conditional statements if/then as part ofthe language.
âą NAND-RAM contains looping constructs such as while and do aspart of the language.
A full description of the NAND-RAM programming language isin the appendix. However, the most important fact you need to knowabout NAND-RAM is that you actually donât need to know muchabout NAND-RAM at all, since it is equivalent in power to Turingmachines:
Theorem 8.1 â Turing Machines (aka NAND-TM programs) and RAM ma-
chines (aka NAND-RAM programs) are equivalent. For every functionđč ⶠ0, 1â â 0, 1â, đč is computable by a NAND-TM program ifand only if đč is computable by a NAND-RAM program.
Since NAND-TM programs are equivalent to Turing machines, andNAND-RAM programs are equivalent to RAM machines, Theorem 8.1shows that all these four models are equivalent to one another.
Proof Idea:
Clearly NAND-RAM is only more powerful than NAND-TM, andso if a function đč is computable by a NAND-TM program then it canbe computed by a NAND-RAM program. The challenging direction isto transform a NAND-RAM program đ to an equivalent NAND-TMprogram đ. To describe the proof in full we will need to cover the fullformal specification of the NAND-RAM language, and show how wecan implement every one of its features as syntactic sugar on top ofNAND-TM.
This can be done but going over all the operations in detail is rathertedious. Hence we will focus on describing the main ideas behind this
286 introduction to theoretical computer science
transformation. (See also Fig. 8.4.) NAND-RAM generalizes NAND-TM in two main ways: (a) adding indexed access to the arrays (ie..,Foo[bar] syntax) and (b) moving from Boolean valued variables tointeger valued ones. The transformation has two steps:
1. Indexed access of bit arrays: We start by showing how to handle (a).Namely, we show how we can implement in NAND-TM the op-eration Setindex(Bar) such that if Bar is an array that encodessome integer đ, then after executing Setindex(Bar) the value ofi will equal to đ. This will allow us to simulate syntax of the formFoo[Bar] by Setindex(Bar) followed by Foo[i].
2. Two dimensional bit arrays: We then show how we can use âsyntacticsugarâ to augment NAND-TM with two dimensional arrays. That is,have two indices i and j and two dimensional arrays, such that we canuse the syntax Foo[i][j] to access the (i,j)-th location of Foo.
3. Arrays of integers: Finally we will encode a one dimensional arrayArr of integers by a two dimensional Arrbin of bits. The idea issimple: if đđ,0, ⊠, đđ,â is a binary (prefix-free) representation ofArr[đ], then Arrbin[đ][đ] will be equal to đđ,đ.Once we have arrays of integers, we can use our usual syntactic
sugar for functions, GOTO etc. to implement the arithmetic and controlflow operations of NAND-RAM.â
The above approach is not the only way to obtain a proof of Theo-rem 8.1, see for example Exercise 8.1
RRemark 8.2 â RAM machines / NAND-RAM and assemblylanguage (optional). RAM machines correspond quiteclosely to actual microprocessors such as those in theIntel x86 series that also contains a large primary mem-ory and a constant number of small registers. This is ofcourse no accident: RAM machines aim at modelingmore closely than Turing machines the architecture ofactual computing systems, which largely follows theso called von Neumann architecture as described inthe report [Neu45]. As a result, NAND-RAM is sim-ilar in its general outline to assembly languages suchas x86 or NIPS. These assembly languages all haveinstructions to (1) move data from registers to mem-ory, (2) perform arithmetic or logical computationson registers, and (3) conditional execution and loops(âifâ and âgotoâ, commonly known as âbranchesâ andâjumpsâ in the context of assembly languages).The main difference between RAM machines andactual microprocessors (and correspondingly between
equivalent models of computation 287
NAND-RAM and assembly languages) is that actualmicroprocessors have a fixed word size đ€ so that allregisters and memory cells hold numbers in [2đ€] (orequivalently strings in 0, 1đ€). This number đ€ canvary among different processors, but common valuesare either 32 or 64. As a theoretical model, RAM ma-chines do not have this limitation, but we rather let đ€be the logarithm of our running time (which roughlycorresponds to its value in practice as well). Actualmicroprocessors also have a fixed number of registers(e.g., 14 general purpose registers in x86-64) but thisdoes not make a big difference with RAM machines.It can be shown that RAM machines with as few astwo registers are as powerful as full-fledged RAM ma-chines that have an arbitrarily large constant numberof registers.Of course actual microprocessors have many featuresnot shared with RAM machines as well, includingparallelism, memory hierarchies, and many others.However, RAM machines do capture actual comput-ers to a first approximation and so (as we will see), therunning time of an algorithm on a RAM machine (e.g.,đ(đ) vs đ(đ2)) is strongly correlated with its practicalefficiency.
8.2 THE GORY DETAILS (OPTIONAL)
We do not show the full formal proof of Theorem 8.1 but focus on themost important parts: implementing indexed access, and simulatingtwo dimensional arrays with one dimensional ones. Even these arealready quite tedious to describe, as will not be surprising to anyonethat has ever written a compiler. Hence you can feel free to merelyskim this section. The important point is not for you to know all de-tails by heart but to be convinced that in principle it is possible totransform a NAND-RAM program to an equivalent NAND-TM pro-gram, and even be convinced that, with sufficient time and effort, youcould do it if you wanted to.
8.2.1 Indexed access in NAND-TMIn NAND-TM we can only access our arrays in the position of the in-dex variable i, while NAND-RAM has integer-valued variables andcan use them for indexed access to arrays, of the form Foo[bar]. To im-plement indexed access in NAND-TM, we will encode integers in ourarrays using some prefix-free representation (see Section 2.5.2)), andthen have a procedure Setindex(Bar) that sets i to the value encodedby Bar. We can simulate the effect of Foo[Bar] using Setindex(Bar)followed by Foo[i].
Implementing Setindex(Bar) can be achieved as follows:
288 introduction to theoretical computer science
1. We initialize an array Arzero such that Atzero[0]= 1 andAtzero[đ]= 0 for all đ > 0. (This can be easily done in NAND-TMas all uninitialized variables default to zero.)
2. Set i to zero, by decrementing it until we reach the point whereAtzero[i]= 1.
3. Let Temp be an array encoding the number 0.4. We use GOTO to simulate an inner loop of of the form: while Temp â
Bar, increment Temp.5. At the end of the loop, i is equal to the value encoded by Bar.
In NAND-TM code (using some syntactic sugar), we can imple-ment the above operations as follows:# assume Atzero is an array such that Atzero[0]=1# and Atzero[j]=0 for all j>0
# set i to 0.LABEL("zero_idx")dir0 = zerodir1 = one# corresponds to i <- i-1GOTO("zero_idx",NOT(Atzero[i]))...# zero out temp#(code below assumes a specific prefix-free encoding in
which 10 is the "end marker")Temp[0] = 1Temp[1] = 0# set i to Bar, assume we know how to increment, compareLABEL("increment_temp")cond = EQUAL(Temp,Bar)dir0 = onedir1 = one# corresponds to i <- i+1INC(Temp)GOTO("increment_temp",cond)# if we reach this point, i is number encoded by Bar...# final instruction of programMODANDJUMP(dir0,dir1)
8.2.2 Two dimensional arrays in NAND-TMTo implement two dimensional arrays, we want to embed them in aone dimensional array. The idea is that we come up with a one to one
equivalent models of computation 289
Figure 8.5: Illustration of the map đđđđđ(đ„, đŠ) =12 (đ„ + đŠ)(đ„ + đŠ + 1) + đ„ for đ„, đŠ â [10], one cansee that for every distinct pairs (đ„, đŠ) and (đ„âČ, đŠâČ),đđđđđ(đ„, đŠ) â đđđđđ(đ„âČ, đŠâČ).
function đđđđđ ⶠâ Ă â â â, and so embed the location (đ, đ) of thetwo dimensional array Two in the location đđđđđ(đ, đ) of the array One.
Since the set â Ă â seems âmuch biggerâ than the set â, a priori itmight not be clear that such a one to one mapping exists. However,once you think about it more, it is not that hard to construct. For ex-ample, you could ask a child to use scissors and glue to transform a10â by 10â piece of paper into a 1â by 100â strip. This is essentiallya one to one map from [10] Ă [10] to [100]. We can generalize this toobtain a one to one map from [đ] Ă [đ] to [đ2] and more generally a oneto one map from â Ă â to â. Specifically, the following map đđđđđwould do (see Fig. 8.5):đđđđđ(đ„, đŠ) = 12 (đ„ + đŠ)(đ„ + đŠ + 1) + đ„ . (8.1)
Exercise 8.3 asks you to prove that đđđđđ is indeed one to one, aswell as computable by a NAND-TM program. (The latter can be doneby simply following the grade-school algorithms for multiplication,addition, and division.) This means that we can replace code of theform Two[Foo][Bar] = something (i.e., access the two dimensionalarray Two at the integers encoded by the one dimensional arrays Fooand Bar) by code of the form:
Blah = embed(Foo,Bar)Setindex(Blah)Two[i] = something
8.2.3 All the restOnce we have two dimensional arrays and indexed access, simulatingNAND-RAM with NAND-TM is just a matter of implementing thestandard algorithms for arithmetic operations and comparisons inNAND-TM. While this is cumbersome, it is not difficult, and the endresult is to show that every NAND-RAM program đ can be simulatedby an equivalent NAND-TM program đ, thus completing the proof ofTheorem 8.1.
RRemark 8.3 â Recursion in NAND-RAM (advanced). Oneconcept that appears in many programming languagesbut we did not include in NAND-RAM programs isrecursion. However, recursion (and function calls ingeneral) can be implemented in NAND-RAM usingthe stack data structure. A stack is a data structure con-taining a sequence of elements, where we can âpushâelements into it and âpopâ them from it in âfirst in lastoutâ order.We can implement a stack using an array of integersStack and a scalar variable stackpointer that will
290 introduction to theoretical computer science
Figure 8.6: A punched card corresponding to a Fortranstatement.
1 Some programming language have fixed (even ifextremely large) bounds on the amount of memorythey can access, which formally prevent them frombeing applicable to computing infinite functions andhence simulating Turing machines. We ignore suchissues in this discussion and assume access to somestorage device without a fixed upper bound on itscapacity.
be the number of items in the stack. We implementpush(foo) by
Stack[stackpointer]=foostackpointer += one
and implement bar = pop() by
bar = Stack[stackpointer]stackpointer -= one
We implement a function call to đč by pushing thearguments for đč into the stack. The code of đč willâpopâ the arguments from the stack, perform the com-putation (which might involve making recursive ornon recursive calls) and then âpushâ its return valueinto the stack. Because of the âfirst in last outâ na-ture of a stack, we do not return control to the callingprocedure until all the recursive calls are done.The fact that we can implement recursion using a non-recursive language is not surprising. Indeed, machinelanguages typically do not have recursion (or functioncalls in general), and hence a compiler implementsfunction calls using a stack and GOTO. You can findonline tutorials on how recursion is implementedvia stack in your favorite programming language,whether itâs Python , JavaScript, or Lisp/Scheme.
8.3 TURING EQUIVALENCE (DISCUSSION)
Any of the standard programming language such as C, Java, Python,Pascal, Fortran have very similar operations to NAND-RAM. (In-deed, ultimately they can all be executed by machines which have afixed number of registers and a large memory array.) Hence usingTheorem 8.1, we can simulate any program in such a programminglanguage by a NAND-TM program. In the other direction, it is a fairlyeasy programming exercise to write an interpreter for NAND-TM inany of the above programming languages. Hence we can also simulateNAND-TM programs (and so by Theorem 7.11, Turing machines) us-ing these programming languages. This property of being equivalentin power to Turing Machines / NAND-TM is called Turing Equivalent(or sometimes Turing Complete). Thus all programming languages weare familiar with are Turing equivalent.1
8.3.1 The âBest of both worldsâ paradigmThe equivalence between Turing Machines and RAM machines allowsus to choose the most convenient language for the task at hand:
âą When we want to prove a theorem about all programs/algorithms,we can use Turing machines (or NAND-TM) since they are sim-
equivalent models of computation 291
Figure 8.7: By having the two equivalent languagesNAND-TM and NAND-RAM, we can âhave our cakeand eat it tooâ, using NAND-TM when we want toprove that programs canât do something, and usingNAND-RAM or other high level languages when wewant to prove that programs can do something.
pler and easier to analyze. In particular, if we want to show thata certain function can not be computed, then we will use Turingmachines.
âą When we want to show that a function can be computed we can useRAM machines or NAND-RAM, because they are easier to pro-gram in and correspond more closely to high level programminglanguages we are used to. In fact, we will often describe NAND-RAM programs in an informal manner, trusting that the readercan fill in the details and translate the high level description to theprecise program. (This is just like the way people typically use in-formal or âpseudocodeâ descriptions of algorithms, trusting thattheir audience will know to translate these descriptions to code ifneeded.)
Our usage of Turing Machines / NAND-TM and RAM Machines/ NAND-RAM is very similar to the way people use in practice highand low level programming languages. When one wants to producea device that executes programs, it is convenient to do so for verysimple and âlow levelâ programming language. When one wants todescribe an algorithm, it is convenient to use as high level a formalismas possible.
Big Idea 10 Using equivalence results such as those betweenTuring and RAM machines, we can âhave our cake and eat it tooâ.We can use a simpler model such as Turing machines when we wantto prove something canât be done, and use a feature-rich model such asRAM machines when we want to prove something can be done.
8.3.2 Letâs talk about abstractionsâThe programmer is in the unique position that ⊠he has to be ableto think in terms of conceptual hierarchies that are much deeper thana single mind ever needed to face before.â, Edsger Dijkstra, âOn thecruelty of really teaching computing scienceâ, 1988.
At some point in any theory of computation course, the instructorand students need to have the talk. That is, we need to discuss the levelof abstraction in describing algorithms. In algorithms courses, onetypically describes algorithms in English, assuming readers can âfillin the detailsâ and would be able to convert such an algorithm into animplementation if needed. For example, Algorithm 8.4 is a high leveldescription of the breadth first search algorithm.
292 introduction to theoretical computer science
Algorithm 8.4 â Breadth First Search.
Input: Graph đș, vertices đą, đŁOutput: âconnectedâ when đą is connected to đŁ in đș, âdis-
connectedâ1: Initialize empty queue đ.2: Put đą in đ3: while đ is not empty do4: Remove top vertex đ€ from đ5: if đ€ = đŁ then return âconnectedâ6: end if7: Mark đ€8: Add all unmarked neighbors of đ€ to đ.9: end while
10: return âdisconnectedâ
If we wanted to give more details on how to implement breadthfirst search in a programming language such as Python or C (orNAND-RAM / NAND-TM for that matter), we would describe howwe implement the queue data structure using an array, and similarlyhow we would use arrays to mark vertices. We call such an âinterme-diate levelâ description an implementation level or pseudocode descrip-tion. Finally, if we want to describe the implementation precisely, wewould give the full code of the program (or another fully precise rep-resentation, such as in the form of a list of tuples). We call this a formalor low level description.
Figure 8.8: We can describe an algorithm at differentlevels of granularity/detail and precision. At thehighest level we just write the idea in words, omittingall details on representation and implementation.In the intermediate level (also known as implemen-tation or pseudocode) we give enough details of theimplementation that would allow someone to de-rive it, though we still fall short of providing the fullcode. The lowest level is where the actual code ormathematical description is fully spelled out. Thesedifferent levels of detail all have their uses, and mov-ing between them is one of the most important skillsfor a computer scientist.
While we started off by describing NAND-CIRC, NAND-TM, andNAND-RAM programs at the full formal level, as we progress in this
equivalent models of computation 293
book we will move to implementation and high level description.After all, our goal is not to use these models for actual computation,but rather to analyze the general phenomenon of computation. Thatsaid, if you donât understand how the high level description translatesto an actual implementation, going âdown to the metalâ is often anexcellent exercise. One of the most important skills for a computerscientist is the ability to move up and down hierarchies of abstractions.
A similar distinction applies to the notion of representation of objectsas strings. Sometimes, to be precise, we give a low level specificationof exactly how an object maps into a binary string. For example, wemight describe an encoding of đ vertex graphs as length đ2 binarystrings, by saying that we map a graph đș over the vertices [đ] to astring đ„ â 0, 1đ2 such that the đ â đ + đ-th coordinate of đ„ is 1 if andonly if the edge đ đ is present in đș. We can also use an intermediate orimplementation level description, by simply saying that we represent agraph using the adjacency matrix representation.
Finally, because we are translating between the various represen-tations of graphs (and objects in general) can be done via a NAND-RAM (and hence a NAND-TM) program, when talking in a high levelwe also suppress discussion of representation altogether. For example,the fact that graph connectivity is a computable function is true re-gardless of whether we represent graphs as adjacency lists, adjacencymatrices, list of edge-pairs, and so on and so forth. Hence, in caseswhere the precise representation doesnât make a difference, we wouldoften talk about our algorithms as taking as input an object đ (thatcan be a graph, a vector, a program, etc.) without specifying how đ isencoded as a string.
Defining âAlgorithmsâ. Up until now we have used the term âalgo-rithmâ informally. However, Turing Machines and the range of equiv-alent models yield a way to precisely and formally define algorithms.Hence whenever we refer to an algorithm in this book, we will meanthat it is an instance of one of the Turing equivalent models, such asTuring machines, NAND-TM, RAM machines, etc. Because of theequivalence of all these models, in many contexts, it will not matterwhich of these we use.
8.3.3 Turing completeness and equivalence, a formal definition (optional)A computational model is some way to define what it means for a pro-gram (which is represented by a string) to compute a (partial) func-tion. A computational model âł is Turing complete, if we can map ev-ery Turing machine (or equivalently NAND-TM program) đ into aprogram đ for âł that computes the same function as đ. It is Turingequivalent if the other direction holds as well (i.e., we can map everyprogram in âł to a Turing machine that computes the same function).
294 introduction to theoretical computer science
We can define this notion formally as follows. (This formal definitionis not crucial for the remainder of this book so feel to skip it as longas you understand the general concept of Turing equivalence; Thisnotion is sometimes referred to in the literature as Gödel numberingor admissible numbering.)
Definition 8.5 â Turing completeness and equivalence (optional). Let â± bethe set of all partial functions from 0, 1â to 0, 1â. A computa-tional model is a map Ⳡⶠ0, 1â â â±.
We say that a program đ â 0, 1â âł-computes a function đč â â±if âł(đ) = đč .
A computational model âł is Turing complete if there is a com-putable map ENCODEⳠⶠ0, 1â â 0, 1â for every Turingmachine đ (represented as a string), âł(ENCODEâł(đ)) is equalto the partial function computed by đ .
A computational model âł is Turing equivalent if it is Tur-ing complete and there exists a computable map DECODEâł â¶0, 1â â 0, 1â such that or every string đ â 0, 1â, đ =DECODEâł(đ ) is a string representation of a Turing machine thatcomputes the function âł(đ).Some examples of Turing equivalent models (some of which we
have already seen, and some are discussed below) include:
⹠Turing machines⹠NAND-TM programs⹠NAND-RAM programs⹠λ calculus⹠Game of life (mapping programs and inputs/outputs to starting
and ending configurations)âą Programming languages such as Python/C/Javascript/OCamlâŠ
(allowing for unbounded storage)
8.4 CELLULAR AUTOMATA
Many physical systems can be described as consisting of a large num-ber of elementary components that interact with one another. Oneway to model such systems is using cellular automata. This is a systemthat consists of a large (or even infinite) number of cells. Each cellonly has a constant number of possible states. At each time step, a cellupdates to a new state by applying some simple rule to the state ofitself and its neighbors.
A canonical example of a cellular automaton is Conwayâs Gameof Life. In this automata the cells are arranged in an infinite two di-mensional grid. Each cell has only two states: âdeadâ (which we can
equivalent models of computation 295
Figure 8.9: Rules for Conwayâs Game of Life. Imagefrom this blog post.
encode as 0 and identify with â ) or âaliveâ (which we can encodeas 1). The next state of a cell depends on its previous state and thestates of its 8 vertical, horizontal and diagonal neighbors (see Fig. 8.9).A dead cell becomes alive only if exactly three of its neighbors arealive. A live cell continues to live if it has two or three live neighbors.Even though the number of cells is potentially infinite, we can en-code the state using a finite-length string by only keeping track of thelive cells. If we initialize the system in a configuration with a finitenumber of live cells, then the number of live cells will stay finite in allfuture steps. The Wikipedia page for the Game of Life contains somebeautiful figures and animations of configurations that produce veryinteresting evolutions.
Figure 8.10: In a two dimensional cellular automatonevery cell is in position đ, đ for some integers đ, đ â â€.The state of a cell is some value đŽđ,đ â ÎŁ for somefinite alphabet ÎŁ. At a given time step, the state of thecell is adjusted according to some function applied tothe state of (đ, đ) and all its neighbors (đ ± 1, đ ± 1).In a one dimensional cellular automaton every cell is inposition đ â †and the state đŽđ of đ at the next timestep depends on its current state and the state of itstwo neighbors đ â 1 and đ + 1.
Since the cells in the game of life are are arranged in an infinite two-dimensional grid, it is an example of a two dimensional cellular automa-ton. We can also consider the even simpler setting of a one dimensionalcellular automaton, where the cells are arranged in an infinite line, seeFig. 8.10. It turns out that even this simple model is enough to achieve
296 introduction to theoretical computer science
Figure 8.11: A Game-of-Life configuration simulatinga Turing Machine. Figure by Paul Rendell.
Turing-completeness. We will now formally define one-dimensionalcellular automata and then prove their Turing completeness.
Definition 8.6 â One dimensional cellular automata. Let ÎŁ be a finite setcontaining the symbol â . A one dimensional cellular automation overalphabet ÎŁ is described by a transition rule đ ⶠΣ3 â ÎŁ, whichsatisfies đ(â ,â ,â ) = â .
A configuration of the automaton đ is a function đŽ ⶠ†â ÎŁ. Ifan automaton with rule đ is in configuration đŽ, then its next config-uration, denoted by đŽâČ = NEXTđ(đŽ), is the function đŽâČ such thatđŽâČ(đ) = đ(đŽ(đ â 1), đŽ(đ), đŽ(đ + 1)) for every đ â â€. In other words,the next state of the automaton đ at point đ obtained by applyingthe rule đ to the values of đŽ at đ and its two neighbors.
Finite configuration. We say that a configuration of an automaton đis finite if there is only some finite number of indices đ0, ⊠, đđâ1 in â€such that đŽ(đđ) â â . (That is, for every đ â đ0, ⊠, đđâ1, đŽ(đ) = â .)Such a configuration can be represented using a finite string thatencodes the indices đ0, ⊠, đđâ1 and the values đŽ(đ0), ⊠, đŽ(đđâ1). Sinceđ (â ,â ,â ) = â , if đŽ is a finite configuration then NEXTđ(đŽ) is finiteas well. We will only be interested in studying cellular automata thatare initialized in finite configurations, and hence remain in a finiteconfiguration throughout their evolution.
8.4.1 One dimensional cellular automata are Turing completeWe can write a program (for example using NAND-RAM) that sim-ulates the evolution of any cellular automaton from an initial finiteconfiguration by simply storing the values of the cells with state notequal to â and repeatedly applying the rule đ. Hence cellular au-tomata can be simulated by Turing Machines. What is more surprisingthat the other direction holds as well. For example, as simple as itsrules seem, we can simulate a Turing machine using the game of life(see Fig. 8.11).
In fact, even one dimensional cellular automata can be Turing com-plete:
Theorem 8.7 â One dimensional automata are Turing complete. For everyTuring machine đ , there is a one dimensional cellular automatonthat can simulate đ on every input đ„.To make the notion of âsimulating a Turing machineâ more precise
we will need to define configurations of Turing machines. We willdo so in Section 8.4.2 below, but at a high level a configuration of aTuring machine is a string that encodes its full state at a given step in
equivalent models of computation 297
its computation. That is, the contents of all (non empty) cells of itstape, its current state, as well as the head position.
The key idea in the proof of Theorem 8.7 is that at every point inthe computation of a Turing machine đ , the only cell in đ âs tape thatcan change is the one where the head is located, and the value thiscell changes to is a function of its current state and the finite state ofđ . This observation allows us to encode the configuration of a Turingmachine đ as a finite configuration of a cellular automaton đ, andensure that a one-step evolution of this encoded configuration underthe rules of đ corresponds to one step in the execution of the Turingmachine đ .
8.4.2 Configurations of Turing machines and the next-step functionTo turn the above ideas into a rigorous proof (and even statement!)of Theorem 8.7 we will need to precisely define the notion of config-urations of Turing machines. This notion will be useful for us in laterchapters as well.
Figure 8.12: A configuration of a Turing machine đwith alphabet ÎŁ and state space [đ] encodes the stateof đ at a particular step in its execution as a string đŒover the alphabet ÎŁ = ÎŁ Ă (â Ă [đ]). The string isof length đĄ where đĄ is such that đâs tape contains â inall positions đĄ and larger and đâs head is in a positionsmaller than đĄ. If đâs head is in the đ-th position, thenfor đ â đ, đŒđ encodes the value of the đ-th cell of đâstape, while đŒđ encodes both this value as well as thecurrent state of đ . If the machine writes the value đ ,changes state to đĄ, and moves right, then in the nextconfiguration will contain at position đ the value (đ, â )and at position đ + 1 the value (đŒđ+1, đĄ).
Definition 8.8 â Configuration of Turing Machines.. Let đ be a Turing ma-chine with tape alphabet ÎŁ and state space [đ]. A configuration of đis a string đŒ â ÎŁâ where ÎŁ = ÎŁ Ă (â âȘ [đ]) that satisfies that thereis exactly one coordinate đ for which đŒđ = (đ, đ ) for some đ â ÎŁ andđ â [đ]. For all other coordinates đ, đŒđ = (đâČ, â ) for some đâČ â ÎŁ.
A configuration đŒ â ÎŁâ of đ corresponds to the following stateof its execution:
âą đ âs tape contains đŒđ,0 for all đ < |đŒ| and contains â for all po-sitions that are at least |đŒ|, where we let đŒđ,0 be the value đ suchthat đŒđ = (đ, đĄ) with đ â ÎŁ and đĄ â â âȘ [đ]. (In other words,
298 introduction to theoretical computer science
since đŒđ is a pair of an alphabet symbol đ and either a state in [đ]or the symbol â , đŒđ,0 is the first component đ of this pair.)
âą đ âs head is in the unique position đ for which đŒđ has the form(đ, đ ) for đ â [đ], and đ âs state is equal to đ .P
Definition 8.8 below has some technical details, butis not actually that deep or complicated. Try to take amoment to stop and think how you would encode as astring the state of a Turing machine at a given point inan execution.Think what are all the components that you need toknow in order to be able to continue the executionfrom this point onwards, and what is a simple wayto encode them using a list of finite symbols. In par-ticular, with an eye towards our future applications,try to think of an encoding which will make it as sim-ple as possible to map a configuration at step đĄ to theconfiguration at step đĄ + 1.
Definition 8.8 is a little cumbersome, but ultimately a configurationis simply a string that encodes a snapshot of the Turing machine at agiven point in the execution. (In operating-systems lingo, it is a âcoredumpâ.) Such a snapshot needs to encode the following components:
1. The current head position.
2. The full contents of the large scale memory, that is the tape.
3. The contents of the âlocal registersâ, that is the state of the ma-chine.
The precise details of how we encode a configuration are not impor-tant, but we do want to record the following simple fact:Lemma 8.9 Let đ be a Turing machine and let NEXTđ ⶠΣâ â ÎŁâbe the function that maps a configuration of đ to the configurationat the next step of the execution. Then for every đ â â, the value ofNEXTđ(đŒ)đ only depends on the coordinates đŒđâ1, đŒđ, đŒđ+1.
(For simplicity of notation, above we use the convention that if đis âout of boundsâ, such as đ < 0 or đ > |đŒ|, then we assume thatđŒđ = (â , â ).) We leave proving Lemma 8.9 as Exercise 8.7. The ideabehind the proof is simple: if the head is neither in position đ norpositions đ â 1 and đ + 1, then the next-step configuration at đ will bethe same as it was before. Otherwise, we can âread offâ the state of theTuring machine and the value of the tape at the head location from the
equivalent models of computation 299
Figure 8.13: Evolution of a one dimensional automata.Each row in the figure corresponds to the configura-tion. The initial configuration corresponds to the toprow and contains only a single âliveâ cell. This figurecorresponds to the âRule 110â automaton of StephenWolfram which is Turing Complete. Figure takenfrom Wolfram MathWorld.
configuration at đ or one of its neighbors and use that to update whatthe new state at đ should be. Completing the full proof is not hard,but doing it is a great way to ensure that you are comfortable with thedefinition of configurations.
Completing the proof of Theorem 8.7. We can now restate Theorem 8.7more formally, and complete its proof:
Theorem 8.10 â One dimensional automata are Turing complete (formal state-
ment). For every Turing Machine đ , if we denote by ÎŁ the alphabetof its configuration strings, then there is a one-dimensional cellularautomaton đ over the alphabet ÎŁâ such that(NEXTđ(đŒ)) = NEXTđ (đŒ) (8.2)
for every configuration đŒ â ÎŁâ of đ (again using the conventionthat we consider đŒđ = â if đ is âout of bounds).
Proof. We consider the element (â , â ) of ÎŁ to correspond to the â element of the automaton đ. In this case, by Lemma 8.9, the functionNEXTđ that maps a configuration of đ into the next one is in fact avalid rule for a one dimensional automata.
The automaton arising from the proof of Theorem 8.10 has a largealphabet, and furthermore one whose size that depends on the ma-chine đ that is being simulated. It turns out that one can obtain anautomaton with an alphabet of fixed size that is independent of theprogram being simulated, and in fact the alphabet of the automatoncan be the minimal set 0, 1! See Fig. 8.13 for an example of such anTuring-complete automaton.
RRemark 8.11 â Configurations of NAND-TM programs.We can use the same approach as Definition 8.8 todefine configurations of a NAND-TM program. Such aconfiguration will need to encode:
1. The current value of the variable i.2. For every scalar variable foo, the value of foo.3. For every array variable Bar, the value Bar[đ] for
every đ â 0, ⊠, đĄ â 1 where đĄ â 1 is the largestvalue that the index variable i ever achieved in thecomputation.
300 introduction to theoretical computer science
8.5 LAMBDA CALCULUS AND FUNCTIONAL PROGRAMMING LAN-GUAGES
The λ calculus is another way to define computable functions. It wasproposed by Alonzo Church in the 1930âs around the same time asAlan Turingâs proposal of the Turing Machine. Interestingly, whileTuring Machines are not used for practical computation, the λ calculushas inspired functional programming languages such as LISP, ML andHaskell, and indirectly the development of many other programminglanguages as well. In this section we will present the λ calculus andshow that its power is equivalent to NAND-TM programs (and hencealso to Turing machines). Our Github repository contains a Jupyternotebook with a Python implementation of the λ calculus that you canexperiment with to get a better feel for this topic.
The λ operator. At the core of the λ calculus is a way to define âanony-mousâ functions. For example, instead of giving a name đ to a func-tion and defining it as đ(đ„) = đ„ Ă đ„ (8.3)
we can write it as đđ„.đ„ Ă đ„ (8.4)and so (đđ„.đ„ Ă đ„)(7) = 49. That is, you can think of đđ„.đđ„đ(đ„),
where đđ„đ is some expression as a way of specifying the anonymousfunction đ„ ⊠đđ„đ(đ„). Anonymous functions, using either đđ„.đ(đ„), đ„ âŠđ(đ„) or other closely related notation, appear in many programminglanguages. For example, in Python we can define the squaring functionusing lambda x: x*x while in JavaScript we can use x => x*x or(x) => x*x. In Scheme we would define it as (lambda (x) (* x x)).Clearly, the name of the argument to a function doesnât matter, and sođđŠ.đŠ Ă đŠ is the same as đđ„.đ„ Ă đ„, as both correspond to the squaringfunction.
Dropping parenthesis. To reduce notational clutter, when writingđ calculus expressions we often drop the parentheses for functionevaluation. Hence instead of writing đ(đ„) for the result of applyingthe function đ to the input đ„, we can also write this as simply đ đ„.Therefore we can write (đđ„.đ„ Ă đ„)7 = 49. In this chapter, we will useboth the đ(đ„) and đ đ„ notations for function application. Functionevaluations are associative and bind from left to right, and hence đ đ âis the same as (đđ)â.8.5.1 Applying functions to functionsA key feature of the λ calculus is that functions are âfirst-class objectsâin the sense that we can use functions as arguments to other functions.
equivalent models of computation 301
For example, can you guess what number is the following expressionequal to? (((đđ.(đđŠ.(đ (đ đŠ))))(đđ„.đ„ Ă đ„)) 3) (8.5)
PThe expression (8.5) might seem daunting, but beforeyou look at the solution below, try to break it apartto its components, and evaluate each component at atime. Working out this example would go a long waytoward understanding the λ calculus.
Letâs evaluate (8.5) one step at a time. As nice as it is for the λcalculus to allow anonymous functions, adding names can be veryhelpful for understanding complicated expressions. So, let us writeđč = đđ.(đđŠ.(đ(đđŠ))) and đ = đđ„.đ„ Ă đ„.
Therefore (8.5) becomes ((đč đ) 3) . (8.6)
On input a function đ , đč outputs the function đđŠ.(đ(đ đŠ)), or inother words đčđ is the function đŠ ⊠đ(đ(đŠ)). Our function đ is simplyđ(đ„) = đ„2 and so (đčđ) is the function that maps đŠ to (đŠ2)2 = đŠ4. Hence((đčđ)3) = 34 = 81.Solved Exercise 8.1 What number does the following expression equalto? ((đđ„.(đđŠ.đ„)) 2) 9) . (8.7)
Solution:đđŠ.đ„ is the function that on input đŠ ignores its input and outputsđ„. Hence (đđ„.(đđŠ.đ„))2 yields the function đŠ ⊠2 (or, using đ nota-tion, the function đđŠ.2). Hence (8.7) is equivalent to (đđŠ.2)9 = 2.
8.5.2 Obtaining multi-argument functions via CurryingIn a λ expression of the form đđ„.đ, the expression đ can itself involvethe λ operator. Thus for example the functionđđ„.(đđŠ.đ„ + đŠ) (8.8)
maps đ„ to the function đŠ ⊠đ„ + đŠ.In particular, if we invoke the function (8.8) on đ to obtain some
function đ , and then invoke đ on đ, we obtain the value đ + đ. We
302 introduction to theoretical computer science
Figure 8.14: In the âcurryingâ transformation, we cancreate the effect of a two parameter function đ(đ„, đŠ)with the λ expression đđ„.(đđŠ.đ(đ„, đŠ)) which on inputđ„ outputs a one-parameter function đđ„ that has đ„âhardwiredâ into it and such that đđ„(đŠ) = đ(đ„, đŠ).This can be illustrated by a circuit diagram; seeChelsea Vossâs site.
can see that the one-argument function (8.8) corresponding to đ âŠ(đ ⊠đ + đ) can also be thought of as the two-argument function(đ, đ) ⊠đ + đ. Generally, we can use the λ expression đđ„.(đđŠ.đ(đ„, đŠ))to simulate the effect of a two argument function (đ„, đŠ) ⊠đ(đ„, đŠ). Thistechnique is known as Currying. We will use the shorthand đđ„, đŠ.đfor đđ„.(đđŠ.đ). If đ = đđ„.(đđŠ.đ) then (đđ)đ corresponds to applying đđand then invoking the resulting function on đ, obtaining the result ofreplacing in đ the occurrences of đ„ with đ and occurrences of đ withđŠ. By our rules of associativity, this is the same as (đđđ) which weâllsometimes also write as đ(đ, đ).8.5.3 Formal description of the λ calculusWe now provide a formal description of the λ calculus. We start withâbasic expressionsâ that contain a single variable such as đ„ or đŠ andbuild more complex expressions of the form (đ đâČ) and đđ„.đ where đ, đâČare expressions and đ„ is a variable idenifier. Formally λ expressionsare defined as follows:
Definition 8.12 â λ expression.. A λ expression is either a single variableidentifier or an expression đ of the one of the following forms:
âą Application: đ = (đâČ đâł), where đâČ and đâł are λ expressions.
âą Abstraction: đ = đđ„.(đâČ) where đâČ is a λ expression.
Definition 8.12 is a recursive definition since we defined the conceptof λ expressions in terms of itself. This might seem confusing at first,but in fact you have known recursive definitions since you were anelementary school student. Consider how we define an arithmeticexpression: it is an expression that is either just a number, or has one ofthe forms (đ + đâČ), (đ â đâČ), (đ Ă đâČ), or (đ Ă· đâČ), where đ and đâČ are otherarithmetic expressions.
Free and bound variables. Variables in a λ expression can either befree or bound to a đ operator (in the sense of Section 1.4.7). In a single-variable λ expression đŁđđ, the variable đŁđđ is free. The set of free andbound variables in an application expression đ = (đâČ đâł) is the sameas that of the underlying expressions đâČ and đâł. In an abstraction ex-pression đ = đđŁđđ.(đâČ), all free occurences of đŁđđ in đâČ are bound tothe đ operator of đ. If you find the notion of free and bound variablesconfusing, you can avoid all these issues by using unique identifiersfor all variables.
Precedence and parenthesis. We will use the following rules to allowus to drop some parenthesis. Function application associates fromleft to right, and so đđâ is the same as (đđ)â. Function application hasa higher precedence than the λ operator, and so đđ„.đđđ„ is the same
equivalent models of computation 303
as đđ„.((đđ)đ„). This is similar to how we use the precedence rules inarithmetic operations to allow us to use fewer parenthesis and so writethe expression (7Ă3)+2 as 7Ă3+2. As mentioned in Section 8.5.2, wealso use the shorthand đđ„, đŠ.đ for đđ„.(đđŠ.đ) and the shorthand đ(đ„, đŠ)for (đ đ„) đŠ. This plays nicely with the âCurryingâ transformation ofsimulating multi-input functions using λ expressions.
Equivalence of λ expressions. As we have seen in Solved Exercise 8.1,the rule that (đđ„.đđ„đ)đđ„đâČ is equivalent to đđ„đ[đ„ â đđ„đâČ] enables usto modify λ expressions and obtain simpler equivalent form for them.Another rule that we can use is that the parameter does not matterand hence for example đđŠ.đŠ is the same as đđ§.đ§. Together these rulesdefine the notion of equivalence of λ expressions:
Definition 8.13 â Equivalence of λ expressions. Two λ expressions areequivalent if they can be made into the same expression by repeatedapplications of the following rules:
1. Evaluation (aka đœ reduction): The expression (đđ„.đđ„đ)đđ„đâČ isequivalent to đđ„đ[đ„ â đđ„đâČ].
2. Variable renaming (aka đŒ conversion): The expression đđ„.đđ„đis equivalent to đđŠ.đđ„đ[đ„ â đŠ].
If đđ„đ is a λ expression of the form đđ„.đđ„đâČ then it naturally corre-sponds to the function that maps any input đ§ to đđ„đâČ[đ„ â đ§]. Hencethe λ calculus naturally implies a computational model. Since in the λcalculus the inputs can themselves be functions, we need to decide inwhat order we evaluate an expression such as(đđ„.đ)(đđŠ.đđ§) . (8.9)There are two natural conventions for this:
âą Call by name (aka âlazy evaluationâ): We evaluate (8.9) by first plug-ging in the righthand expression (đđŠ.đđ§) as input to the lefthandside function, obtaining đ[đ„ â (đđŠ.đđ§)] and then continue fromthere.
âą Call by value (aka âeager evaluationâ): We evaluate (8.9) by firstevaluating the righthand side and obtaining â = đ[đŠ â đ§], and thenplugging this into the lefthandside to obtain đ[đ„ â â].Because the λ calculus has only pure functions, that do not have
âside effectsâ, in many cases the order does not matter. In fact, it canbe shown that if we obtain a definite irreducible expression (for ex-ample, a number) in both strategies, then it will be the same one.
304 introduction to theoretical computer science
However, for concreteness we will always use the âcall by nameâ (i.e.,lazy evaluation) order. (The same choice is made in the programminglanguage Haskell, though many other programming languages useeager evaluation.) Formally, the evaluation of a λ expression usingâcall by nameâ is captured by the following process:
Definition 8.14 â Simplification of λ expressions. Let đ be a λ expres-sion. The simplification of đ is the result of the following recursiveprocess:
1. If đ is a single variable đ„ then the simplification of đ is đ.2. If đ has the form đ = đđ„.đâČ then the simplification of đ is đđ„.đ âČ
where đ âČ is the simplification of đâČ.3. (Evaluation / đœ reduction.) If đ has the form đ = (đđ„.đâČ đâł) then
the simplification of đ is the simplification of đâČ[đ„ â đâł], whichdenotes replacing all copies of đ„ in đâČ bound to the đ operatorwith đâł
4. (Renaming / đŒ conversion.) The canonical simplification of đ isobtained by taking the simplification of đ and renaming the vari-ables so that the first bound variable in the expression is đŁ0, thesecond one is đŁ1, and so on and so forth.
We say that two λ expressions đ and đâČ are equivalent, denoted byđ â đâČ, if they have the same canonical simplification.
Solved Exercise 8.2 â Equivalence of λ expressions. Prove that the followingtwo expressions đ and đ are equivalent:
đ = đđ„.đ„ (8.10)
đ = (đđ.(đđ.đ))(đđ§.đ§đ§) (8.11)
Solution:
The canonical simplification of đ is simply đđŁ0.đŁ0. To do thecanonical simplification of đ we first use đœ reduction to plug inđđ§.đ§đ§ instead of đ in (đđ.đ) but since đ is not used in this function atall, we simply obtained đđ.đ which simplifies to đđŁ0.đŁ0 as well.
equivalent models of computation 305
2 In Lisp, the PAIR, HEAD and TAIL functions aretraditionally called cons, car and cdr.
8.5.4 Infinite loops in the λ calculusLike Turing machines and NAND-TM programs, the simplificationprocess in the λ calculus can also enter into an infinite loop. For exam-ple, consider the λ expressionđđ„.đ„đ„ đđ„.đ„đ„ (8.12)
If we try to simplify (8.12) by invoking the lefthand function onthe righthand one, then we get another copy of (8.12) and hence thisnever ends. There are examples where the order of evaluation canmatter for whether or not an expression can be simplified, see Exer-cise 8.9.
8.6 THE âENHANCEDâ Î CALCULUS
We now discuss the λ calculus as a computational model. We willstart by describing an âenhancedâ version of the λ calculus that con-tains some âsuperfluous featuresâ but is easier to wrap your headaround. We will first show how the enhanced λ calculus is equiva-lent to Turing machines in computational power. Then we will showhow all the features of âenhanced λ calculusâ can be implemented asâsyntactic sugarâ on top of the âpureâ (i.e., non enhanced) λ calculus.Hence the pure λ calculus is equivalent in power to Turing machines(and hence also to RAM machines and all other Turing-equivalentmodels).
The enhanced λ calculus includes the following set of objects andoperations:
âą Boolean constants and IF function: There are λ expressions 0, 1and IF that satisfy the following conditions: for every λ expressionđ and đ , IF 1 đ đ = đ and IF 0 đ đ = đ . That is, IF is the function thatgiven three arguments đ, đ, đ outputs đ if đ = 1 and đ if đ = 0.
âą Pairs: There is a λ expression PAIR which we will think of as thepairing function. For every λ expressions đ, đ , PAIR đ đ is thepair âšđ, đâ© that contains đ as its first member and đ as its secondmember. We also have λ expressions HEAD and TAIL that extractthe first and second member of a pair respectively. Hence, for everyλ expressions đ, đ , HEAD (PAIR đ đ) = đ and TAIL (PAIR đ đ) = đ .2
âą Lists and strings: There is λ expression NIL that corresponds tothe empty list, which we also denote by âšâ„â©. Using PAIR and NILwe construct lists. The idea is that if đż is a đ element list of theform âšđ1, đ2, ⊠, đđ, â„â© then for every λ expression đ0 we can obtainthe đ + 1 element list âšđ0, đ1, đ2, ⊠, đđ, â„â© using the expressionPAIR đ0 đż. For example, for every three λ expressions đ, đ, đ, the
306 introduction to theoretical computer science
following corresponds to the three element list âšđ, đ, đ, â„â©:PAIR đ (PAIR đ (PAIR đ NIL)) . (8.13)
The λ expression ISEMPTY returns 1 on NIL and returns 0 on everyother list. A string is simply a list of bits.
âą List operations: The enhanced λ calculus also contains thelist-processing functions MAP, REDUCE, and FILTER. Givena list đż = âšđ„0, ⊠, đ„đâ1, â„â© and a function đ , MAP đż đ ap-plies đ on every member of the list to obtain the new listđżâČ = âšđ(đ„0), ⊠, đ(đ„đâ1), â„â©. Given a list đż as above and anexpression đ whose output is either 0 or 1, FILTER đż đ returns thelist âšđ„đâ©đđ„đ=1 containing all the elements of đż for which đ outputs1. The function REDUCE applies a âcombiningâ operation to alist. For example, REDUCE đż + 0 will return the sum of all theelements in the list đż. More generally, REDUCE takes a list đż, anoperation đ (which we think of as taking two arguments) and a λexpression đ§ (which we think of as the âneutral elementâ for theoperation đ , such as 0 for addition and 1 for multiplication). Theoutput is defined via
REDUCE đż đ đ§ = â§âšâ©đ§ đż = NILđ (HEAD đż) (REDUCE (TAIL đż) đ đ§) otherwise.
(8.14)See Fig. 8.16 for an illustration of the three list-processing operations.
âą Recursion: Finally, we want to be able to execute recursive func-tions. Since in λ calculus functions are anonymous, we canât writea definition of the form đ(đ„) = đđđâ where đđđâ includes calls tođ . Instead we use functions đ that take an additional input đđ as aparameter. The operator RECURSE will take such a function đ asinput and return a ârecursive versionâ of đ where all the calls to đđare replaced by recursive calls to this function. That is, if we have afunction đč taking two parameters đđ and đ„, then RECURSE đč willbe the function đ taking one parameter đ„ such that đ(đ„) = đč(đ, đ„)for every đ„.
Solved Exercise 8.3 â Compute NAND using λ calculus. Give a λ expressionđ such that đ đ„ đŠ = NAND(đ„, đŠ) for every đ„, đŠ â 0, 1.
equivalent models of computation 307
Solution:
The NAND of đ„, đŠ is equal to 1 unless đ„ = đŠ = 1. Hence we canwrite đ = đđ„, đŠ.IF(đ„, IF(đŠ, 0, 1), 1) (8.15)
Solved Exercise 8.4 â Compute XOR using λ calculus. Give a λ expressionXOR such that for every list đż = âšđ„0, ⊠, đ„đâ1, â„â© where đ„đ â 0, 1 forđ â [đ], XORđż evaluates to â đ„đ mod 2.
Solution:
First, we note that we can compute XOR of two bits as follows:
NOT = đđ.IF(đ, 0, 1) (8.16)
andXOR2 = đđ, đ.IF(đ,NOT(đ), đ) (8.17)
(We are using here a bit of syntactic sugar to describe the func-tions. To obtain the λ expression for XOR we will simply replacethe expression (8.16) in (8.17).) Now recursively we can define theXOR of a list as follows:
XOR(đż) = â§âšâ©0 đż is emptyXOR2(HEAD(đż),XOR(TAIL(đż))) otherwise
(8.18)This means that XOR is equal to
RECURSE (đđđ, đż.IF(ISEMPTY(đż), 0,XOR2(HEAD đż , đđ(TAIL đż)))) .(8.19)
That is, XOR is obtained by applying the RECURSE operatorto the function that on inputs đđ, đż, returns 0 if ISEMPTY(đż) andotherwise returns XOR2 applied to HEAD(đż) and to đđ(TAIL(đż)).
We could have also computed XOR using the REDUCE opera-tion, we leave working this out as an exercise to the reader.
8.6.1 Computing a function in the enhanced λ calculusAn enhanced λ expression is obtained by composing the objects abovewith the application and abstraction rules. The result of simplifying a λ
308 introduction to theoretical computer science
Figure 8.15: A list âšđ„0, đ„1, đ„2â© in the λ calculus is con-structed from the tail up, building the pair âšđ„2,NILâ©,then the pair âšđ„1, âšđ„2,NILâ©â© and finally the pairâšđ„0, âšđ„1, âšđ„2,NILâ©â©â©. That is, a list is a pair wherethe first element of the pair is the first element of thelist and the second element is the rest of the list. Thefigure on the left renders this âpairs inside pairsâconstruction, though it is often easier to think of a listas a âchainâ, as in the figure on the right, where thesecond element of each pair is thought of as a link,pointer or reference to the remainder of the list.
Figure 8.16: Illustration of the MAP, FILTER andREDUCE operations.
expression is an equivalent expression, and hence if two expressionshave the same simplification then they are equivalent.
Definition 8.15 â Computing a function via λ calculus. Let đč ⶠ0, 1â â0, 1âWe say that đđ„đ computes đč if for every đ„ â 0, 1â,đđ„đâšđ„0, ⊠, đ„đâ1, â„â© â âšđŠ0, ⊠, đŠđâ1, â„â© (8.20)where đ = |đ„|, đŠ = đč(đ„), and đ = |đŠ|, and the notion of equiva-
lence is defined as per Definition 8.14.
8.6.2 Enhanced λ calculus is Turing-completeThe basic operations of the enhanced λ calculus more or less amountto the Lisp or Scheme programming languages. Given that, it is per-haps not surprising that the enhanced λ-calculus is equivalent toTuring machines:
Theorem 8.16 â Lambda calculus and NAND-TM. For every functionđč ⶠ0, 1â â 0, 1â, đč is computable in the enhanced λ calculus ifand only if it is computable by a Turing machine.
equivalent models of computation 309
Proof Idea:
To prove the theorem, we need to show that (1) if đč is computableby a λ calculus expression then it is computable by a Turing machine,and (2) if đč is computable by a Turing machine, then it is computableby an enhanced λ calculus expression.
Showing (1) is fairly straightforward. Applying the simplificationrules to a λ expression basically amounts to âsearch and replaceâwhich we can implement easily in, say, NAND-RAM, or for thatmatter Python (both of which are equivalent to Turing machines inpower). Showing (2) essentially amounts to simulating a Turing ma-chine (or writing a NAND-TM interpreter) in a functional program-ming language such as LISP or Scheme. We give the details below buthow this can be done is a good exercise in mastering some functionalprogramming techniques that are useful in their own right.âProof of Theorem 8.16. We only sketch the proof. The âifâ directionis simple. As mentioned above, evaluating λ expressions basicallyamounts to âsearch and replaceâ. It is also a fairly straightforwardprogramming exercise to implement all the above basic operations inan imperative language such as Python or C, and using the same ideaswe can do so in NAND-RAM as well, which we can then transform toa NAND-TM program.
For the âonly ifâ direction we need to simulate a Turing machineusing a λ expression. We will do so by first showing that showingfor every Turing machine đ a λ expression to compute the next-stepfunction NEXTđ ⶠΣâ â ÎŁâ that maps a configuration of đ to the nextone (see Section 8.4.2).
A configuration of đ is a string đŒ â ÎŁâ for a finite set ÎŁ. We canencode every symbol đ â ÎŁ by a finite string 0, 1â, and so we willencode a configuration đŒ in the λ calculus as a list âšđŒ0, đŒ1, ⊠, đŒđâ1, â„â©where đŒđ is an â-length string (i.e., an â-length list of 0âs and 1âs) en-coding a symbol in ÎŁ.
By Lemma 8.9, for every đŒ â ÎŁâ, NEXTđ(đŒ)đ is equal tođ(đŒđâ1, đŒđ, đŒđ+1) for some finite function đ ⶠΣ3 â ÎŁ. Using ourencoding of ÎŁ as 0, 1â, we can also think of đ as mapping 0, 13â to0, 1â. By Solved Exercise 8.3, we can compute the NAND function,and hence every finite function, including đ, using the λ calculus.Using this insight, we can compute NEXTđ using the λ calculus asfollows. Given a list đż encoding the configuration đŒ0 ⯠đŒđâ1, wedefine the lists đżđđđđŁ and đżđđđ„đĄ encoding the configuration đŒ shiftedby one step to the right and left respectively. The next configurationđŒâČ is defined as đŒâČđ = đ(đżđđđđŁ[đ], đż[đ], đżđđđ„đĄ[đ]) where we let đżâČ[đ] denote
310 introduction to theoretical computer science
the đ-th element of đżâČ. This can be computed by recursion (and henceusing the enhanced λ calculusâ RECURSE operator) as follows:
Algorithm 8.17 â đđžđđđ using the λ calculus.
Input: List đż = âšđŒ0, đŒ1, ⊠, đŒđâ1, â„â© encoding a configura-tion đŒ.
Output: List đżâČ encoding đđžđđđ(đŒ)1: procedure ComputeNext(đżđđđđŁ, đż, đżđđđ„đĄ)2: if thenđŒđđžđđđ đ đżđđđđŁ3: return đđŒđż4: end if5: đ â đ»đžđŽđ· đżđđđđŁ6: if thenđŒđđžđđđ đ đż7: đ â â # Encoding of â in 0, 1â8: else9: đ â đ»đžđŽđ· đż
10: end if11: if thenđŒđđžđđđ đ đżđđđ„đĄ12: đ â â 13: else14: đ â đ»đžđŽđ· đżđđđ„đĄ15: end if16: return đđŽđŒđ đ(đ, đ, đ) ComputeNext(đ đŽđŒđż đżđđđđŁ , đ đŽđŒđż đż , đ đŽđŒđż đżđđđ„đĄ)17: end procedure18: đżđđđđŁ â đđŽđŒđ â đż # đżđđđđŁ = âšâ , đŒ0, ⊠, đŒđâ1, â„â©19: đżđđđ„đĄ â đ đŽđŒđż đż # đżđđđ„đĄ = âšđŒ1, ⊠, đŒđâ1, â„20: return ComputeNext(đżđđđđŁ, đż, đżđđđ„đĄ)
Once we can compute NEXTđ , we can simulate the execution ofđ on input đ„ using the following recursion. Define FINAL(đŒ) to bethe final configuration of đ when initialized at configuration đŒ. Thefunction FINAL can be defined recursively as follows:
FINAL(đŒ) = â§âšâ©đŒ đŒ is halting configurationNEXTđ(đŒ) otherwise
. (8.21)
Checking whether a configuration is halting (i.e., whether it isone in which the transition function would output Halt) can be easilyimplemented in the đ calculus, and hence we can use the RECURSEto compute FINAL. If we let đŒ0 be the initial configuration of đ oninput đ„ then we can obtain the output đ(đ„) from FINAL(đŒ0), hencecompleting the proof.
equivalent models of computation 311
8.7 FROM ENHANCED TO PURE Î CALCULUS
While the collection of âbasicâ functions we allowed for the enhancedλ calculus is smaller than whatâs provided by most Lisp dialects, com-ing from NAND-TM it still seems a little âbloatedâ. Can we make dowith less? In other words, can we find a subset of these basic opera-tions that can implement the rest?
It turns out that there is in fact a proper subset of the operations ofthe enhanced λ calculus that can be used to implement the rest. Thatsubset is the empty set. That is, we can implement all the operationsabove using the λ formalism only, even without using 0âs and 1âs. Itâsλâs all the way down!
PThis is a good point to pause and think howyou would implement these operations your-self. For example, start by thinking how youcould implement MAP using REDUCE, andthen REDUCE using RECURSE combined with0, 1, IF,PAIR,HEAD,TAIL,NIL, ISEMPTY. You canalso PAIR, HEAD and TAIL based on 0, 1, IF. The mostchallenging part is to implement RECURSE using onlythe operations of the pure λ calculus.
Theorem 8.18 â Enhanced λ calculus equivalent to pure λ calculus.. Thereare λ expressions that implement the functions 0,1,IF,PAIR, HEAD,TAIL, NIL, ISEMPTY, MAP, REDUCE, and RECURSE.
The idea behind Theorem 8.18 is that we encode 0 and 1 them-selves as λ expressions, and build things up from there. This is knownas Church encoding, as it was originated by Church in his effort toshow that the λ calculus can be a basis for all computation. We willnot write the full formal proof of Theorem 8.18 but outline the ideasinvolved in it:
âą We define 0 to be the function that on two inputs đ„, đŠ outputs đŠ,and 1 to be the function that on two inputs đ„, đŠ outputs đ„. We useCurrying to achieve the effect of two-input functions and hence0 = đđ„.đđŠ.đŠ and 1 = đđ„.đđŠ.đ„. (This representation scheme is thecommon convention for representing false and true but there aremany other alternative representations for 0 and 1 that would haveworked just as well.)
âą The above implementation makes the IF function trivial:IF(đđđđ, đ, đ) is simply đđđđ đ đ since 0đđ = đ and 1đđ = đ. Wecan write IF = đđ„.đ„ to achieve IF(đđđđ, đ, đ) = (((IFđđđđ)đ)đ) =đđđđ đ đ.
312 introduction to theoretical computer science
âą To encode a pair (đ„, đŠ) we will produce a function đđ„,đŠ that has đ„and đŠ âin its bellyâ and satisfies đđ„,đŠđ = đđ„đŠ for every function đ.That is, PAIR = đđ„, đŠ. (đđ.đđ„đŠ). We can extract the first element ofa pair đ by writing đ1 and the second element by writing đ0, and soHEAD = đđ.đ1 and TAIL = đđ.đ0.
âą We define NIL to be the function that ignores its input and alwaysoutputs 1. That is, NIL = đđ„.1. The ISEMPTY function checks,given an input đ, whether we get 1 if we apply đ to the functionđ§đđđ = đđ„, đŠ.0 that ignores both its inputs and always outputs 0. Forevery valid pair of the form đ = PAIRđ„đŠ, đđ§đđđ = đđ„đŠ = 0 whileNILđ§đđđ = 1. Formally, ISEMPTY = đđ.đ(đđ„, đŠ.0).
RRemark 8.19 â Church numerals (optional). There isnothing special about Boolean values. You can usesimilar tricks to implement natural numbers usingλ terms. The standard way to do so is to representthe number đ by the function ITERđ that on input afunction đ outputs the function đ„ ⊠đ(đ(⯠đ(đ„))) (đtimes). That is, we represent the natural number 1 asđđ.đ , the number 2 as đđ.(đđ„.đ(đđ„)), the number 3 asđđ.(đđ„.đ(đ(đđ„))), and so on and so forth. (Note thatthis is not the same representation we used for 1 inthe Boolean context: this is fine; we already know thatthe same object can be represented in more than oneway.) The number 0 is represented by the functionthat maps any function đ to the identity function đđ„.đ„.(That is, 0 = đđ.(đđ„.đ„).)In this representation, we can compute PLUS(đ, đ)as đđ.đđ„.(đđ)((đđ)đ„) and TIMES(đ, đ) as đđ.đ(đđ).Subtraction and division are trickier, but can beachieved using recursion. (Working this out is a greatexercise.)
8.7.1 List processingNow we come to a bigger hurdle, which is how to implementMAP, FILTER, REDUCE and RECURSE in the pure λ calculus. Itturns out that we can build MAP and FILTER from REDUCE, andREDUCE from RECURSE. For example MAP(đż, đ) is the same asREDUCE(đż, đ) where đ is the operation that on input đ„ and đŠ, outputsPAIR(đ(đ„),NIL) if đŠ is NIL and otherwise outputs PAIR(đ(đ„), đŠ). (Ileave checking this as a (recommended!) exercise for you, the reader.)
We can define REDUCE(đż, đ) recursively, by settingREDUCE(NIL, đ) = NIL and stipulating that given a non-empty list đż, which we can think of as a pair (âđđđ, đđđ đĄ),
equivalent models of computation 313
REDUCE(đż, đ) = đ(âđđđ,REDUCE(đđđ đĄ, đ))). Thus, we mighttry to write a recursive λ expression for REDUCE as follows
REDUCE = đđż, đ.IF(ISEMPTY(đż),NIL, đHEAD(đż)REDUCE(TAIL(đż), đ)) .(8.22)
The only fly in this ointment is that the λ calculus does not have thenotion of recursion, and so this is an invalid definition. But of coursewe can use our RECURSE operator to solve this problem. We willreplace the recursive call to âREDUCEâ with a call to a function đđthat is given as an extra argument, and then apply RECURSE to this.Thus REDUCE = RECURSE đđŠđ đžđ·đđ¶đž where
đđŠđ đžđ·đđ¶đž = đđđ, đż, đ.IF(ISEMPTY(đż),NIL, đHEAD(đż)đđ(TAIL(đż), đ)) .(8.23)
8.7.2 The Y combinator, or recursion without recursionEq. (8.23) means that implementing MAP, FILTER, and REDUCE boilsdown to implementing the RECURSE operator in the pure λ calculus.This is what we do now.
How can we implement recursion without recursion? We willillustrate this using a simple example - the XOR function. As shown inSolved Exercise 8.4, we can write the XOR function of a list recursivelyas follows:
XOR(đż) = â§âšâ©0 đż is emptyXOR2(HEAD(đż),XOR(TAIL(đż))) otherwise
(8.24)
where XOR2 ⶠ0, 12 â 0, 1 is the XOR on two bits. In Python wewould write this asdef xor2(a,b): return 1-b if a else bdef head(L): return L[0]def tail(L): return L[1:]
def xor(L): return xor2(head(L),xor(tail(L))) if L else 0
print(xor([0,1,1,0,0,1]))# 1
Now, how could we eliminate this recursive call? The main idea isthat since functions can take other functions as input, it is perfectlylegal in Python (and the λ calculus of course) to give a function itselfas input. So, our idea is to try to come up with a non recursive functiontempxor that takes two inputs: a function and a list, and such thattempxor(tempxor,L)will output the XOR of L!
314 introduction to theoretical computer science
PAt this point you might want to stop and try to im-plement this on your own in Python or any otherprogramming language of your choice (as long as itallows functions as inputs).
Our first attempt might be to simply use the idea of replacing therecursive call by me. Letâs define this function as myxor
def myxor(me,L): return xor2(head(L),me(tail(L))) if Lelse 0
Letâs test this out:
myxor(myxor,[1,0,1])
If you do this, you will get the following complaint from the inter-preter:
TypeError: myxor() missing 1 required positional argu-ment
The problem is that myxor expects two inputs- a function and alist- while in the call to me we only provided a list. To correct this, wemodify the call to also provide the function itself:
def tempxor(me,L): return xor2(head(L),me(me,tail(L))) ifL else 0
Note the call me(me,..) in the definition of tempxor: given a func-tion me as input, tempxor will actually call the function me with itselfas the first input. If we test this out now, we see that we actually getthe right result!
tempxor(tempxor,[1,0,1])# 0tempxor(tempxor,[1,0,1,1])# 1
and so we can define xor(L) as simply return tem-pxor(tempxor,L).
The approach above is not specific to XOR. Given a recursive func-tion f that takes an input x, we can obtain a non recursive version asfollows:
1. Create the function myf that takes a pair of inputs me and x, andreplaces recursive calls to f with calls to me.
2. Create the function tempf that converts calls in myf of the formme(x) to calls of the form me(me,x).
equivalent models of computation 315
3 Because of specific issues of Python syntax, in thisimplementation we use f * g for applying f to grather than fg, and use λx(exp) rather than λx.expfor abstraction. We also use _0 and _1 for the λ termsfor 0 and 1 so as not to confuse with the Pythonconstants.
3. The function f(x) will be defined as tempf(tempf,x)
Here is the way we implement the RECURSE operator in Python. Itwill take a function myf as above, and replace it with a function g suchthat g(x)=myf(g,x) for every x.
def RECURSE(myf):def tempf(me,x): return myf(lambda y: me(me,y),x)
return lambda x: tempf(tempf,x)
xor = RECURSE(myxor)
print(xor([0,1,1,0,0,1]))# 1
print(xor([1,1,0,0,1,1,1,1]))# 0
From Python to the calculus. In the λ calculus, a two input functionđ that takes a pair of inputs đđ, đŠ is written as đđđ.(đđŠ.đ). So thefunction đŠ ⊠đđ(đđ, đŠ) is simply written as đđ đđ and similarlythe function đ„ ⊠đĄđđđđ(đĄđđđđ, đ„) is simply đĄđđđđ đĄđđđđ . (Canyou see why?) Therefore the function tempf defined above can bewritten as λ me. myf(me me). This means that if we denote the inputof RECURSE by đ , then RECURSE đđŠđ = đĄđđđđ đĄđđđđ where đĄđđđđ =đđ.đ(đ đ) or in other words
RECURSE = đđ.((đđ.đ(đ đ)) (đđ.đ(đ đ))) (8.25)
The online appendix contains an implementation of the λ calcu-lus using Python. Here is an implementation of the recursive XORfunction from that appendix:3
# XOR of two bitsXOR2 = λ(a,b)(IF(a,IF(b,_0,_1),b))
# Recursive XOR with recursive calls replaced by mparameter
myXOR = λ(m,l)(IF(ISEMPTY(l),_0,XOR2(HEAD(l),m(TAIL(l)))))
# Recurse operator (aka Y combinator)RECURSE = λf((λm(f(m*m)))(λm(f(m*m))))
# XOR function
316 introduction to theoretical computer science
XOR = RECURSE(myXOR)
#TESTING:
XOR(PAIR(_1,NIL)) # List [1]# equals 1
XOR(PAIR(_1,PAIR(_0,PAIR(_1,NIL)))) # List [1,0,1]# equals 0
RRemark 8.20 â The Y combinator. The RECURSE opera-tor above is better known as the Y combinator.It is one of a family of a fixed point operators that givena lambda expression đč , find a fixed point đ of đč suchthat đ = đčđ . If you think about it, XOR is the fixedpoint of đđŠđđđ above. XOR is the function suchthat for every đ„, if plug in XOR as the first argumentof đđŠđđđ then we get back XOR, or in other wordsXOR = đđŠđđđ XOR. Hence finding a fixed point forđđŠđđđ is the same as applying RECURSE to it.
8.8 THE CHURCH-TURING THESIS (DISCUSSION)â[In 1934], Church had been speculating, and finally definitely proposed, thatthe λ-definable functions are all the effectively calculable functions âŠ. WhenChurch proposed this thesis, I sat down to disprove it ⊠but, quickly realizingthat [my approach failed], I became overnight a supporter of the thesis.â,Stephen Kleene, 1979.
â[The thesis is] not so much a definition or to an axiom but ⊠a natural law.â,Emil Post, 1936.
We have defined functions to be computable if they can be computedby a NAND-TM program, and weâve seen that the definition wouldremain the same if we replaced NAND-TM programs by Python pro-grams, Turing machines, λ calculus, cellular automata, and manyother computational models. The Church-Turing thesis is that this isthe only sensible definition of âcomputableâ functions. Unlike theâPhysical Extended Church-Turing Thesisâ (PECTT) which we sawbefore, the Church-Turing thesis does not make a concrete physicalprediction that can be experimentally tested, but it certainly motivatespredictions such as the PECTT. One can think of the Church-TuringThesis as either advocating a definitional choice, making some pre-diction about all potential computing devices, or suggesting somelaws of nature that constrain the natural world. In Scott Aaronsonâs
equivalent models of computation 317
words, âwhatever it is, the Church-Turing thesis can only be regardedas extremely successfulâ. No candidate computing device (includingquantum computers, and also much less reasonable models such asthe hypothetical âclosed time curveâ computers we mentioned before)has so far mounted a serious challenge to the Church-Turing thesis.These devices might potentially make some computations more effi-cient, but they do not change the difference between what is finitelycomputable and what is not. (The extended Church-Turing thesis,which we discuss in Section 13.3, stipulates that Turing machines cap-ture also the limit of what can be efficiently computable. Just like itsphysical version, quantum computing presents the main challenge tothis thesis.)
8.8.1 Different models of computationWe can summarize the models we have seen in the following table:
Table 8.1: Different models for computing finite functions andfunctions with arbitrary input length.
Computationalproblems Type of model ExamplesFinite functionsđ ⶠ0, 1đ â 0, 1đ Non uniform
computation(algorithmdepends on inputlength)
Boolean circuits,NAND circuits,straight-line programs(e.g., NAND-CIRC)
Functions withunbounded inputsđč ⶠ0, 1â â 0, 1â
Sequential accessto memory
Turing machines,NAND-TM programs
â Indexed access /RAM
RAM machines,NAND-RAM, modernprogramminglanguages
â Other Lambda calculus,cellular automata
Later on in Chapter 17 we will study memory bounded computa-tion. It turns out that NAND-TM programs with a constant amountof memory are equivalent to the model of finite automata (the adjec-tives âdeterministicâ or ânondeterministicâ are sometimes added aswell, this model is also known as finite state machines) which in turncaptures the notion of regular languages (those that can be described byregular expressions), which is a concept we will see in Chapter 10.
318 introduction to theoretical computer science
Chapter Recap
⹠While we defined computable functions usingTuring machines, we could just as well have doneso using many other models, including not justNAND-TM programs but also RAM machines,NAND-RAM, the λ-calculus, cellular automata andmany other models.
âą Very simple models turn out to be âTuring com-pleteâ in the sense that they can simulate arbitrarilycomplex computation.
8.9 EXERCISES
Exercise 8.1 â Alternative proof for TM/RAM equivalence. Let SEARCH â¶0, 1â â 0, 1â be the following function. The input is a pair(đż, đ) where đ â 0, 1â, đż is an encoding of a list of key value pairs(đ0, đŁ1), ⊠, (đđâ1, đŁđâ1) where đ0, ⊠, đđâ1, đŁ0, ⊠, đŁđâ1 are binarystrings. The output is đŁđ for the smallest đ such that đđ = đ, if such đexists, and otherwise the empty string.
1. Prove that SEARCH is computable by a Turing machine.
2. Let UPDATE(đż, đ, đŁ) be the function whose input is a list đż of pairs,and whose output is the list đżâČ obtained by prepending the pair(đ, đŁ) to the beginning of đż. Prove that UPDATE is computable by aTuring machine.
3. Suppose we encode the configuration of a NAND-RAM programby a list đż of key/value pairs where the key is either the name ofa scalar variable foo or of the form Bar[<num>] for some num-ber <num> and it contains all the nonzero values of variables. LetNEXT(đż) be the function that maps a configuration of a NAND-RAM program at one step to the configuration in the next step.Prove that NEXT is computable by a Turing machine (you donâthave to implement each one of the arithmetic operations: it isenough to implement addition and multiplication).
4. Prove that for every đč ⶠ0, 1â â 0, 1â that is computable by aNAND-RAM program, đč is computable by a Turing machine.
Exercise 8.2 â NAND-TM lookup. This exercise shows part of the proof thatNAND-TM can simulate NAND-RAM. Produce the code of a NAND-TM program that computes the function LOOKUP ⶠ0, 1â â 0, 1that is defined as follows. On input đđ(đ)đ„, where đđ(đ) denotes aprefix-free encoding of an integer đ, LOOKUP(đđ(đ)đ„) = đ„đ if đ < |đ„|
equivalent models of computation 319
4 You donât have to give a full description of a Turingmachine: use our âhave the cake and eat it tooâparadigm to show the existence of such a machine byarguing from more powerful equivalent models.
5 Same hint as Exercise 8.5 applies. Note that forshowing that LONGPATH is computable you donâthave to give an efficient algorithm.
and LOOKUP(đđ(đ)đ„) = 0 otherwise. (We donât care what LOOKUPoutputs on inputs that are not of this form.) You can choose anyprefix-free encoding of your choice, and also can use your favoriteprogramming language to produce this code.
Exercise 8.3 â Pairing. Let đđđđđ ⶠâ2 â â be the function defined asđđđđđ(đ„0, đ„1) = 12 (đ„0 + đ„1)(đ„0 + đ„1 + 1) + đ„1.1. Prove that for every đ„0, đ„1 â â, đđđđđ(đ„0, đ„1) is indeed a natural
number.
2. Prove that đđđđđ is one-to-one
3. Construct a NAND-TM program đ such that for every đ„0, đ„1 â â,đ(đđ(đ„0)đđ(đ„1)) = đđ(đđđđđ(đ„0, đ„1)), where đđ is the prefix-freeencoding map defined above. You can use the syntactic sugar forinner loops, conditionals, and incrementing/decrementing thecounter.
4. Construct NAND-TM programs đ0, đ1 such that for for everyđ„0, đ„1 â â and đ â đ , đđ(đđ(đđđđđ(đ„0, đ„1))) = đđ(đ„đ). You canuse the syntactic sugar for inner loops, conditionals, and increment-ing/decrementing the counter.
Exercise 8.4 â Shortest Path. Let SHORTPATH ⶠ0, 1â â 0, 1âbe the function that on input a string encoding a triple (đș, đą, đŁ) out-puts a string encoding â if đą and đŁ are disconnected in đș or a stringencoding the length đ of the shortest path from đą to đŁ. Prove thatSHORTPATH is computable by a Turing machine. See footnote forhint.4
Exercise 8.5 â Longest Path. Let LONGPATH ⶠ0, 1â â 0, 1â bethe function that on input a string encoding a triple (đș, đą, đŁ) outputsa string encoding â if đą and đŁ are disconnected in đș or a string en-coding the length đ of the longest simple path from đą to đŁ. Prove thatLONGPATH is computable by a Turing machine. See footnote forhint.5
Exercise 8.6 â Shortest path λ expression. Let SHORTPATH be as inExercise 8.4. Prove that there exists a đ expression that computesSHORTPATH. You can use Exercise 8.4
Exercise 8.7 â Next-step function is local. Prove Lemma 8.9 and use it tocomplete the proof of Theorem 8.7.
320 introduction to theoretical computer science
6 Hint: You can reduce the number of variables afunction takes by âpairing them upâ. That is, define aλ expression PAIR such that for every đ„, đŠ PAIRđ„đŠ issome function đ such that đ0 = đ„ and đ1 = đŠ. Thenuse PAIR to iteratively reduce the number of variablesused.
7 Use structural induction on the expression đ.
8 The name đ§đđ is a common name for this operation,for example in Python. It should not be confused withthe zip compression file format.
9 Use MAP and REDUCE (and potentially FILTER).You might also find the function đ§đđ of Exercise 8.10useful.
10 Try to set up a procedure such that if array Leftcontains an encoding of a λ expression đđ„.đ andarray Right contains an encoding of another λ expres-sion đâČ, then the array Result will contain đ[đ„ â đâČ].
Exercise 8.8 â λ calculus requires at most three variables. Prove that for ev-ery λ-expression đ with no free variables there is an equivalent λ-expression đ that only uses the variables đ„,đŠ, and đ§.6
Exercise 8.9 â Evaluation order example in λ calculus. 1. Let đ =đđ„.7 ((đđ„.đ„đ„)(đđ„.đ„đ„)). Prove that the simplification process of đends in a definite number if we use the âcall by nameâ evaluationorder while it never ends if we use the âcall by valueâ order.
2. (bonus, challenging) Let đ be any λ expression. Prove that if thesimplification process ends in a definite number if we use the âcallby valueâ order then it also ends in such a number if we use theâcall by nameâ order. See footnote for hint.7
Exercise 8.10 â Zip function. Give an enhanced λ calculus expression tocompute the function đ§đđ that on input a pair of lists đŒ and đż of thesame length đ, outputs a list of đ pairs đ such that the đ-th elementof đ (which we denote by đđ) is the pair (đŒđ, đżđ). Thus đ§đđ âzipstogetherâ these two lists of elements into a single list of pairs.8
Exercise 8.11 â Next-step function without đ đžđ¶đđ đđž. Let đ be a Turingmachine. Give an enhanced λ calculus expression to compute thenext-step function NEXTđ of đ (as in the proof of Theorem 8.16)without using RECURSE. See footnote for hint.9
Exercise 8.12 â λ calculus to NAND-TM compiler (challenging). Give a programin the programming language of your choice that takes as input a λexpression đ and outputs a NAND-TM program đ that computes thesame function as đ. For partial credit you can use the GOTO and allNAND-CIRC syntactic sugar in your output program. You can useany encoding of λ expressions as binary string that is convenient foryou. See footnote for hint.10
Exercise 8.13 â At least two in đ calculus. Let 1 = đđ„, đŠ.đ„ and 0 = đđ„, đŠ.đŠ asbefore. Define
ALT = đđ, đ, đ.(đ(đ1(đ10))(đđ0)) (8.26)
Prove that ALT is a đ expression that computes the at least two func-tion. That is, for every đ, đ, đ â 0, 1 (as encoded above) ALTđđđ = 1if and only at least two of đ, đ, đ are equal to 1.
equivalent models of computation 321
Exercise 8.14 â Locality of next-step function. This question will help youget a better sense of the notion of locality of the next step function of Tur-ing Machines. This locality plays an important role in results such asthe Turing completeness of đ calculus and one dimensional cellularautomata, as well as results such as Godelâs Incompleteness Theoremand the Cook Levin theorem that we will see later in this course. De-fine STRINGS to be the a programming language that has the followingsemantics:
âą A STRINGS program đ has a single string variable str that is boththe input and the output of đ. The program has no loops and noother variables, but rather consists of a sequence of conditionalsearch and replace operations that modify str.
âą The operations of a STRINGS program are:
â REPLACE(pattern1,pattern2)where pattern1 and pattern2are fixed strings. This replaces the first occurrence of pattern1in str with pattern2
â if search(pattern) code executes code if pattern is asubstring of str. The code code can itself include nested ifâs.(One can also add an else ... to execute if pattern is nota substring of condf).
â the returned value is str
âą A STRING program đ computes a function đč ⶠ0, 1â â 0, 1â iffor every đ„ â 0, 1â, if we initialize str to đ„ and then execute thesequence of instructions in đ, then at the end of the execution strequals đč(đ„).For example, the following is a STRINGS program that computes
the function đč ⶠ0, 1â â 0, 1â such that for every đ„ â 0, 1â, if đ„contains a substring of the form đŠ = 11đđ11 where đ, đ â 0, 1, thenđč(đ„) = đ„âČ where đ„âČ is obtained by replacing the first occurrence of đŠ inđ„ with 00.if search('110011')
replace('110011','00') else if search('110111')
replace('110111','00') else if search('111011')
replace('111011','00') else if search('111111')
replace('1111111','00')
322 introduction to theoretical computer science
Prove that for every Turing Machine program đ , there exists aSTRINGS program đ that computes the NEXTđ function that mapsevery string encoding a valid configuration of đ to the string encodingthe configuration of the next step of đ âs computation. (We donâtcare what the function will do on strings that do not encode a validconfiguration.) You donât have to write the STRINGS program fully,but you do need to give a convincing argument that such a programexists.
8.10 BIBLIOGRAPHICAL NOTES
Chapters 7 in the wonderful book of Moore and Mertens [MM11]contains a great exposition much of this material. .
The RAM model can be very useful in studying the concrete com-plexity of practical algorithms. Its theoretical study was initiated in[CR73]. However, the exact set of operations that are allowed in theRAM model and their costs vary between texts and contexts. Oneneeds to be careful in making such definitions, especially if the wordsize grows, as was already shown by Shamir [Sha79]. Chapter 3 inSavageâs book [Sav98] contains a more formal description of RAMmachines, see also the paper [Hag98]. A study of RAM algorithmsthat are independent of the input size (known as the âtransdichoto-mous RAM modelâ) was initiated by [FW93]
The models of computation we considered so far are inherentlysequential, but these days much computation happens in parallel,whether using multi-core processors or in massively parallel dis-tributed computation in data centers or over the Internet. Parallelcomputing is important in practice, but it does not really make muchdifference for the question of what can and canât be computed. Afterall, if a computation can be performed using đ machines in đĄ time,then it can be computed by a single machine in time đđĄ.
The λ-calculus was described by Church in [Chu41]. Pierceâs book[Pie02] is a canonical textbook, see also [Bar84]. The âCurrying tech-niqueâ is named after the logician Haskell Curry (the Haskell pro-gramming language is named after Haskell Curry as well). Curryhimself attributed this concept to Moses Schönfinkel, though for somereason the term âSchönfinkelingâ never caught on.
Unlike most programming languages, the pure λ-calculus doesnâthave the notion of types. Every object in the λ calculus can also bethought of as a λ expression and hence as a function that takes oneinput and returns one output. All functions take one input and re-turn one output, and if you feed a function an input of a form it didnâtexpect, it still evaluates the λ expression via âsearch and replaceâ,replacing all instances of its parameter with copies of the input expres-
equivalent models of computation 323
sion you fed it. Typed variants of the λ calculus are objects of intenseresearch, and are strongly related to type systems for programminglanguage and computer-verifiable proof systems, see [Pie02]. Some ofthe typed variants of the λ calculus do not have infinite loops, whichmakes them very useful as ways of enabling static analysis of pro-grams as well as computer-verifiable proofs. We will come back to thispoint in Chapter 10 and Chapter 22.
Tao has proposed showing the Turing completeness of fluid dy-namics (a âwater computerâ) as a way of settling the question of thebehavior of the Navier-Stokes equations, see this popular article.
9Universality and uncomputability
âA function of a variable quantity is an analytic expression composed in anyway whatsoever of the variable quantity and numbers or constant quantities.â,Leonhard Euler, 1748.
âThe importance of the universal machine is clear. We do not need to have aninfinity of different machines doing different jobs. ⊠The engineering problemof producing various machines for various jobs is replaced by the office work ofâprogrammingâ the universal machineâ, Alan Turing, 1948
One of the most significant results we showed for Boolean circuits(or equivalently, straight-line programs) is the notion of universality:there is a single circuit that can evaluate all other circuits. However,this result came with a significant caveat. To evaluate a circuit of đ gates, the universal circuit needed to use a number of gates largerthan đ . It turns out that uniform models such as Turing machines orNAND-TM programs allow us to âbreak out of this cycleâ and obtaina truly universal Turing machine đ that can evaluate all other machines,including machines that are more complex (e.g., more states) than đitself. (Similarly, there is a Universal NAND-TM program đ âČ that canevaluate all NAND-TM programs, including programs that have morelines than đ âČ.)
It is no exaggeration to say that the existence of such a universalprogram/machine underlies the information technology revolutionthat began in the latter half of the 20th century (and is still ongoing).Up to that point in history, people have produced various special-purpose calculating devices such as the abacus, the slide ruler, andmachines that compute various trigonometric series. But as Turing(who was perhaps the one to see most clearly the ramifications ofuniversality) observed, a general purpose computer is much more pow-erful. Once we build a device that can compute the single universalfunction, we have the ability, via software, to extend it to do arbitrarycomputations. For example, if we want to simulate a new Turing ma-chine đ , we do not need to build a new physical machine, but rather
Compiled on 8.26.2020 18:10
Learning Objectives:âą The universal machine/program - âone
program to rule them allââą A fundamental result in computer science and
mathematics: the existence of uncomputablefunctions.
âą The halting problem: the canonical example ofan uncomputable function.
âą Introduction to the technique of reductions.âą Riceâs Theorem: A âmeta toolâ for
uncomputability results, and a starting pointfor much of the research on compilers,programming languages, and softwareverification.
326 introduction to theoretical computer science
can represent đ as a string (i.e., using code) and then input đ to theuniversal machine đ .
Beyond the practical applications, the existence of a universal algo-rithm also has surprising theoretical ramifications, and in particularcan be used to show the existence of uncomputable functions, upend-ing the intuitions of mathematicians over the centuries from Eulerto Hilbert. In this chapter we will prove the existence of the univer-sal program, and also show its implications for uncomputability, seeFig. 9.1
This chapter: A non-mathy overviewIn this chapter we will see two of the most important resultsin Computer Science:
1. The existence of a universal Turing machine: a single algo-rithm that can evaluate all other algorithms,
2. The existence of uncomputable functions: functions (includ-ing the famous âHalting problemâ) that cannot be computedby any algorithm.
Along the way, we develop the technique of reductions as away to show hardness of computing a function. A reductiongives a way to compute a certain function using âwishfulthinkingâ and assuming that another function can be com-puted. Reductions are of course widely used in program-ming - we often obtain an algorithm for one task by usinganother task as a âblack boxâ subroutine. However we willuse it in the âcontra positiveâ: rather than using a reductionto show that the former task is âeasyâ, we use them to showthat the latter task is âhardâ. Donât worry if you find thisconfusing - reductions are initially confusing - but they can bemastered with time and practice.
9.1 UNIVERSALITY OR A META-CIRCULAR EVALUATOR
We start by proving the existence of a universal Turing machine. This isa single Turing machine đ that can evaluate arbitrary Turing machinesđ on arbitrary inputs đ„, including machines đ that can have morestates and larger alphabet than đ itself. In particular, đ can even beused to evaluate itself! This notion of self reference will appear time andagain in this book, and as we will see, leads to several counter-intuitivephenomena in computing.
universality and uncomputability 327
Figure 9.1: In this chapter we will show the existenceof a universal Turing machine and then use this to de-rive first the existence of some uncomputable function.We then use this to derive the uncomputability ofTuringâs famous âhalting problemâ (i.e., the HALTfunction), from which we a host of other uncom-putability results follow. We also introduce reductions,which allow us to use the uncomputability of afunction đč to derive the uncomputability of a newfunction đș.
Figure 9.2: A Universal Turing Machine is a singleTuring Machine đ that can evaluate, given input the(description as a string of) arbitrary Turing machineđ and input đ„, the output of đ on đ„. In contrast tothe universal circuit depicted in Fig. 5.6, the machineđ can be much more complex (e.g., more states ortape alphabet symbols) than đ .
Theorem 9.1 â Universal Turing Machine. There exists a Turing machineđ such that on every string đ which represents a Turing machine,and đ„ â 0, 1â, đ(đ, đ„) = đ(đ„).
That is, if the machine đ halts on đ„ and outputs some đŠ â0, 1â then đ(đ, đ„) = đŠ, and if đ does not halt on đ„ (i.e.,đ(đ„) = â„) then đ(đ, đ„) = â„.
Big Idea 11 There is a âuniversalâ algorithm that can evaluatearbitrary algorithms on arbitrary inputs.
Proof Idea:
Once you understand what the theorem says, it is not that hard toprove. The desired program đ is an interpreter for Turing machines.That is, đ gets a representation of the machine đ (think of it as sourcecode), and some input đ„, and needs to simulate the execution of đ onđ„.
Think of how you would code đ in your favorite programminglanguage. First, you would need to decide on some representationscheme for đ . For example, you can use an array or a dictionaryto encode đ âs transition function. Then you would use some datastructure, such as a list, to store the contents of đ âs tape. Now you cansimulate đ step by step, updating the data structure as you go along.The interpreter will continue the simulation until the machine halts.
Once you do that, translating this interpreter from your favoriteprogramming language to a Turing machine can be done just as wehave seen in Chapter 8. The end result is whatâs known as a âmeta-
328 introduction to theoretical computer science
circular evaluatorâ: an interpreter for a programming language in thesame one. This is a concept that has a long history in computer sciencestarting from the original universal Turing machine. See also Fig. 9.3.â9.1.1 Proving the existence of a universal Turing MachineTo prove (and even properly state) Theorem 9.1, we need to fix somerepresentation for Turing machines as strings. One potential choicefor such a representation is to use the equivalence between Turingmachines and NAND-TM programs and hence represent a Turingmachine đ using the ASCII encoding of the source code of the corre-sponding NAND-TM program đ . However, we will use a more directencoding.
Definition 9.2 â String representation of Turing Machine. Let đ be a Turingmachine with đ states and a size â alphabet ÎŁ = đ0, ⊠, đââ1 (weuse the convention đ0 = 0,đ1 = 1, đ2 = â , đ3 = â·). We representđ as the triple (đ, â, đ ) where đ is the table of values for đżđ :đ = (đżđ(0, đ0), đżđ(0, đ1), ⊠, đżđ(đ â 1, đââ1)) , (9.1)
where each value đżđ(đ , đ) is a triple (đ âČ, đâČ, đ) with đ âČ â [đ],đâČ â ÎŁ and đ a number 0, 1, 2, 3 encoding one of L, R, S, H. Thussuch a machine đ is encoded by a list of 2 + 3đ â â natural num-bers. The string representation of đ is obtained by concatenatingprefix free representation of all these integers. If a string đŒ â 0, 1âdoes not represent a list of integers in the form above, then we treatit as representing the trivial Turing machine with one state thatimmediately halts on every input.
RRemark 9.3 â Take away points of representation. Thedetails of the representation scheme of Turing ma-chines as strings are immaterial for almost all applica-tions. What you need to remember are the followingpoints:
1. We can represent every Turing machine as a string.2. Given the string representation of a Turing ma-
chine đ and an input đ„, we can simulate đ âsexecution on the input đ„. (This is the content ofTheorem 9.1.)
An additional minor issue is that for convenience wemake the assumption that every string represents someTuring machine. This is very easy to ensure by justmapping strings that would otherwise not represent a
universality and uncomputability 329
Turing machine into some fixed trivial machine. Thisassumption is not very important, but does make afew results (such as Riceâs Theorem: Theorem 9.15) alittle less cumbersome to state.
Using this representation, we can formally prove Theorem 9.1.
Proof of Theorem 9.1. We will only sketch the proof, giving the majorideas. First, we observe that we can easily write a Python programthat, on input a representation (đ, â, đ ) of a Turing machine đ andan input đ„, evaluates đ on đ. Here is the code of this program forconcreteness, though you can feel free to skip it if you are not familiarwith (or interested in) Python:
# constantsdef EVAL(ÎŽ,x):
'''Evaluate TM given by transition table ÎŽon input x'''Tape = [""] + [a for a in x]i = 0; s = 0 # i = head pos, s = statewhile True:
s, Tape[i], d = Ύ[(s,Tape[i])]if d == "H": breakif d == "L": i = max(i-1,0)if d == "R": i += 1if i>= len(Tape): Tape.append('Ί')
j = 1; Y = [] # produce outputwhile Tape[j] != 'Ί':
Y.append(Tape[j])j += 1
return Y
On input a transition table đż this program will simulate the cor-responding machine đ step by step, at each point maintaining theinvariant that the array Tape contains the contents of đ âs tape, andthe variable s contains đ âs current state.
The above does not prove the theorem as stated, since we needto show a Turing machine that computes EVAL rather than a Pythonprogram. With enough effort, we can translate this Python codeline by line to a Turing machine. However, to prove the theorem wedonât need to do this, but can use our âeat the cake and have it tooâparadigm. That is, while we need to evaluate a Turing machine, inwriting the code for the interpreter we are allowed to use a richermodel such as NAND-RAM since it is equivalent in power to Turingmachines per Theorem 8.1).
330 introduction to theoretical computer science
Translating the above Python code to NAND-RAM is truly straight-forward. The only issue is that NAND-RAM doesnât have the dictio-nary data structure built in, which we have used above to store thetransition function ÎŽ. However, we can represent a dictionary đ· ofthe form đđđŠ0 ⶠđŁđđ0, ⊠, đđđŠđâ1 ⶠđŁđđđâ1 as simply a list of pairs.To compute đ·[đ] we can scan over all the pairs until we find one ofthe form (đ, đŁ) in which case we return đŁ. Similarly we scan the listto update the dictionary with a new value, either modifying it or ap-pending the pair (đđđŠ, đŁđđ) at the end.
RRemark 9.4 â Efficiency of the simulation. The argu-ment in the proof of Theorem 9.1 is a very inefficientway to implement the dictionary data structure inpractice, but it suffices for the purpose of proving thetheorem. Reading and writing to a dictionary of đvalues in this implementation takes Ω(đ) steps, butit is in fact possible to do this in đ(logđ) steps usinga search tree data structure or even đ(1) (for âtypicalâinstances) using a hash table. NAND-RAM and RAMmachines correspond to the architecture of modernelectronic computers, and so we can implement hashtables and search trees in NAND-RAM just as they areimplemented in other programming languages.
The construction above yields a universal Turing Machine with avery large number of states. However, since universal Turing machineshave such a philosophical and technical importance, researchers haveattempted to find the smallest possible universal Turing machines, seeSection 9.7.
9.1.2 Implications of universality (discussion)There is more than one Turing machine đ that satisfies the condi-tions of Theorem 9.1, but the existence of even a single such machineis already extremely fundamental to both the theory and practice ofcomputer science. Theorem 9.1âs impact reaches beyond the particu-lar model of Turing machines. Because we can simulate every TuringMachine by a NAND-TM program and vice versa, Theorem 9.1 im-mediately implies there exists a universal NAND-TM program đđsuch that đđ(đ , đ„) = đ(đ„) for every NAND-TM program đ . We canalso âmix and matchâ models. For example since we can simulateevery NAND-RAM program by a Turing machine, and every TuringMachine by the đ calculus, Theorem 9.1 implies that there exists a đexpression đ such that for every NAND-RAM program đ and input đ„on which đ(đ„) = đŠ, if we encode (đ , đ„) as a đ-expression đ (using the
universality and uncomputability 331
Figure 9.3: a) A particularly elegant example of aâmeta-circular evaluatorâ comes from John Mc-Carthyâs 1960 paper, where he defined the Lispprogramming language and gave a Lisp function thatevaluates an arbitrary Lisp program (see above). Lispwas not initially intended as a practical program-ming language and this example was merely meantas an illustration that the Lisp universal function ismore elegant than the universal Turing machine. Itwas McCarthyâs graduate student Steve Russell whosuggested that it can be implemented. As McCarthylater recalled, âI said to him, ho, ho, youâre confusingtheory with practice, this eval is intended for reading, notfor computing. But he went ahead and did it. That is, hecompiled the eval in my paper into IBM 704 machine code,fixing a bug, and then advertised this as a Lisp interpreter,which it certainly wasâ. b) A self-replicating C programfrom the classic essay of Thompson [Tho84].
đ-calculus encoding of strings as lists of 0âs and 1âs) then (đ đ) eval-uates to an encoding of đŠ. More generally we can say that for everyđł and đŽ in the set Turing Machines, RAM Machines, NAND-TM,NAND-RAM, đ-calculus, JavaScript, Python, ⊠of Turing equivalentmodels, there exists a program/machine in đł that computes the map(đ , đ„) ⊠đ(đ„) for every program/machine đ â đŽ.
The idea of a âuniversal programâ is of course not limited to theory.For example compilers for programming languages are often used tocompile themselves, as well as programs more complicated than thecompiler. (An extreme example of this is Fabrice Bellardâs ObfuscatedTiny C Compiler which is a C program of 2048 bytes that can compilea large subset of the C programming language, and in particular cancompile itself.) This is also related to the fact that it is possible to writea program that can print its own source code, see Fig. 9.3. There areuniversal Turing machines known that require a very small numberof states or alphabet symbols, and in particular there is a universalTuring machine (with respect to a particular choice of representingTuring machines as strings) whose tape alphabet is â·,â , 0, 1 andhas fewer than 25 states (see Section 9.7).
9.2 IS EVERY FUNCTION COMPUTABLE?
In Theorem 4.12, we saw that NAND-CIRC programs can computeevery finite function đ ⶠ0, 1đ â 0, 1. Therefore a natural guess isthat NAND-TM programs (or equivalently, Turing Machines) couldcompute every infinite function đč ⶠ0, 1â â 0, 1. However, thisturns out to be false. That is, there exists a function đč ⶠ0, 1â â 0, 1that is uncomputable!
332 introduction to theoretical computer science
The existence of uncomputable functions is quite surprising. Ourintuitive notion of a âfunctionâ (and the notion most mathematicianshad until the 20th century) is that a function đ defines some implicitor explicit way of computing the output đ(đ„) from the input đ„. Thenotion of an âuncomputable functionâ thus seems to be a contradic-tion in terms, but yet the following theorem shows that such creaturesdo exist:
Theorem 9.5 â Uncomputable functions. There exists a function đč â â¶0, 1â â 0, 1 that is not computable by any Turing machine.
Proof Idea:
The idea behind the proof follows quite closely Cantorâs proof thatthe reals are uncountable (Theorem 2.5), and in fact the theorem canalso be obtained fairly directly from that result (see Exercise 7.11).However, it is instructive to see the direct proof. The idea is to con-struct đč â in a way that will ensure that every possible machine đ willin fact fail to compute đč â. We do so by defining đč â(đ„) to equal 0 if đ„describes a Turing machine đ which satisfies đ(đ„) = 1 and definingđč â(đ„) = 1 otherwise. By construction, if đ is any Turing machine andđ„ is the string describing it, then đč â(đ„) â đ(đ„) and therefore đ doesnot compute đč â.âProof of Theorem 9.5. The proof is illustrated in Fig. 9.4. We start bydefining the following function đș ⶠ0, 1â â 0, 1:
For every string đ„ â 0, 1â, if đ„ satisfies (1) đ„ is a valid repre-sentation of some Turing machine đ (per the representation schemeabove) and (2) when the program đ is executed on the input đ„ ithalts and produces an output, then we define đș(đ„) as the first bit ofthis output. Otherwise (i.e., if đ„ is not a valid representation of a Tur-ing machine, or the machine đđ„ never halts on đ„) we define đș(đ„) = 0.We define đč â(đ„) = 1 â đș(đ„).
We claim that there is no Turing machine that computes đč â. In-deed, suppose, towards the sake of contradiction, there exists a ma-chine đ that computes đč â, and let đ„ be the binary string that rep-resents the machine đ . On one hand, since by our assumption đcomputes đč â, on input đ„ the machine đ halts and outputs đč â(đ„). Onthe other hand, by the definition of đč â, since đ„ is the representationof the machine đ , đč â(đ„) = 1 â đș(đ„) = 1 â đ(đ„), hence yielding acontradiction.
universality and uncomputability 333
Figure 9.4: We construct an uncomputable functionby defining for every two strings đ„, đŠ the value1 â đđŠ(đ„) which equals 0 if the machine describedby đŠ outputs 1 on đ„, and 1 otherwise. We then defineđč â(đ„) to be the âdiagonalâ of this table, namelyđč â(đ„) = 1 â đđ„(đ„) for every đ„. The function đč âis uncomputable, because if it was computable bysome machine whose string description is đ„â then wewould get that đđ„â (đ„â) = đč(đ„â) = 1 â đđ„â (đ„â).
Big Idea 12 There are some functions that can not be computed byany algorithm.
PThe proof of Theorem 9.5 is short but subtle. I suggestthat you pause here and go back to read it again andthink about it - this is a proof that is worth reading atleast twice if not three or four times. It is not often thecase that a few lines of mathematical reasoning estab-lish a deeply profound fact - that there are problemswe simply cannot solve.
The type of argument used to prove Theorem 9.5 is known as di-agonalization since it can be described as defining a function basedon the diagonal entries of a table as in Fig. 9.4. The proof can bethought of as an infinite version of the counting argument we usedfor showing lower bound for NAND-CIRC programs in Theorem 5.3.Namely, we show that itâs not possible to compute all functions from0, 1â â 0, 1 by Turing machines simply because there are morefunctions like that then there are Turing machines.
As mentioned in Remark 7.4, many texts use the âlanguageâ ter-minology and so will call a set đż â 0, 1â an undecidable or nonrecursive language if the function đč ⶠ0, 1â â¶â 0, 1 such thatđč(đ„) = 1 â đ„ â đż is uncomputable.
334 introduction to theoretical computer science
9.3 THE HALTING PROBLEM
Theorem 9.5 shows that there is some function that cannot be com-puted. But is this function the equivalent of the âtree that falls in theforest with no one hearing itâ? That is, perhaps it is a function thatno one actually wants to compute. It turns out that there are naturaluncomputable functions:
Theorem 9.6 â Uncomputability of Halting function. Let HALT ⶠ0, 1â â0, 1 be the function such that for every string đ â 0, 1â,HALT(đ, đ„) = 1 if Turing machine đ halts on the input đ„ andHALT(đ, đ„) = 0 otherwise. Then HALT is not computable.
Before turning to prove Theorem 9.6, we note that HALT is a verynatural function to want to compute. For example, one can think ofHALT as a special case of the task of managing an âApp storeâ. Thatis, given the code of some application, the gatekeeper for the storeneeds to decide if this code is safe enough to allow in the store or not.At a minimum, it seems that we should verify that the code would notgo into an infinite loop.Proof Idea:
One way to think about this proof is as follows:
Uncomputability of đč â + Universality = Uncomputability of HALT(9.2)
That is, we will use the universal Turing machine that computes EVALto derive the uncomputability of HALT from the uncomputability ofđč â shown in Theorem 9.5. Specifically, the proof will be by contra-diction. That is, we will assume towards a contradiction that HALT iscomputable, and use that assumption, together with the universal Tur-ing machine of Theorem 9.1, to derive that đč â is computable, whichwill contradict Theorem 9.5.â
Big Idea 13 If a function đč is uncomputable we can show thatanother function đ» is uncomputable by giving a way to reduce the taskof computing đč to computing đ» .
Proof of Theorem 9.6. The proof will use the previously establishedresult Theorem 9.5. Recall that Theorem 9.5 shows that the followingfunction đč â ⶠ0, 1â â 0, 1 is uncomputable:
đč â(đ„) = â§âšâ©1 đ„(đ„) = 00 otherwise(9.3)
universality and uncomputability 335
where đ„(đ„) denotes the output of the Turing machine described by thestring đ„ on the input đ„ (with the usual convention that đ„(đ„) = â„ if thiscomputation does not halt).
We will show that the uncomputability of đč â implies the uncom-putability of HALT. Specifically, we will assume, towards a contra-diction, that there exists a Turing machine đ that can compute theHALT function, and use that to obtain a Turing machine đ âČ that com-putes the function đč â. (This is known as a proof by reduction, since wereduce the task of computing đč â to the task of computing HALT. Bythe contrapositive, this means the uncomputability of đč â implies theuncomputability of HALT.)
Indeed, suppose that đ is a Turing machine that computes HALT.Algorithm 9.7 describes a Turing Machine đ âČ that computes đč â. (Weuse âhigh levelâ description of Turing machines, appealing to theâhave your cake and eat it tooâ paradigm, see Big Idea 10.)
Algorithm 9.7 â đč â to đ»đŽđżđ reduction.
Input: đ„ â 0, 1âOutput: đč â(đ„)1: # Assume T.M. đđ»đŽđżđ computes đ»đŽđżđ2: Let đ§ â đđ»đŽđżđ (đ„, đ„). # Assume đ§ = đ»đŽđżđ (đ„, đ„).3: if đ§ = 0 then4: return 05: end if6: Let đŠ â đ(đ„, đ„) # đ universal TM, i.e., đŠ = đ„(đ„)7: if đŠ = 0 then8: return 19: end if
10: return 0We claim that Algorithm 9.7 computes the function đč â. In-
deed, suppose that đ„(đ„) = 0 (and hence đč â(đ„) = 1). In thiscase, HALT(đ„, đ„) = 1 and hence, under our assumption thatđ(đ„, đ„) = HALT(đ„, đ„), the value đ§ will equal 1, and hence Al-gorithm 9.7 will set đŠ = đ„(đ„) = 0, and output the correct value1.
Suppose otherwise that đ„(đ„) â 0 (and hence đč â(đ„) = 0). In thiscase there are two possibilities:
âą Case 1: The machine described by đ„ does not halt on the input đ„.In this case, HALT(đ„, đ„) = 0. Since we assume that đ computesHALT it means that on input đ„, đ„, the machine đ must halt andoutput the value 0. This means that Algorithm 9.7 will set đ§ = 0and output 0.
336 introduction to theoretical computer science
1 This argument has also been connected to theissues of consciousness and free will. I am personallyskeptical of its relevance to these issues. Perhaps thereasoning is that humans have the ability to solve thehalting problem but they exercise their free will andconsciousness by choosing not to do so.
âą Case 2: The machine described by đ„ halts on the input đ„ and out-puts some đŠâČ â 0. In this case, since HALT(đ„, đ„) = 1, under ourassumptions, Algorithm 9.7 will set đŠ = đŠâČ â 0 and so output 0.We see that in all cases, đ âČ(đ„) = đč â(đ„), which contradicts the
fact that đč â is uncomputable. Hence we reach a contradiction to ouroriginal assumption that đ computes HALT.
POnce again, this is a proof thatâs worth reading morethan once. The uncomputability of the halting prob-lem is one of the fundamental theorems of computerscience, and is the starting point for much of the in-vestigations we will see later. An excellent way to geta better understanding of Theorem 9.6 is to go overSection 9.3.2, which presents an alternative proof ofthe same result.
9.3.1 Is the Halting problem really hard? (discussion)Many peopleâs first instinct when they see the proof of Theorem 9.6is to not believe it. That is, most people do believe the mathematicalstatement, but intuitively it doesnât seem that the Halting problem isreally that hard. After all, being uncomputable only means that HALTcannot be computed by a Turing machine.
But programmers seem to solve HALT all the time by informally orformally arguing that their programs halt. Itâs true that their programsare written in C or Python, as opposed to Turing machines, but thatmakes no difference: we can easily translate back and forth betweenthis model and any other programming language.
While every programmer encounters at some point an infinite loop,is there really no way to solve the halting problem? Some peopleargue that they personally can, if they think hard enough, determinewhether any concrete program that they are given will halt or not.Some have even argued that humans in general have the ability todo that, and hence humans have inherently superior intelligence tocomputers or anything else modeled by Turing machines.1
The best answer we have so far is that there truly is no way to solveHALT, whether using Macs, PCs, quantum computers, humans, orany other combination of electronic, mechanical, and biological de-vices. Indeed this assertion is the content of the Church-Turing Thesis.This of course does not mean that for every possible program đ , itis hard to decide if đ enters an infinite loop. Some programs donâteven have loops at all (and hence trivially halt), and there are manyother far less trivial examples of programs that we can certify to never
universality and uncomputability 337
Figure 9.5: SMBCâs take on solving the Halting prob-lem.
enter an infinite loop (or programs that we know for sure that willenter such a loop). However, there is no general procedure that woulddetermine for an arbitrary program đ whether it halts or not. More-over, there are some very simple programs for which no one knowswhether they halt or not. For example, the following Python programwill halt if and only if Goldbachâs conjecture is false:
def isprime(p):return all(p % i for i in range(2,p-1))
def Goldbach(n):return any( (isprime(p) and isprime(n-p))
for p in range(2,n-1))
n = 4while True:
if not Goldbach(n): breakn+= 2
Given that Goldbachâs Conjecture has been open since 1742, it isunclear that humans have any magical ability to say whether this (orother similar programs) will halt or not.
9.3.2 A direct proof of the uncomputability of HALT (optional)It turns out that we can combine the ideas of the proofs of Theo-rem 9.5 and Theorem 9.6 to obtain a short proof of the latter theorem,that does not appeal to the uncomputability of đč â. This short proofappeared in print in a 1965 letter to the editor of Christopher Strachey:
To the Editor, The Computer Journal.An Impossible ProgramSir,A well-known piece of folk-lore among programmers holds that it isimpossible to write a program which can examine any other programand tell, in every case, if it will terminate or get into a closed loop whenit is run. I have never actually seen a proof of this in print, and thoughAlan Turing once gave me a verbal proof (in a railway carriage on theway to a Conference at the NPL in 1953), I unfortunately and promptlyforgot the details. This left me with an uneasy feeling that the proofmust be long or complicated, but in fact it is so short and simple that itmay be of interest to casual readers. The version below uses CPL, butnot in any essential way.Suppose T[R] is a Boolean function taking a routine (or program) Rwith no formal or free variables as its arguments and that for all R,T[R] = True if R terminates if run and that T[R] = False if R does notterminate.Consider the routine P defined as follows
338 introduction to theoretical computer science
rec routine P§L: if T[P] go to LReturn §
If T[P] = True the routine P will loop, and it will only terminate ifT[P] = False. In each case âT[P]â has exactly the wrong value, and thiscontradiction shows that the function T cannot exist.Yours faithfully,C. StracheyChurchill College, Cambridge
PTry to stop and extract the argument for provingTheorem 9.6 from the letter above.
Since CPL is not as common today, let us reproduce this proof. Theidea is the following: suppose for the sake of contradiction that thereexists a program T such that T(f,x) equals True iff f halts on inputx. (Stracheyâs letter considers the no-input variant of HALT, but asweâll see, this is an immaterial distinction.) Then we can construct aprogram P and an input x such that T(P,x) gives the wrong answer.The idea is that on input x, the program P will do the following: runT(x,x), and if the answer is True then go into an infinite loop, andotherwise halt. Now you can see that T(P,P) will give the wronganswer: if P halts when it gets its own code as input, then T(P,P) issupposed to be True, but then P(P) will go into an infinite loop. Andif P does not halt, then T(P,P) is supposed to be False but then P(P)will halt. We can also code this up in Python:
def CantSolveMe(T):"""Gets function T that claims to solve HALT.Returns a pair (P,x) of code and input on whichT(P,x) â HALT(x)"""def fool(x):
if T(x,x):while True: pass
return "I halted"
return (fool,fool)
For example, consider the following Naive Python program T thatguesses that a given function does not halt if its input contains whileor for
universality and uncomputability 339
def T(f,x):"""Crude halting tester - decides it doesn't halt if it
contains a loop."""import inspectsource = inspect.getsource(f)if source.find("while"): return Falseif source.find("for"): return Falsereturn True
If we now set (f,x) = CantSolveMe(T), then T(f,x)=False butf(x) does in fact halt. This is of course not specific to this particular T:for every program T, if we run (f,x) = CantSolveMe(T) then weâllget an input on which T gives the wrong answer to HALT.
9.4 REDUCTIONS
The Halting problem turns out to be a linchpin of uncomputability, inthe sense that Theorem 9.6 has been used to show the uncomputabil-ity of a great many interesting functions. We will see several examplesof such results in this chapter and the exercises, but there are manymore such results (see Fig. 9.6).
Figure 9.6: Some uncomputability results. An arrowfrom problem X to problem Y means that we use theuncomputability of X to prove the uncomputabilityof Y by reducing computing X to computing Y.All of these results except for the MRDP Theoremappear in either the text or exercises. The HaltingProblem HALT serves as our starting point for allthese uncomputability results as well as many others.
The idea behind such uncomputability results is conceptually sim-ple but can at first be quite confusing. If we know that HALT is un-computable, and we want to show that some other function BLAH isuncomputable, then we can do so via a contrapositive argument (i.e.,proof by contradiction). That is, we show that if there exists a Turingmachine that computes BLAH then there exists a Turing machine thatcomputes HALT. (Indeed, this is exactly how we showed that HALTitself is uncomputable, by reducing this fact to the uncomputability ofthe function đč â from Theorem 9.5.)
340 introduction to theoretical computer science
For example, to prove that BLAH is uncomputable, we could showthat there is a computable function đ ⶠ0, 1â â 0, 1â such that forevery pair đ and đ„, HALT(đ, đ„) = BLAH(đ (đ, đ„)). The existence ofsuch a function đ implies that if BLAH was computable then HALTwould be computable as well, hence leading to a contradiction! Theconfusing part about reductions is that we are assuming somethingwe believe is false (that BLAH has an algorithm) to derive somethingthat we know is false (that HALT has an algorithm). Michael Sipserdescribes such results as having the form âIf pigs could whistle thenhorses could flyâ.
A reduction-based proof has two components. For starters, sincewe need đ to be computable, we should describe the algorithm tocompute it. The algorithm to compute đ is known as a reduction sincethe transformation đ modifies an input to HALT to an input to BLAH,and hence reduces the task of computing HALT to the task of comput-ing BLAH. The second component of a reduction-based proof is theanalysis of the algorithm đ : namely a proof that đ does indeed satisfythe desired properties.
Reduction-based proofs are just like other proofs by contradiction,but the fact that they involve hypothetical algorithms that donât reallyexist tends to make reductions quite confusing. The one silver liningis that at the end of the day the notion of reductions is mathematicallyquite simple, and so itâs not that bad even if you have to go back tofirst principles every time you need to remember what is the directionthat a reduction should go in.
RRemark 9.8 â Reductions are algorithms. A reductionis an algorithm, which means that, as discussed inRemark 0.3, a reduction has three components:
âą Specification (what): In the case of a reductionfrom HALT to BLAH, the specification is that func-tion đ ⶠ0, 1â â 0, 1â should satisfy thatHALT(đ, đ„) = BLAH(đ (đ, đ„)) for every Tur-ing machine đ and input đ„. In general, to reducea function đč to đș, the reduction should satisfyđč(đ€) = đș(đ (đ€)) for every input đ€ to đč .
âą Implementation (how): The algorithmâs descrip-tion: the precise instructions how to transform aninput đ€ to the output đ (đ€).
âą Analysis (why): A proof that the algorithm meetsthe specification. In particular, in a reductionfrom đč to đș this is a proof that for every inputđ€, the output đŠ of the algorithm satisfies thatđč(đ€) = đș(đŠ).
universality and uncomputability 341
9.4.1 Example: Halting on the zero problemHere is a concrete example for a proof by reduction. We define thefunction HALTONZERO ⶠ0, 1â â 0, 1 as follows. Given anystring đ , HALTONZERO(đ) = 1 if and only if đ describes a Turingmachine that halts when it is given the string 0 as input. A prioriHALTONZERO seems like a potentially easier function to computethan the full-fledged HALT function, and so we could perhaps hopethat it is not uncomputable. Alas, the following theorem shows thatthis is not the case:
Theorem 9.9 â Halting without input. HALTONZERO is uncomputable.
PThe proof of Theorem 9.9 is below, but before readingit you might want to pause for a couple of minutesand think how you would prove it yourself. In partic-ular, try to think of what a reduction from HALT toHALTONZERO would look like. Doing so is an excel-lent way to get some initial comfort with the notionof proofs by reduction, which a technique we will beusing time and again in this book.
Figure 9.7: To prove Theorem 9.9, we show thatHALTONZERO is uncomputable by giving a reductionfrom the task of computing HALT to the task of com-puting HALTONZERO. This shows that if there was ahypothetical algorithm đŽ computing HALTONZERO,then there would be an algorithm đ” computingHALT, contradicting Theorem 9.6. Since neither đŽ norđ” actually exists, this is an example of an implicationof the form âif pigs could whistle then horses couldflyâ.
Proof of Theorem 9.9. The proof is by reduction from HALT, seeFig. 9.7. We will assume, towards the sake of contradiction, thatHALTONZERO is computable by some algorithm đŽ, and use thishypothetical algorithm đŽ to construct an algorithm đ” to computeHALT, hence obtaining a contradiction to Theorem 9.6. (As discussed
342 introduction to theoretical computer science
in Big Idea 10, following our âhave your cake and eat it tooâ paradigm,we just use the generic name âalgorithmâ rather than worryingwhether we model them as Turing machines, NAND-TM programs,NAND-RAM, etc.; this makes no difference since all these models areequivalent to one another.)
Since this is our first proof by reduction from the Halting prob-lem, we will spell it out in more details than usual. Such a proof byreduction consists of two steps:
1. Description of the reduction: We will describe the operation of ouralgorithm đ”, and how it makes âfunction callsâ to the hypotheticalalgorithm đŽ.
2. Analysis of the reduction: We will then prove that under the hypoth-esis that Algorithm đŽ computes HALTONZERO, Algorithm đ” willcompute HALT.
Algorithm 9.10 â đ»đŽđżđ to đ»đŽđżđ đđđđžđ đ reduction.
Input: Turing machine đ and string đ„.Output: Turing machine đ âČ such that đ halts on đ„ iff đ âČ
halts on zero1: procedure đđ,đ„(đ€) # Description of the T.M. đđ,đ„2: return đžđ đŽđż(đ, đ„) # Ignore the
Input: đ€, evaluate đ on đ„.3: end procedure4: return đđ,đ„ # We do not execute đđ,đ„: only return its
description
Our Algorithm đ” works as follows: on input đ, đ„, it runs Algo-rithm 9.10 to obtain a Turing Machine đ âČ, and then returns đŽ(đ âČ).The machine đ âČ ignores its input đ§ and simply runs đ on đ„.
In pseudocode, the program đđ,đ„ will look something like thefollowing:
def N(z):M = r'.......'# a string constant containing desc. of Mx = r'.......'# a string constant containing xreturn eval(M,x)# note that we ignore the input z
That is, if we think of đđ,đ„ as a program, then it is a program thatcontains đ and đ„ as âhardwired constantsâ, and given any input đ§, itsimply ignores the input and always returns the result of evaluating
universality and uncomputability 343
đ on đ„. The algorithm đ” does not actually execute the machine đđ,đ„.đ” merely writes down the description of đđ,đ„ as a string (just as wedid above) and feeds this string as input to đŽ.
The above completes the description of the reduction. The analysis isobtained by proving the following claim:
Claim: For every strings đ, đ„, đ§, the machine đđ,đ„ constructed byAlgorithm đ” in Step 1 satisfies that đđ,đ„ halts on đ§ if and only if theprogram described by đ halts on the input đ„.
Proof of Claim: Since đđ,đ„ ignores its input and evaluates đ on đ„using the universal Turing machine, it will halt on đ§ if and only if đhalts on đ„.
In particular if we instantiate this claim with the input đ§ = 0 tođđ,đ„, we see that HALTONZERO(đđ,đ„) = HALT(đ, đ„). Thus ifthe hypothetical algorithm đŽ satisfies đŽ(đ) = HALTONZERO(đ)for every đ then the algorithm đ” we construct satisfies đ”(đ, đ„) =HALT(đ, đ„) for every đ, đ„, contradicting the uncomputability ofHALT.
RRemark 9.11 â The hardwiring technique. In the proof ofTheorem 9.9 we used the technique of âhardwiringâan input đ„ to a program/machine đ . That is, modify-ing a program đ that it uses âhardwired constantsâfor some of all of its input. This technique is quitecommon in reductions and elsewhere, and we willoften use it again in this course.
9.5 RICEâS THEOREM AND THE IMPOSSIBILITY OF GENERALSOFTWARE VERIFICATION
The uncomputability of the Halting problem turns out to be a specialcase of a much more general phenomenon. Namely, that we cannotcertify semantic properties of general purpose programs. âSemantic prop-ertiesâ mean properties of the function that the program computes, asopposed to properties that depend on the particular syntax used bythe program.
An example for a semantic property of a program đ is the propertythat whenever đ is given an input string with an even number of 1âs,it outputs 0. Another example is the property that đ will always haltwhenever the input ends with a 1. In contrast, the property that a Cprogram contains a comment before every function declaration is nota semantic property, since it depends on the actual source code asopposed to the input/output relation.
344 introduction to theoretical computer science
Checking semantic properties of programs is of great interest, as itcorresponds to checking whether a program conforms to a specifica-tion. Alas it turns out that such properties are in general uncomputable.We have already seen some examples of uncomputable semantic func-tions, namely HALT and HALTONZERO, but these are just the âtip ofthe icebergâ. We start by observing one more such example:
Theorem 9.12 â Computing all zero function. Let ZEROFUNC ⶠ0, 1â â0, 1 be the function such that for every đ â 0, 1â, ZEROFUNC(đ) =1 if and only if đ represents a Turing machine such that đ outputs0 on every input đ„ â 0, 1â. Then ZEROFUNC is uncomputable.
PDespite the similarity in their names, ZEROFUNC andHALTONZERO are two different functions. For exam-ple, if đ is a Turing machine that on input đ„ â 0, 1â,halts and outputs the OR of all of đ„âs coordinates, thenHALTONZERO(đ) = 1 (since đ does halt on theinput 0) but ZEROFUNC(đ) = 0 (since đ does notcompute the constant zero function).
Proof of Theorem 9.12. The proof is by reduction to HALTONZERO.Suppose, towards the sake of contradiction, that there was an algo-rithm đŽ such that đŽ(đ) = ZEROFUNC(đ) for every đ â 0, 1â.Then we will construct an algorithm đ” that solves HALTONZERO,contradicting Theorem 9.9.
Given a Turing machine đ (which is the input to HALTONZERO),our Algorithm đ” does the following:
1. Construct a Turing Machine đ which on input đ„ â 0, 1â, firstruns đ(0) and then outputs 0.
2. Return đŽ(đ).Now if đ halts on the input 0 then the Turing machine đ com-
putes the constant zero function, and hence under our assumptionthat đŽ computes ZEROFUNC, đŽ(đ) = 1. If đ does not halt on theinput 0, then the Turing machine đ will not halt on any input, andso in particular will not compute the constant zero function. Henceunder our assumption that đŽ computes ZEROFUNC, đŽ(đ) = 0.We see that in both cases, ZEROFUNC(đ) = HALTONZERO(đ)and hence the value that Algorithm đ” returns in step 2 is equal toHALTONZERO(đ) which is what we needed to prove.
Another result along similar lines is the following:
universality and uncomputability 345
Theorem 9.13 â Uncomputability of verifying parity. The following func-tion is uncomputable
COMPUTES-PARITY(đ ) = â§âšâ©1 đ computes the parity function0 otherwise(9.4)
PWe leave the proof of Theorem 9.13 as an exercise(Exercise 9.6). I strongly encourage you to stop hereand try to solve this exercise.
9.5.1 Riceâs TheoremTheorem 9.13 can be generalized far beyond the parity function. Infact, this generalization rules out verifying any type of semantic spec-ification on programs. We define a semantic specification on programsto be some property that does not depend on the code of the programbut just on the function that the program computes.
For example, consider the following two C programs
int First(int n) if (n<0) return 0;return 2*n;
int Second(int n) int i = 0;int j = 0if (n<0) return 0;while (j<n)
i = i + 2;j= j + 1;
return i;
First and Second are two distinct C programs, but they computethe same function. A semantic property, would be either true for bothprograms or false for both programs, since it depends on the functionthe programs compute and not on their code. An example for a se-mantic property that both First and Second satisfy is the following:âThe program đ computes a function đ mapping integers to integers satisfy-ing that đ(đ) â„ đ for every input đâ.
346 introduction to theoretical computer science
A property is not semantic if it depends on the source code ratherthan the input/output behavior. For example, properties such as âtheprogram contains the variable kâ or âthe program uses the while op-erationâ are not semantic. Such properties can be true for one of theprograms and false for others. Formally, we define semantic proper-ties as follows:
Definition 9.14 â Semantic properties. A pair of Turing machinesđ and đ âČ are functionally equivalent if for every đ„ â 0, 1â,đ(đ„) = đ âČ(đ„). (In particular, đ(đ„) = â„ iff đ âČ(đ„) = â„ for allđ„.)A function đč ⶠ0, 1â â 0, 1 is semantic if for every pair of
strings đ, đ âČ that represent functionally equivalent Turing ma-chines, đč(đ) = đč(đ âČ). (Recall that we assume that every stringrepresents some Turing machine, see Remark 9.3)
There are two trivial examples of semantic functions: the constantone function and the constant zero function. For example, if đ is theconstant zero function (i.e., đ(đ) = 0 for every đ) then clearlyđč(đ) = đč(đ âČ) for every pair of Turing machines đ and đ âČ that arefunctionally equivalent đ and đ âČ. Here is a non-trivial exampleSolved Exercise 9.1 â đđžđ đđčđđđ¶ is semantic. Prove that the functionZEROFUNC is semantic.
Solution:
Recall that ZEROFUNC(đ) = 1 if and only if đ(đ„) = 0 forevery đ„ â 0, 1â. If đ and đ âČ are functionally equivalent, then forevery đ„, đ(đ„) = đ âČ(đ„). Hence ZEROFUNC(đ) = 1 if and only ifZEROFUNC(đ âČ) = 1.
Often the properties of programs that we are most interested incomputing are the semantic ones, since we want to understand theprogramsâ functionality. Unfortunately, Riceâs Theorem tells us thatthese properties are all uncomputable:
Theorem 9.15 â Riceâs Theorem. Let đč ⶠ0, 1â â 0, 1. If đč is seman-tic and non-trivial then it is uncomputable.
Proof Idea:
The idea behind the proof is to show that every semantic non-trivial function đč is at least as hard to compute as HALTONZERO.This will conclude the proof since by Theorem 9.9, HALTONZEROis uncomputable. If a function đč is non trivial then there are two
universality and uncomputability 347
machines đ0 and đ1 such that đč(đ0) = 0 and đč(đ1) = 1. So,the goal would be to take a machine đ and find a way to map it intoa machine đ = đ (đ), such that (i) if đ halts on zero then đ isfunctionally equivalent to đ1 and (ii) if đ does not halt on zero thenđ is functionally equivalent đ0.
Because đč is semantic, if we achieved this, then we would be guar-anteed that HALTONZERO(đ) = đč(đ (đ)), and hence would showthat if đč was computable, then HALTONZERO would be computableas well, contradicting Theorem 9.9.âProof of Theorem 9.15. We will not give the proof in full formality, butrather illustrate the proof idea by restricting our attention to a particu-lar semantic function đč . However, the same techniques generalize toall possible semantic functions. Define MONOTONE ⶠ0, 1â â 0, 1as follows: MONOTONE(đ) = 1 if there does not exist đ â â andtwo inputs đ„, đ„âČ â 0, 1đ such that for every đ â [đ] đ„đ †đ„âČđ but đ(đ„)outputs 1 and đ(đ„âČ) = 0. That is, MONOTONE(đ) = 1 if itâs notpossible to find an input đ„ such that flipping some bits of đ„ from 0 to1 will change đ âs output in the other direction from 1 to 0. We willprove that MONOTONE is uncomputable, but the proof will easilygeneralize to any semantic function.
We start by noting that MONOTONE is neither the constant zeronor the constant one function:
âą The machine INF that simply goes into an infinite loop on everyinput satisfies MONOTONE(INF) = 1, since INF is not definedanywhere and so in particular there are no two inputs đ„, đ„âČ wheređ„đ †đ„âČđ for every đ but INF(đ„) = 0 and INF(đ„âČ) = 1.
⹠The machine PAR that computes the XOR or parity of its input, isnot monotone (e.g., PAR(1, 1, 0, 0, ⊠, 0) = 0 but PAR(1, 0, 0, ⊠, 0) =0) and hence MONOTONE(PAR) = 0.(Note that INF and PAR are machines and not functions.)We will now give a reduction from HALTONZERO to
MONOTONE. That is, we assume towards a contradiction thatthere exists an algorithm đŽ that computes MONOTONE and we willbuild an algorithm đ” that computes HALTONZERO. Our algorithm đ”will work as follows:
Algorithm đ”:Input: String đ describing a Turing machine. (Goal: ComputeHALTONZERO(đ))Assumption: Access to Algorithm đŽ to compute MONOTONE.Operation:
348 introduction to theoretical computer science
1. Construct the following machine đ : âOn input đ§ â 0, 1â do: (a)Run đ(0), (b) Return PAR(đ§)â.
2. Return 1 â đŽ(đ).To complete the proof we need to show that đ” outputs the cor-
rect answer, under our assumption that đŽ computes MONOTONE.In other words, we need to show that HALTONZERO(đ) = 1 âđđđđđ đđđž(đ). Suppose that đ does not halt on zero. In thiscase the program đ constructed by Algorithm đ” enters into an in-finite loop in step (a) and will never reach step (b). Hence in thiscase đ is functionally equivalent to INF. (The machine đ is notthe same machine as INF: its description or code is different. But itdoes have the same input/output behavior (in this case) of neverhalting on any input. Also, while the program đ will go into an in-finite loop on every input, Algorithm đ” never actually runs đ : itonly produces its code and feeds it to đŽ. Hence Algorithm đ” willnot enter into an infinite loop even in this case.) Thus in this case,MONOTONE(đ) = MONOTONE(INF) = 1.
If đ does halt on zero, then step (a) in đ will eventually concludeand đ âs output will be determined by step (b), where it simply out-puts the parity of its input. Hence in this case, đ computes the non-monotone parity function (i.e., is functionally equivalent to PAR), andso we get that MONOTONE(đ) = MONOTONE(PAR) = 0. In bothcases, MONOTONE(đ) = 1 â đ»đŽđżđ đđđđžđ đ(đ), which is whatwe wanted to prove.
An examination of this proof shows that we did not use anythingabout MONOTONE beyond the fact that it is semantic and non-trivial.For every semantic non-trivial đč , we can use the same proof, replacingPAR and INF with two machines đ0 and đ1 such that đč(đ0) = 0 andđč(đ1) = 1. Such machines must exist if đč is non trivial.
RRemark 9.16 â Semantic is not the same as uncom-putable. Riceâs Theorem is so powerful and such apopular way of proving uncomputability that peo-ple sometimes get confused and think that it is theonly way to prove uncomputability. In particular, acommon misconception is that if a function đč is notsemantic then it is computable. This is not at all thecase.For example, consider the following functionHALTNOYALE ⶠ0, 1â â 0, 1. This is a functionthat on input a string that represents a NAND-TMprogram đ , outputs 1 if and only if both (i) đ haltson the input 0, and (ii) the program đ does not con-tain a variable with the identifier Yale. The function
universality and uncomputability 349
HALTNOYALE is clearly not semantic, as it will out-put two different values when given as input one ofthe following two functionally equivalent programs:
Yale[0] = NAND(X[0],X[0])Y[0] = NAND(X[0],Yale[0])
and
Harvard[0] = NAND(X[0],X[0])Y[0] = NAND(X[0],Harvard[0])
However, HALTNOYALE is uncomputable since everyprogram đ can be transformed into an equivalent(and in fact improved :)) program đ âČ that does notcontain the variable Yale. Hence if we could computeHALTNOYALE then determine halting on zero forNAND-TM programs (and hence for Turing machinesas well).Moreover, as we will see in Chapter 11, there are un-computable functions whose inputs are not programs,and hence for which the adjective âsemanticâ is notapplicable.Properties such as âthe program contains the variableYaleâ are sometimes known as syntactic properties.The terms âsemanticâ and âsyntacticâ are used be-yond the realm of programming languages: a famousexample of a syntactically correct but semanticallymeaningless sentence in English is Chomskyâs âColor-less green ideas sleep furiously.â However, formallydefining âsyntactic propertiesâ is rather subtle and wewill not use this terminology in this book, sticking tothe terms âsemanticâ and ânon semanticâ only.
9.5.2 Halting and Riceâs Theorem for other Turing-complete modelsAs we saw before, many natural computational models turn out to beequivalent to one another, in the sense that we can transform a âpro-gramâ of one model (such as a đ expression, or a game-of-life config-urations) into another model (such as a NAND-TM program). Thisequivalence implies that we can translate the uncomputability of theHalting problem for NAND-TM programs into uncomputability forHalting in other models. For example:
Theorem 9.17 â NAND-TM Machine Halting. Let NANDTMHALT â¶0, 1â â 0, 1 be the function that on input strings đ â0, 1â and đ„ â 0, 1â outputs 1 if the NAND-TM program de-scribed by đ halts on the input đ„ and outputs 0 otherwise. ThenNANDTMHALT is uncomputable.
350 introduction to theoretical computer science
POnce again, this is a good point for you to stop and tryto prove the result yourself before reading the proofbelow.
Proof. We have seen in Theorem 7.11 that for every Turing machineđ , there is an equivalent NAND-TM program đđ such that for ev-ery đ„, đđ(đ„) = đ(đ„). In particular this means that HALT(đ) =NANDTMHALT(đđ).
The transformation đ ⊠đđ that is obtained from the proofof Theorem 7.11 is constructive. That is, the proof yields a way tocompute the map đ ⊠đđ . This means that this proof yields areduction from task of computing HALT to the task of computingNANDTMHALT, which means that since HALT is uncomputable,neither is NANDTMHALT.
The same proof carries over to other computational models such asthe đ calculus, two dimensional (or even one-dimensional) automata etc.Hence for example, there is no algorithm to decide if a đ expressionevaluates the identity function, and no algorithm to decide whetheran initial configuration of the game of life will result in eventuallycoloring the cell (0, 0) black or not.
Indeed, we can generalize Riceâs Theorem to all these models. Forexample, if đč ⶠ0, 1â â 0, 1 is a non-trivial function such thatđč(đ) = đč(đ âČ) for every functionally equivalent NAND-TM programsđ , đ âČ then đč is uncomputable, and the same holds for NAND-RAMprograms, đ-expressions, and all other Turing complete models (asdefined in Definition 8.5), see also Exercise 9.12.
9.5.3 Is software verification doomed? (discussion)Programs are increasingly being used for mission critical purposes,whether itâs running our banking system, flying planes, or monitoringnuclear reactors. If we canât even give a certification algorithm thata program correctly computes the parity function, how can we everbe assured that a program does what it is supposed to do? The keyinsight is that while it is impossible to certify that a general programconforms with a specification, it is possible to write a program inthe first place in a way that will make it easier to certify. As a trivialexample, if you write a program without loops, then you can certifythat it halts. Also, while it might not be possible to certify that anarbitrary program computes the parity function, it is quite possible towrite a particular program đ for which we can mathematically provethat đ computes the parity. In fact, writing programs or algorithms
universality and uncomputability 351
and providing proofs for their correctness is what we do all the time inalgorithms research.
The field of software verification is concerned with verifying thatgiven programs satisfy certain conditions. These conditions can bethat the program computes a certain function, that it never writesinto a dangerous memory location, that is respects certain invari-ants, and others. While the general tasks of verifying this may beuncomputable, researchers have managed to do so for many inter-esting cases, especially if the program is written in the first place ina formalism or programming language that makes verification eas-ier. That said, verification, especially of large and complex programs,remains a highly challenging task in practice as well, and the num-ber of programs that have been formally proven correct is still quitesmall. Moreover, even phrasing the right theorem to prove (i.e., thespecification) if often a highly non-trivial endeavor.
Figure 9.8: The set R of computable Boolean functions(Definition 7.3) is a proper subset of the set of allfunctions mapping 0, 1â to 0, 1. In this chapterwe saw a few examples of elements in the latter setthat are not in the former.
Chapter Recap
âą There is a universal Turing machine (or NAND-TMprogram) đ such that on input a description of aTuring machine đ and some input đ„, đ(đ, đ„) haltsand outputs đ(đ„) if (and only if) đ halts on inputđ„. Unlike in the case of finite computation (i.e.,NAND-CIRC programs / circuits), the input tothe program đ can be a machine đ that has morestates than đ itself.
âą Unlike the finite case, there are actually functionsthat are inherently uncomputable in the sense thatthey cannot be computed by any Turing machine.
âą These include not only some âdegenerateâ or âeso-tericâ functions but also functions that people have
352 introduction to theoretical computer science
2 A machine with alphabet ÎŁ can have at most |ÎŁ|đchoices for the contents of the first đ locations ofits tape. What happens if the machine repeats apreviously seen configuration, in the sense that thetape contents, the head location, and the current state,are all identical to what they were in some previousstate of the execution?
deeply care about and conjectured that could becomputed.
âą If the Church-Turing thesis holds then a functionđč that is uncomputable according to our definitioncannot be computed by any means in our physicalworld.
9.6 EXERCISES
Exercise 9.1 â NAND-RAM Halt. Let NANDRAMHALT ⶠ0, 1â â 0, 1be the function such that on input (đ , đ„) where đ represents a NAND-RAM program, NANDRAMHALT(đ , đ„) = 1 iff đ halts on the input đ„.Prove that NANDRAMHALT is uncomputable.
Exercise 9.2 â Timed halting. Let TIMEDHALT ⶠ0, 1â â 0, 1 bethe function that on input (a string representing) a triple (đ, đ„, đ ),TIMEDHALT(đ, đ„, đ ) = 1 iff the Turing machine đ , on input đ„,halts within at most đ steps (where a step is defined as one sequenceof reading a symbol from the tape, updating the state, writing a newsymbol and (potentially) moving the head).
Prove that TIMEDHALT is computable.
Exercise 9.3 â Space halting (challenging). Let SPACEHALT ⶠ0, 1â â0, 1 be the function that on input (a string representing) a triple(đ, đ„, đ ), SPACEHALT(đ, đ„, đ ) = 1 iff the Turing machine đ , oninput đ„, halts before its head reached the đ -th location of its tape. (Wedonât care how many steps đ makes, as long as the head stays insidelocations 0, ⊠, đ â 1.)
Prove that SPACEHALT is computable. See footnote for hint2
Exercise 9.4 â Computable compositions. Suppose that đč ⶠ0, 1â â 0, 1and đș ⶠ0, 1â â 0, 1 are computable functions. For each one of thefollowing functions đ» , either prove that đ» is necessarily computable orgive an example of a pair đč and đș of computable functions such thatđ» will not be computable. Prove your assertions.
1. đ»(đ„) = 1 iff đč(đ„) = 1 OR đș(đ„) = 1.2. đ»(đ„) = 1 iff there exist two nonempty strings đą, đŁ â 0, 1â such
that đ„ = đąđŁ (i.e., đ„ is the concatenation of đą and đŁ), đč(đą) = 1 andđș(đŁ) = 1.3. đ»(đ„) = 1 iff there exist a list đą0, ⊠, đąđĄâ1 of non empty strings such
that stringsđč(đąđ) = 1 for every đ â [đĄ] and đ„ = đą0đą1 ⯠đąđĄâ1.
universality and uncomputability 353
3 Hint: You can use Riceâs Theorem.
4 Hint: While it cannot be applied directly, with alittle âmassagingâ you can prove this using RiceâsTheorem.
4. đ»(đ„) = 1 iff đ„ is a valid string representation of a NAND++program đ such that for every đ§ â 0, 1â, on input đ§ the programđ outputs đč(đ§).
5. đ»(đ„) = 1 iff đ„ is a valid string representation of a NAND++program đ such that on input đ„ the program đ outputs đč(đ„).
6. đ»(đ„) = 1 iff đ„ is a valid string representation of a NAND++program đ such that on input đ„, đ outputs đč(đ„) after executing atmost 100 â |đ„|2 lines.
Exercise 9.5 Prove that the following function FINITE ⶠ0, 1â â 0, 1is uncomputable. On input đ â 0, 1â, we define FINITE(đ ) = 1if and only if đ is a string that represents a NAND++ program suchthat there only a finite number of inputs đ„ â 0, 1â s.t. đ(đ„) = 1.3
Exercise 9.6 â Computing parity. Prove Theorem 9.13 without using RiceâsTheorem.
Exercise 9.7 â TM Equivalence. Let EQ ⶠ0, 1â â¶â 0, 1 be the func-tion defined as follows: given a string representing a pair (đ, đ âČ)of Turing machines, EQ(đ, đ âČ) = 1 iff đ and đ âČ are functionallyequivalent as per Definition 9.14. Prove that EQ is uncomputable.
Note that you cannot use Riceâs Theorem directly, as this theoremonly deals with functions that take a single Turing machine as input,and EQ takes two machines.
Exercise 9.8 For each of the following two functions, say whether it iscomputable or not:
1. Given a NAND-TM program đ , an input đ„, and a number đ, whenwe run đ on đ„, does the index variable i ever reach đ?
2. Given a NAND-TM program đ , an input đ„, and a number đ, whenwe run đ on đ„, does đ ever write to an array at index đ?
Exercise 9.9 Let đč ⶠ0, 1â â 0, 1 be the function that is defined asfollows. On input a string đ that represents a NAND-RAM programand a String đ that represents a Turing machine, đč(đ , đ) = 1 if andonly if there exists some input đ„ such đ halts on đ„ but đ does not halton đ„. Prove that đč is uncomputable. See footnote for hint.4
354 introduction to theoretical computer science
5 HALT has this property.
6 You can either use the diagonalization method toprove this directly or show that the set of all recur-sively enumerable functions is countable.
7 HALT has this property: show that if both HALTand đ»đŽđżđ were recursively enumerable then HALTwould be in fact computable.
8 Show that any đș satisfying (b) must be semantic.
Exercise 9.10 â Recursively enumerable. Define a function đč ⶠ0, 1â â¶â0, 1 to be recursively enumerable if there exists a Turing machine đsuch that such that for every đ„ â 0, 1â, if đč(đ„) = 1 then đ(đ„) = 1,and if đč(đ„) = 0 then đ(đ„) = â„. (i.e., if đč(đ„) = 0 then đ does not halton đ„.)1. Prove that every computable đč is also recursively enumerable.
2. Prove that there exists đč that is not computable but is recursivelyenumerable. See footnote for hint.5
3. Prove that there exists a function đč ⶠ0, 1â â 0, 1 such that đč isnot recursively enumerable. See footnote for hint.6
4. Prove that there exists a function đč ⶠ0, 1â â 0, 1 such thatđč is recursively enumerable but the function đč defined as đč(đ„) =1 â đč(đ„) is not recursively enumerable. See footnote for hint.7
Exercise 9.11 â Riceâs Theorem: standard form. In this exercise we willprove Riceâs Theorem in the form that it is typically stated in the litera-ture.
For a Turing machine đ , define đż(đ) â 0, 1â to be the set of allđ„ â 0, 1â such that đ halts on the input đ„ and outputs 1. (The setđż(đ) is known in the literature as the language recognized by đ . Notethat đ might either output a value other than 1 or not halt at all oninputs đ„ â đż(đ). )1. Prove that for every Turing Machine đ , if we define đčđ ⶠ0, 1â â0, 1 to be the function such that đčđ(đ„) = 1 iff đ„ â đż(đ) then đčđ
is recursively enumerable as defined in Exercise 9.10.
2. Use Theorem 9.15 to prove that for every đș ⶠ0, 1â â 0, 1, if (a)đș is neither the constant zero nor the constant one function, and(b) for every đ, đ âČ such that đż(đ) = đż(đ âČ), đș(đ) = đș(đ âČ),then đș is uncomputable. See footnote for hint.8
Exercise 9.12 â Riceâs Theorem for general Turing-equivalent models (optional).
Let â± be the set of all partial functions from 0, 1â to 0, 1 and âł â¶0, 1â â â± be a Turing-equivalent model as defined in Definition 8.5.We define a function đč ⶠ0, 1â â 0, 1 to be âł-semantic if thereexists some đą ⶠⱠâ 0, 1 such that đč(đ) = đą(âł(đ)) for everyđ â 0, 1â.
Prove that for every âł-semantic đč ⶠ0, 1â â 0, 1 that is neitherthe constant one nor the constant zero function, đč is uncomputable.
universality and uncomputability 355
9 You will not need to use very specific propertiesof the TOWER function in this exercise. For exam-ple, NBB(đ) also grows faster than the Ackermanfunction.
Exercise 9.13 â Busy Beaver. In this question we define the NAND-TM variant of the busy beaver function (see Aaronsonâs 1999 essay,2017 blog post and 2020 survey [Aar20]; see also Taoâs highly recom-mended presentation on how civilizationâs scientific progress can bemeasured by the quantities we can grasp).
1. Let đđ”đ” ⶠ0, 1â â â be defined as follows. For every string đ â0, 1â, if đ represents a NAND-TM program such that when đ isexecuted on the input 0 then it halts within đ steps then đđ”đ”(đ ) =đ . Otherwise (if đ does not represent a NAND-TM program, or itis a program that does not halt on 0), đđ”đ”(đ ) = 0. Prove that đđ”đ”is uncomputable.
2. Let TOWER(đ) denote the number 2â â â 2âđ times(that is, a âtower of pow-
ers of twoâ of height đ). To get a sense of how fast this functiongrows, TOWER(1) = 2, TOWER(2) = 22 = 4, TOWER(3) = 222 =16, TOWER(4) = 216 = 65536 and TOWER(5) = 265536 whichis about 1020000. TOWER(6) is already a number that is too big towrite even in scientific notation. Define NBB ⶠâ â â (for âNAND-TM Busy Beaverâ) to be the function NBB(đ) = maxđâ0,1đ đđ”đ”(đ )where đđ”đ” is as defined in Question 6.1. Prove that NBB growsfaster than TOWER, in the sense that TOWER(đ) = đ(NBB(đ)). Seefootnote for hint9
9.7 BIBLIOGRAPHICAL NOTES
The cartoon of the Halting problem in Fig. 9.1 and taken from CharlesCooperâs website.
Section 7.2 in [MM11] gives a highly recommended overview ofuncomputability. Gödel, Escher, Bach [Hof99] is a classic popularscience book that touches on uncomputability, and unprovability, andspecifically Gödelâs Theorem that we will see in Chapter 11. See alsothe recent book by Holt [Hol18].
The history of the definition of a function is intertwined with thedevelopment of mathematics as a field. For many years, a functionwas identified (as per Eulerâs quote above) with the means to calcu-late the output from the input. In the 1800âs, with the invention ofthe Fourier series and with the systematic study of continuity anddifferentiability, people have started looking at more general kinds offunctions, but the modern definition of a function as an arbitrary map-ping was not yet universally accepted. For example, in 1899 Poincarewrote âwe have seen a mass of bizarre functions which appear to be forcedto resemble as little as possible honest functions which serve some purpose.
356 introduction to theoretical computer science
⊠they are invented on purpose to show that our ancestorâs reasoning was atfault, and we shall never get anything more than that out of themâ. Some ofthis fascinating history is discussed in [Gra83; Kle91; LĂŒt02; Gra05].
The existence of a universal Turing machine, and the uncomputabil-ity of HALT was first shown by Turing in his seminal paper [Tur37],though closely related results were shown by Church a year before.These works built on Gödelâs 1931 incompleteness theorem that we willdiscuss in Chapter 11.
Some universal Turing Machines with a small alphabet and numberof states are given in [Rog96], including a single-tape universal Turingmachine with the binary alphabet and with less than 25 states; seealso the survey [WN09]. Adam Yedidia has written software to helpin producing Turing machines with a small number of states. This isrelated to the recreational pastime of âCode Golfingâ which is aboutsolving a certain computational task using the as short as possibleprogram. Finding âhighly complexâ small Turing machine is alsorelated to the âBusy Beaverâ problem, see Exercise 9.13 and the survey[Aar20].
The diagonalization argument used to prove uncomputability of đč âis derived from Cantorâs argument for the uncountability of the realsdiscussed in Chapter 2.
Christopher Strachey was an English computer scientist and theinventor of the CPL programming language. He was also an earlyartificial intelligence visionary, programming a computer to playCheckers and even write love letters in the early 1950âs, see this NewYorker article and this website.
Riceâs Theorem was proven in [Ric53]. It is typically stated in aform somewhat different than what we used, see Exercise 9.11.
We do not discuss in the chapter the concept of recursively enumer-able languages, but it is covered briefly in Exercise 9.10. As usual, weuse function, as opposed to language, notation.
The cartoon of the Halting problem in Fig. 9.1 is copyright 2019Charles F. Cooper.
10Restricted computational models
âHappy families are all alike; every unhappy family is unhappy in its ownwayâ, Leo Tolstoy (opening of the book âAnna Kareninaâ).
We have seen that many models of computation are Turing equiva-lent, including Turing machines, NAND-TM/NAND-RAM programs,standard programming languages such as C/Python/Javascript, aswell as other models such as the đ calculus and even the game of life.The flip side of this is that for all these models, Riceâs theorem (The-orem 9.15) holds as well, which means that any semantic property ofprograms in such a model is uncomputable.
The uncomputability of halting and other semantic specificationproblems for Turing equivalent models motivates restricted com-putational models that are (a) powerful enough to capture a set offunctions useful for certain applications but (b) weak enough that wecan still solve semantic specification problems on them. In this chapterwe discuss several such examples.
Big Idea 14 We can use restricted computational models to bypasslimitations such as uncomputability of the Halting problem and RiceâsTheorem. Such models can compute only a restricted subclass offunctions, but allow to answer at least some semantic questions onprograms.
10.1 TURING COMPLETENESS AS A BUG
We have seen that seemingly simple computational models or sys-tems can turn out to be Turing complete. The following webpage listsseveral examples of formalisms that âaccidentallyâ turned out to Tur-ing complete, including supposedly limited languages such as the Cpreprocessor, CSS, (certain variants of) SQL, sendmail configuration,as well as games such as Minecraft, Super Mario, and the card game
Compiled on 8.26.2020 18:10
Learning Objectives:âą See that Turing completeness is not always a
good thing.âą Another example of an always-halting
formalism: context-free grammars and simplytyped đ calculus.
âą The pumping lemma for non context-freefunctions.
âą Examples of computable and uncomputablesemantic properties of regular expressions andcontext-free grammars.
358 introduction to theoretical computer science
Figure 10.1: Some restricted computational models.We have already seen two equivalent restrictedmodels of computation: regular expressions anddeterministic finite automata. We show a morepowerful model: context-free grammars. We alsopresent tools to demonstrate that some functions cannot be computed in these models.
âMagic: The Gatheringâ. Turing completeness is not always a goodthing, as it means that such formalisms can give rise to arbitrarilycomplex behavior. For example, the postscript format (a precursor ofPDF) is a Turing-complete programming language meant to describedocuments for printing. The expressive power of postscript can allowfor short descriptions of very complex images, but it also gave rise tosome nasty surprises, such as the attacks described in this page rang-ing from using infinite loops as a denial of service attack, to accessingthe printerâs file system.
Example 10.1 â The DAO Hack. An interesting recent example of thepitfalls of Turing-completeness arose in the context of the cryp-tocurrency Ethereum. The distinguishing feature of this currencyis the ability to design âsmart contractsâ using an expressive (andin particular Turing-complete) programming language. In ourcurrent âhuman operatedâ economy, Alice and Bob might sign acontract to agree that if condition X happens then they will jointlyinvest in Charlieâs company. Ethereum allows Alice and Bob tocreate a joint venture where Alice and Bob pool their funds to-gether into an account that will be governed by some program đthat decides under what conditions it disburses funds from it. Forexample, one could imagine a piece of code that interacts betweenAlice, Bob, and some program running on Bobâs car that allowsAlice to rent out Bobâs car without any human intervention oroverhead.
Specifically Ethereum uses the Turing-complete programminglanguage solidity which has a syntax similar to JavaScript. Theflagship of Ethereum was an experiment known as The âDecen-tralized Autonomous Organizationâ or The DAO. The idea wasto create a smart contract that would create an autonomously rundecentralized venture capital fund, without human managers,where shareholders could decide on investment opportunities. The
restricted computational models 359
DAO was at the time the biggest crowdfunding success in history.At its height the DAO was worth 150 million dollars, which wasmore than ten percent of the total Ethereum market. Investing inthe DAO (or entering any other âsmart contractâ) amounts to pro-viding your funds to be run by a computer program. i.e., âcodeis lawâ, or to use the words the DAO described itself: âThe DAOis borne from immutable, unstoppable, and irrefutable computer codeâ.Unfortunately, it turns out that (as we saw in Chapter 9) under-standing the behavior of computer programs is quite a hard thingto do. A hacker (or perhaps, some would say, a savvy investor)was able to fashion an input that caused the DAO code to enterinto an infinite recursive loop in which it continuously transferredfunds into the hackerâs account, thereby cleaning out about 60 mil-lion dollars out of the DAO. While this transaction was âlegalâ inthe sense that it complied with the code of the smart contract, itwas obviously not what the humans who wrote this code had inmind. The Ethereum community struggled with the response tothis attack. Some tried the âRobin Hoodâ approach of using thesame loophole to drain the DAO funds into a secure account, butit only had limited success. Eventually, the Ethereum communitydecided that the code can be mutable, stoppable, and refutable.Specifically, the Ethereum maintainers and miners agreed on aâhard forkâ (also known as a âbailoutâ) to revert history to be-fore the hackerâs transaction occurred. Some community membersstrongly opposed this decision, and so an alternative currencycalled Ethereum Classic was created that preserved the originalhistory.
10.2 CONTEXT FREE GRAMMARS
If you have ever written a program, youâve experienced a syntax error.You probably also had the experience of your program entering intoan infinite loop. What is less likely is that the compiler or interpreterentered an infinite loop while trying to figure out if your program hasa syntax error.
When a person designs a programming language, they need todetermine its syntax. That is, the designer decides which strings corre-sponds to valid programs, and which ones do not (i.e., which stringscontain a syntax error). To ensure that a compiler or interpreter al-ways halts when checking for syntax errors, language designers typi-cally do not use a general Turing-complete mechanism to express theirsyntax. Rather they use a restricted computational model. One of themost popular choices for such models is context free grammars.
360 introduction to theoretical computer science
To explain context free grammars, let us begin with a canonical ex-ample. Consider the function ARITH ⶠΣâ â 0, 1 that takes as inputa string đ„ over the alphabet ÎŁ = (, ), +, â, Ă, Ă·, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9and returns 1 if and only if the string đ„ represents a valid arithmeticexpression. Intuitively, we build expressions by applying an opera-tion such as +,â,Ă or Ă· to smaller expressions, or enclosing them inparenthesis, where the âbase caseâ corresponds to expressions that aresimply numbers. More precisely, we can make the following defini-tions:
âą A digit is one of the symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.âą A number is a sequence of digits. (For simplicity we drop the condi-
tion that the sequence does not have a leading zero, though it is nothard to encode it in a context-free grammar as well.)
âą An operation is one of +, â, Ă, Ă·âą An expression has either the form ânumberâ, the form âsub-
expression1 operation sub-expression2â, or the form â(sub-expression1)â, where âsub-expression1â and âsub-expression2â arethemselves expressions. (Note that this is a recursive definition.)
A context free grammar (CFG) is a formal way of specifying suchconditions. A CFG consists of a set of rules that tell us how to generatestrings from smaller components. In the above example, one of therules is âif đđ„đ1 and đđ„đ2 are valid expressions, then đđ„đ1 Ă đđ„đ2 isalso a valid expressionâ; we can also write this rule using the short-hand đđ„đđđđ đ đđđ â đđ„đđđđ đ đđđ Ă đđ„đđđđ đ đđđ. As in the above ex-ample, the rules of a context-free grammar are often recursive: the ruleđđ„đđđđ đ đđđ â đđ„đđđđ đ đđđ Ă đđ„đđđđ đ đđđ defines valid expressions interms of itself. We now formally define context-free grammars:
Definition 10.2 â Context Free Grammar. Let ÎŁ be some finite set. Acontext free grammar (CFG) over ÎŁ is a triple (đ , đ , đ ) such that:
âą đ , known as the variables, is a set disjoint from ÎŁ.
âą đ â đ is known as the initial variable.
âą đ is a set of rules. Each rule is a pair (đŁ, đ§) with đŁ â đ andđ§ â (ÎŁ âȘ đ )â. We often write the rule (đŁ, đ§) as đŁ â đ§ and say thatthe string đ§ can be derived from the variable đŁ.
restricted computational models 361
Example 10.3 â Context free grammar for arithmetic expressions. Theexample above of well-formed arithmetic expressions can be cap-tured formally by the following context free grammar:
âą The alphabet ÎŁ is (, ), +, â, Ă, Ă·, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9âą The variables are đ = đđ„đđđđ đ đđđ , đđąđđđđ , đđđđđĄ , đđđđđđĄđđđ.âą The rules are the set đ containing the following 19 rules:
â The 4 rules đđđđđđĄđđđ â +, đđđđđđĄđđđ â â, đđđđđđĄđđđ â Ă,and đđđđđđĄđđđ â Ă·.
â The 10 rules đđđđđĄ â 0,âŠ, đđđđđĄ â 9.â The rule đđąđđđđ â đđđđđĄ.â The rule đđąđđđđ â đđđđđĄ đđąđđđđ.â The rule đđ„đđđđ đ đđđ â đđąđđđđ.â The rule đđ„đđđđ đ đđđ â đđ„đđđđ đ đđđ đđđđđđĄđđđ đđ„đđđđ đ đđđ.â The rule đđ„đđđđ đ đđđ â (đđ„đđđđ đ đđđ).
âą The starting variable is đđ„đđđđ đ đđđPeople use many different notations to write context free grammars.
One of the most common notations is the BackusâNaur form. In thisnotation we write a rule of the form đŁ â đ (where đŁ is a variable and đis a string) in the form <v> := a. If we have several rules of the formđŁ ⊠đ, đŁ ⊠đ, and đŁ ⊠đ then we can combine them as <v> := a|b|c.(In words we say that đŁ can derive either đ, đ, or đ.) For example, theBackus-Naur description for the context free grammar of Example 10.3is the following (using ASCII equivalents for operations):
operation := +|-|*|/digit := 0|1|2|3|4|5|6|7|8|9number := digit|digit numberexpression := number|expression operation
expression|(expression)Another example of a context free grammar is the âmatching paren-
thesisâ grammar, which can be represented in Backus-Naur as fol-lows:
match := ""|match match|(match)
A string over the alphabet (,) can be generated from this gram-mar (where match is the starting expression and "" corresponds to theempty string) if and only if it consists of a matching set of parenthesis.
362 introduction to theoretical computer science
1 As in the case of Definition 6.7 we can also uselanguage rather than function notation and say that alanguage đż â ÎŁâ is context free if the function đč suchthat đč(đ„) = 1 iff đ„ â đż is context free.
In contrast, by Lemma 6.19 there is no regular expression that matchesa string đ„ if and only if đ„ contains a valid sequence of matching paren-thesis.
10.2.1 Context-free grammars as a computational modelWe can think of a context-free grammar over the alphabet ÎŁ as defin-ing a function that maps every string đ„ in ÎŁâ to 1 or 0 depending onwhether đ„ can be generated by the rules of the grammars. We nowmake this definition formally.
Definition 10.4 â Deriving a string from a grammar. If đș = (đ , đ , đ ) is acontext-free grammar over ÎŁ, then for two strings đŒ, đœ â (ÎŁ âȘ đ )âwe say that đœ can be derived in one step from đŒ, denoted by đŒ âđș đœ,if we can obtain đœ from đŒ by applying one of the rules of đș. That is,we obtain đœ by replacing in đŒ one occurrence of the variable đŁ withthe string đ§, where đŁ â đ§ is a rule of đș.
We say that đœ can be derived from đŒ, denoted by đŒ ââđș đœ, if itcan be derived by some finite number đ of steps. That is, if thereare đŒ1, ⊠, đŒđâ1 â (ÎŁ âȘ đ )â, so that đŒ âđș đŒ1 âđș đŒ2 âđș ⯠âđșđŒđâ1 âđș đœ.
We say that đ„ â ÎŁâ is matched by đș = (đ , đ , đ ) if đ„ can be de-rived from the starting variable đ (i.e., if đ ââđș đ„). We define thefunction computed by (đ , đ , đ ) to be the map Ίđ ,đ ,đ ⶠΣâ â 0, 1such that Ίđ ,đ ,đ (đ„) = 1 iff đ„ is matched by (đ , đ , đ ). A functionđč ⶠΣâ â 0, 1 is context free if đč = Ίđ ,đ ,đ for some CFG (đ , đ , đ ).1
A priori it might not be clear that the map Ίđ ,đ ,đ is computable,but it turns out that this is the case.
Theorem 10.5 â Context-free grammars always halt. For every CFG(đ , đ , đ ) over 0, 1, the function Ίđ ,đ ,đ ⶠ0, 1â â 0, 1 iscomputable.
As usual we restrict attention to grammars over 0, 1 although theproof extends to any finite alphabet ÎŁ.
Proof. We only sketch the proof. We start with the observation we canconvert every CFG to an equivalent version of Chomsky normal form,where all rules either have the form đą â đŁđ€ for variables đą, đŁ, đ€ or theform đą â đ for a variable đą and symbol đ â ÎŁ, plus potentially therule đ â "" where đ is the starting variable.
The idea behind such a transformation is to simply add new vari-ables as needed, and so for example we can translate a rule such asđŁ â đąđđ€ into the three rules đŁ â đąđ, đ â đĄđ€ and đĄ â đ.
restricted computational models 363
Using the Chomsky Normal form we get a natural recursive algo-rithm for computing whether đ ââđș đ„ for a given grammar đș andstring đ„. We simply try all possible guesses for the first rule đ â đąđŁthat is used in such a derivation, and then all possible ways to par-tition đ„ as a concatenation đ„ = đ„âČđ„âł. If we guessed the rule and thepartition correctly, then this reduces our task to checking whetherđą ââđș đ„âČ and đŁ ââđș đ„âł, which (as it involves shorter strings) canbe done recursively. The base cases are when đ„ is empty or a singlesymbol, and can be easily handled.
RRemark 10.6 â Parse trees. While we focus on thetask of deciding whether a CFG matches a string, thealgorithm to compute Ίđ ,đ ,đ actually gives more in-formation than that. That is, on input a string đ„, ifΊđ ,đ ,đ (đ„) = 1 then the algorithm yields the sequenceof rules that one can apply from the starting vertex đ to obtain the final string đ„. We can think of these rulesas determining a tree with đ being the root vertex andthe sinks (or leaves) corresponding to the substringsof đ„ that are obtained by the rules that do not have avariable in their second element. This tree is knownas the parse tree of đ„, and often yields very usefulinformation about the structure of đ„.Often the first step in a compiler or interpreter for aprogramming language is a parser that transforms thesource into the parse tree (also known as the abstractsyntax tree). There are also tools that can automati-cally convert a description of a context-free grammarsinto a parser algorithm that computes the parse tree ofa given string. (Indeed, the above recursive algorithmcan be used to achieve this, but there are much moreefficient versions, especially for grammars that haveparticular forms, and programming language design-ers often try to ensure their languages have these moreefficient grammars.)
10.2.2 The power of context free grammarsContext free grammars can capture every regular expression:
Theorem 10.7 â Context free grammars and regular expressions. Let đ be aregular expression over 0, 1, then there is a CFG (đ , đ , đ ) over0, 1 such that Ίđ ,đ ,đ = Ίđ.
Proof. We prove the theorem by induction on the length of đ. If đ isan expression of one bit length, then đ = 0 or đ = 1, in which casewe leave it to the reader to verify that there is a (trivial) CFG that
364 introduction to theoretical computer science
computes it. Otherwise, we fall into one of the following case: case1: đ = đâČđâł, case 2: đ = đâČ|đâł or case 3: đ = (đâČ)â where in all casesđâČ, đâł are shorter regular expressions. By the induction hypothesishave grammars (đ âČ, đ âČ, đ âČ) and (đ âł, đ âł, đ âł) that compute ΊđâČ and Ίđâłrespectively. By renaming of variables, we can also assume withoutloss of generality that đ âČ and đ âł are disjoint.
In case 1, we can define the new grammar as follows: we add a newstarting variable đ â đ âȘ đ âČ and the rule đ ⊠đ âČđ âł. In case 2, we candefine the new grammar as follows: we add a new starting variableđ â đ âȘ đ âČ and the rules đ ⊠đ âČ and đ ⊠đ âł. Case 3 will be theonly one that uses recursion. As before we add a new starting variableđ â đ âȘ đ âČ, but now add the rules đ ⊠"" (i.e., the empty string) andalso add, for every rule of the form (đ âČ, đŒ) â đ âČ, the rule đ ⊠đ đŒ to đ .
We leave it to the reader as (a very good!) exercise to verify that inall three cases the grammars we produce capture the same function asthe original expression.
It turns out that CFGâs are strictly more powerful than regularexpressions. In particular, as weâve seen, the âmatching parenthesisâfunction MATCHPAREN can be computed by a context free grammar,whereas, as shown in Lemma 6.19, it cannot be computed by regularexpressions. Here is another example:Solved Exercise 10.1 â Context free grammar for palindromes. Let PAL â¶0, 1, ; â â 0, 1 be the function defined in Solved Exercise 6.4 wherePAL(đ€) = 1 iff đ€ has the form đą; đąđ . Then PAL can be computed by acontext-free grammar
Solution:
A simple grammar computing PAL can be described usingBackusâNaur notation:
start := ; | 0 start 0 | 1 start 1
One can prove by induction that this grammar generates exactlythe strings đ€ such that PAL(đ€) = 1.
A more interesting example is computing the strings of the formđą; đŁ that are not palindromes:Solved Exercise 10.2 â Non palindromes. Prove that there is a context freegrammar that computes NPAL ⶠ0, 1, ; â â 0, 1 where NPAL(đ€) =1 if đ€ = đą; đŁ but đŁ â đąđ .
restricted computational models 365
Solution:
Using BackusâNaur notation we can describe such a grammar asfollows
palindrome := ; | 0 palindrome 0 | 1 palindrome 1different := 0 palindrome 1 | 1 palindrome 0start := different | 0 start | 1 start | start
0 | start 1In words, this means that we can characterize a string đ€ such
that NPAL(đ€) = 1 as having the following formđ€ = đŒđđą; đąđ đâČđœ (10.1)where đŒ, đœ, đą are arbitrary strings and đ â đâČ. Hence we can
generate such a string by first generating a palindrome đą; đąđ (palindrome variable), then adding 0 on either the left or right and1 on the opposite side to get something that is not a palindrome(different variable), and then we can add arbitrary number of 0âsand 1âs on either end (the start variable).
10.2.3 Limitations of context-free grammars (optional)Even though context-free grammars are more powerful than regularexpressions, there are some simple languages that are not capturedby context free grammars. One tool to show this is the context-freegrammar analog of the âpumping lemmaâ (Theorem 6.20):
Theorem 10.8 â Context-free pumping lemma. Let (đ , đ , đ ) be a CFGover ÎŁ, then there is some numbers đ0, đ1 â â such that for everyđ„ â ÎŁâ with |đ„| > đ0, if Ίđ ,đ ,đ (đ„) = 1 then đ„ = đđđđđ such that|đ| + |đ| + |đ| †đ1, |đ| + |đ| â„ 1, and Ίđ ,đ ,đ (đđđđđđđ) = 1 for everyđ â â.
PThe context-free pumping lemma is even more cum-bersome to state than its regular analog, but you canremember it as saying the following: âIf a long enoughstring is matched by a grammar, there must be a variablethat is repeated in the derivation.â
Proof of Theorem 10.8. We only sketch the proof. The idea is that ifthe total number of symbols in the rules of the grammar is đ0, thenthe only way to get |đ„| > đ0 with Ίđ ,đ ,đ (đ„) = 1 is to use recursion.That is, there must be some variable đŁ â đ such that we are able to
366 introduction to theoretical computer science
derive from đŁ the value đđŁđ for some strings đ, đ â ÎŁâ, and then furtheron derive from đŁ some string đ â ÎŁâ such that đđđ is a substring ofđ„ (in other words, đ„ = đđđđđ for some đ, đ â 0, 1â). If we takethe variable đŁ satisfying this requirement with a minimum numberof derivation steps, then we can ensure that |đđđ| is at most someconstant depending on đ0 and we can set đ1 to be that constant (đ1 =10 â |đ | â đ0 will do, since we will not need more than |đ | applicationsof rules, and each such application can grow the string by at most đ0symbols).
Thus by the definition of the grammar, we can repeat the derivationto replace the substring đđđ in đ„ with đđđđđ for every đ â â whileretaining the property that the output of Ίđ ,đ ,đ is still one. Since đđđis a substring of đ„, we can write đ„ = đđđđđ and are guaranteed thatđđđđđđđ is matched by the grammar for every đ.
Using Theorem 10.8 one can show that even the simple functionđč ⶠ0, 1â â 0, 1 defined as follows:đč(đ„) = â§âšâ©1 đ„ = đ€đ€ for some đ€ â 0, 1â0 otherwise(10.2)
is not context free. (In contrast, the function đș ⶠ0, 1â â 0, 1defined as đș(đ„) = 1 iff đ„ = đ€0đ€1 ⯠đ€đâ1đ€đâ1đ€đâ2 ⯠đ€0 for someđ€ â 0, 1â and đ = |đ€| is context free, can you see why?.)Solved Exercise 10.3 â Equality is not context-free. Let EQ ⶠ0, 1, ; â â0, 1 be the function such that EQ(đ„) = 1 if and only if đ„ = đą; đą forsome đą â 0, 1â. Then EQ is not context free.
Solution:
We use the context-free pumping lemma. Suppose towards thesake of contradiction that there is a grammar đș that computes EQ,and let đ0 be the constant obtained from Theorem 10.8.
Consider the string đ„ = 1đ00đ0 ; 1đ00đ0 , and write it as đ„ = đđđđđas per Theorem 10.8, with |đđđ| †đ0 and with |đ| + |đ| â„ 1. By The-orem 10.8, it should hold that EQ(đđđ) = 1. However, by case anal-ysis this can be shown to be a contradiction.
Firstly, unless đ is on the left side of the ; separator and đ is onthe right side, dropping đ and đ will definitely make the two partsdifferent. But if it is the case that đ is on the left side and đ is on theright side, then by the condition that |đđđ| †đ0 we know that đ is astring of only zeros and đ is a string of only ones. If we drop đ andđ then since one of them is non empty, we get that there are either
restricted computational models 367
less zeroes on the left side than on the right side, or there are lessones on the right side than on the left side. In either case, we getthat EQ(đđđ) = 0, obtaining the desired contradiction.
10.3 SEMANTIC PROPERTIES OF CONTEXT FREE LANGUAGES
As in the case of regular expressions, the limitations of context freegrammars do provide some advantages. For example, emptiness ofcontext free grammars is decidable:
Theorem 10.9 â Emptiness for CFGâs is decidable. There is an algorithmthat on input a context-free grammar đș, outputs 1 if and only if Ίđșis the constant zero function.
Proof Idea:
The proof is easier to see if we transform the grammar to ChomskyNormal Form as in Theorem 10.5. Given a grammar đș, we can recur-sively define a non-terminal variable đŁ to be non empty if there is eithera rule of the form đŁ â đ, or there is a rule of the form đŁ â đąđ€ whereboth đą and đ€ are non empty. Then the grammar is non empty if andonly if the starting variable đ is non-empty.âProof of Theorem 10.9. We assume that the grammar đș in ChomskyNormal Form as in Theorem 10.5. We consider the following proce-dure for marking variables as ânon emptyâ:
1. We start by marking all variables đŁ that are involved in a rule of theform đŁ â đ as non empty.
2. We then continue to mark đŁ as non empty if it is involved in a ruleof the form đŁ â đąđ€ where đą, đ€ have been marked before.
We continue this way until we cannot mark any more variables. Wethen declare that the grammar is empty if and only if đ has not beenmarked. To see why this is a valid algorithm, note that if a variable đŁhas been marked as ânon emptyâ then there is some string đŒ â ÎŁâ thatcan be derived from đŁ. On the other hand, if đŁ has not been marked,then every sequence of derivations from đŁ will always have a variablethat has not been replaced by alphabet symbols. Hence in particularΊđș is the all zero function if and only if the starting variable đ is notmarked ânon emptyâ.
368 introduction to theoretical computer science
10.3.1 Uncomputability of context-free grammar equivalence (optional)By analogy to regular expressions, one might have hoped to get analgorithm for deciding whether two given context free grammarsare equivalent. Alas, no such luck. It turns out that the equivalenceproblem for context free grammars is uncomputable. This is a directcorollary of the following theorem:
Theorem 10.10 â Fullness of CFGâs is uncomputable. For every set ÎŁ, letCFGFULLÎŁ be the function that on input a context-free grammar đșover ÎŁ, outputs 1 if and only if đș computes the constant 1 function.Then there is some finite ÎŁ such that CFGFULLÎŁ is uncomputable.
Theorem 10.10 immediately implies that equivalence for context-free grammars is uncomputable, since computing âfullnessâ of agrammar đș over some alphabet ÎŁ = đ0, ⊠, đđâ1 corresponds tochecking whether đș is equivalent to the grammar đ â ""|đ đ0| ⯠|đ đđâ1.Note that Theorem 10.10 and Theorem 10.9 together imply thatcontext-free grammars, unlike regular expressions, are not closedunder complement. (Can you see why?) Since we can encode everyelement of ÎŁ using âlog |ÎŁ|â bits (and this finite encoding can be easilycarried out within a grammar) Theorem 10.10 implies that fullness isalso uncomputable for grammars over the binary alphabet.
Proof Idea:
We prove the theorem by reducing from the Halting problem. Todo that we use the notion of configurations of NAND-TM programs, asdefined in Definition 8.8. Recall that a configuration of a program đ is abinary string đ that encodes all the information about the program inthe current iteration.
We define ÎŁ to be 0, 1 plus some separator characters and defineINVALIDđ ⶠΣâ â 0, 1 to be the function that maps every string đż âÎŁâ to 1 if and only if đż does not encode a sequence of configurationsthat correspond to a valid halting history of the computation of đ onthe empty input.
The heart of the proof is to show that INVALIDđ is context-free.Once we do that, we see that đ halts on the empty input if and only ifINVALIDđ (đż) = 1 for every đż. To show that, we will encode the listin a special way that makes it amenable to deciding via a context-freegrammar. Specifically we will reverse all the odd-numbered strings.âProof of Theorem 10.10. We only sketch the proof. We will show that ifwe can compute CFGFULL then we can solve HALTONZERO, whichhas been proven uncomputable in Theorem 9.9. Let đ be an input
restricted computational models 369
Turing machine for HALTONZERO. We will use the notion of configu-rations of a Turing machine, as defined in Definition 8.8.
Recall that a configuration of Turing machine đ and input đ„ cap-tures the full state of đ at some point of the computation. The partic-ular details of configurations are not so important, but what you needto remember is that:
âą A configuration can be encoded by a binary string đ â 0, 1â.âą The initial configuration of đ on the input 0 is some fixed string.
âą A halting configuration will have the value a certain state (which canbe easily âread offâ from it) set to 1.
âą If đ is a configuration at some step đ of the computation, we denoteby NEXTđ(đ) as the configuration at the next step. NEXTđ(đ) isa string that agrees with đ on all but a constant number of coor-dinates (those encoding the position corresponding to the headposition and the two adjacent ones). On those coordinates, thevalue of NEXTđ(đ) can be computed by some finite function.
We will let the alphabet ÎŁ = 0, 1 âȘ â, #. A computation his-tory of đ on the input 0 is a string đż â ÎŁ that corresponds to a listâđ0#đ1âđ2#đ3 ⯠đđĄâ2âđđĄâ1# (i.e., â comes before an even numberedblock, and # comes before an odd numbered one) such that if đ iseven then đđ is the string encoding the configuration of đ on input 0at the beginning of its đ-th iteration, and if đ is odd then it is the sameexcept the string is reversed. (That is, for odd đ, đđđŁ(đđ) encodes theconfiguration of đ on input 0 at the beginning of its đ-th iteration.)Reversing the odd-numbered blocks is a technical trick to ensure thatthe function INVALIDđ we define below is context free.
We now define INVALIDđ ⶠΣâ â 0, 1 as follows:
INVALIDđ(đż) = â§âšâ©0 đż is a valid computation history of đ on 01 otherwise(10.3)
We will show the following claim:CLAIM: INVALIDđ is context-free.The claim implies the theorem. Since đ halts on 0 if and only if
there exists a valid computation history, INVALIDđ is the constantone function if and only if đ does not halt on 0. In particular, thisallows us to reduce determining whether đ halts on 0 to determiningwhether the grammar đșđ corresponding to INVALIDđ is full.
We now turn to the proof of the claim. We will not show all thedetails, but the main point INVALIDđ(đż) = 1 if at least one of thefollowing three conditions hold:
370 introduction to theoretical computer science
1. đż is not of the right format, i.e. not of the form âšbinary-stringâ©#âšbinary-stringâ©ââšbinary-stringâ©# âŻ.
2. đż contains a substring of the form âđ#đâČâ such thatđâČ â đđđŁ(NEXTđ (đ))3. đż contains a substring of the form #đâđâČ# such thatđâČ â NEXTđ (đđđŁ(đ))
Since context-free functions are closed under the OR operation, theclaim will follow if we show that we can verify conditions 1, 2 and 3via a context-free grammar.
For condition 1 this is very simple: checking that đż is of the correctformat can be done using a regular expression. Since regular expres-sions are closed under negation, this means that checking that đż is notof this format can also be done by a regular expression and hence by acontext-free grammar.
For conditions 2 and 3, this follows via very similar reasoning tothat showing that the function đč such that đč(đą#đŁ) = 1 iff đą â đđđŁ(đŁ)is context-free, see Solved Exercise 10.2. After all, the NEXTđ functiononly modifies its input in a constant number of places. We leave fillingout the details as an exercise to the reader. Since INVALIDđ(đż) = 1if and only if đż satisfies one of the conditions 1., 2. or 3., and all threeconditions can be tested for via a context-free grammar, this completesthe proof of the claim and hence the theorem.
10.4 SUMMARY OF SEMANTIC PROPERTIES FOR REGULAR EX-PRESSIONS AND CONTEXT-FREE GRAMMARS
To summarize, we can often trade expressiveness of the model foramenability to analysis. If we consider computational models that arenot Turing complete, then we are sometimes able to bypass Riceâs The-orem and answer certain semantic questions about programs in suchmodels. Here is a summary of some of what is known about semanticquestions for the different models we have seen.
Table 10.1: Computability of semantic properties
Model Halting Emptiness EquivalenceRegular expressions Computable Computable ComputableContext free grammars Computable Computable UncomputableTuring-complete models UncomputableUncomputable Uncomputable
restricted computational models 371
Chapter Recap
âą The uncomputability of the Halting problem forgeneral models motivates the definition of re-stricted computational models.
âą In some restricted models we can answer semanticquestions such as: does a given program terminate,or do two programs compute the same function?
âą Regular expressions are a restricted model of com-putation that is often useful to capture tasks ofstring matching. We can test efficiently whetheran expression matches a string, as well as answerquestions such as Halting and Equivalence.
âą Context free grammars is a stronger, yet still not Tur-ing complete, model of computation. The haltingproblem for context free grammars is computable,but equivalence is not computable.
10.5 EXERCISES
Exercise 10.1 â Closure properties of context-free functions. Suppose thatđč, đș ⶠ0, 1â â 0, 1 are context free. For each one of the followingdefinitions of the function đ» , either prove that đ» is always contextfree or give a counterexample for regular đč, đș that would make đ» notcontext free.
1. đ»(đ„) = đč(đ„) âš đș(đ„).2. đ»(đ„) = đč(đ„) ⧠đș(đ„)3. đ»(đ„) = NAND(đč(đ„), đș(đ„)).4. đ»(đ„) = đč(đ„đ ) where đ„đ is the reverse of đ„: đ„đ = đ„đâ1đ„đâ2 ⯠đ„đ forđ = |đ„|.5. đ»(đ„) = â§âšâ©1 đ„ = đąđŁ s.t. đč(đą) = đș(đŁ) = 10 otherwise
6. đ»(đ„) = â§âšâ©1 đ„ = đąđą s.t. đč(đą) = đș(đą) = 10 otherwise
7. đ»(đ„) = â§âšâ©1 đ„ = đąđąđ s.t. đč(đą) = đș(đą) = 10 otherwise
Exercise 10.2 Prove that the function đč ⶠ0, 1â â 0, 1 such thatđč(đ„) = 1 if and only if |đ„| is a power of two is not context free.
372 introduction to theoretical computer science
2 Try to see if you can âembedâ in some way a func-tion that looks similar to MATCHPAREN in SYN, soyou can use a similar proof. Of course for a functionto be non-regular, it does not need to utilize literalparentheses symbols.
Exercise 10.3 â Syntax for programming languages. Consider the followingsyntax of a âprogramming languageâ whose source can be writtenusing the ASCII character set:
âą Variables are obtained by a sequence of letters, numbers and under-scores, but canât start with a number.
âą A statement has either the form foo = bar; where foo and bar arevariables, or the form IF (foo) BEGIN ... ENDwhere ... is listof one or more statements, potentially separated by newlines.
A program in our language is simply a sequence of statements (pos-sibly separated by newlines or spaces).
1. Let VAR ⶠ0, 1â â 0, 1 be the function that given a stringđ„ â 0, 1â, outputs 1 if and only if đ„ corresponds to an ASCIIencoding of a valid variable identifier. Prove that VAR is regular.
2. Let SYN ⶠ0, 1â â 0, 1 be the function that given a stringđ â 0, 1â, outputs 1 if and only if đ is an ASCII encoding of a validprogram in our language. Prove that SYN is context free. (You donot have to specify the full formal grammar for SYN, but you needto show that such a grammar exists.)
3. Prove that SYN is not regular. See footnote for hint2
10.6 BIBLIOGRAPHICAL NOTES
As in the case of regular expressions, there are many resources avail-able that cover context-free grammar in great detail. Chapter 2 of[Sip97] contains many examples of context-free grammars and theirproperties. There are also websites such as Grammophone where youcan input grammars, and see what strings they generate, as well assome of the properties that they satisfy.
The adjective âcontext freeâ is used for CFGâs because a rule ofthe form đŁ ⊠đ means that we can always replace đŁ with the stringđ, no matter what is the context in which đŁ appears. More generally,we might want to consider cases where the replacement rules dependon the context. This gives rise to the notion of general (aka âType 0â)grammars that allow rules of the form đ â đ where both đ and đ arestrings over (đ âȘ ÎŁ)â. The idea is that if, for example, we wanted toenforce the condition that we only apply some rule such as đŁ ⊠0đ€1when đŁ is surrounded by three zeroes on both sides, then we could doso by adding a rule of the form 000đŁ000 ⊠0000đ€1000 (and of coursewe can add much more general conditions). Alas, this generality
restricted computational models 373
comes at a cost - general grammars are Turing complete and hencetheir halting problem is uncomputable. That is, there is no algorithmđŽ that can determine for every general grammar đș and a string đ„,whether or not the grammar đș generates đ„.
The Chomsky Hierarchy is a hierarchy of grammars from the leastrestrictive (most powerful) Type 0 grammars, which correspond torecursively enumerable languages (see Exercise 9.10) to the most re-strictive Type 3 grammars, which correspond to regular languages.Context-free languages correspond to Type 2 grammars. Type 1 gram-mars are context sensitive grammars. These are more powerful thancontext-free grammars but still less powerful than Turing machines.In particular functions/languages corresponding to context-sensitivegrammars are always computable, and in fact can be computed by alinear bounded automatons which are non-deterministic algorithmsthat take đ(đ) space. For this reason, the class of functions/languagescorresponding to context-sensitive grammars is also known as thecomplexity class NSPACEđ(đ); we discuss space-bounded com-plexity in Chapter 17). While Riceâs Theorem implies that we cannotcompute any non-trivial semantic property of Type 0 grammars, thesituation is more complex for other types of grammars: some seman-tic properties can be determined and some cannot, depending on thegrammarâs place in the hierarchy.
11Is every theorem provable?
âTake any definite unsolved problem, such as ⊠the existence of an infinitenumber of prime numbers of the form 2đ + 1. However unapproachable theseproblems may seem to us and however helpless we stand before them, we have,nevertheless, the firm conviction that their solution must follow by a finitenumber of purely logical processesâŠâââŠThis conviction of the solvability of every mathematical problem is a pow-erful incentive to the worker. We hear within us the perpetual call: There is theproblem. Seek its solution. You can find it by pure reason, for in mathematicsthere is no ignorabimus.â, David Hilbert, 1900.
âThe meaning of a statement is its method of verification.â, Moritz Schlick,1938 (aka âThe verification principleâ of logical positivism)
The problems shown uncomputable in Chapter 9, while naturaland important, still intimately involved NAND-TM programs or othercomputing mechanisms in their definitions. One could perhaps hopethat as long as we steer clear of functions whose inputs are themselvesprograms, we can avoid the âcurse of uncomputabilityâ. Alas, we haveno such luck.
In this chapter we will see an example of a natural and seeminglyâcomputation freeâ problem that nevertheless turns out to be uncom-putable: solving Diophantine equations. As a corollary, we will seeone of the most striking results of 20th century mathematics: GödelâsIncompleteness Theorem, which showed that there are some mathemat-ical statements (in fact, in number theory) that are inherently unprov-able. We will actually start with the latter result, and then show theformer.
11.1 HILBERTâS PROGRAM AND GĂDELâS INCOMPLETENESSTHEOREM
âAnd what are these âŠvanishing increments? They are neither finite quanti-ties, nor quantities infinitely small, nor yet nothing. May we not call them theghosts of departed quantities?â, George Berkeley, Bishop of Cloyne, 1734.
Compiled on 8.26.2020 18:10
Learning Objectives:âą More examples of uncomputable functions
that are not as tied to computation.âą Gödelâs incompleteness theorem - a result
that shook the world of mathematics in theearly 20th century.
376 introduction to theoretical computer science
Figure 11.1: Outline of the results of this chapter. Oneversion of Gödelâs Incompleteness Theorem is animmediate consequence of the uncomputability of theHalting problem. To obtain the theorem as originallystated (for statements about the integers) we firstprove that the QMS problem of determining truthof quantified statements involving both integers andstrings is uncomputable. We do so using the notion ofTuring Machine configurations but there are alternativeapproaches to do so as well, see Remark 11.14.
The 1700âs and 1800âs were a time of great discoveries in mathe-matics but also of several crises. The discovery of calculus by Newtonand Leibnitz in the late 1600âs ushered a golden age of problem solv-ing. Many longstanding challenges succumbed to the new tools thatwere discovered, and mathematicians got ever better at doing sometruly impressive calculations. However, the rigorous foundationsbehind these calculations left much to be desired. Mathematiciansmanipulated infinitesimal quantities and infinite series cavalierly, andwhile most of the time they ended up with the correct results, therewere a few strange examples (such as trying to calculate the valueof the infinite series 1 â 1 + 1 â 1 + 1 + âŠ) which seemed to giveout different answers depending on the method of calculation. Thisled to a growing sense of unease in the foundations of the subjectwhich was addressed in the works of mathematicians such as Cauchy,Weierstrass, and Riemann, who eventually placed analysis on firmerfoundations, giving rise to the đâs and đżâs that students taking honorscalculus grapple with to this day.
In the beginning of the 20th century, there was an effort to replicatethis effort, in greater rigor, to all parts of mathematics. The hope wasto show that all the true results of mathematics can be obtained bystarting with a number of axioms, and deriving theorems from themusing logical rules of inference. This effort was known as the Hilbertprogram, named after the influential mathematician David Hilbert.
Alas, it turns out the results weâve seen dealt a devastating blow tothis program, as was shown by Kurt Gödel in 1931:
Theorem 11.1 â Gödelâs Incompleteness Theorem: informal version. Forevery sound proof system đ for sufficiently rich mathematicalstatements, there is a mathematical statement that is true but is notprovable in đ .
is every theorem provable? 377
1 This happens to be a false statement.
2 It is unknown whether this statement is true or false.
11.1.1 Defining âProof SystemsâBefore proving Theorem 11.1, we need to define âproof systemsâ andeven formally define the notion of a âmathematical statementâ. Ingeometry and other areas of mathematics, proof systems are oftendefined by starting with some basic assumptions or axioms and thenderiving more statements by using inference rules such as the famousModus Ponens, but what axioms shall we use? What rules? We willuse an extremely general notion of proof systems, not even restrictingourselves to ones that have the form of axioms and inference.
Mathematical statements. At the highest level, a mathematical statementis simply a piece of text, which we can think of as a string đ„ â 0, 1â.Mathematical statements contain assertions whose truth does notdepend on any empirical fact, but rather only on properties of abstractobjects. For example, the following is a mathematical statement:1
âThe number 2,696,635,869,504,783,333,238,805,675,613, 588,278,597,832,162,617,892,474,670,798,113is primeâ.
Mathematical statements do not have to involve numbers. Theycan assert properties of any other mathematical object including sets,strings, functions, graphs and yes, even programs. Thus, another exam-ple of a mathematical statement is the following:2
The following Python function halts on every positive integer n
def f(n):if n==1: return 1return f(3*n+1) if n % 2 else f(n//2)
Proof systems. A proof for a statement đ„ â 0, 1â is another piece oftext đ€ â 0, 1â that certifies the truth of the statement asserted in đ„.The conditions for a valid proof system are:
1. (Effectiveness) Given a statement đ„ and a proof đ€, there is an algo-rithm to verify whether or not đ€ is a valid proof for đ„. (For exam-ple, by going line by line and checking that each line follows fromthe preceding ones using one of the allowed inference rules.)
2. (Soundness) If there is a valid proof đ€ for đ„ then đ„ is true.
These are quite minimal requirements for a proof system. Require-ment 2 (soundness) is the very definition of a proof system: youshouldnât be able to prove things that are not true. Requirement 1is also essential. If there is no set of rules (i.e., an algorithm) to checkthat a proof is valid then in what sense is it a proof system? We could
378 introduction to theoretical computer science
replace it with a system where the âproofâ for a statement đ„ is âtrustme: itâs trueâ.
We formally define proof systems as an algorithm đ wheređ (đ„, đ€) = 1 holds if the string đ€ is a valid proof for the statement đ„.Even if đ„ is true, the string đ€ does not have to be a valid proof for it(there are plenty of wrong proofs for true statements such as 4=2+2)but if đ€ is a valid proof for đ„ then đ„ must be true.
Definition 11.2 â Proof systems. Let đŻ â 0, 1â be some set (whichwe consider the âtrueâ statements). A proof system for đŻ is an algo-rithm đ that satisfies:
1. (Effectiveness) For every đ„, đ€ â 0, 1â, đ (đ„, đ€) halts with an out-put of either 0 or 1.
2. (Soundness) For every đ„ â đŻ and đ€ â 0, 1â, đ (đ„, đ€) = 0.A true statement đ„ â đŻ is unprovable (with respect to đ ) if for
every đ€ â 0, 1â, đ (đ„, đ€) = 0. We say that đ is complete if theredoes not exist a true statement đ„ that is unprovable with respect tođ .
Big Idea 15 A proof is just a string of text whose meaning is givenby a verification algorithm.
11.2 GĂDELâS INCOMPLETENESS THEOREM: COMPUTATIONALVARIANT
Our first formalization of Theorem 11.1 involves statements aboutTuring machines. We let â be the set of strings đ„ â 0, 1â that havethe form âTuring machine đ halts on the zero inputâ.
Theorem 11.3 â Gödelâs Incompleteness Theorem: computational variant.
There does not exist a complete proof system for â.
Proof Idea:
If we had such a complete and sound proof system then we couldsolve the HALTONZERO problem. On input a Turing machine đ ,we would search all purported proofs đ€ and halt as soon as we finda proof of either âđ halts on zeroâ or âđ does not halt on zeroâ. Ifthe system is sound and complete then we will eventually find such aproof, and it will provide us with the correct output.â
is every theorem provable? 379
Proof of Theorem 11.3. Assume for the sake of contradiction that therewas such a proof system đ . We will use đ to build an algorithm đŽthat computes HALTONZERO, hence contradicting Theorem 9.9. Ouralgorithm đŽ will will work as follows:
Algorithm 11.4 â Halting from proofs.
Input: Turing Machine đOutput: 1 đ if halts on theInput: 0; 0 otherwise.1: for đ = 1, 2, 3, ⊠do2: for đ€ â 0, 1đ do3: if đ (âđ halts on 0â, đ€) = 1 then4: return 15: end if6: if đ (âđ does not halt on 0â, đ€) = 1 then7: return 08: end if9: end for
10: end for
If đ halts on 0 then under our assumption there exists đ€ thatproves this fact, and so when Algorithm đŽ reaches đ = |đ€| we willeventually find this đ€ and output 1, unless we already halted be-fore. But we cannot halt before and output a wrong answer becauseit would contradict the soundness of the proof system. Similarly, thisshows that if đ does not halt on 0 then (since we assume there is aproof of this fact too) our algorithm đŽ will eventually halt and output0.
RRemark 11.5 â The Gödel statement (optional). One canextract from the proof of Theorem 11.3 a procedurethat for every proof system đ , yields a true statementđ„â that cannot be proven in đ . But Gödelâs proofgave a very explicit description of such a statement đ„âwhich is closely related to the âLiarâs paradoxâ. Thatis, Gödelâs statement đ„â was designed to be true if andonly if âđ€â0,1â đ (đ„, đ€) = 0. In other words, it satisfiedthe following property
đ„â is true â đ„â does not have a proof in đ (11.1)
One can see that if đ„â is true, then it does not have aproof, but it is false then (assuming the proof systemis sound) then it cannot have a proof, and hence đ„â
380 introduction to theoretical computer science
must be both true and unprovable. One might wonderhow is it possible to come up with an đ„â that satisfiesa condition such as (11.1) where the same string đ„âappears on both the righthand side and the lefthandside of the equation. The idea is that the proof of The-orem 11.3 yields a way to transform every statement đ„into a statement đč(đ„) that is true if and only if đ„ doesnot have a proof in đ . Thus đ„â needs to be a fixed pointof đč : a sentence such that đ„â = đč(đ„â). It turns out thatwe can always find such a fixed point of đč . Weâve al-ready seen this phenomenon in the đ calculus, wherethe đ combinator maps every đč into a fixed point đ đčof đč . This is very related to the idea of programs thatcan print their own code. Indeed, Scott Aaronson likesto describe Gödelâs statement as follows:
The following sentence repeated twice, the sec-ond time in quotes, is not provable in the formalsystem đ . âThe following sentence repeatedtwice, the second time in quotes, is not provablein the formal system đ .â
In the argument above we actually showed that đ„â istrue, under the assumption that đ is sound. Since đ„âis true and does not have a proof in đ , this means thatwe cannot carry the above argument in the system đ ,which means that đ cannot prove its own soundness(or even consistency: that there is no proof of both astatement and its negation). Using this idea, itâs nothard to get Gödelâs second incompleteness theorem,which says that every sufficiently rich đ cannot proveits own consistency. That is, if we formalize the state-ment đâ that is true if and only if đ is consistent (i.e.,đ cannot prove both a statement and the statementâsnegation), then đâ cannot be proven in đ .
11.3 QUANTIFIED INTEGER STATEMENTS
There is something âunsatisfyingâ about Theorem 11.3. Sure, it showsthere are statements that are unprovable, but they donât feel like ârealâstatements about math. After all, they talk about programs rather thannumbers, matrices, or derivatives, or whatever it is they teach in mathcourses. It turns out that we can get an analogous result for statementssuch as âthere are no positive integers đ„ and đŠ such that đ„2 â 2 =đŠ7â, or âthere are positive integers đ„, đŠ, đ§ such that đ„2 + đŠ6 = đ§11âthat only talk about natural numbers. It doesnât get much more ârealmathâ than this. Indeed, the 19th century mathematician LeopoldKronecker famously said that âGod made the integers, all else is thework of man.â (By the way, the status of the above two statements isunknown.)
is every theorem provable? 381
To make this more precise, let us define the notion of quantifiedinteger statements:
Definition 11.6 â Quantified integer statements. A quantified integer state-ment is a well-formed statement with no unbound variables involv-ing integers, variables, the operators >, <, Ă, +, â, =, the logicaloperations ÂŹ (NOT), ⧠(AND), and âš (OR), as well as quantifiersof the form âđ„ââ and âđŠââ where đ„, đŠ are variable names.
We often care deeply about determining the truth of quantifiedinteger statements. For example, the statement that Fermatâs LastTheorem is true for đ = 3 can be phrased as the quantified integerstatement
ÂŹâđâââđâââđââ(đ > 0)â§(đ > 0)â§(đ > 0)â§(đ Ă đ Ă đ + đ Ă đ Ă đ = đ Ă đ Ă đ) .(11.2)
The twin prime conjecture, that states that there is an infinite num-ber of numbers đ such that both đ and đ + 2 are primes can be phrasedas the quantified integer statementâđâââđââ(đ > đ) ⧠PRIME(đ) ⧠PRIME(đ + 2) (11.3)
where we replace an instance of PRIME(đ) with the statement (đ >1) ⧠âđâââđââ(đ = 1) âš (đ = đ) âš ÂŹ(đ Ă đ = đ).The claim (mentioned in Hilbertâs quote above) that are infinitely
many primes of the form đ = 2đ + 1 can be phrased as follows:
âđâââđââ(đ > đ) ⧠PRIME(đ)â§(âđââ(đ â 2 ⧠PRIME(đ)) â ÂŹDIVIDES(đ, đ â 1)) (11.4)
where DIVIDES(đ, đ) is the statement âđââđ Ă đ = đ. In English, thiscorresponds to the claim that for every đ there is some đ > đ such thatall of đ â 1âs prime factors are equal to 2.
RRemark 11.7 â Syntactic sugar for quantified integerstatements. To make our statements more readable,we often use syntactic sugar and so write đ„ â đŠ asshorthand for ÂŹ(đ„ = đŠ), and so on. Similarly, theâimplication operatorâ đ â đ is âsyntactic sugarâ orshorthand for ÂŹđ âš đ, and the âif and only if operatorâđ â is shorthand for (đ â đ) ⧠(đ â đ). We willalso allow ourselves the use of âmacrosâ: plugging inone quantified integer statement in another, as we didwith DIVIDES and PRIME above.
382 introduction to theoretical computer science
Much of number theory is concerned with determining the truthof quantified integer statements. Since our experience has been that,given enough time (which could sometimes be several centuries) hu-manity has managed to do so for the statements that it cared enoughabout, one could (as Hilbert did) hope that eventually we would beable to prove or disprove all such statements. Alas, this turns out to beimpossible:
Theorem 11.8 â Gödelâs Incompleteness Theorem for quantified integer state-
ments. Let đ ⶠ0, 1â â 0, 1 a computable purported verificationprocedure for quantified integer statements. Then either:
âą đ is not sound: There exists a false statement đ„ and a stringđ€ â 0, 1â such that đ (đ„, đ€) = 1.or
âą đ is not complete: There exists a true statement đ„ such that forevery đ€ â 0, 1â, đ (đ„, đ€) = 0.
Theorem 11.8 is a direct corollary of the following result, justas Theorem 11.3 was a direct corollary of the uncomputability ofHALTONZERO:
Theorem 11.9 â Uncomputability of quantified integer statements. LetQIS ⶠ0, 1â â 0, 1 be the function that given a (string rep-resentation of) a quantified integer statement outputs 1 if it is trueand 0 if it is false. Then QIS is uncomputable.
Since a quantified integer statement is simply a sequence of sym-bols, we can easily represent it as a string. For simplicity we will as-sume that every string represents some quantified integer statement,by mapping strings that do not correspond to such a statement to anarbitrary statement such as âđ„ââđ„ = 1.
PPlease stop here and make sure you understandwhy the uncomputability of QIS (i.e., Theorem 11.9)means that there is no sound and complete proofsystem for proving quantified integer statements (i.e.,Theorem 11.8). This follows in the same way thatTheorem 11.3 followed from the uncomputability ofHALTONZERO, but working out the details is a greatexercise (see Exercise 11.1)
In the rest of this chapter, we will show the proof of Theorem 11.8,following the outline illustrated in Fig. 11.1.
is every theorem provable? 383
Figure 11.2: Diophantine equations such as findinga positive integer solution to the equation đ(đ +đ)(đ + đ) + đ(đ + đ)(đ + đ) + đ(đ + đ)(đ + đ) =4(đ + đ)(đ + đ)(đ + đ) (depicted more compactlyand whimsically above) can be surprisingly difficult.There are many equations for which we do not knowif they have a solution, and there is no algorithm tosolve them in general. The smallest solution for thisequation has 80 digits! See this Quora post for moreinformation, including the credits for this image.3 This is a special case of whatâs known as âFermatâsLast Theoremâ which states that đđ + đđ = đđ has nosolution in integers for đ > 2. This was conjectured in1637 by Pierre de Fermat but only proven by AndrewWiles in 1991. The case đ = 11 (along with all otherso called âregular prime exponentsâ) was establishedby Kummer in 1850.
11.4 DIOPHANTINE EQUATIONS AND THE MRDP THEOREM
Many of the functions people wanted to compute over the years in-volved solving equations. These have a much longer history thanmechanical computers. The Babylonians already knew how to solvesome quadratic equations in 2000BC, and the formula for all quadrat-ics appears in the Bakhshali Manuscript that was composed in Indiaaround the 3rd century. During the Renaissance, Italian mathemati-cians discovered generalization of these formulas for cubic and quar-tic (degrees 3 and 4) equations. Many of the greatest minds of the17th and 18th century, including Euler, Lagrange, Leibniz and Gaussworked on the problem of finding such a formula for quintic equationsto no avail, until in the 19th century Ruffini, Abel and Galois showedthat no such formula exists, along the way giving birth to group theory.
However, the fact that there is no closed-form formula doesnot mean we can not solve such equations. People have beensolving higher degree equations numerically for ages. The Chinesemanuscript Jiuzhang Suanshu from the first century mentions suchapproaches. Solving polynomial equations is by no means restrictedonly to ancient history or to studentsâ homework. The gradientdescent method is the workhorse powering many of the machinelearning tools that have revolutionized Computer Science over the lastseveral years.
But there are some equations that we simply do not know how tosolve by any means. For example, it took more than 200 years until peo-ple succeeded in proving that the equation đ11 + đ11 = đ11 has nosolution in integers.3 The notorious difficulty of so called Diophantineequations (i.e., finding integer roots of a polynomial) motivated themathematician David Hilbert in 1900 to include the question of find-ing a general procedure for solving such equations in his famous listof twenty-three open problems for mathematics of the 20th century. Idonât think Hilbert doubted that such a procedure exists. After all, thewhole history of mathematics up to this point involved the discoveryof ever more powerful methods, and even impossibility results suchas the inability to trisect an angle with a straightedge and compass, orthe non-existence of an algebraic formula for quintic equations, merelypointed out to the need to use more general methods.
Alas, this turned out not to be the case for Diophantine equations.In 1970, Yuri Matiyasevich, building on a decades long line of work byMartin Davis, Hilary Putnam and Julia Robinson, showed that there issimply no method to solve such equations in general:
Theorem 11.10 â MRDP Theorem. Let DIO ⶠ0, 1â â 0, 1 be thefunction that takes as input a string describing a 100-variable poly-
384 introduction to theoretical computer science
nomial with integer coefficients đ(đ„0, ⊠, đ„99) and outputs 1 if andonly if there exists đ§0, ⊠, đ§99 â â s.t. đ(đ§0, ⊠, đ§99) = 0.
Then DIO is uncomputable.
As usual, we assume some standard way to express numbers andtext as binary strings. The constant 100 is of course arbitrary; the prob-lem is known to be uncomputable even for polynomials of degreefour and at most 58 variables. In fact the number of variables can bereduced to nine, at the expense of the polynomial having a larger (butstill constant) degree. See Jonesâs paper for more about this issue.
RRemark 11.11 â Active code vs static data. The diffi-culty in finding a way to distinguish between âcodeâsuch as NAND-TM programs, and âstatic contentâsuch as polynomials is just another manifestation ofthe phenomenon that code is the same as data. Whilea fool-proof solution for distinguishing between thetwo is inherently impossible, finding heuristics that doa reasonable job keeps many firewall and anti-virusmanufacturers very busy (and finding ways to bypassthese tools keeps many hackers busy as well).
11.5 HARDNESS OF QUANTIFIED INTEGER STATEMENTS
We will not prove the MRDP Theorem (Theorem 11.10). However, aswe mentioned, we will prove the uncomputability of QIS (i.e., Theo-rem 11.9), which is a special case of the MRDP Theorem. The reasonis that a Diophantine equation is a special case of a quantified integerstatement where the only quantifier is â. This means that deciding thetruth of quantified integer statements is a potentially harder problemthan solving Diophantine equations, and so it is potentially easier toprove that QIS is uncomputable.
PIf you find the last sentence confusing, it is worth-while to reread it until you are sure you follow itslogic. We are so accustomed to trying to find solu-tions for problems that it can sometimes be hard tofollow the arguments for showing that problems areuncomputable.
Our proof of the uncomputability of QIS (i.e. Theorem 11.9) will, asusual, go by reduction from the Halting problem, but we will do so intwo steps:
is every theorem provable? 385
1. We will first use a reduction from the Halting problem to show thatdeciding the truth of quantified mixed statements is uncomputable.Quantified mixed statements involve both strings and integers.Since quantified mixed statements are a more general concept thanquantified integer statements, it is easier to prove the uncomputabil-ity of deciding their truth.
2. We will then reduce the problem of quantified mixed statements toquantifier integer statements.
11.5.1 Step 1: Quantified mixed statements and computation historiesWe define quantified mixed statements as statements involving not justintegers and the usual arithmetic operators, but also string variables aswell.
Definition 11.12 â Quantified mixed statements. A quantified mixed state-ment is a well-formed statement with no unbound variables involv-ing integers, variables, the operators >, <, Ă, +, â, =, the logicaloperations ÂŹ (NOT), ⧠(AND), and âš (OR), as well as quanti-fiers of the form âđ„ââ, âđâ0,1â , âđŠââ, âđâ0,1â where đ„, đŠ, đ, đ arevariable names. These also include the operator |đ| which returnsthe length of a string valued variable đ, as well as the operator đđwhere đ is a string-valued variable and đ is an integer valued ex-pression which is true if đ is smaller than the length of đ and the đđĄâcoordinate of đ is 1, and is false otherwise.
For example, the true statement that for every string đ there is astring đ that corresponds to đ in reverse order can be phrased as thefollowing quantified mixed statementâđâ0,1ââđâ0,1â(|đ| = |đ|) ⧠(âđââđ < |đ| â (đđ â đ|đ|âđ)) . (11.5)
Quantified mixed statements are more general than quantifiedinteger statements, and so the following theorem is potentially easierto prove than Theorem 11.9:
Theorem 11.13 â Uncomputability of quantified mixed statements. LetQMS ⶠ0, 1â â 0, 1 be the function that given a (string rep-resentation of) a quantified mixed statement outputs 1 if it is trueand 0 if it is false. Then QMS is uncomputable.
Proof Idea:
The idea behind the proof is similar to that used in showing thatone-dimensional cellular automata are Turing complete (Theorem 8.7)as well as showing that equivalence (or even âfullnessâ) of contextfree grammars is uncomputable (Theorem 10.10). We use the notion
386 introduction to theoretical computer science
of a configuration of a NAND-TM program as in Definition 8.8. Sucha configuration can be thought of as a string đŒ over some large-but-finite alphabet ÎŁ describing its current state, including the valuesof all arrays, scalars, and the index variable i. It can be shown thatif đŒ is the configuration at a certain step of the execution and đœ isthe configuration at the next step, then đœđ = đŒđ for all đ outside ofđ â 1, đ, đ + 1 where đ is the value of i. In particular, every value đœđ issimply a function of đŒđâ1,đ,đ+1. Using these observations we can writea quantified mixed statement NEXT(đŒ, đœ) that will be true if and only ifđœ is the configuration encoding the next step after đŒ. Since a programđ halts on input đ„ if and only if there is a sequence of configurationsđŒ0, ⊠, đŒđĄâ1 (known as a computation history) starting with the initialconfiguration with input đ„ and ending in a halting configuration, wecan define a quantified mixed statement to determine if there is sucha statement by taking a universal quantifier over all strings đ» (forhistory) that encode a tuple (đŒ0, đŒ1, ⊠, đŒđĄâ1) and then checking thatđŒ0 and đŒđĄâ1 are valid starting and halting configurations, and thatNEXT(đŒđ, đŒđ+1) is true for every đ â 0, ⊠, đĄ â 2.âProof of Theorem 11.13. The proof is obtained by a reduction from theHalting problem. Specifically, we will use the notion of a configura-tion of a Turing Machines (Definition 8.8) that we have seen in thecontext of proving that one dimensional cellular automata are Turingcomplete. We need the following facts about configurations:
âą For every Turing Machine đ , there is a finite alphabet ÎŁ, and aconfiguration of đ is a string đŒ â ÎŁâ.
âą A configuration đŒ encodes all the state of the program at a particu-lar iteration, including the array, scalar, and index variables.
âą If đŒ is a configuration, then đœ = NEXTđ (đŒ) denotes the configura-tion of the computation after one more iteration. đœ is a string over ÎŁof length either |đŒ| or |đŒ| + 1, and every coordinate of đœ is a functionof just three coordinates in đŒ. That is, for every đ â 0, ⊠, |đœ| â 1,đœđ = MAPđ (đŒđâ1, đŒđ, đŒđ+1) where MAPđ ⶠΣ3 â ÎŁ is some functiondepending on đ .
âą There are simple conditions to check whether a string đŒ is a validstarting configuration corresponding to an input đ„, as well as tocheck whether a string đŒ is a halting configuration. In particularthese conditions can be phrased as quantified mixed statements.
âą A program đ halts on input đ„ if and only if there exists a sequenceof configurations đ» = (đŒ0, đŒ1, ⊠, đŒđ â1) such that (i) đŒ0 is a valid
is every theorem provable? 387
starting configuration of đ with input đ„, (ii) đŒđ â1 is a valid haltingconfiguration of đ , and (iii) đŒđ+1 = NEXTđ (đŒđ) for every đ â0, ⊠, đ â 2.We can encode such a sequence đ» of configuration as a binary
string. For concreteness, we let â = âlog(|ÎŁ| + 1)â and encode eachsymbol đ in ÎŁ âȘ "; " by a string in 0, 1â. We use â;â as a âseparatorâsymbol, and so encode đ» = (đŒ0, đŒ1, ⊠, đŒđ â1) as the concatenationof the encodings of each configuration, using â;â to separate the en-coding of đŒđ and đŒđ+1 for every đ â [đ ]. In particular for every TuringMachine đ , đ halts on the input 0 if and only if the following state-ment đđ is true
âđ»â0,1âđ» encodes halting configuration sequence starting with input 0 .(11.6)
If we can encode the statement đđ as a mixed-integer statementthen, since đđ is true if and only if HALTONZERO(đ) = 1, thiswould reduce the task of computing HALTONZERO to computingMIS, and hence imply (using Theorem 9.9 ) that MIS is uncomputable,completing the proof. Indeed, đđ can be encoded as a mixed-integerstatement for the following reasons:
1. Let đŒ, đœ â 0, 1â be two strings that encode configurations of đ .We can define a quantified mixed predicate NEXT(đŒ, đœ) that is trueif and only if đœ = NEXTđ(đœ) (i.e., đœ encodes the configurationobtained by proceeding from đŒ in one computational step). IndeedNEXT(đŒ, đœ) is true if for every đ â 0, ⊠, |đœ| which is a multipleof â, đœđ,âŠ,đ+ââ1 = MAPđ(đŒđââ,âŻ,đ+2ââ1) where MAPđ ⶠ0, 13â â0, 1â is the finite function above (identifying elements of ÎŁ withtheir encoding in 0, 1â). Since MAPđ is a finite function, we canexpress it using the logical operations AND,OR, NOT (for exampleby computing MAPđ with NANDâs).
2. Using the above we can now write the condition that for everysubstring of đ» that has the form đŒENC(; )đœ with đŒ, đœ â 0, 1âand ENC(; ) being the encoding of the separator â;â, it holds thatNEXT(đŒ, đœ) is true.
3. Finally, if đŒ0 is a binary string encoding the initial configuration ofđ on input 0, checking that the first |đŒ0| bits of đ» equal đŒ0 can beexpressed using AND,OR, and NOTâs. Similarly checking that thelast configuration encoded by đ» corresponds to a state in which đwill halt can also be expressed as a quantified statement.
Together the above yields a computable procedure that maps everyTuring Machine đ into a quantified mixed statement đđ such that
388 introduction to theoretical computer science
HALTONZERO(đ) = 1 if and only if QMS(đđ) = 1. This reducescomputing HALTONZERO to computing QMS, and hence the uncom-putability of HALTONZERO implies the uncomputability of QMS.
RRemark 11.14 â Alternative proofs. There are sev-eral other ways to show that QMS is uncomputable.For example, we can express the condition that a 1-dimensional cellular automaton eventually writes aâ1â to a given cell from a given initial configurationas a quantified mixed statement over a string encod-ing the history of all configurations. We can then usethe fact that cellular automatons can simulate Tur-ing machines (Theorem 8.7) to reduce the haltingproblem to QMS. We can also use other well knownuncomputable problems such as tiling or the post cor-respondence problem. Exercise 11.5 and Exercise 11.6explore two alternative proofs of Theorem 11.13.
11.5.2 Step 2: Reducing mixed statements to integer statementsWe now show how to prove Theorem 11.9 using Theorem 11.13. Theidea is again a proof by reduction. We will show a transformation ofevery quantifier mixed statement đ into a quantified integer statementđ that does not use string-valued variables such that đ is true if andonly if đ is true.
To remove string-valued variables from a statement, we encodeevery string by a pair integer. We will show that we can encode astring đ„ â 0, 1â by a pair of numbers (đ, đ) â â s.t.
âą đ = |đ„|âą There is a quantified integer statement COORD(đ, đ) that for everyđ < đ, will be true if đ„đ = 1 and will be false otherwise.
This will mean that we can replace a âfor allâ quantifier over stringssuch as âđ„â0,1â with a pair of quantifiers over integers of the formâđâââđââ (and similarly replace an existential quantifier of the formâđ„â0,1â with a pair of quantifiers âđâââđââ) . We can then replace allcalls to |đ„| by đ and all calls to đ„đ by COORD(đ, đ). This means thatif we are able to define COORD via a quantified integer statement,then we obtain a proof of Theorem 11.9, since we can use it to mapevery mixed quantified statement đ to an equivalent quantified inte-ger statement đ such that đ is true if and only if đ is true, and henceQMS(đ) = QIS(đ). Such a procedure implies that the task of comput-ing QMS reduces to the task of computing QIS, which means that theuncomputability of QMS implies the uncomputability of QIS.
is every theorem provable? 389
The above shows that proof of Theorem 11.9 all boils down to find-ing the right encoding of strings as integers, and the right way toimplement COORD as a quantified integer statement. To achieve thiswe use the following technical result :Lemma 11.15 â Constructible prime sequence. There is a sequence of primenumbers đ0 < đ1 < đ2 < ⯠such that there is a quantified integerstatement PSEQ(đ, đ) that is true if and only if đ = đđ.
Using Lemma 11.15 we can encode a đ„ â 0, 1â by the numbers(đ, đ) where đ = âđ„đ=1 đđ and đ = |đ„|. We can then define thestatement COORD(đ, đ) as
COORD(đ, đ) = âđââPSEQ(đ, đ) ⧠DIVIDES(đ, đ) (11.7)
where DIVIDES(đ, đ), as before, is defined as âđââđ Ă đ = đ. Note thatindeed if đ, đ encodes the string đ„ â 0, 1â, then for every đ < đ,COORD(đ, đ) = đ„đ, since đđ divides đ if and only if đ„đ = 1.
Thus all that is left to conclude the proof of Theorem 11.9 is toprove Lemma 11.15, which we now proceed to do.
Proof. The sequence of prime numbers we consider is the following:We fix đ¶ to be a sufficiently large constant (đ¶ = 2234 will do) anddefine đđ to be the smallest prime number that is in the interval [(đ +đ¶)3 + 1, (đ + đ¶ + 1)3 â 1]. It is known that there exists such a primenumber for every đ â â. Given this, the definition of PSEQ(đ, đ) issimple:(đ > (đ+đ¶)Ă(đ+đ¶)Ă(đ+đ¶))â§(đ < (đ+đ¶+1)Ă(đ+đ¶+1)Ă(đ+đ¶+1))â§(âđâČÂŹPRIME(đâČ) âš (đâČ â€ đ) âš (đâČ â„ đ)) ,
(11.8)We leave it to the reader to verify that PSEQ(đ, đ) is true iff đ = đđ.
To sum up we have shown that for every quantified mixed state-ment đ, we can compute a quantified integer statement đ such thatQMS(đ) = 1 if and only if QIS(đ) = 1. Hence the uncomputabilityof QMS (Theorem 11.13) implies the uncomputability of QIS, com-pleting the proof of Theorem 11.9, and so also the proof of GödelâsIncompleteness Theorem for quantified integer statements (Theo-rem 11.8).
Chapter Recap
âą Uncomputable functions include also functionsthat seem to have nothing to do with NAND-TMprograms or other computational models suchas determining the satisfiability of Diophantineequations.
390 introduction to theoretical computer science
4 Hint: think of đ„ as saying âTuring Machine đ haltson input đąâ and đ€ being a proof that is the number ofsteps that it will take for this to happen. Can you findan always-halting đ that will verify such statements?
Figure 11.3: In the puzzle problem, the input can bethought of as a finite collection ÎŁ of types of puz-zle pieces and the goal is to find out whether or notfind a way to arrange pieces from these types in arectangle. Formally, we model the input as a pair offunctions đđđĄđââ, đđđĄđâ ⶠΣ2 â 0, 1 thatsuch that đđđĄđââ(đđđđĄ, đđđâđĄ) = 1 (respectivelyđđđĄđâ(đąđ, đđđ€đ) = 1 ) if the pair of pieces arecompatible when placed in their respective posi-tions. We assume ÎŁ contains a special symbol â corresponding to having no piece, and an arrange-ment of puzzle pieces by an (đ â 2) Ă (đ â 2)rectangle is modeled by a string đ„ â ÎŁđâ đ whoseâouter coordinatesâ â are â and such that for everyđ â [đ â 1], đ â [đ â 1], đđđĄđâ(đ„đ,đ, đ„đ+1,đ) = 1 andđđđĄđââ(đ„đ,đ, đ„đ,đ+1) = 1.
âą This also implies that for any sound proof system(and in particular every finite axiomatic system) đ,there are interesting statements đ (namely of theform âđč(đ„) = 0â for an uncomputable functionđč) such that đ is not able to prove either đ or itsnegation.
11.6 EXERCISES
Exercise 11.1 â Gödelâs Theorem from uncomputability of đđŒđ. Prove Theo-rem 11.8 using Theorem 11.9.
Exercise 11.2 â Proof systems and uncomputability. Let FINDPROOF â¶0, 1â â 0, 1 be the following function. On input a Turing machineđ (which we think of as the verifying algorithm for a proof system)and a string đ„ â 0, 1â, FINDPROOF(đ , đ„) = 1 if and only if thereexists đ€ â 0, 1â such that đ (đ„, đ€) = 1.1. Prove that FINDPROOF is uncomputable.
2. Prove that there exists a Turing machine đ such that đ haltson every input đ„, đŁ but the function FINDPROOFđ defined asFINDPROOFđ (đ„) = FINDPROOF(đ , đ„) is uncomputable. Seefootnote for hint.4
Exercise 11.3 â Expression for floor. Let FSQRT(đ, đ) = âđââ((đ Ă đ) >đ) âš (đ †đ). Prove that FSQRT(đ, đ) is true if and only if đ = ââđâ.
Exercise 11.4 â axiomatic proof systems. For every representation of logicalstatements as strings, we can define an axiomatic proof system toconsist of a finite set of strings đŽ and a finite set of rules đŒ0, ⊠, đŒđâ1with đŒđ ⶠ(0, 1â)đđ â 0, 1â such that a proof (đ 1, ⊠, đ đ) that đ đis true is valid if for every đ, either đ đ â đŽ or is some đ â [đ] andare đ1, ⊠, đđđ < đ such that đ đ = đŒđ(đ đ1 , ⊠, đđđ). A system is sound ifwhenever there is no false đ such that there is a proof that đ is true.Prove that for every uncomputable function đč ⶠ0, 1â â 0, 1and every sound axiomatic proof system đ (that is characterized by afinite number of axioms and inference rules), there is some input đ„ forwhich the proof system đ is not able to prove neither that đč(đ„) = 0nor that đč(đ„) â 0.
Exercise 11.5 â Post Corrrespondence Problem. In the Post CorrespondenceProblem the input is a set đ = (đŒ0, đœ0), ⊠, (đœđâ1, đœđâ1) where each
is every theorem provable? 391
đŒđ and đœđ is a string in 0, 1â. We say that PCP(đ) = 1 if and only ifthere exists a list (đŒ0, đœ0), ⊠, (đŒđâ1, đœđâ1) of pairs in đ such thatđŒ0đŒ1 ⯠đŒđâ1 = đœ0đœ1 ⯠đœđâ1 . (11.9)
(We can think of each pair (đŒ, đœ) â đ as a âdomino tileâ and the ques-tion is whether we can stack a list of such tiles so that the top and thebottom yield the same string.) It can be shown that the PCP is uncom-putable by a fairly straightforward though somewhat tedious proof(see for example the Wikipedia page for the Post CorrespondenceProblem or Section 5.2 in [Sip97]).
Use this fact to provide a direct proof that QMS is uncomputable byshowing that there exists a computable map đ ⶠ0, 1â â 0, 1â suchthat PCP(đ) = QMS(đ (đ)) for every string đ encoding an instance ofthe post correspondence problem.
Exercise 11.6 â Uncomputability of puzzle. Let PUZZLE ⶠ0, 1â â 0, 1 bethe problem of determining, given a finite collection of types of âpuz-zle piecesâ, whether it is possible to put them together in a rectangle,see Fig. 11.3. Formally, we think of such a collection as a finite set ÎŁ(see Fig. 11.3). We model the criteria as to which pieces âfit togetherâby a pair of finite function đđđĄđâ, đđđĄđââ ⶠΣ2 â 0, 1 such that apiece đ fits above a piece đ if and only if đđđĄđâ(đ, đ) = 1 and a pieceđ fits to the left of a piece đ if and only if đđđĄđââ(đ, đ) = 1. To modelthe âstraight edgeâ pieces that can be placed next to a âblank spotâwe assume that ÎŁ contains the symbol â and the matching functionsare defined accordingly. A square tiling of ÎŁ is an đ Ă đ long stringđ„ â ÎŁđđ, such that for every đ â 1, ⊠, đ â 2 and đ â 1, ⊠, đ â 2,đđđĄđâ(đ„đ,đ, đ„đâ1,đ, đ„đ+1,đ, đ„đ,đâ1, đ„đ,đ+1) = 1 (i.e., every âinternal pieveâfits in with the pieces adjacent to it). We also require all of the âouterpiecesâ (i.e., đ„đ,đ where đ â 0, đ â 1 of đ â 0, đ â 1) are âblankâor equal to â . The function PUZZLE takes as input a string describingthe set ÎŁ and the function đđđĄđâ and outputs 1 if and only if there issome square tiling of ÎŁ: some not all blank string đ„ â ÎŁđđ satisfyingthe above condition.
1. Prove that PUZZLE is uncomputable.
2. Give a reduction from PUZZLE to QMS.
Exercise 11.7 â MRDP exercise. The MRDP theorem states that theproblem of determining, given a đ-variable polynomial đ with integercoefficients, whether there exists integers đ„0, ⊠, đ„đâ1 such thatđ(đ„0, ⊠, đ„đâ1) = 0 is uncomputable. Consider the following quadratic
392 introduction to theoretical computer science
5 You can replace the equation đŠ = đ„4 with the pairof equations đŠ = đ§2 and đ§ = đ„2. Also, you canreplace the equation đ€ = đ„6 with the three equationsđ€ = đŠđą, đŠ = đ„4 and đą = đ„2.
6 You will not need to use very specific properties ofthe TOWER function in this exercise. For example,NBB(đ) also grows faster than the Ackerman func-tion. You might find Aaronsonâs blog post on thesame topic to be quite interesting, and relevant to thisbook at large. If you like it then you might also enjoythis piece by Terence Tao.
integer equation problem: the input is a list of polynomials đ0, ⊠, đđâ1over đ variables with integer coefficients, where each of the polynomi-als is of degree at most two (i.e., it is a quadratic function). The goalis to determine whether there exist integers đ„0, ⊠, đ„đâ1 that solve theequations đ0(đ„) = ⯠= đđâ1(đ„) = 0.
Use the MRDP Theorem to prove that this problem is uncom-putable. That is, show that the function QUADINTEQ ⶠ0, 1â â0, 1 is uncomputable, where this function gets as input a string de-scribing the polynomials đ0, ⊠, đđâ1 (each with integer coefficientsand degree at most two), and outputs 1 if and only if there existsđ„0, ⊠, đ„đâ1 â †such that for every đ â [đ], đđ(đ„0, ⊠, đ„đâ1) = 0. Seefootnote for hint5
Exercise 11.8 â The Busy Beaver problem. In this question we define theNAND-TM variant of the busy beaver function.
1. We define the function đ ⶠ0, 1â â â as follows: for everystring đ â 0, 1â, if đ represents a NAND-TM program such thatwhen đ is executed on the input 0 (i.e., the string of length 1 that issimply 0), a total of đ lines are executed before the program halts,then đ (đ) = đ . Otherwise (if đ does not represent a NAND-TMprogram, or it is a program that does not halt on 0), đ (đ) = 0.Prove that đ is uncomputable.
2. Let TOWER(đ) denote the number 222.
.
.
2âđ times(that is, a âtower of pow-
ers of twoâ of height đ). To get a sense of how fast this functiongrows, TOWER(1) = 2, TOWER(2) = 22 = 4, TOWER(3) = 222 =16, TOWER(4) = 216 = 65536 and TOWER(5) = 265536 whichis about 1020000. TOWER(6) is already a number that is too big towrite even in scientific notation. Define NBB ⶠâ â â (for âNAND-TM Busy Beaverâ) to be the function NBB(đ) = maxđâ0,1đ đ (đ)where đ ⶠâ â â is the function defined in Item 1. Prove thatNBB grows faster than TOWER, in the sense that TOWER(đ) =đ(NBB(đ)) (i.e., for every đ > 0, there exists đ0 such that for everyđ > đ0, TOWER(đ) < đ â NBB(đ).).6
11.7 BIBLIOGRAPHICAL NOTES
As mentioned before, Gödel, Escher, Bach [Hof99] is a highly recom-mended book covering Gödelâs Theorem. A classic popular sciencebook about Fermatâs Last Theorem is [Sin97].
Cantorâs are used for both Turing and Gödelâs theorems. In a twistof fate, using techniques originating from the works of Gödel and Tur-
is every theorem provable? 393
ing, Paul Cohen showed in 1963 that Cantorâs Continuum Hypothesisis independent of the axioms of set theory, which means that neitherit nor its negation is provable from these axioms and hence in somesense can be considered as âneither true nor falseâ (see [Coh08]). TheContinuum Hypothesis is the conjecture that for every subset đ of â,either there is a one-to-one and onto map between đ and â or thereis a one-to-one and onto map between đ and â. It was conjecturedby Cantor and listed by Hilbert in 1900 as one of the most importantproblems in mathematics. See also the non-conventional survey ofShelah [She03]. See here for recent progress on a related question.
Thanks to Alex Lombardi for pointing out an embarrassing mistakein the description of Fermatâs Last Theorem. (I said that it was openfor exponent 11 before Wilesâ work.)
IIIEFFICIENT ALGORITHMS
12Efficient computation: An informal introduction
âThe problem of distinguishing prime numbers from composite and of resolvingthe latter into their prime factors is ⊠one of the most important and usefulin arithmetic ⊠Nevertheless we must confess that all methods ⊠are eitherrestricted to very special cases or are so laborious ⊠they try the patience ofeven the practiced calculator ⊠and do not apply at all to larger numbers.â,Carl Friedrich Gauss, 1798
âFor practical purposes, the difference between algebraic and exponential orderis often more crucial than the difference between finite and non-finite.â, JackEdmunds, âPaths, Trees, and Flowersâ, 1963
âWhat is the most efficient way to sort a million 32-bit integers?â, EricSchmidt to Barack Obama, 2008âI think the bubble sort would be the wrong way to go.â, Barack Obama.
So far we have been concerned with which functions are computableand which ones are not. In this chapter we look at the finer questionof the time that it takes to compute functions, as a function of their inputlength. Time complexity is extremely important to both the theory andpractice of computing, but in introductory courses, coding interviews,and software development, terms such as âđ(đ) running timeâ are of-ten used in an informal way. People donât have a precise definition ofwhat a linear-time algorithm is, but rather assume that âtheyâll knowit when they see itâ. In this book we will define running time pre-cisely, using the mathematical models of computation we developedin the previous chapters. This will allow us to ask (and sometimesanswer) questions such as:
âą âIs there a function that can be computed in đ(đ2) time but not inđ(đ) time?â
âą âAre there natural problems for which the best algorithm (and notjust the best known) requires 2Ω(đ) time?â
Compiled on 8.26.2020 18:10
Learning Objectives:âą Describe at a high level some interesting
computational problems.âą The difference between polynomial and
exponential time.âą Examples of techniques for obtaining efficient
algorithmsâą Examples of how seemingly small differences
in problems can potentially make hugedifferences in their computational complexity.
398 introduction to theoretical computer science
Big Idea 16 The running time of an algorithm is not a number, it isa function of the length of the input.
This chapter: A non-mathy overviewIn this chapter, we informally survey examples of compu-tational problems. For some of these problems we knowefficient (i.e., đ(đđ)-time for a small constant đ) algorithms,and for others the best known algorithms are exponential.We present these examples to get a feel as to the kinds ofproblems that lie on each side of this divide and also see howsometimes seemingly minor changes in problem formula-tion can make the (known) complexity of a problem âjumpâfrom polynomial to exponential. We do not formally definethe notion of running time in this chapter, but use the sameâI know it when I see itâ notion of an đ(đ) or đ(đ2) timealgorithms as the one youâve seen in introduction to com-puter science courses. We will see the precise definition ofrunning time (using Turing machines and RAM machines /NAND-RAM) in Chapter 13.
While the difference between đ(đ) and đ(đ2) time can be crucial inpractice, in this book we focus on the even bigger difference betweenpolynomial and exponential running time. As we will see, the differencebetween polynomial versus exponential time is typically insensitive tothe choice of the particular computational model, a polynomial-timealgorithm is still polynomial whether you use Turing machines, RAMmachines, or parallel cluster as your model of computation, and sim-ilarly an exponential-time algorithm will remain exponential in all ofthese platforms. One of the interesting phenomena of computing isthat there is often a kind of a âthreshold phenomenonâ or âzero-onelawâ for running time. Many natural problems can either be solvedin polynomial running time with a not-too-large exponent (e.g., some-thing like đ(đ2) or đ(đ3)), or require exponential (e.g., at least 2Ω(đ)or 2Ω(âđ)) time to solve. The reasons for this phenomenon are still notfully understood, but some light on it is shed by the concept of NPcompleteness, which we will see in Chapter 15.
This chapter is merely a tiny sample of the landscape of computa-tional problems and efficient algorithms. If you want to explore thefield of algorithms and data structures more deeply (which I verymuch hope you do!), the bibliographical notes contain references tosome excellent texts, some of which are available freely on the web.
efficient computation: an informal introduction 399
RRemark 12.1 â Relations between parts of this book.Part I of this book contained a quantitative study ofcomputation of finite functions. We asked what arethe resources (in terms of gates of Boolean circuits orlines in straight-line programs) required to computevarious finite functions.Part II of the book contained a qualitative study ofcomputation of infinite functions (i.e., functions ofunbounded input length). In that part we asked thequalitative question of whether or not a function is com-putable at all, regardless of the number of operations.Part III of the book, beginning with this chapter,merges the two approaches and contains a quantitativestudy of computation of infinite functions. In this partwe ask how do resources for computing a functionscale with the length of the input. In Chapter 13 wedefine the notion of running time, and the class P offunctions that can be computed using a number ofsteps that scales polynomially with the input length.In Section 13.6 we will relate this class to the modelsof Boolean circuits and straightline programs that westudied in Part I.
12.1 PROBLEMS ON GRAPHS
In this chapter we discuss several examples of important computa-tional problems. Many of the problems will involve graphs. We havealready encountered graphs before (see Section 1.4.4) but now quicklyrecall the basic notation. A graph đș consists of a set of vertices đ andedges đž where each edge is a pair of vertices. We typically denote byđ the number of vertices (and in fact often consider graphs where theset of vertices đ equals the set [đ] of the integers between 0 and đ â 1).In a directed graph, an edge is an ordered pair (đą, đŁ), which we some-times denote as đą đŁ. In an undirected graph, an edge is an unorderedpair (or simply a set) đą, đŁ which we sometimes denote as đą đŁ orđą ⌠đŁ. An equivalent viewpoint is that an undirected graph corre-sponds to a directed graph satisfying the property that whenever theedge đą đŁ is present then so is the edge đŁ đą. In this chapter we restrictour attention to graphs that are undirected and simple (i.e., containingno parallel edges or self-loops). Graphs can be represented either inthe adjacency list or adjacency matrix representation. We can transformbetween these two representations using đ(đ2) operations, and hencefor our purposes we will mostly consider them as equivalent.
Graphs are so ubiquitous in computer science and other sciencesbecause they can be used to model a great many of the data that weencounter. These are not just the âobviousâ data such as the road
400 introduction to theoretical computer science
Figure 12.1: Some examples of graphs found on theInternet.
1 A queue is a data structure for storing a list of el-ements in âFirst In First Out (FIFO)â order. Eachâpopâ operation removes an element from the queuein the order that they were âpushedâ into it; see theWikipedia page.
network (which can be thought of as a graph of whose vertices arelocations with edges corresponding to road segments), or the web(which can be thought of as a graph whose vertices are web pageswith edges corresponding to links), or social networks (which canbe thought of as a graph whose vertices are people and the edgescorrespond to friend relation). Graphs can also denote correlations indata (e.g., graph of observations of features with edges correspondingto features that tend to appear together), causal relations (e.g., generegulatory networks, where a gene is connected to gene products itderives), or the state space of a system (e.g., graph of configurationsof a physical system, with edges corresponding to states that can bereached from one another in one step).
12.1.1 Finding the shortest path in a graphThe shortest path problem is the task of finding, given a graph đș =(đ , đž) and two vertices đ , đĄ â đ , the length of the shortest pathbetween đ and đĄ (if such a path exists). That is, we want to find thesmallest number đ such that there are vertices đŁ0, đŁ1, ⊠, đŁđ with đŁ0 = đ ,đŁđ = đĄ and for every đ â 0, ⊠, đ â 1 an edge between đŁđ and đŁđ+1. For-mally, we define MINPATH ⶠ0, 1â â 0, 1â to be the function thaton input a triple (đș, đ , đĄ) (represented as a string) outputs the numberđ which is the length of the shortest path in đș between đ and đĄ or astring representing no path if no such path exists. (In practice peopleoften want to also find the actual path and not just its length; it turnsout that the algorithms to compute the length of the path often yieldthe actual path itself as a byproduct, and so everything we say aboutthe task of computing the length also applies to the task of finding thepath.)
If each vertex has at least two neighbors then there can be an expo-nential number of paths from đ to đĄ, but fortunately we do not have toenumerate them all to find the shortest path. We can find the short-est path using a breadth first search (BFS), enumerating đ âs neigh-bors, and then neighborsâ neighbors, etc.. in order. If we maintainthe neighbors in a list we can perform a BFS in đ(đ2) time, while us-ing a queue we can do this in đ(đ) time.1 Dijkstraâs algorithm is awell-known generalization of BFS to weighted graphs. More formally,the algorithm for computing the function MINPATH is described inAlgorithm 12.2.
efficient computation: an informal introduction 401
Algorithm 12.2 â Shortest path via BFS.
Input: Graph đș = (đ , đž) and vertices đ , đĄ â đ . Assumeđ = [đ].Output: Length đ of shortest path from đ to đĄ or â if no
such path exists.1: Let đ· be length-đ array.2: Set đ·[đ ] = 0 and đ·[đ] = â for all đ â [đ] ⧔ đ .3: Initialize queue đ to contain đ .4: while đ non empty do5: Pop đŁ from đ6: if đŁ = đĄ then7: return đ·[đŁ]8: end if9: for đą neighbor of đŁ with đ·[đą] = â do
10: Set đ·[đą] â đ·[đŁ] + 111: Add đą to đ.12: end for13: end while14: return â
Since we only add to the queue vertices đ€ with đ·[đ€] = â (andthen immediately set đ·[đ€] to an actual number), we never push tothe queue a vertex more than once, and hence the algorithm makes atmost đ âpushâ and âpopâ operations. For each vertex đŁ, the numberof times we run the inner loop is equal to the degree of đŁ and hencethe total running time is proportional to the sum of all degrees whichequals twice the number đ of edges. Algorithm 12.2 returns the cor-rect answer since the vertices are added to the queue in the order oftheir distance from đ , and hence we will reach đĄ after we have exploredall the vertices that are closer to đ than đĄ.
RRemark 12.3 â On data structures. If youâve ever takenan algorithms course, you have probably encounteredmany data structures such as lists, arrays, queues,stacks, heaps, search trees, hash tables and manymore. Data structures are extremely important in com-puter science, and each one of those offers differenttradeoffs between overhead in storage, operationssupported, cost in time for each operation, and more.For example, if we store đ items in a list, we will needa linear (i.e., đ(đ) time) scan to retrieve an element,while we achieve the same operation in đ(1) time ifwe used a hash table. However, when we only careabout polynomial-time algorithms, such factors ofđ(đ) in the running time will not make much differ-
402 introduction to theoretical computer science
Figure 12.2: A knightâs tour can be thought of as amaximally long path on the graph corresponding toa chessboard where we put an edge between any twosquares that can be reached by one step via a legalknight move.
ence. Similarly, if we donât care about the differencebetween đ(đ) and đ(đ2), then it doesnât matter if werepresent graphs as adjacency lists or adjacency matri-ces. Hence we will often describe our algorithms at avery high level, without specifying the particular datastructures that are used to implement them. How-ever, it will always be clear that there exists some datastructure that is sufficient for our purposes.
12.1.2 Finding the longest path in a graphThe longest path problem is the task of finding the length of the longestsimple (i.e., non intersecting) path between a given pair of verticesđ and đĄ in a given graph đș. If the graph is a road network, then thelongest path might seem less motivated than the shortest path (unlessyou are the kind of person that always prefers the âscenic routeâ).But graphs can and are used to model a variety of phenomena, and inmany such cases finding the longest path (and some of its variants)can be very useful. In particular, finding the longest path is a gener-alization of the famous Hamiltonian path problem which asks for amaximally long simple path (i.e., path that visits all đ vertices once)between đ and đĄ, as well as the notorious traveling salesman problem(TSP) of finding (in a weighted graph) a path visiting all vertices ofcost at most đ€. TSP is a classical optimization problem, with appli-cations ranging from planning and logistics to DNA sequencing andastronomy.
Surprisingly, while we can find the shortest path in đ(đ) time,there is no known algorithm for the longest path problem that signif-icantly improves on the trivial âexhaustive searchâ or âbrute forceâalgorithm that enumerates all the exponentially many possibilitiesfor such paths. Specifically, the best known algorithms for the longestpath problem take đ(đđ) time for some constant đ > 1. (At the mo-ment the best record is đ ⌠1.65 or so; even obtaining an đ(2đ) timebound is not that simple, see Exercise 12.1.)
12.1.3 Finding the minimum cut in a graphGiven a graph đș = (đ , đž), a cut of đș is a subset đ â đ such that đis neither empty nor is it all of đ . The edges cut by đ are those edgeswhere one of their endpoints is in đ and the other is in đ = đ ⧔ đ. Wedenote this set of edges by đž(đ, đ). If đ , đĄ â đ are a pair of verticesthen an đ , đĄ cut is a cut such that đ â đ and đĄ â đ (see Fig. 12.3).The minimum đ , đĄ cut problem is the task of finding, given đ and đĄ, theminimum number đ such that there is an đ , đĄ cut cutting đ edges (theproblem is also sometimes phrased as finding the set that achievesthis minimum; it turns out that algorithms to compute the numberoften yield the set as well). Formally, we define MINCUT ⶠ0, 1â â
efficient computation: an informal introduction 403
Figure 12.3: A cut in a graph đș = (đ , đž) is simply asubset đ of its vertices. The edges that are cut by đare all those whose one endpoint is in đ and the otherone is in đ = đ ⧔ đ. The cut edges are colored red inthis figure.
0, 1â to be the function that on input a string representing a triple(đș = (đ , đž), đ , đĄ) of a graph and two vertices, outputs the minimumnumber đ such that there exists a set đ â đ with đ â đ, đĄ â đ and|đž(đ, đ)| = đ.
Computing minimum đ , đĄ cuts is useful in many applications sinceminimum cuts often correspond to bottlenecks. For example, in a com-munication or railroad network the minimum cut between đ and đĄcorresponds to the smallest number of edges that, if dropped, willdisconnect đ from đĄ. (This was actually the original motivation for thisproblem; see Section 12.6.) Similar applications arise in schedulingand planning. In the setting of image segmentation, one can define agraph whose vertices are pixels and whose edges correspond to neigh-boring pixels of distinct colors. If we want to separate the foregroundfrom the background then we can pick (or guess) a foreground pixel đ and background pixel đĄ and ask for a minimum cut between them.
The naive algorithm for computing MINCUT will check all 2đ pos-sible subsets of an đ-vertex graph, but it turns out we can do muchbetter than that. As weâve seen in this book time and again, there ismore than one algorithm to compute the same function, and someof those algorithms might be more efficient than others. Luckily theminimum cut problem is one of those cases. In particular, as we willsee in the next section, there are algorithms that compute MINCUT intime which is polynomial in the number of vertices.
12.1.4 Min-Cut Max-Flow and Linear programmingWe can obtain a polynomial-time algorithm for computing MINCUTusing the Max-Flow Min-Cut Theorem. This theorem says that theminimum cut between đ and đĄ equals the maximum amount of flowwe can send from đ to đĄ, if every edge has unit capacity. Specifically,imagine that every edge of the graph corresponded to a pipe thatcould carry one unit of fluid per one unit of time (say 1 liter of waterper second). The maximum đ , đĄ flow is the maximum units of waterthat we could transfer from đ to đĄ over these pipes. If there is an đ , đĄcut of đ edges, then the maximum flow is at most đ. The reason isthat such a cut đ acts as a âbottleneckâ since at most đ units can flowfrom đ to its complement at any given unit of time. This means thatthe maximum đ , đĄ flow is always at most the value of the minimumđ , đĄ cut. The surprising and non-trivial content of the Max-Flow Min-Cut Theorem is that the maximum flow is also at least the value of theminimum cut, and hence computing the cut is the same as computingthe flow.
The Max-Flow Min-Cut Theorem reduces the task of computing aminimum cut to the task of computing a maximum flow. However, thisstill does not show how to compute such a flow. The Ford-Fulkerson
404 introduction to theoretical computer science
Algorithm is a direct way to compute a flow using incremental im-provements. But computing flows in polynomial time is also a specialcase of a much more general tool known as linear programming.
A flow on a graph đș of đ edges can be modeled as a vector đ„ â âđwhere for every edge đ, đ„đ corresponds to the amount of water pertime-unit that flows on đ. We think of an edge đ as an ordered pair(đą, đŁ) (we can choose the order arbitrarily) and let đ„đ be the amountof flow that goes from đą to đŁ. (If the flow is in the other direction thenwe make đ„đ negative.) Since every edge has capacity one, we knowthat â1 †đ„đ †1 for every edge đ. A valid flow has the property thatthe amount of water leaving the source đ is the same as the amountentering the sink đĄ, and that for every other vertex đŁ, the amount ofwater entering and leaving đŁ is the same.
Mathematically, we can write these conditions as follows:âđâđ đ„đ + âđâđĄ đ„đ = 0âđâđŁ đ„đ = 0 âđŁâđ ⧔đ ,đĄâ1 †đ„đ †1 âđâđž(12.1)
where for every vertex đŁ, summing over đ â đŁ means summing over allthe edges that touch đŁ.
The maximum flow problem can be thought of as the task of max-imizing âđâđ đ„đ over all the vectors đ„ â âđ that satisfy the aboveconditions (12.1). Maximizing a linear function â(đ„) over the set ofđ„ â âđ that satisfy certain linear equalities and inequalities is knownas linear programming. Luckily, there are polynomial-time algorithmsfor solving linear programming, and hence we can solve the maxi-mum flow (and so, equivalently, minimum cut) problem in polyno-mial time. In fact, there are much better algorithms for maximum-flow/minimum-cut, even for weighted directed graphs, with currentlythe record standing at đ(minđ10/7, đâđ) time.Solved Exercise 12.1 â Global minimum cut. Given a graph đș = (đ , đž),define the global minimum cut of đș to be the minimum over all đ â đwith đ â â and đ â đ of the number of edges cut by đ. Prove thatthere is a polynomial-time algorithm to compute the global minimumcut of a graph.
Solution:
By the above we know that there is a polynomial-time algorithmđŽ that on input (đș, đ , đĄ) finds the minimum đ , đĄ cut in the graph
efficient computation: an informal introduction 405
Figure 12.4: In a convex function đ (left figure), forevery đ„ and đŠ and đ â [0, 1] it holds that đ(đđ„ + (1 âđ)đŠ) †đâ đ(đ„)+(1âđ)â đ(đŠ). In particular this meansthat every local minimum of đ is also a global minimum.In contrast in a non convex function there can be manylocal minima.
Figure 12.5: In the high dimensional case, if đ is aconvex function (left figure) the global minimumis the only local minimum, and we can find it bya local-search algorithm which can be thought ofas dropping a marble and letting it âslide downâuntil it reaches the global minimum. In contrast, anon-convex function (right figure) might have anexponential number of local minima in which anylocal-search algorithm could get stuck.
đș. Using đŽ, we can obtain an algorithm đ” that on input a graph đșcomputes the global minimum cut as follows:
1. For every distinct pair đ , đĄ â đ , Algorithms đ” sets đđ ,đĄ âđŽ(đș, đ , đĄ).2. đ” returns the minimum of đđ ,đĄ over all distinct pairs đ , đĄ
The running time of đ” will be đ(đ2) times the running time of đŽand hence polynomial time. Moreover, if the the global minimumcut is đ, then when đ” reaches an iteration with đ â đ and đĄ â đ itwill obtain the value of this cut, and hence the value output by đ”will be the value of the global minimum cut.
The above is our first example of a reduction in the context ofpolynomial-time algorithms. Namely, we reduced the task of com-puting the global minimum cut to the task of computing minimumđ , đĄ cuts.
12.1.5 Finding the maximum cut in a graphThe maximum cut problem is the task of finding, given an input graphđș = (đ , đž), the subset đ â đ that maximizes the number of edgescut by đ. (We can also define an đ , đĄ-cut variant of the maximum cutlike we did for minimum cut; the two variants have similar complexitybut the global maximum cut is more common in the literature.) Likeits cousin the minimum cut problem, the maximum cut problem isalso very well motivated. For example, maximum cut arises in VLSIdesign, and also has some surprising relation to analyzing the Isingmodel in statistical physics.
Surprisingly, while (as weâve seen) there is a polynomial-time al-gorithm for the minimum cut problem, there is no known algorithmsolving maximum cut much faster than the trivial âbrute forceâ algo-rithm that tries all 2đ possibilities for the set đ.
12.1.6 A note on convexityThere is an underlying reason for the sometimes radical differencebetween the difficulty of maximizing and minimizing a function overa domain. If đ· â âđ, then a function đ ⶠđ· â đ is convex if for everyđ„, đŠ â đ· and đ â [0, 1] đ(đđ„ + (1 â đ)đŠ) †đđ(đ„) + (1 â đ)đ(đŠ). Thatis, đ applied to the đ-weighted midpoint between đ„ and đŠ is smallerthan the đ-weighted average value of đ . If đ· itself is convex (whichmeans that if đ„, đŠ are in đ· then so is the line segment between them),then this means that if đ„ is a local minimum of đ then it is also a globalminimum. The reason is that if đ(đŠ) < đ(đ„) then every point đ§ =đđ„ + (1 â đ)đŠ on the line segment between đ„ and đŠ will satisfy đ(đ§) â€đđ(đ„) + (1 â đ)đ(đŠ) < đ(đ„) and hence in particular đ„ cannot be a local
406 introduction to theoretical computer science
minimum. Intuitively, local minima of functions are much easier tofind than global ones: after all, any âlocal searchâ algorithm that keepsfinding a nearby point on which the value is lower, will eventuallyarrive at a local minima. One example of such a local search algorithmis gradient descent which takes a sequence of small steps, each one inthe direction that would reduce the value by the most amount basedon the current derivative.
Indeed, under certain technical conditions, we can often efficientlyfind the minimum of convex functions over a convex domain, andthis is the reason why problems such as minimum cut and shortestpath are easy to solve. On the other hand, maximizing a convex func-tion over a convex domain (or equivalently, minimizing a concavefunction) can often be a hard computational task. A linear functionis both convex and concave, which is the reason that both the maxi-mization and minimization problems for linear functions can be doneefficiently.
The minimum cut problem is not a priori a convex minimizationtask, because the set of potential cuts is discrete and not continuous.However, it turns out that we can embed it in a continuous and con-vex set via the (linear) maximum flow problem. The âmax flow mincutâ theorem ensures that this embedding is âtightâ in the sense thatthe minimum âfractional cutâ that we obtain through the maximum-flow linear program will be the same as the true minimum cut. Un-fortunately, we donât know of such a tight embedding in the setting ofthe maximum cut problem.
Convexity arises time and again in the context of efficient computa-tion. For example, one of the basic tasks in machine learning is empir-ical risk minimization. This is the task of finding a classifier for a givenset of training examples. That is, the input is a list of labeled examples(đ„0, đŠ0), ⊠, (đ„đâ1, đŠđâ1), where each đ„đ â 0, 1đ and đŠđ â 0, 1,and the goal is to find a classifier â ⶠ0, 1đ â 0, 1 (or sometimesâ ⶠ0, 1đ â â) that minimizes the number of errors. More generally,we want to find â that minimizesđâ1âđ=0 đż(đŠđ, â(đ„đ)) (12.2)
where đż is some loss function measuring how far is the predicted la-bel â(đ„đ) from the true label đŠđ. When đż is the square loss functionđż(đŠ, đŠâČ) = (đŠ â đŠâČ)2 and â is a linear function, empirical risk mini-mization corresponds to the well-known convex minimization task oflinear regression. In other cases, when the task is non convex, there canbe many global or local minima. That said, even if we donât find theglobal (or even a local) minima, this continuous embedding can stillhelp us. In particular, when running a local improvement algorithm
efficient computation: an informal introduction 407
such as Gradient Descent, we might still find a function â that is âuse-fulâ in the sense of having a small error on future examples from thesame distribution.
12.2 BEYOND GRAPHS
Not all computational problems arise from graphs. We now list someother examples of computational problems that are of great interest.
12.2.1 SATA propositional formula đ involves đ variables đ„1, ⊠, đ„đ and the logicaloperators AND (â§), OR (âš), and NOT (ÂŹ, also denoted as â ). We saythat such a formula is in conjunctive normal form (CNF for short) if it isan AND of ORs of variables or their negations (we call a term of theform đ„đ or đ„đ a literal). For example, this is a CNF formula(đ„7 âš đ„22 âš đ„15) ⧠(đ„37 âš đ„22) ⧠(đ„55 âš đ„7) (12.3)
The satisfiability problem is the task of determining, given a CNFformula đ, whether or not there exists a satisfying assignment for đ. Asatisfying assignment for đ is a string đ„ â 0, 1đ such that đ evalu-ates to True if we assign its variables the values of đ„. The SAT problemmight seem as an abstract question of interest only in logic but in factSAT is of huge interest in industrial optimization, with applicationsincluding manufacturing planning, circuit synthesis, software verifica-tion, air-traffic control, scheduling sports tournaments, and more.
2SAT. We say that a formula is a đ-CNF it is an AND of ORs whereeach OR involves exactly đ literals. The đ-SAT problem is the restric-tion of the satisfiability problem for the case that the input formula isa đ-CNF. In particular, the 2SAT problem is to find out, given a 2-CNFformula đ, whether there is an assignment đ„ â 0, 1đ that satisfiesđ, in the sense that it makes it evaluate to 1 or âTrueâ. The trivial,brute-force, algorithm for 2SAT will enumerate all the 2đ assignmentsđ„ â 0, 1đ but fortunately we can do much better. The key is thatwe can think of every constraint of the form âđ âš âđ (where âđ, âđ areliterals, corresponding to variables or their negations) as an implicationâđ â âđ, since it corresponds to the constraints that if the literal ââČđ = âđis true then it must be the case that âđ is true as well. Hence we canthink of đ as a directed graph between the 2đ literals, with an edgefrom âđ to âđ corresponding to an implication from the former to thelatter. It can be shown that đ is unsatisfiable if and only if there is avariable đ„đ such that there is a directed path from đ„đ to đ„đ as well asa directed path from đ„đ to đ„đ (see Exercise 12.2). This reduces 2SATto the (efficiently solvable) problem of determining connectivity indirected graphs.
408 introduction to theoretical computer science
3SAT. The 3SAT problem is the task of determining satisfiabilityfor 3CNFs. One might think that changing from two to three wouldnot make that much of a difference for complexity. One would bewrong. Despite much effort, we do not know of a significantly betterthan brute force algorithm for 3SAT (the best known algorithms takeroughly 1.3đ steps).
Interestingly, a similar issue arises time and again in computation,where the difference between two and three often corresponds tothe difference between tractable and intractable. We do not fully un-derstand the reasons for this phenomenon, though the notion of NPcompleteness we will see later does offer a partial explanation. It maybe related to the fact that optimizing a polynomial often amounts toequations on its derivative. The derivative of a quadratic polynomial islinear, while the derivative of a cubic is quadratic, and, as we will see,the difference between solving linear and quadratic equations can bequite profound.
12.2.2 Solving linear equationsOne of the most useful problems that people have been solving timeand again is solving đ linear equations in đ variables. That is, solveequations of the formđ0,0đ„0 + đ0,1đ„1 + ⯠+ đ0,đâ1đ„đâ1 = đ0đ1,0đ„0 + đ1,1đ„1 + ⯠+ đ1,đâ1đ„đâ1 = đ1âź+ âź + âź + âź =âźđđâ1,0đ„0 + đđâ1,1đ„1 + ⯠+ đđâ1,đâ1đ„đâ1 = đđâ1
(12.4)
where đđ,đđ,đâ[đ] and đđđâ[đ] are real (or rational) numbers. Morecompactly, we can write this as the equations đŽđ„ = đ where đŽ is anđ Ă đ matrix, and we think of đ„, đ are column vectors in âđ.
The standard Gaussian elimination algorithm can be used to solvesuch equations in polynomial time (i.e., determine if they have a so-lution, and if so, to find it). As we discussed above, if we are willingto allow some loss in precision, we even have algorithms that handlelinear inequalities, also known as linear programming. In contrast, ifwe insist on integer solutions, the task of solving for linear equalitiesor inequalities is known as integer programming, and the best knownalgorithms are exponential time in the worst case.
RRemark 12.4 â Bit complexity of numbers. Whenever wediscuss problems whose inputs correspond to num-bers, the input length corresponds to how many bitsare needed to describe the number (or, as is equiv-alent up to a constant factor, the number of digits
efficient computation: an informal introduction 409
in base 10, 16 or any other constant). The differencebetween the length of the input and the magnitudeof the number itself can be of course quite profound.For example, most people would agree that there isa huge difference between having a billion (i.e. 109)dollars and having nine dollars. Similarly there is ahuge difference between an algorithm that takes đsteps on an đ-bit number and an algorithm that takes2đ steps.One example is the problem (discussed below) offinding the prime factors of a given integer đ . Thenatural algorithm is to search for such a factor by try-ing all numbers from 1 to đ , but that would take đsteps which is exponential in the input length, whichis the number of bits needed to describe đ . (The run-ning time of this algorithm can be easily improved toroughly
âđ , but this is still exponential (i.e., 2đ/2) inthe number đ of bits to describe đ .) It is an importantand long open question whether there is such an algo-rithm that runs in time polynomial in the input length(i.e., polynomial in logđ).
12.2.3 Solving quadratic equationsSuppose that we want to solve not just linear but also equations in-volving quadratic terms of the form đđ,đ,đđ„đđ„đ. That is, suppose thatwe are given a set of quadratic polynomials đ1, ⊠, đđ and considerthe equations đđ(đ„) = 0. To avoid issues with bit representations,we will always assume that the equations contain the constraintsđ„2đ â đ„đ = 0đâ[đ]. Since only 0 and 1 satisfy the equation đ2 â đ = 0,this assumption means that we can restrict attention to solutions in0, 1đ. Solving quadratic equations in several variables is a classicaland extremely well motivated problem. This is the generalization ofthe classical case of single-variable quadratic equations that gener-ations of high school students grapple with. It also generalizes thequadratic assignment problem, introduced in the 1950âs as a way tooptimize assignment of economic activities. Once again, we do notknow a much better algorithm for this problem than the one that enu-merates over all the 2đ possibilities.
12.3 MORE ADVANCED EXAMPLES
We now list a few more examples of interesting problems that are alittle more advanced but are of significant interest in areas such asphysics, economics, number theory, and cryptography.
12.3.1 Determinant of a matrixThe determinant of a đ Ă đ matrix đŽ, denoted by det(đŽ), is an ex-tremely important quantity in linear algebra. For example, it is known
410 introduction to theoretical computer science
that det(đŽ) â 0 if and only if đŽ is nonsingular, which means that ithas an inverse đŽâ1, and hence we can always uniquely solve equationsof the form đŽđ„ = đ where đ„ and đ are đ-dimensional vectors. Moregenerally, the determinant can be thought of as a quantitative measureas to what extent đŽ is far from being singular. If the rows of đŽ are âal-mostâ linearly dependent (for example, if the third row is very closeto being a linear combination of the first two rows) then the determi-nant will be small, while if they are far from it (for example, if they areare orthogonal to one another, then the determinant will be large). Inparticular, for every matrix đŽ, the absolute value of the determinantof đŽ is at most the product of the norms (i.e., square root of sum ofsquares of entries) of the rows, with equality if and only if the rowsare orthogonal to one another.
The determinant can be defined in several ways. One way to definethe determinant of an đ Ă đ matrix đŽ is:
det(đŽ) = âđâđđ sign(đ) âđâ[đ] đŽđ,đ(đ) (12.5)
where đđ is the set of all permutations from [đ] to [đ] and the sign ofa permutation đ is equal to â1 raised to the power of the number ofinversions in đ (pairs đ, đ such that đ > đ but đ(đ) < đ(đ)).
This definition suggests that computing det(đŽ) might requiresumming over |đđ| terms which would take exponential time since|đđ| = đ! > 2đ. However, there are other ways to compute the de-terminant. For example, it is known that det is the only function thatsatisfies the following conditions:
1. det(AB) = det(đŽ)det(đ”) for every square matrices đŽ, đ”.
2. For every đ Ă đ triangular matrix đ with diagonal entriesđ0, ⊠, đđâ1, det(đ ) = âđđ=0 đđ. In particular det(đŒ) = 1 where đŒ isthe identity matrix. (A triangular matrix is one in which either allentries below the diagonal, or all entries above the diagonal, arezero.)
3. det(đ) = â1 where đ is a âswap matrixâ that corresponds toswapping two rows or two columns of đŒ . That is, there are two
coordinates đ, đ such that for every đ, đ, đđ,đ = â§âšâ©1 đ = đ , đ â đ, đ1 đ, đ = đ, đ0 otherwise
.
Using these rules and the Gaussian elimination algorithm, it ispossible to tell whether đŽ is singular or not, and in the latter case, de-compose đŽ as a product of a polynomial number of swap matricesand triangular matrices. (Indeed one can verify that the row opera-tions in Gaussian elimination corresponds to either multiplying by a
efficient computation: an informal introduction 411
swap matrix or by a triangular matrix.) Hence we can compute thedeterminant for an đ Ă đ matrix using a polynomial time of arithmeticoperations.
12.3.2 Permanent of a matrixGiven an đ Ă đ matrix đŽ, the permanent of đŽ is defined as
perm(đŽ) = âđâđđ âđâ[đ] đŽđ,đ(đ) . (12.6)
That is, perm(đŽ) is defined analogously to the determinant in (12.5)except that we drop the term sign(đ). The permanent of a matrix is anatural quantity, and has been studied in several contexts includingcombinatorics and graph theory. It also arises in physics where it canbe used to describe the quantum state of multiple Boson particles (seehere and here).
Permanent modulo 2. If the entries of đŽ are integers, then we can de-fine the Boolean function đđđđ2 which outputs on input a matrix đŽthe result of the permanent of đŽ modulo 2. It turns out that we cancompute đđđđ2(đŽ) in polynomial time. The key is that modulo 2, âđ„and +đ„ are the same quantity and hence, since the only differencebetween (12.5) and (12.6) is that some terms are multiplied by â1,det(đŽ) mod 2 = perm(đŽ) mod 2 for every đŽ.
Permanent modulo 3. Emboldened by our good fortune above, wemight hope to be able to compute the permanent modulo any prime đand perhaps in full generality. Alas, we have no such luck. In a similarâtwo to threeâ type of a phenomenon, we do not know of a muchbetter than brute force algorithm to even compute the permanentmodulo 3.12.3.3 Finding a zero-sum equilibriumA zero sum game is a game between two players where the payoff forone is the same as the penalty for the other. That is, whatever the firstplayer gains, the second player loses. As much as we want to avoidthem, zero sum games do arise in life, and the one good thing aboutthem is that at least we can compute the optimal strategy.
A zero sum game can be specified by an đ Ă đ matrix đŽ, where ifplayer 1 chooses action đ and player 2 chooses action đ then player onegets đŽđ,đ and player 2 loses the same amount. The famous Min MaxTheorem by John von Neumann states that if we allow probabilistic orâmixedâ strategies (where a player does not choose a single action butrather a distribution over actions) then it does not matter who playsfirst and the end result will be the same. Mathematically the min maxtheorem is that if we let Îđ be the set of probability distributions over
412 introduction to theoretical computer science
[đ] (i.e., non-negative columns vectors in âđ whose entries sum to 1)then
maxđââđ minđââđ đâ€đŽđ = minđââđ maxđââđ đâ€đŽđ (12.7)
The min-max theorem turns out to be a corollary of linear pro-gramming duality, and indeed the value of (12.7) can be computedefficiently by a linear program.
12.3.4 Finding a Nash equilibriumFortunately, not all real-world games are zero sum, and we do havemore general games, where the payoff of one player does not neces-sarily equal the loss of the other. John Nash won the Nobel prize forshowing that there is a notion of equilibrium for such games as well.In many economic texts it is taken as an article of faith that whenactual agents are involved in such a game then they reach a Nashequilibrium. However, unlike zero sum games, we do not know ofan efficient algorithm for finding a Nash equilibrium given the de-scription of a general (non zero sum) game. In particular this meansthat, despite economistsâ intuitions, there are games for which naturalstrategies will take an exponential number of steps to converge to anequilibrium.
12.3.5 Primality testingAnother classical computational problem, that has been of interestsince the ancient Greeks, is to determine whether a given numberđ is prime or composite. Clearly we can do so by trying to divide itwith all the numbers in 2, ⊠, đ â 1, but this would take at least đsteps which is exponential in its bit complexity đ = logđ . We canreduce these đ steps to
âđ by observing that if đ is a composite ofthe form đ = PQ then either đ or đ is smaller than
âđ . But this isstill quite terrible. If đ is a 1024 bit integer,
âđ is about 2512, and sorunning this algorithm on such an input would take much more thanthe lifetime of the universe.
Luckily, it turns out we can do radically better. In the 1970âs, Ra-bin and Miller gave probabilistic algorithms to determine whether agiven number đ is prime or composite in time đđđđŠ(đ) for đ = logđ .We will discuss the probabilistic model of computation later in thiscourse. In 2002, Agrawal, Kayal, and Saxena found a deterministicđđđđŠ(đ) time algorithm for this problem. This is surely a developmentthat mathematicians from Archimedes till Gauss would have foundexciting.
efficient computation: an informal introduction 413
Figure 12.6: The current computational status ofseveral interesting problems. For all of them we eitherknow a polynomial-time algorithm or the knownalgorithms require at least 2đđ for some đ > 0. Infact for all except the factoring problem, we eitherknow an đ(đ3) time algorithm or the best knownalgorithm require at least 2Ω(đ) time where đ is anatural parameter such that there is a brute forcealgorithm taking roughly 2đ or đ! time. Whether thisâcliffâ between the easy and hard problem is a realphenomenon or a reflection of our ignorance is still anopen question.
12.3.6 Integer factoringGiven that we can efficiently determine whether a number đ is primeor composite, we could expect that in the latter case we could also ef-ficiently find the factorization of đ . Alas, no such algorithm is known.In a surprising and exciting turn of events, the non existence of such analgorithm has been used as a basis for encryptions, and indeed it un-derlies much of the security of the world wide web. We will return tothe factoring problem later in this course. We remark that we do knowmuch better than brute force algorithms for this problem. While thebrute force algorithms would require 2Ω(đ) time to factor an đ-bit inte-ger, there are known algorithms running in time roughly 2đ(âđ) andalso algorithms that are widely believed (though not fully rigorouslyanalyzed) to run in time roughly 2đ(đ1/3). (By âroughlyâ we mean thatwe neglect factors that are polylogarithmic in đ.)12.4 OUR CURRENT KNOWLEDGE
The difference between an exponential and polynomial time algo-rithm might seem merely âquantitativeâ but it is in fact extremelysignificant. As weâve already seen, the brute force exponential timealgorithm runs out of steam very very fast, and as Edmonds says, inpractice there might not be much difference between a problem wherethe best algorithm is exponential and a problem that is not solvableat all. Thus the efficient algorithms we mentioned above are widelyused and power many computer science applications. Moreover, apolynomial-time algorithm often arises out of significant insight tothe problem at hand, whether it is the âmax-flow min-cutâ result, thesolvability of the determinant, or the group theoretic structure thatenables primality testing. Such insight can be useful regardless of itscomputational implications.
At the moment we do not know whether the âhardâ problems aretruly hard, or whether it is merely because we havenât yet found theright algorithms for them. However, we will now see that there areproblems that do inherently require exponential time. We just donâtknow if any of the examples above fall into that category.
Chapter Recap
âą There are many natural problems that havepolynomial-time algorithms, and other naturalproblems that weâd love to solve, but for which thebest known algorithms are exponential.
âą Often a polynomial time algorithm relies on dis-covering some hidden structure in the problem, orfinding a surprising equivalent formulation for it.
414 introduction to theoretical computer science
2 Hint: Use dynamic programming to compute forevery đ , đĄ â [đ] and đ â [đ] the value đ(đ , đĄ, đ)which equals 1 if there is a simple path from đ to đĄthat uses exactly the vertices in đ. Do this iterativelyfor đâs of growing sizes.
âą There are many interesting problems where thereis an exponential gap between the best known algo-rithm and the best algorithm that we can rule out.Closing this gap is one of the main open questionsof theoretical computer science.
12.5 EXERCISES
Exercise 12.1 â exponential time algorithm for longest path. The naive algo-rithm for computing the longest path in a given graph could takemore than đ! steps. Give a đđđđŠ(đ)2đ time algorithm for the longestpath problem in đ vertex graphs.2
Exercise 12.2 â 2SAT algorithm. For every 2CNF đ, define the graph đșđon 2đ vertices corresponding to the literals đ„1, ⊠, đ„đ, đ„1, ⊠, đ„đ, suchthat there is an edge âđ âđ iff the constraint âđ âš âđ is in đ. Prove that đis unsatisfiable if and only if there is some đ such that there is a pathfrom đ„đ to đ„đ and from đ„đ to đ„đ in đșđ. Show how to use this to solve2SAT in polynomial time.
Exercise 12.3 â Reductions for showing algorithms. The following fact istrue: there is a polynomial-time algorithm BIP that on input a graphđș = (đ , đž) outputs 1 if and only if the graph is bipartite: there is apartition of đ to disjoint parts đ and đ such that every edge (đą, đŁ) â đžsatisfies either đą â đ and đŁ â đ or đą â đ and đŁ â đ. Use thisfact to prove that there is a polynomial-time algorithm to computethat following function CLIQUEPARTITION that on input a graphđș = (đ , đž) outputs 1 if and only if there is a partition of đ the graphinto two parts đ and đ such that both đ and đ are cliques: for everypair of distinct vertices đą, đŁ â đ, the edge (đą, đŁ) is in đž and similarly forevery pair of distinct vertices đą, đŁ â đ , the edge (đą, đŁ) is in đž.
12.6 BIBLIOGRAPHICAL NOTES
The classic undergraduate introduction to algorithms text is[Cor+09]. Two texts that are less âencyclopedicâ are Kleinberg andTardos [KT06], and Dasgupta, Papadimitriou and Vazirani [DPV08].Jeff Ericksonâs book is an excellent algorithms text that is freelyavailable online.
The origins of the minimum cut problem date to the Cold War.Specifically, Ford and Fulkerson discovered their max-flow/min-cutalgorithm in 1955 as a way to find out the minimum amount of train
efficient computation: an informal introduction 415
tracks that would need to be blown up to disconnect Russia from therest of Europe. See the survey [Sch05] for more.
Some algorithms for the longest path problem are given in [Wil09;Bjo14].
12.7 FURTHER EXPLORATIONS
Some topics related to this chapter that might be accessible to ad-vanced students include: (to be completed)
13Modeling running time
âWhen the measure of the problem-size is reasonable and when thesizes assume values arbitrarily large, an asymptotic estimate of ⊠the or-der of difficulty of [an] algorithm .. is theoretically important. It cannotbe rigged by making the algorithm artificially difficult for smaller sizesâ,Jack Edmonds, âPaths, Trees, and Flowersâ, 1963
Max Newman: It is all very well to say that a machine could ⊠do this orthat, but ⊠what about the time it would take to do it?Alan Turing: To my mind this time factor is the one question which willinvolve all the real technical difficulty.BBC radio panel on âCan automatic Calculating Machines Be Said toThink?â, 1952
In Chapter 12 we saw examples of efficient algorithms, and madesome claims about their running time, but did not give a mathemati-cally precise definition for this concept. We do so in this chapter, usingthe models of Turing machines and RAM machines (or equivalentlyNAND-TM and NAND-RAM) we have seen before. The runningtime of an algorithm is not a fixed number since any non-trivial algo-rithm will take longer to run on longer inputs. Thus, what we wantto measure is the dependence between the number of steps the algo-rithms takes and the length of the input. In particular we care aboutthe distinction between algorithms that take at most polynomial time(i.e., đ(đđ) time for some constant đ) and problems for which everyalgorithm requires at least exponential time (i.e., Ω(2đđ) for some đ). Asmentioned in Edmondâs quote in Chapter 12, the difference betweenthese two can sometimes be as important as the difference betweenbeing computable and uncomputable.
This chapter: A non-mathy overviewIn this chapter we formally define what it means for a func-tion to be computable in a certain number of steps. As dis-cussed in Chapter 12, running time is not a number, rather
Compiled on 8.26.2020 18:10
Learning Objectives:âą Formally modeling running time, and in
particular notions such as đ(đ) or đ(đ3) timealgorithms.
âą The classes P and EXP modelling polynomialand exponential time respectively.
âą The time hierarchy theorem, that in particularsays that for every đ â„ 1 there are functionswe can compute in đ(đđ+1) time but can notcompute in đ(đđ) time.
âą The class P/poly of non uniform computationand the result that P â P/poly
418 introduction to theoretical computer science
Figure 13.1: Overview of the results of this chapter.
what we care about is the scaling behevaiour of the numberof steps as the input size grows. We can use either Turingmachines or RAM machines to give such a formal definition- it turns out that this doesnât make a difference at the reso-lution we care about. We make several important definitionsand prove some important theorems in this chapter. We willdefine the main time complexity classes we use in this book,and also show the Time Hierarchy Theorem which states thatgiven more resources (more time steps per input size) we cancompute more functions.
To put this in more âmathyâ language, in this chapter we definewhat it means for a function đč ⶠ0, 1â â 0, 1â to be computablein time đ (đ) steps, where đ is some function mapping the length đof the input to the number of computation steps allowed. Using thisdefinition we will do the following (see also Fig. 13.1):
âą We define the class P of Boolean functions that can be computedin polynomial time and the class EXP of functions that can be com-puted in exponential time. Note that P â EXP if we can compute afunction in polynomial time, we can certainly compute it in expo-nential time.
âą We show that the times to compute a function using a Turing Ma-chine and using a RAM machine (or NAND-RAM program) arepolynomially related. In particular this means that the classes P andEXP are identical regardless of whether they are defined usingTuring Machines or RAM machines / NAND-RAM programs.
modeling running time 419
âą We give an efficient universal NAND-RAM program and use this toestablish the time hierarchy theorem that in particular implies that P isa strict subset of EXP.
âą We relate the notions defined here to the non uniform models ofBoolean circuits and NAND-CIRC programs defined in Chapter 3.We define P/poly to be the class of functions that can be computedby a sequence of polynomial-sized circuits. We prove that P â P/polyand that P/poly contains uncomputable functions.
13.1 FORMALLY DEFINING RUNNING TIME
Our models of computation such Turing Machines, NAND-TM andNAND-RAM programs and others all operate by executing a se-quence of instructions on an input one step at a time. We can definethe running time of an algorithm đ in one of these models by measur-ing the number of steps đ takes on input đ„ as a function of the length|đ„| of the input. We start by defining running time with respect to Tur-ing Machines:
Definition 13.1 â Running time (Turing Machines). Let đ ⶠâ â â be somefunction mapping natural numbers to natural numbers. We saythat a function đč ⶠ0, 1â â 0, 1â is computable in đ (đ) Turing-Machine time (TM-time for short) if there exists a Turing Machine đsuch that for every sufficiently large đ and every đ„ â 0, 1đ, whengiven input đ„, the machine đ halts after executing at most đ (đ)steps and outputs đč(đ„).
We define TIMETM(đ (đ)) to be the set of Boolean functions(functions mapping 0, 1â to 0, 1) that are computable in đ (đ)TM time.
Big Idea 17 For a function đč ⶠ0, 1â â 0, 1 and đ â¶â â â, we can formally define what it means for đč to be computablein time at most đ (đ) where đ is the size of the input.
PDefinition 13.1 is not very complicated but is one ofthe most important definitions of this book. As usual,TIMETM(đ (đ)) is a class of functions, not of machines. Ifđ is a Turing Machine then a statement such as âđis a member of TIMETM(đ2)â does not make sense.The concept of TM-time as defined here is sometimesknown as âsingle-tape Turing machine timeâ in theliterature, since some texts consider Turing machineswith more than one working tape.
420 introduction to theoretical computer science
Figure 13.2: Comparing đ (đ) = 10đ3 with đ âČ(đ) =2đ (on the right figure the Y axis is in log scale).Since for every large enough đ, đ âČ(đ) â„ đ (đ),TIMETM(đ (đ)) â TIMETM(đ âČ(đ)).
The relaxation of considering only âsufficiently largeâ đâs is notvery important but it is convenient since it allows us to avoid dealingexplicitly with un-interesting âedge casesâ. We will mostly anyway beinterested in determining running time only up to constant and evenpolynomial factors.
While the notion of being computable within a certain running timecan be defined for every function, the class TIMETM(đ (đ)) is a classof Boolean functions that have a single bit of output. This choice is notvery important, but is made for simplicity and convenience later on.In fact, every non-Boolean function has a computationally equivalentBoolean variant, see Exercise 13.3.Solved Exercise 13.1 â Example of time bounds. Prove that TIMETM(10â đ3) âTIMETM(2đ).
Solution:
The proof is illustrated in Fig. 13.2. Suppose that đč â TIMETM(10â đ3) and hence there some number đ0 and a machine đ such thatfor every đ > đ0, and đ„ â 0, 1â, đ(đ„) outputs đč(đ„) within atmost 10 â đ3 steps. Since 10 â đ3 = đ(2đ), there is some numberđ1 such that for every đ > đ1, 10 â đ3 < 2đ. Hence for everyđ > maxđ0, đ1, đ(đ„) will output đč(đ„) within at most 2đ steps,just demonstrating that đč â TIMETM(2đ).
13.1.1 Polynomial and Exponential TimeUnlike the notion of computability, the exact running time can be afunction of the model we use. However, it turns out that if we onlycare about âcoarse enoughâ resolution (as will most often be the case)then the choice of the model, whether Turing Machines, RAM ma-chines, NAND-TM/NAND-RAM programs, or C/Python programs,does not matter. This is known as the extended Church-Turing Thesis.Specifically we will mostly care about the difference between polyno-mial and exponential time.
The two main time complexity classes we will be interested in arethe following:
âą Polynomial time: A function đč ⶠ0, 1â â 0, 1 is computable inpolynomial time if it is in the class P = âȘđâ1,2,3,âŠTIMETM(đđ). Thatis, đč â P if there is an algorithm to compute đč that runs in time atmost polynomial (i.e., at most đđ for some constant đ) in the lengthof the input.
âą Exponential time: A function đč ⶠ0, 1â â 0, 1 is computable inexponential time if it is in the class EXP = âȘđâ1,2,3,âŠTIMETM(2đđ).
modeling running time 421
That is, đč â EXP if there is an algorithm to compute đč that runs intime at most exponential (i.e., at most 2đđ for some constant đ) in thelength of the input.
In other words, these are defined as follows:
Definition 13.2 â P and EXP. Let đč ⶠ0, 1â â 0, 1. We say thatđč â P if there is a polynomial đ ⶠâ â â and a Turing Machineđ such that for every đ„ â 0, 1â, when given input đ„, the Turingmachine halts within at most đ(|đ„|) steps and outputs đč(đ„).
We say that đč â EXP if there is a polynomial đ ⶠâ â â anda Turing Machine đ such that for every đ„ â 0, 1â, when giveninput đ„, đ halts within at most 2đ(|đ„|) steps and outputs đč(đ„).
PPlease take the time to make sure you understandthese definitions. In particular, sometimes studentsthink of the class EXP as corresponding to functionsthat are not in P. However, this is not the case. If đč isin EXP then it can be computed in exponential time.This does not mean that it cannot be computed inpolynomial time as well.
Solved Exercise 13.2 â Differerent definitions of P. Prove that P as defined inDefinition 13.2 is equal to âȘđâ1,2,3,âŠTIMETM(đđ)
Solution:
To show these two sets are equal we need to show that P ââȘđâ1,2,3,âŠTIMETM(đđ) and âȘđâ1,2,3,âŠTIMETM(đđ) â P. We startwith the former inclusion. Suppose that đč â P. Then there is somepolynomial đ ⶠâ â â and a Turing machine đ such that đcomputes đč and đ halts on every input đ„ within at most đ(|đ„|)steps. We can write the polynomial đ ⶠâ â â in the formđ(đ) = âđđ=0 đđđđ where đ0, ⊠, đđ â â, and we assume that đđis nonzero (or otherwise we just let đ correspond to the largestnumber such that đđ is nonzero). The degree if đ the number đ.Since đđ = đ(đđ+1), no matter what is the coefficient đđ, for largeenough đ, đ(đ) < đđ+1 which means that the Turing machine đwill halt on inputs of length đ within fewer than đđ+1 steps, andhence đč â TIMETM(đđ+1) â âȘđâ1,2,3,âŠTIMETM(đđ).
For the second inclusion, suppose that đč â âȘđâ1,2,3,âŠTIMETM(đđ).Then there is some positive đ â â such that đč â TIMETM(đđ) whichmeans that there is a Turing Machine đ and some number đ0 such
422 introduction to theoretical computer science
Figure 13.3: Some examples of problems that areknown to be in P and problems that are known tobe in EXP but not known whether or not they arein P. Since both P and EXP are classes of Booleanfunctions, in this figure we always refer to the Boolean(i.e., Yes/No) variant of the problems.
that đ computes đč and for every đ > đ0, đ halts on length đinputs within at most đđ steps. Let đ0 be the maximum numberof steps that đ takes on inputs of length at most đ0. Then if wedefine the polynomial đ(đ) = đđ + đ0 then we see that đ halts onevery input đ„ within at most đ(|đ„|) steps and hence the existence ofđ demonstrates that đč â P.
Since exponential time is much larger than polynomial time, P âEXP. All of the problems we listed in Chapter 12 are in EXP, but asweâve seen, for some of them there are much better algorithms thatdemonstrate that they are in fact in the smaller class P.
P EXP (but not known to be in P)Shortest path Longest PathMin cut Max cut2SAT 3SATLinear eqs Quad. eqsZerosum NashDeterminant PermanentPrimality Factoring
Table : A table of the examples from Chapter 12. All these problemsare in EXP but only the ones on the left column are currently known tobe in P as well (i.e., they have a polynomial-time algorithm). See alsoFig. 13.3.
RRemark 13.3 â Boolean versions of problems. Manyof the problems defined in Chapter 12 correspond tonon Boolean functions (functions with more than onebit of output) while P and EXP are sets of Booleanfunctions. However, for every non-Boolean functionđč we can always define a computationally-equivalentBoolean function đș by letting đș(đ„, đ) be the đ-th bitof đč(đ„) (see Exercise 13.3). Hence the table above,as well as Fig. 13.3, refer to the computationally-equivalent Boolean variants of these problems.
13.2 MODELING RUNNING TIME USING RAM MACHINES / NAND-RAM
Turing Machines are a clean theoretical model of computation, butdo not closely correspond to real-world computing architectures. Thediscrepancy between Turing Machines and actual computers doesnot matter much when we consider the question of which functions
modeling running time 423
are computable, but can make a difference in the context of efficiency.Even a basic staple of undergraduate algorithms such as âmerge sortâcannot be implemented on a Turing Machine in đ(đ logđ) time (seeSection 13.8). RAM machines (or equivalently, NAND-RAM programs)match more closely actual computing architecture and what we meanwhen we say đ(đ) or đ(đ logđ) algorithms in algorithms coursesor whiteboard coding interviews. We can define running time withrespect to NAND-RAM programs just as we did for Turing Machines.
Definition 13.4 â Running time (RAM). Let đ ⶠâ â â be some func-tion mapping natural numbers to natural numbers. We say thata function đč ⶠ0, 1â â 0, 1â is computable in đ (đ) RAM time(RAM-time for short) if there exists a NAND-RAM program đ suchthat for every sufficiently large đ and every đ„ â 0, 1đ, when giveninput đ„, the program đ halts after executing at most đ (đ) lines andoutputs đč(đ„).
We define TIMERAM(đ (đ)) to be the set of Boolean functions(functions mapping 0, 1â to 0, 1) that are computable in đ (đ)RAM time.
Because NAND-RAM programs correspond more closely to ournatural notions of running time, we will use NAND-RAM as ourâdefaultâ model of running time, and hence use TIME(đ (đ)) (withoutany subscript) to denote TIMERAM(đ (đ)). However, it turns out thatas long as we only care about the difference between exponential andpolynomial time, this does not make much difference. The reason isthat Turing Machines can simulate NAND-RAM programs with atmost a polynomial overhead (see also Fig. 13.4):
Theorem 13.5 â Relating RAM and Turing machines. Let đ ⶠâ â â be afunction such that đ (đ) â„ đ for every đ and the map đ ⊠đ (đ) canbe computed by a Turing machine in time đ(đ (đ)). Then
TIMETM(đ (đ)) â TIMERAM(10 â đ (đ)) â TIMETM(đ (đ)4) . (13.1)
PThe technical details of Theorem 13.5, such as the con-dition that đ ⊠đ (đ) is computable in đ(đ (đ)) timeor the constants 10 and 4 in (13.1) (which are not tightand can be improved), are not very important. In par-ticular, all non pathological time bound functions weencounter in practice such as đ (đ) = đ, đ (đ) = đ logđ,đ (đ) = 2đ etc. will satisfy the conditions of Theo-rem 13.5, see also Remark 13.6.
424 introduction to theoretical computer science
Figure 13.4: The proof of Theorem 13.5 shows thatwe can simulate đ steps of a Turing Machine with đsteps of a NAND-RAM program, and can simulateđ steps of a NAND-RAM program with đ(đ 4)steps of a Turing Machine. Hence TIMETM(đ (đ)) âTIMERAM(10 â đ (đ)) â TIMETM(đ (đ)4).
The main message of the theorem is Turing Machinesand RAM machines are âroughly equivalentâ in thesense that one can simulate the other with polyno-mial overhead. Similarly, while the proof involvessome technical details, itâs not very deep or hard, andmerely follows the simulation of RAM machines withTuring Machines we saw in Theorem 8.1 with morecareful âbook keepingâ.
For example, by instantiating Theorem 13.5 with đ (đ) = đđ andusing the fact that 10đđ = đ(đđ+1), we see that TIMETM(đđ) âTIMERAM(đđ+1) â TIMETM(đ4đ+4) which means that (by Solved Ex-ercise 13.2)
P = âȘđ=1,2,âŠTIMETM(đđ) = âȘđ=1,2,âŠTIMERAM(đđ) . (13.2)That is, we could have equally well defined P as the class of functionscomputable by NAND-RAM programs (instead of Turing Machines)that run in time polynomial in the length of the input. Similarly, byinstantiating Theorem 13.5 with đ (đ) = 2đđ we see that the class EXPcan also be defined as the set of functions computable by NAND-RAMprograms in time at most 2đ(đ) where đ is some polynomial. Similarequivalence results are known for many models including cellularautomata, C/Python/Javascript programs, parallel computers, and agreat many other models, which justifies the choice of P as capturinga technology-independent notion of tractability. (See Section 13.3for more discussion of this issue.) This equivalence between Turingmachines and NAND-RAM (as well as other models) allows us topick our favorite model depending on the task at hand (i.e., âhave ourcake and eat it tooâ) even when we study questions of efficiency, aslong as we only care about the gap between polynomial and exponentialtime. When we want to design an algorithm, we can use the extrapower and convenience afforded by NAND-RAM. When we wantto analyze a program or prove a negative result, we can restrict ourattention to Turing machines.
Big Idea 18 All âreasonableâ computational models are equiv-alent if we only care about the distinction between polynomial andexponential.
The adjective âreasonableâ above refers to all scalable computa-tional models that have been implemented, with the possible excep-tion of quantum computers, see Section 13.3 and Chapter 23.Proof Idea:
The direction TIMETM(đ (đ)) â TIMERAM(10 â đ (đ)) is not hard toshow, since a NAND-RAM program đ can simulate a Turing Machine
modeling running time 425
đ with constant overhead by storing the transition table of đ inan array (as is done in the proof of Theorem 9.1). Simulating everystep of the Turing machine can be done in a constant number đ ofsteps of RAM, and it can be shown this constant đ is smaller than10. Thus the heart of the theorem is to prove that TIMERAM(đ (đ)) âTIMETM(đ (đ)4). This proof closely follows the proof of Theorem 8.1,where we have shown that every function đč that is computable bya NAND-RAM program đ is computable by a Turing Machine (orequivalently a NAND-TM program) đ . To prove Theorem 13.5, wefollow the exact same proof but just check that the overhead of thesimulation of đ by đ is polynomial. The proof has many details, butis not deep. It is therefore much more important that you understandthe statement of this theorem than its proof.âProof of Theorem 13.5. We only focus on the non-trivial directionTIMERAM(đ (đ)) â TIMETM(đ (đ)4). Let đč â TIMERAM(đ (đ)). đč canbe computed in time đ (đ) by some NAND-RAM program đ and weneed to show that it can also be computed in time đ (đ)4 by a TuringMachine đ . This will follow from showing that đč can be computedin time đ (đ)4 by a NAND-TM program, since for every NAND-TMprogram đ there is a Turing Machine đ simulating it such that eachiteration of đ corresponds to a single step of đ .
As mentioned above, we follow the proof of Theorem 8.1 (simula-tion of NAND-RAM programs using NAND-TM programs) and usethe exact same simulation, but with a more careful accounting of thenumber of steps that the simulation costs. Recall, that the simulationof NAND-RAM works by âpeeling offâ features of NAND-RAM oneby one, until we are left with NAND-TM.
We will not provide the full details but will present the main ideasused in showing that every feature of NAND-RAM can be simulatedby NAND-TM with at most a polynomial overhead:
1. Recall that every NAND-RAM variable or array element can con-tain an integer between 0 and đ where đ is the number of lines thathave been executed so far. Therefore if đ is a NAND-RAM pro-gram that computes đč in đ (đ) time, then on inputs of length đ, allintegers used by đ are of magnitude at most đ (đ). This means thatthe largest value i can ever reach is at most đ (đ) and so each one ofđ âs variables can be thought of as an array of at most đ (đ) indices,each of which holds a natural number of magnitude at most đ (đ).We let â = âlogđ (đ)â be the number of bits needed to encode suchnumbers. (We can start off the simulation by computing đ (đ) andâ.)
426 introduction to theoretical computer science
2. We can encode a NAND-RAM array of length †đ (đ) containingnumbers in 0, ⊠, đ (đ) â 1 as an Boolean (i.e., NAND-TM) arrayof đ (đ)â = đ(đ (đ) logđ (đ)) bits, which we can also think of asa two dimensional array as we did in the proof of Theorem 8.1. Weencode a NAND-RAM scalar containing a number in 0, ⊠, đ (đ) â1 simply by a shorter NAND-TM array of â bits.
3. We can simulate the two dimensional arrays using one-dimensional arrays of length đ (đ)â = đ(đ (đ) logđ (đ). All thearithmetic operations on integers use the grade-school algorithms,that take time that is polynomial in the number â of bits of theintegers, which is đđđđŠ(logđ (đ)) in our case. Hence we can simulateđ (đ) steps of NAND-RAM with đ(đ (đ)đđđđŠ(logđ (đ)) steps of amodel that uses random access memory but only Boolean-valuedone-dimensional arrays.
4. The most expensive step is to translate from random access mem-ory to the sequential memory model of NAND-TM/Turing Ma-chines. As we did in the proof of Theorem 8.1 (see Section 8.2), wecan simulate accessing an array Foo at some location encoded in anarray Bar by:
a. Copying Bar to some temporary array Tempb. Having an array Index which is initially all zeros except 1 at the
first location.c. Repeating the following until Temp encodes the number 0:
(Number of repetitions is at most đ (đ).)âą Decrease the number encoded temp by 1. (Take number of steps
polynomial in â = âlogđ (đ)â.)âą Decrease i until it is equal to 0. (Take đ(đ (đ) steps.)âą Scan Index until we reach the point in which it equals 1 and
then change this 1 to 0 and go one step further and write 1 inthis location. (Takes đ(đ (đ)) steps.)
d. When we are done we know that if we scan Index until we reachthe point in which Index[i]= 1 then i contains the value thatwas encoded by Bar (Takes đ(đ (đ) steps.)
The total cost for each such operation is đ(đ (đ)2+đ (đ)đđđđŠ(logđ (đ))) =đ(đ (đ)2) steps.In sum, we simulate a single step of NAND-RAM usingđ(đ (đ)2đđđđŠ(logđ (đ))) steps of NAND-TM, and hence the total
simulation time is đ(đ (đ)3đđđđŠ(logđ (đ))) which is smaller than đ (đ)4for sufficiently large đ.
modeling running time 427
RRemark 13.6 â Nice time bounds. When consideringgeneral time bounds such we need to make sure torule out some âpathologicalâ cases such as functions đthat donât give enough time for the algorithm to readthe input, or functions where the time bound itself isuncomputable. We say that a function đ ⶠâ â â isa nice time bound function (or nice function for short)if for every đ â â, đ (đ) â„ đ (i.e., đ allows enoughtime to read the input), for every đâČ â„ đ, đ (đâČ) â„ đ (đ)(i.e., đ allows more time on longer inputs), and themap đč(đ„) = 1đ (|đ„|) (i.e., mapping a string of lengthđ to a sequence of đ (đ) ones) can be computed by aNAND-RAM program in đ(đ (đ)) time.All the ânormalâ time complexity bounds we en-counter in applications such as đ (đ) = 100đ,đ (đ) = đ2 logđ,đ (đ) = 2âđ, etc. are âniceâ.Hence from now on we will only care about theclass TIME(đ (đ)) when đ is a âniceâ function. Thecomputability condition is in particular typically easilysatisfied. For example, for arithmetic functions suchas đ (đ) = đ3, we can typically compute the binaryrepresentation of đ (đ) in time polynomial in the num-ber of bits of đ (đ) and hence poly-logarithmic in đ (đ).Hence the time to write the string 1đ (đ) in such caseswill be đ (đ) + đđđđŠ(logđ (đ)) = đ(đ (đ)).
13.3 EXTENDED CHURCH-TURING THESIS (DISCUSSION)
Theorem 13.5 shows that the computational models of Turing Machinesand RAMMachines / NAND-RAM programs are equivalent up to poly-nomial factors in the running time. Other examples of polynomiallyequivalent models include:
âą All standard programming languages, including C/Python/-JavaScript/Lisp/etc.
âą The đ calculus (see also Section 13.8).
âą Cellular automata
âą Parallel computers
âą Biological computing devices such as DNA-based computers.
The Extended Church Turing Thesis is the statement that this is truefor all physically realizable computing models. In other words, theextended Church Turing thesis says that for every scalable computingdevice đ¶ (which has a finite description but can be in principle usedto run computation on arbitrarily large inputs), there is some con-stant đ such that for every function đč ⶠ0, 1â â 0, 1 that đ¶ can
428 introduction to theoretical computer science
compute on đ length inputs using an đ(đ) amount of physical re-sources, đč is in TIME(đ(đ)đ). This is a strengthening of the (âplainâ)Church-Turing Thesis, discussed in Section 8.8, which states that theset of computable functions is the same for all physically realizablemodels, but without requiring the overhead in the simulation betweendifferent models to be at most polynomial.
All the current constructions of scalable computational models andprogramming languages conform to the Extended Church-TuringThesis, in the sense that they can be simulated with polynomial over-head by Turing Machines (and hence also by NAND-TM or NAND-RAM programs). Consequently, the classes P and EXP are robust tothe choice of model, and we can use the programming language ofour choice, or high level descriptions of an algorithm, to determinewhether or not a problem is in P.
Like the Church-Turing thesis itself, the extended Church-Turingthesis is in the asymptotic setting and does not directly yield an ex-perimentally testable prediction. However, it can be instantiated withmore concrete bounds on the overhead, yielding experimentally-testable predictions such as the Physical Extended Church-Turing Thesiswe mentioned in Section 5.6.
In the last hundred+ years of studying and mechanizing com-putation, no one has yet constructed a scalable computing devicethat violates the extended Church Turing Thesis. However, quan-tum computing, if realized, will pose a serious challenge to the ex-tended Church-Turing Thesis (see Chapter 23). However, even ifthe promises of quantum computing are fully realized, the extendedChurch-Turing thesis is âmorallyâ correct, in the sense that, while wedo need to adapt the thesis to account for the possibility of quantumcomputing, its broad outline remains unchanged. We are still ableto model computation mathematically, we can still treat programsas strings and have a universal program, we still have time hierarchyand uncomputability results, and there is still no reason to doubt the(âplainâ) Church-Turing thesis. Moreover, the prospect of quantumcomputing does not seem to make a difference for the time complexityof many (though not all!) of the concrete problems that we care about.In particular, as far as we know, out of all the example problems men-tioned in Chapter 12 the complexity of only oneâ integer factoringâis affected by modifying our model to include quantum computers aswell.
modeling running time 429
Figure 13.5: The universal NAND-RAM programđ simulates an input NAND-RAM program đby storing all of đ âs variables inside a single arrayVars of đ . If đ has đĄ variables, then the array Varsis divided into blocks of length đĄ, where the đ-thcoordinate of the đ-th block contains the đ-th elementof the đ-th array of đ . If the đ-th variable of đ isscalar, then we just store its value in the zeroth blockof Vars.
13.4 EFFICIENT UNIVERSAL MACHINE: A NAND-RAM INTER-PRETER IN NAND-RAM
We have seen in Theorem 9.1 the âuniversal Turing Machineâ. Exam-ining that proof, and combining it with Theorem 13.5 , we can see thatthe program đ has a polynomial overhead, in the sense that it can sim-ulate đ steps of a given NAND-TM (or NAND-RAM) program đ onan input đ„ in đ(đ 4) steps. But in fact, by directly simulating NAND-RAM programs we can do better with only a constant multiplicativeoverhead. That is, there is a universal NAND-RAM program đ such thatfor every NAND-RAM program đ , đ simulates đ steps of đ usingonly đ(đ ) steps. (The implicit constant in the đ notation can dependon the program đ but does not depend on the length of the input.)
Theorem 13.7 â Efficient universality of NAND-RAM. There exists a NAND-RAM program đ satisfying the following:
1. (đ is a universal NAND-RAM program.) For every NAND-RAMprogram đ and input đ„, đ(đ , đ„) = đ(đ„) where by đ(đ , đ„) wedenote the output of đ on a string encoding the pair (đ , đ„).
2. (đ is efficient.) There are some constants đ, đ such that for ev-ery NAND-RAM program đ , if đ halts on input đ„ after mostđ steps, then đ(đ , đ„) halts after at most đ¶ â đ steps where𶠆đ|đ |đ.
PAs in the case of Theorem 13.5, the proof of Theo-rem 13.7 is not very deep and so it is more importantto understand its statement. Specifically, if you under-stand how you would go about writing an interpreterfor NAND-RAM using a modern programming lan-guage such as Python, then you know everything youneed to know about the proof of this theorem.
Proof of Theorem 13.7. To present a universal NAND-RAM programin full we would need to describe a precise representation scheme,as well as the full NAND-RAM instructions for the program. Whilethis can be done, it is more important to focus on the main ideas, andso we just sketch the proof here. A specification of NAND-RAM isgiven in the appendix, and for the purposes of this simulation, we cansimply use the representation of the code NAND-RAM as an ASCIIstring.
430 introduction to theoretical computer science
The program đ gets as input a NAND-RAM program đ and aninput đ„ and simulates đ one step at a time. To do so, đ does the fol-lowing:
1. đ maintains variables program_counter, and number_steps for thecurrent line to be executed and the number of steps executed so far.
2. đ initially scans the code of đ to find the number đĄ of unique vari-able names that đ uses. It will translate each variable name into anumber between 0 and đĄ â 1 and use an array Program to store đ âscode where for every line â, Program[â] will store the â-th line of đwhere the variable names have been translated to numbers. (Moreconcretely, we will use a constant number of arrays to separatelyencode the operation used in this line, and the variable names andindices of the operands.)
3. đ maintains a single array Vars that contains all the values of đ âsvariables. We divide Vars into blocks of length đĄ. If đ is a num-ber corresponding to an array variable Foo of đ , then we storeFoo[0] in Vars[đ ], we store Foo[1] in Var_values[đĄ + đ ], Foo[2]in Vars[2đĄ + đ ] and so on and so forth (see Fig. 13.5). Generally,ifthe đ -th variable of đ is a scalar variable, then its value will bestored in location Vars[đ ]. If it is an array variable then the value ofits đ-th element will be stored in location Vars[đĄ â đ + đ ].
4. To simulate a single step of đ , the program đ recovers from Pro-gram the line corresponding to program_counter and executes it.Since NAND-RAM has a constant number of arithmetic operations,we can implement the logic of which operation to execute using asequence of a constant number of if-then-elseâs. Retrieving fromVars the values of the operands of each instruction can be doneusing a constant number of arithmetic operations.
The setup stages take only a constant (depending on |đ | but noton the input đ„) number of steps. Once we are done with the setup, tosimulate a single step of đ , we just need to retrieve the correspondingline and do a constant number of âif elsesâ and accesses to Vars tosimulate it. Hence the total running time to simulate đ steps of theprogram đ is at most đ(đ ) when suppressing constants that dependon the program đ .
13.4.1 Timed Universal Turing MachineOne corollary of the efficient universal machine is the following.Given any Turing Machine đ , input đ„, and âstep budgetâ đ , we cansimulate the execution of đ for đ steps in time that is polynomial in
modeling running time 431
Figure 13.6: The timed universal Turing Machine takesas input a Turing machine đ , an input đ„, and a timebound đ , and outputs đ(đ„) if đ halts within atmost đ steps. Theorem 13.8 states that there is such amachine that runs in time polynomial in đ .
đ . Formally, we define a function TIMEDEVAL that takes the threeparameters đ , đ„, and the time budget, and outputs đ(đ„) if đ haltswithin at most đ steps, and outputs 0 otherwise. The timed univer-sal Turing Machine computes TIMEDEVAL in polynomial time (seeFig. 13.6). (Since we measure time as a function of the input length,we define TIMEDEVAL as taking the input đ represented in unary: astring of đ ones.)
Theorem 13.8 â Timed Universal Turing Machine. Let TIMEDEVAL â¶0, 1â â 0, 1â be the function defined as
TIMEDEVAL(đ, đ„, 1đ ) = â§âšâ©đ(đ„) đ halts within †đ steps on đ„0 otherwise.
(13.3)Then TIMEDEVAL â P.
Proof. We only sketch the proof since the result follows fairly directlyfrom Theorem 13.5 and Theorem 13.7. By Theorem 13.5 to show thatTIMEDEVAL â P, it suffices to give a polynomial-time NAND-RAMprogram to compute TIMEDEVAL.
Such a program can be obtained as follows. Given a Turing Ma-chine đ , by Theorem 13.5 we can transform it in time polynomial inits description into a functionally-equivalent NAND-RAM programđ such that the execution of đ on đ steps can be simulated by theexecution of đ on đ â đ steps. We can then run the universal NAND-RAM machine of Theorem 13.7 to simulate đ for đ â đ steps, usingđ(đ ) time, and output 0 if the execution did not halt within this bud-get. This shows that TIMEDEVAL can be computed by a NAND-RAMprogram in time polynomial in |đ| and linear in đ , which meansTIMEDEVAL â P.
13.5 THE TIME HIERARCHY THEOREM
Some functions are uncomputable, but are there functions that canbe computed, but only at an exorbitant cost? For example, is there afunction that can be computed in time 2đ, but can not be computed intime 20.9đ? It turns out that the answer is Yes:
Theorem 13.9 â Time Hierarchy Theorem. For every nice function đ â¶â â â, there is a function đč ⶠ0, 1â â 0, 1 in TIME(đ (đ) logđ)⧔TIME(đ (đ)).There is nothing special about logđ, and we could have used any
other efficiently computable function that tends to infinity with đ.
432 introduction to theoretical computer science
Big Idea 19 If we have more time, we can compute more functions.
RRemark 13.10 â Simpler corollary of the time hierarchytheorem. The generality of the time hierarchy theoremcan make its proof a little hard to read. It might beeasier to follow the proof if you first try to prove byyourself the easier statement P â EXP.You can do so by showing that the following functionđč ⶠ0, 1â â¶â 0, 1 is in EXP ⧔ P: for every TuringMachine đ and input đ„, đč(đ, đ„) = 1 if and only ifđ halts on đ„ within at most |đ„|log |đ„| steps. One canshow that đč â TIME(đđ(logđ)) â EXP using theuniversal Turing machine (or the efficient universalNAND-RAM program of Theorem 13.7). On the otherharnd, we can use similar ideas to those used to showthe uncomputability of HALT in Section 9.3.2 to provethat đč â P.
Figure 13.7: The Time Hierarchy Theorem (Theo-rem 13.9) states that all of these classes are distinct.
Proof Idea:
In the proof of Theorem 9.6 (the uncomputability of the Haltingproblem), we have shown that the function HALT cannot be com-puted in any finite time. An examination of the proof shows that itgives something stronger. Namely, the proof shows that if we fix ourcomputational budget to be đ steps, then not only we canât distinguishbetween programs that halt and those that do not, but cannot evendistinguish between programs that halt within at most đ âČ steps andthose that take more than that (where đ âČ is some number dependingon đ ). Therefore, the proof of Theorem 13.9 follows the ideas of the
modeling running time 433
uncomputability of the halting problem, but again with a more carefulaccounting of the running time.âProof of Theorem 13.9. Our proof is inspired by the proof of the un-computability of the halting problem. Specifically, for every functionđ as in the theoremâs statement, we define the Bounded Halting func-tion HALTđ as follows. The input to HALTđ is a pair (đ , đ„) such that|đ | †log log |đ„| encodes some NAND-RAM program. We define
HALTđ (đ , đ„) = â§âšâ©1, đ halts on đ„ within †100 â đ (|đ | + |đ„|) steps0, otherwise.
(13.4)(The constant 100 and the function log logđ are rather arbitrary, andare chosen for convenience in this proof.)
Theorem 13.9 is an immediate consequence of the following twoclaims:
Claim 1: HALTđ â TIME(đ (đ) â logđ)andClaim 2: HALTđ â TIME(đ (đ)).Please make sure you understand why indeed the theorem follows
directly from the combination of these two claims. We now turn toproving them.
Proof of claim 1: We can easily check in linear time whether aninput has the form đ , đ„ where |đ | †log log |đ„|. Since đ (â ) is a nicefunction, we can evaluate it in đ(đ (đ)) time. Thus, we can computeHALTđ (đ , đ„) as follows:
1. Compute đ0 = đ (|đ | + |đ„|) in đ(đ0) steps.
2. Use the universal NAND-RAM program of Theorem 13.7 to simu-late 100â đ0 steps of đ on the input đ„ using at most đđđđŠ(|đ |)đ0 steps.(Recall that we use đđđđŠ(â) to denote a quantity that is bounded byđâđ for some constants đ, đ.)
3. If đ halts within these 100 â đ0 steps then output 1, else output 0.The length of the input is đ = |đ | + |đ„|. Since |đ„| †đ and(log log |đ„|)đ = đ(log |đ„|) for every đ, the running time will beđ(đ (|đ | + |đ„|) logđ) and hence the above algorithm demonstrates that
HALTđ â TIME(đ (đ) â logđ), completing the proof of Claim 1.Proof of claim 2: This proof is the heart of Theorem 13.9, and is
very reminiscent of the proof that HALT is not computable. Assume,for the sake of contradiction, that there is some NAND-RAM programđ â that computes HALTđ (đ , đ„) within đ (|đ | + |đ„|) steps. We are going
434 introduction to theoretical computer science
to show a contradiction by creating a program đ and showing thatunder our assumptions, if đ runs for less than đ (đ) steps when given(a padded version of) its own code as input then it actually runs formore than đ (đ) steps and vice versa. (It is worth re-reading the lastsentence twice or thrice to make sure you understand this logic. It isvery similar to the direct proof of the uncomputability of the haltingproblem where we obtained a contradiction by using an assumedâhalting solverâ to construct a program that, given its own code asinput, halts if and only if it does not halt.)
We will define đâ to be the program that on input a string đ§ doesthe following:
1. If đ§ does not have the form đ§ = đ1đ where đ represents a NAND-RAM program and |đ | < 0.1 log logđ then return 0. (Recall that1đ denotes the string of đ ones.)
2. Compute đ = đ â(đ , đ§) (at a cost of at most đ (|đ | + |đ§|) steps, underour assumptions).
3. If đ = 1 then đâ goes into an infinite loop, otherwise it halts.
Let â be the length description of đâ as a string, and let đ be largerthan 221000â . We will reach a contradiction by splitting into cases ac-cording to whether or not HALTđ (đâ, đâ1đ) equals 0 or 1.
On the one hand, if HALTđ (đâ, đâ1đ) = 1, then under our as-sumption that đ â computes HALTđ , đâ will go into an infinite loopon input đ§ = đâ1đ, and hence in particular đâ does not halt within100đ (|đâ| + đ) steps on the input đ§. But this contradicts our assump-tion that HALTđ (đâ, đâ1đ) = 1.
This means that it must hold that HALTđ (đâ, đâ1đ) = 0. Butin this case, since we assume đ â computes HALTđ , đâ does not doanything in phase 3 of its computation, and so the only computa-tion costs come in phases 1 and 2 of the computation. It is not hardto verify that Phase 1 can be done in linear and in fact less than 5|đ§|steps. Phase 2 involves executing đ â, which under our assumptionrequires đ (|đâ| + đ) steps. In total we can perform both phases inless than 10đ (|đâ| + đ) in steps, which by definition means thatHALTđ (đâ, đâ1đ) = 1, but this is of course a contradiction. Thiscompletes the proof of Claim 2 and hence of Theorem 13.9.
Solved Exercise 13.3 â P vs EXP. Prove that P â EXP.
Solution:
modeling running time 435
Figure 13.8: Some complexity classes and some of thefunctions we know (or conjecture) to be contained inthem.
We show why this statement follows from the time hierarchytheorem, but it can be an instructive exercise to prove it directly,see Remark 13.10. We need to show that there exists đč â EXP ⧔ P.Let đ (đ) = đlogđ and đ âČ(đ) = đlogđ/2. Both are nice functions.Since đ (đ)/đ âČ(đ) = đ(logđ), by Theorem 13.9 there exists someđč in TIME(đ âČ(đ)) â TIME(đ (đ)). Since for sufficiently large đ,2đ > đlogđ, đč â TIME(2đ) â EXP. On the other hand, đč â P.Indeed, suppose otherwise that there was a constant đ > 0 anda Turing Machine computing đč on đ-length input in at most đđsteps for all sufficiently large đ. Then since for đ large enoughđđ < đlogđ/2, it would have followed that đč â TIME(đlogđ/2)contradicting our choice of đč .
The time hierarchy theorem tells us that there are functions we cancompute in đ(đ2) time but not đ(đ), in 2đ time, but not 2âđ, etc.. Inparticular there are most definitely functions that we can compute intime 2đ but not đ(đ). We have seen that we have no shortage of natu-ral functions for which the best known algorithm requires roughly 2đtime, and that many people have invested significant effort in tryingto improve that. However, unlike in the finite vs. infinite case, for allof the examples above at the moment we do not know how to ruleout even an đ(đ) time algorithm. We will however see that there is asingle unproven conjecture that would imply such a result for most ofthese problems.
The time hierarchy theorem relies on the existence of an efficientuniversal NAND-RAM program, as proven in Theorem 13.7. Forother models such as Turing Machines we have similar time hierarchyresults showing that there are functions computable in time đ (đ) andnot in time đ (đ)/đ(đ) where đ(đ) corresponds to the overhead in thecorresponding universal machine.
13.6 NON UNIFORM COMPUTATION
We have now seen two measures of âcomputation costâ for functions.In Section 4.6 we defined the complexity of computing finite functionsusing circuits / straightline programs. Specifically, for a finite functionđ ⶠ0, 1đ â 0, 1 and number đ â â, đ â SIZE(đ ) if there is a circuitof at most đ NAND gates (or equivalently a đ -line NAND-CIRCprogram) that computes đ. To relate this to the classes TIME(đ (đ))defined in this chapter we first need to extend the class SIZE(đ (đ))from finite functions to functions with unbounded input length.
Definition 13.11 â Non uniform computation. Let đč ⶠ0, 1â â 0, 1and đ ⶠâ â â be a nice time bound. For every đ â â, define
436 introduction to theoretical computer science
Figure 13.9: We can think of an infinite functionđč ⶠ0, 1â â 0, 1 as a collection of finite functionsđč0, đč1, đč2, ⊠where đčđ ⶠ0, 1đ â 0, 1 is therestriction of đč to inputs of length đ. We say đč is inP/poly if for every đ, the function đčđ is computableby a polynomial size NAND-CIRC program, orequivalently, a polynomial sized Boolean circuit.
đčđ ⶠ0, 1đ â 0, 1 to be the restriction of đč to inputs of size đ.That is, đčđ is the function mapping 0, 1đ to 0, 1 such that forevery đ„ â 0, 1đ, đčđ(đ„) = đč(đ„).
We say that đč is non-uniformly computable in at most đ (đ) size, de-noted by đč â SIZE(đ (đ)) if there exists a sequence (đ¶0, đ¶1, đ¶2, âŠ)of NAND circuits such that:
âą For every đ â â, đ¶đ computes the function đčđâą For every sufficiently large đ, đ¶đ has at most đ (đ) gates.
The non uniform analog to the class P is the class P/poly defined as
P/poly = âȘđââSIZE(đđ) . (13.5)There is a big difference between non uniform computation and uni-form complexity classes such as TIME(đ (đ)) or P. The conditionđč â P means that there is a single Turing machine đ that computesđč on all inputs in polynomial time. The condition đč â P/poly onlymeans that for every input length đ there can be a different circuit đ¶đthat computes đč using polynomially many gates on inputs of theselengths. As we will see, đč â P/poly does not necessarily imply thatđč â P. However, the other direction is true:
Theorem 13.12 â Nonuniform computation contains uniform computa-
tion. There is some đ â â s.t. for every nice đ ⶠâ â â andđč ⶠ0, 1â â 0, 1,TIME(đ (đ)) â SIZE(đ (đ)đ) . (13.6)
In particular, Theorem 13.12 shows that for every đ, TIME(đđ) âSIZE(đđđ) and hence P â P/poly.Proof Idea:
The idea behind the proof is to âunroll the loopâ. Specifically, wewill use the programming language variants of non-uniform and uni-form computation: namely NAND-CIRC and NAND-TM. The maindifference between the two is that NAND-TM has loops. However, forevery fixed đ, if we know that a NAND-TM program runs in at mostđ (đ) steps, then we can replace its loop by simply âcopying and past-ingâ its code đ (đ) times, similar to how in Python we can replace codesuch as
for i in range(4):print(i)
with the âloop freeâ code
modeling running time 437
print(0)print(1)print(2)print(3)
To make this idea into an actual proof we need to tackle one tech-nical difficulty, and this is to ensure that the NAND-TM program isoblivious in the sense that the value of the index variable i in the đ-thiteration of the loop will depend only on đ and not on the contents ofthe input. We make a digression to do just that in Section 13.6.1 andthen complete the proof of Theorem 13.12.â13.6.1 Oblivious NAND-TM programsOur approach for proving Theorem 13.12 involves âunrolling theloopâ. For example, consider the following NAND-TM to compute theXOR function on inputs of arbitrary length:
temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)MODANDJUMP(X_nonblank[i],X_nonblank[i])
Setting (as an example) đ = 3, we can attempt to translate thisNAND-TM program into a NAND-CIRC program for computingXOR3 ⶠ0, 13 â 0, 1 by simply âcopying and pastingâ the loopthree times (dropping the MODANDJMP line):
temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)
438 introduction to theoretical computer science
Figure 13.10: A NAND circuit for XOR3 obtained byâunrolling the loopâ of the NAND-TM program forcomputing XOR three times.
temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)
However, the above is still not a valid NAND-CIRC program sinceit contains references to the special variable i. To make it into a validNAND-CIRC program, we replace references to i in the first iterationwith 0, references in the second iteration with 1, and references in thethird iteration with 2. (We also create a variable zero and use it for thefirst time any variable is instantiated, as well as remove assignments tonon-output variables that are never used later on.) The resulting pro-gram is a standard âloop free and index freeâ NAND-CIRC programthat computes XOR3 (see also Fig. 13.10):
temp_0 = NAND(X[0],X[0])one = NAND(X[0],temp_0)zero = NAND(one,one)temp_2 = NAND(X[0],zero)temp_3 = NAND(X[0],temp_2)temp_4 = NAND(zero,temp_2)Y[0] = NAND(temp_3,temp_4)temp_2 = NAND(X[1],Y[0])temp_3 = NAND(X[1],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)temp_2 = NAND(X[2],Y[0])temp_3 = NAND(X[2],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)
Key to this transformation was the fact that in our original NAND-TM program for XOR, regardless of whether the input is 011, 100, orany other string, the index variable i is guaranteed to equal 0 in thefirst iteration, 1 in the second iteration, 2 in the third iteration, and soon and so forth. The particular sequence 0, 1, 2, ⊠is immaterial: thecrucial property is that the NAND-TM program for XOR is obliviousin the sense that the value of the index i in the đ-th iteration dependsonly on đ and does not depend on the particular choice of the input.Luckily, it is possible to transform every NAND-TM program intoa functionally equivalent oblivious program with at most quadraticoverhead. (Similarly we can transform any Turing machine into afunctionally equivalent oblivious Turing machine, see Exercise 13.6.)
modeling running time 439
Figure 13.11: We simulate a đ (đ)-time NAND-TMprogram đ âČ with an oblivious NAND-TM program đby adding special arrays Atstart and Atend to markpositions 0 and đ â 1 respectively. The program đwill simply âsweepâ its arrays from right to left andback again. If the original program đ âČ would havemoved i in a different direction then we wait đ(đ )steps until we reach the same point back again, and sođ runs in đ(đ (đ)2) time.
Theorem 13.13 â Making NAND-TM oblivious. Let đ ⶠâ â â be a nicefunction and let đč â TIMETM(đ (đ)). Then there is a NAND-TMprogram đ that computes đč in đ(đ (đ)2) steps and satisfying thefollowing. For every đ â â there is a sequence đ0, đ1, ⊠, đđâ1 suchthat for every đ„ â 0, 1đ, if đ is executed on input đ„ then in theđ-th iteration the variable i is equal to đđ.In other words, Theorem 13.13 implies that if we can compute đč inđ (đ) steps, then we can compute it in đ(đ (đ)2) steps with a programđ in which the position of i in the đ-th iteration depends only on đ
and the length of the input, and not on the contents of the input. Sucha program can be easily translated into a NAND-CIRC program ofđ(đ (đ)2) lines by âunrolling the loopâ.Proof Idea:
We can translate any NAND-TM program đ âČ into an obliviousprogram đ by making đ âsweepâ its arrays. That is, the index i inđ will always move all the way from position 0 to position đ (đ) â 1and back again. We can then simulate the program đ âČ with at mostđ (đ) overhead: if đ âČ wants to move i left when we are in a rightwardsweep then we simply wait the at most 2đ (đ) steps until the next timewe are back in the same position while sweeping to the left.âProof of Theorem 13.13. Let đ âČ be a NAND-TM program computing đčin đ (đ) steps. We construct an oblivious NAND-TM program đ forcomputing đč as follows (see also Fig. 13.11).
1. On input đ„, đ will compute đ = đ (|đ„|) and set up arrays Atstartand Atend satisfying Atstart[0]= 1 and Atstart[đ]= 0 for đ > 0and Atend[đ â 1]= 1 and Atend[i]= 0 for all đ â đ â 1. We can dothis because đ is a nice function. Note that since this computationdoes not depend on đ„ but only on its length, it is oblivious.
2. đ will also have a special array Marker initialized to all zeroes.
3. The index variable of đ will change direction of movement tothe right whenever Atstart[i]= 1 and to the left wheneverAtend[i]= 1.
4. The program đ simulates the execution of đ âČ. However, if theMODANDJMP instruction in đ âČ attempts to move to the right when đis moving left (or vice versa) then đ will set Marker[i] to 1 andenter into a special âwaiting modeâ. In this mode đ will wait untilthe next time in which Marker[i]= 1 (at the next sweep) at whichpoints đ zeroes Marker[i] and continues with the simulation. In
440 introduction to theoretical computer science
Figure 13.12: The function UNROLL takes as input aTuring Machine đ , an input length parameter đ, astep budget parameter đ , and outputs a circuit đ¶ ofsize đđđđŠ(đ ) that takes đ bits of inputs and outputsđ(đ„) if đ halts on đ„ within at most đ steps.
the worst case this will take 2đ (đ) steps (if đ has to go all the wayfrom one end to the other and back again.)
5. We also modify đ to ensure it ends the computation after simu-lating exactly đ (đ) steps of đ âČ, adding âdummy stepsâ if đ âČ endsearly.
We see that đ simulates the execution of đ âČ with an overhead ofđ(đ (đ)) steps of đ per one step of đ âČ, hence completing the proof.
Theorem 13.13 implies Theorem 13.12. Indeed, if đ is a đ-line obliv-ious NAND-TM program computing đč in time đ (đ) then for every đwe can obtain a NAND-CIRC program of (đ â 1) â đ (đ) lines by simplymaking đ (đ) copies of đ (dropping the final MODANDJMP line). In theđ-th copy we replace all references of the form Foo[i] to foo_đđ wheređđ is the value of i in the đ-th iteration.
13.6.2 âUnrolling the loopâ: algorithmic transformation of Turing Machinesto circuits
The proof of Theorem 13.12 is algorithmic, in the sense that the proofyields a polynomial-time algorithm that given a Turing Machine đand parameters đ and đ, produces a circuit of đ(đ 2) gates that agreeswith đ on all inputs đ„ â 0, 1đ (as long as đ runs for less than đsteps these inputs.) We record this fact in the following theorem, sinceit will be useful for us later on:
Theorem 13.14 â Turing-machine to circuit compiler. There is algorithmUNROLL such that for every Turing Machine đ and numbers đ, đ ,UNROLL(đ, 1đ , 1đ) runs for đđđđŠ(|đ|, đ , đ) steps and outputs aNAND circuit đ¶ with đ inputs, đ(đ 2) gates, and one output, suchthat đ¶(đ„) = â§âšâ©đŠ đ halts in †đ steps and outputs đŠ0 otherwise
. (13.7)
Proof. We only sketch the proof since it follows by directly translat-ing the proof of Theorem 13.12 into an algorithm together with thesimulation of Turing machines by NAND-TM programs (see alsoFig. 13.13). Specifically, UNROLL does the following:
1. Transform the Turing Machine đ into an equivalent NAND-TMprogram đ .
2. Transform the NAND-TM program đ into an equivalent obliviousprogram đ âČ following the proof of Theorem 13.13. The program đ âČtakes đ âČ = đ(đ 2) steps to simulate đ steps of đ .
modeling running time 441
3. âUnroll the loopâ of đ âČ by obtaining a NAND-CIRC program ofđ(đ âČ) lines (or equivalently a NAND circuit with đ(đ 2) gates)corresponding to the execution of đ âČ iterations of đ âČ.
Figure 13.13: We can transform a Turing Machine đ ,input length parameter đ, and time bound đ into anđ(đ 2) sized NAND circuit that agrees with đ on allinputs đ„ â 0, 1đ on which đ halts in at most đsteps. The transformation is obtained by first usingthe equivalence of Turing Machines and NAND-TM programs đ , then turning đ into an equivalentoblivious NAND-TM program đ âČ via Theorem 13.13,then âunrollingâ đ(đ 2) iterations of the loop ofđ âČ to obtain an đ(đ 2) line NAND-CIRC programthat agrees with đ âČ on length đ inputs, and finallytranslating this program into an equivalent circuit.
Big Idea 20 By âunrolling the loopâ we can transform an al-gorithm that takes đ (đ) steps to compute đč into a circuit that usesđđđđŠ(đ (đ)) gates to compute the restriction of đč to 0, 1đ.
PReviewing the transformations described in Fig. 13.13,as well as solving the following two exercises is a greatway to get more comfort with non-uniform complexityand in particular with P/poly and its relation to P.
Solved Exercise 13.4 â Alternative characterization of P. Prove that for everyđč ⶠ0, 1â â 0, 1, đč â P if and only if there is a polynomial-time Turing Machine đ such that for every đ â â, đ(1đ) outputs adescription of an đ input circuit đ¶đ that computes the restriction đčđof đč to inputs in 0, 1đ.
Solution:
We start with the âifâ direction. Suppose that there is a polynomial-time Turing Machine đ that on input 1đ outputs a circuit đ¶đ that
442 introduction to theoretical computer science
computes đčđ. Then the following is a polynomial-time TuringMachine đ âČ to compute đč . On input đ„ â 0, 1â, đ âČ will:
1. Let đ = |đ„| and compute đ¶đ = đ(1đ).2. Return the evaluation of đ¶đ on đ„.
Since we can evaluate a Boolean circuit on an input in poly-nomial time, đ âČ runs in polynomial time and computes đč(đ„) onevery input đ„.
For the âonly ifâ direction, if đ âČ is a Turing Machine that com-putes đč in polynomial-time, then (applying the equivalence of Tur-ing Machines and NAND-TM as well as Theorem 13.13) there isalso an oblivious NAND-TM program đ that computes đč in timeđ(đ) for some polynomial đ. We can now define đ to be the TuringMachine that on input 1đ outputs the NAND circuit obtained byâunrolling the loopâ of đ for đ(đ) iterations. The resulting NANDcircuit computes đčđ and has đ(đ(đ)) gates. It can also be trans-formed to a Boolean circuit with đ(đ(đ)) AND/OR/NOT gates.
Solved Exercise 13.5 â P/poly characterization by advice. Let đč ⶠ0, 1â â0, 1. Then đč â P/poly if and only if there exists a polynomial đ ⶠâ ââ, a polynomial-time Turing Machine đ and a sequence đđđââ ofstrings, such that for every đ â â:
âą |đđ| †đ(đ)âą For every đ„ â 0, 1đ, đ(đđ, đ„) = đč(đ„).
Solution:
We only sketch the proof. For the âonly ifâ direction, if đč âP/poly then we can use for đđ simply the description of the cor-responding circuit đ¶đ and for đ the program that computes inpolynomial time the evaluation of a circuit on its input.
For the âifâ direction, we can use the same âunrolling the loopâtechnique of Theorem 13.12 to show that if đ is a polynomial-timeNAND-TM program, then for every đ â â, the map đ„ ⊠đ(đđ, đ„)can be computed by a polynomial size NAND-CIRC program đđ.
13.6.3 Can uniform algorithms simulate non uniform ones?Theorem 13.12 shows that every function in TIME(đ (đ)) is inSIZE(đđđđŠ(đ (đ))). One can ask if there is an inverse relation. Supposethat đč is such that đčđ has a âshortâ NAND-CIRC program for everyđ. Can we say that it must be in TIME(đ (đ)) for some âsmallâ đ ? The
modeling running time 443
answer is an emphatic no. Not only is P/poly not contained in P, in factP/poly contains functions that are uncomputable!
Theorem 13.15 â P/poly contains uncomputable functions. There exists anuncomputable function đč ⶠ0, 1â â 0, 1 such that đč â P/poly.
Proof Idea:
Since P/poly corresponds to non uniform computation, a functionđč is in P/poly if for every đ â â, the restriction đčđ to inputs of lengthđ has a small circuit/program, even if the circuits for different valuesof đ are completely different from one another. In particular, if đč hasthe property that for every equal-length inputs đ„ and đ„âČ, đč(đ„) =đč(đ„âČ) then this means that đčđ is either the constant function zeroor the constant function one for every đ â â. Since the constantfunction has a (very!) small circuit, such a function đč will alwaysbe in P/poly (indeed even in smaller classes). Yet by a reduction fromthe Halting problem, we can obtain a function with this property thatis uncomputable.âProof of Theorem 13.15. Consider the following âunary halting func-tionâ UH ⶠ0, 1â â 0, 1 defined as follows. We let đ ⶠâ â 0, 1âbe the function that on input đ â â, outputs the string that corre-sponds to the binary representation of the number đ without the mostsignificant 1 digit. Note that đ is onto. For every đ„ â 0, 1, we de-fine UH(đ„) = HALTONZERO(đ(|đ„|)). That is, if đ is the length of đ„,then UH(đ„) = 1 if and only if the string đ(đ) encodes a NAND-TMprogram that halts on the input 0.
UH is uncomputable, since otherwise we could computeHALTONZERO by transforming the input program đ into the integerđ such that đ = đ(đ) and then running UH(1đ) (i.e., UH on the stringof đ ones). On the other hand, for every đ, UHđ(đ„) is either equalto 0 for all inputs đ„ or equal to 1 on all inputs đ„, and hence can becomputed by a NAND-CIRC program of a constant number of lines.
The issue here is of course uniformity. For a function đč ⶠ0, 1â â0, 1, if đč is in TIME(đ (đ)) then we have a single algorithm thatcan compute đčđ for every đ. On the other hand, đčđ might be inSIZE(đ (đ)) for every đ using a completely different algorithm for ev-ery input length. For this reason we typically use P/poly not as a modelof efficient computation but rather as a way to model inefficient compu-tation. For example, in cryptography people often define an encryp-tion scheme to be secure if breaking it for a key of length đ requires
444 introduction to theoretical computer science
more than a polynomial number of NAND lines. Since P â P/poly,this in particular precludes a polynomial time algorithm for doing so,but there are technical reasons why working in a non uniform modelmakes more sense in cryptography. It also allows to talk about se-curity in non asymptotic terms such as a scheme having â128 bits ofsecurityâ.
While it can sometimes be a real issue, in many natural settings thedifference between uniform and non-uniform computation does notseem so important. In particular, in all the examples of problems notknown to be in P we discussed before: longest path, 3SAT, factoring,etc., these problems are also not known to be in P/poly either. Thus,for ânaturalâ functions, if you pretend that TIME(đ (đ)) is roughly thesame as SIZE(đ (đ)), you will be right more often than wrong.
Figure 13.14: Relations between P, EXP, and P/poly. Itis known that P â EXP, P â P/poly and that P/polycontains uncomputable functions (which in particularare outside of EXP). It is not known whether or notEXP â P/poly though it is believed that EXP â P/poly.
13.6.4 Uniform vs. Nonuniform computation: A recapTo summarize, the two models of computation we have described sofar are:âą Uniform models: Turing machines, NAND-TM programs, RAM ma-
chines, NAND-RAM programs, C/JavaScript/Python, etc. These mod-els include loops and unbounded memory hence a single programcan compute a function with unbounded input length.
âą Non-uniform models: Boolean Circuits or straightline programs haveno loops and can only compute finite functions. The time to executethem is exactly the number of lines or gates they contain.For a function đč ⶠ0, 1â â 0, 1 and some nice time boundđ ⶠâ â â, we know that:
âą If đč is uniformly computable in time đ (đ) then there is a sequenceof circuits đ¶1, đ¶2, ⊠where đ¶đ has đđđđŠ(đ (đ)) gates and computesđčđ (i.e., restriction of đč to 0, 1đ) for every đ.
âą The reverse direction is not necessarily true - there are examples offunctions đč ⶠ0, 1đ â 0, 1 such that đčđ can be computed byeven a constant size circuit but đč is uncomputable.
modeling running time 445
This means that non uniform complexity is more useful to establishhardness of a function than its easiness.
Chapter Recap
âą We can define the time complexity of a functionusing NAND-TM programs, and similarly to thenotion of computability, this appears to capture theinherent complexity of the function.
âą There are many natural problems that havepolynomial-time algorithms, and other naturalproblems that weâd love to solve, but for which thebest known algorithms are exponential.
âą The definition of polynomial time, and hence theclass P, is robust to the choice of model, whetherit is Turing machines, NAND-TM, NAND-RAM,modern programming languages, and many othermodels.
âą The time hierarchy theorem shows that there aresome problems that can be solved in exponential,but not in polynomial time. However, we do notknow if that is the case for the natural examplesthat we described in this lecture.
âą By âunrolling the loopâ we can show that everyfunction computable in time đ (đ) can be computedby a sequence of NAND-CIRC programs (one forevery input length) each of size at most đđđđŠ(đ (đ))
13.7 EXERCISES
Exercise 13.1 â Equivalence of different definitions of P and EXP.. Provethat the classes P and EXP defined in Definition 13.2 are equal toâȘđâ1,2,3,âŠTIME(đđ) and âȘđâ1,2,3,âŠTIME(2đđ) respectively. (Ifđ1, đ2, đ3, ⊠is a collection of sets then the set đ = âȘđâ1,2,3,âŠđđ isthe set of all elements đ such that there exists some đ â 1, 2, 3, âŠwhere đ â đđ.)
Exercise 13.2 â Robustness to representation. Theorem 13.5 shows that theclasses P and EXP are robust with respect to variations in the choiceof the computational model. This exercise shows that these classesare also robust with respect to our choice of the representation of theinput.
Specifically, let đč be a function mapping graphs to 0, 1, and letđč âČ, đč Ⳡⶠ0, 1â â 0, 1 be the functions defined as follows. For everyđ„ â 0, 1â:
446 introduction to theoretical computer science
1 Hint: This is the Turing Machine analog of Theo-rem 13.13. We replace one step of the original TM đâČcomputing đč with a âsweepâ of the obliviouss TM đin which it goes đ steps to the right and then đ stepsto the left.
âą đč âČ(đ„) = 1 iff đ„ represents a graph đș via the adjacency matrixrepresentation such that đč(đș) = 1.
âą đč âł(đ„) = 1 iff đ„ represents a graph đș via the adjacency list represen-tation such that đč(đș) = 1.Prove that đč âČ â P iff đč âł â P.More generally, for every function đč ⶠ0, 1â â 0, 1, the answer
to the question of whether đč â P (or whether đč â EXP) is unchangedby switching representations, as long as transforming one represen-tation to the other can be done in polynomial time (which essentiallyholds for all reasonable representations).
Exercise 13.3 â Boolean functions. For every function đč ⶠ0, 1â â 0, 1â,define đ”đđđ(đč) to be the function mapping 0, 1â to 0, 1 such thaton input a (string representation of a) triple (đ„, đ, đ) with đ„ â 0, 1â,đ â â and đ â 0, 1,
đ”đđđ(đč)(đ„, đ, đ) = â§âšâ©đč(đ„)đ đ = 0, đ < |đč(đ„)|1 đ = 1, đ < |đč(đ„)|0 otherwise
(13.8)
where đč(đ„)đ is the đ-th bit of the string đč(đ„).Prove that đč â P if and only if đ”đđđ(đč) â P.
Exercise 13.4 â Composition of polynomial time. Prove that ifđč, đș ⶠ0, 1â â 0, 1â are in P then their composition đč â đș,which is the function đ» s.t. đ»(đ„) = đč(đș(đ„)), is also in P.
Exercise 13.5 â Non composition of exponential time. Prove that there is someđč, đș ⶠ0, 1â â 0, 1â s.t. đč, đș â EXP but đč â đș is not in EXP.
Exercise 13.6 â Oblivious Turing Machines. We say that a Turing machine đis oblivious if there is some function đ ⶠâ Ă â â †such that for everyinput đ„ of length đ, and đĄ â â it holds that:
âą If đ takes more than đĄ steps to halt on the input đ„, then in the đĄ-th step đ âs head will be in the position đ (đ, đĄ). (Note that thisposition depends only on the length of đ„ and not its contents.)
âą If đ halts before the đĄ-th step then đ (đ, đĄ) = â1.Prove that if đč â P then there exists an oblivious Turing machine đ
that computes đč in polynomial time. See footnote for hint.1
modeling running time 447
Exercise 13.7 Let EDGE ⶠ0, 1â â 0, 1 be the function such that oninput a string representing a triple (đż, đ, đ), where đż is the adjacencylist representation of an đ vertex graph đș, and đ and đ are numbers in[đ], EDGE(đż, đ, đ) = 1 if the edge đ, đ is present in the graph. EDGEoutputs 0 on all other inputs.
1. Prove that EDGE â P.
2. Let PLANARMATRIX ⶠ0, 1â â 0, 1 be the function thaton input an adjacency matrix đŽ outputs 1 if and only if the graphrepresented by đŽ is planar (that is, can be drawn on the plane with-out edges crossing one another). For this question, you can usewithout proof the fact that PLANARMATRIX â P. Prove thatPLANARLIST â P where PLANARLIST ⶠ0, 1â â 0, 1 is thefunction that on input an adjacency list đż outputs 1 if and only if đżrepresents a planar graph.
Exercise 13.8 â Evaluate NAND circuits. Let NANDEVAL ⶠ0, 1â â0, 1 be the function such that for every string representing a pair(đ, đ„) where đ is an đ-input 1-output NAND-CIRC (not NAND-TM!)program and đ„ â 0, 1đ, NANDEVAL(đ, đ„) = đ(đ„). On all otherinputs NANDEVAL outputs 0.
Prove that NANDEVAL â P.
Exercise 13.9 â Find hard function. Let NANDHARD ⶠ0, 1â â 0, 1be the function such that on input a string representing a pair (đ, đ )where
âą đ â 0, 12đ for some đ â ââą đ â â
NANDHARD(đ, đ ) = 1 if there is no NAND-CIRC program đof at most đ lines that computes the function đč ⶠ0, 1đ â 0, 1whose truth table is the string đ . That is, NANDHARD(đ, đ ) = 1if for every NAND-CIRC program đ of at most đ lines, there existssome đ„ â 0, 1đ such that đ(đ„) â đđ„ where đđ„ denote the đ„-thecoordinate of đ , using the binary representation to identify 0, 1đwith the numbers 0, ⊠, 2đ â 1.1. Prove that NANDHARD â EXP.
2. (Challenge) Prove that there is an algorithm FINDHARD suchthat if đ is sufficiently large, then FINDHARD(1đ) runs in time22đ(đ) and outputs a string đ â 0, 12đ that is the truth table ofa function that is not contained in SIZE(2đ/(1000đ)). (In other
448 introduction to theoretical computer science
2 Hint: Use Item 1, the existence of functions requir-ing exponentially hard NAND programs, and the factthat there are only finitely many functions mapping0, 1đ to 0, 1.
words, if đ is the string output by FINDHARD(1đ) then if we letđč ⶠ0, 1đ â 0, 1 be the function such that đč(đ„) outputs the đ„-thcoordinate of đ , then đč â SIZE(2đ/(1000đ)).2
Exercise 13.10 Suppose that you are in charge of scheduling courses incomputer science in University X. In University X, computer sciencestudents wake up late, and have to work on their startups in the af-ternoon, and take long weekends with their investors. So you onlyhave two possible slots: you can schedule a course either Monday-Wednesday 11am-1pm or Tuesday-Thursday 11am-1pm.
Let SCHEDULE ⶠ0, 1â â 0, 1 be the function that takes as inputa list of courses đż and a list of conflicts đ¶ (i.e., list of pairs of coursesthat cannot share the same time slot) and outputs 1 if and only if thereis a âconflict freeâ scheduling of the courses in đż, where no pair in đ¶ isscheduled in the same time slot.
More precisely, the list đż is a list of strings (đ0, ⊠, đđâ1) and the listđ¶ is a list of pairs of the form (đđ, đđ). SCHEDULE(đż, đ¶) = 1 if andonly if there exists a partition of đ0, ⊠, đđâ1 into two parts so that thereis no pair (đđ, đđ) â đ¶ such that both đđ and đđ are in the same part.
Prove that SCHEDULE â P. As usual, you do not have to providethe full code to show that this is the case, and can describe operationsas a high level, as well as appeal to any data structures or other resultsmentioned in the book or in lecture. Note that to show that a functionđč is in P you need to both (1) present an algorithm đŽ that computesđč in polynomial time, (2) prove that đŽ does indeed run in polynomialtime, and does indeed compute the correct answer.
Try to think whether or not your algorithm extends to the casewhere there are three possible time slots.
13.8 BIBLIOGRAPHICAL NOTES
Because we are interested in the maximum number of steps for inputsof a given length, running-time as we defined it is often known asworst case complexity. The minimum number of steps (or âbest caseâcomplexity) to compute a function on length đ inputs is typically nota meaningful quantity since essentially every natural problem willhave some trivially easy instances. However, the average case complexity(i.e., complexity on a âtypicalâ or ârandomâ input) is an interestingconcept which weâll return to when we discuss cryptography. Thatsaid, worst-case complexity is the most standard and basic of thecomplexity measures, and will be our focus in most of this book.
Some lower bounds for single-tape Turing machines are given in[Maa85].
modeling running time 449
For defining efficiency in the đ calculus, one needs to be carefulabout the order of application of the reduction steps, which can matterfor computational efficiency, see for example this paper.
The notation P/poly is used for historical reasons. It was introducedby Karp and Lipton, who considered this class as corresponding tofunctions that can be computed by polynomial-time Turing Machinesthat are given for any input length đ an advice string of length polyno-mial in đ.
14Polynomial-time reductions
Consider some of the problems we have encountered in Chapter 12:
1. The 3SAT problem: deciding whether a given 3CNF formula has asatisfying assignment.
2. Finding the longest path in a graph.
3. Finding the maximum cut in a graph.
4. Solving quadratic equations over đ variables đ„0, ⊠, đ„đâ1 â â.
All of these problems have the following properties:
âą These are important problems, and people have spent significanteffort on trying to find better algorithms for them.
âą Each one of these is a search problem, whereby we search for asolution that is âgoodâ in some easy to define sense (e.g., a longpath, a satisfying assignment, etc.).
âą Each of these problems has a trivial exponential time algorithm thatinvolve enumerating all possible solutions.
âą At the moment, for all these problems the best known algorithm isnot much faster than the trivial one in the worst case.
In this chapter and in Chapter 15 we will see that, despite theirapparent differences, we can relate the computational complexity ofthese and many other problems. In fact, it turns out that the prob-lems above are computationally equivalent, in the sense that solving oneof them immediately implies solving the others. This phenomenon,known as NP completeness, is one of the surprising discoveries of the-oretical computer science, and we will see that it has far-reachingramifications.
Compiled on 8.26.2020 18:10
Learning Objectives:âą Introduce the notion of polynomial-time
reductions as a way to relate the complexity ofproblems to one another.
âą See several examples of such reductions.âą 3SAT as a basic starting point for reductions.
452 introduction to theoretical computer science
This chapter: A non-mathy overviewThis chapter introduces the concept of a polynomial time re-duction which is a central object in computational complexityand this book in particular. A polynomial-time reduction is away to reduce the task of solving one problem to another. Theway we use reductions in complexity is to argue that if thefirst problem is hard to solve efficiently, then the second mustalso be hard. We see several examples for reductions in thischapter, and reductions will be the basis for the theory of NPcompleteness that we will develop in Chapter 15.
Figure 14.1: In this chapter we show that if the 3SATproblem cannot be solved in polynomial time, thenneither can the QUADEQ, LONGESTPATH, ISETand MAXCUT problems. We do this by using thereduction paradigm showing for example âif pigs couldwhistleâ (i.e., if we had an efficient algorithm forQUADEQ) then âhorses could flyâ (i.e., we wouldhave an efficient algorithm for 3SAT.)
In this chapter we will see that for each one of the problems of find-ing a longest path in a graph, solving quadratic equations, and findingthe maximum cut, if there exists a polynomial-time algorithm for thisproblem then there exists a polynomial-time algorithm for the 3SATproblem as well. In other words, we will reduce the task of solving3SAT to each one of the above tasks. Another way to interpret theseresults is that if there does not exist a polynomial-time algorithm for3SAT then there does not exist a polynomial-time algorithm for theseother problems as well. In Chapter 15 we will see evidence (thoughno proof!) that all of the above problems do not have polynomial-timealgorithms and hence are inherently intractable.
14.1 FORMAL DEFINITIONS OF PROBLEMS
For reasons of technical convenience rather than anything substantial,we concern ourselves with decision problems (i.e., Yes/No questions) orin other words Boolean (i.e., one-bit output) functions. We model the
polynomial-time reductions 453
problems above as functions mapping 0, 1â to 0, 1 in the followingway:
3SAT. The 3SAT problem can be phrased as the function 3SAT â¶0, 1â â 0, 1 that takes as input a 3CNF formula đ (i.e., a formulaof the form đ¶0 ⧠⯠⧠đ¶đâ1 where each đ¶đ is the OR of three variablesor their negation) and maps đ to 1 if there exists some assignment tothe variables of đ that causes it to evalute to true, and to 0 otherwise.For example
3SAT ("(đ„0 âš đ„1 âš đ„2) ⧠(đ„1 âš đ„2 âš đ„3) ⧠(đ„0 âš đ„2 âš đ„3)") = 1 (14.1)
since the assignment đ„ = 1101 satisfies the input formula. In theabove we assume some representation of formulas as strings, anddefine the function to output 0 if its input is not a valid representation;we use the same convention for all the other functions below.
Quadratic equations. The quadratic equations problem corresponds to thefunction QUADEQ ⶠ0, 1â â 0, 1 that maps a set of quadraticequations đž to 1 if there is an assignment đ„ that satisfies all equations,and to 0 otherwise.
Longest path. The longest path problem corresponds to the functionLONGPATH ⶠ0, 1â â 0, 1 that maps a graph đș and a number đto 1 if there is a simple path in đș of length at least đ, and maps (đș, đ)to 0 otherwise. The longest path problem is a generalization of thewell-known Hamiltonian Path Problem of determining whether a pathof length đ exists in a given đ vertex graph.
Maximum cut. The maximum cut problem corresponds to the functionMAXCUT ⶠ0, 1â â 0, 1 that maps a graph đș and a number đ to1 if there is a cut in đș that cuts at least đ edges, and maps (đș, đ) to 0otherwise.
All of the problems above are in EXP but it is not known whetheror not they are in P. However, we will see in this chapter that if eitherQUADEQ , LONGPATH or MAXCUT are in P, then so is 3SAT.14.2 POLYNOMIAL-TIME REDUCTIONS
Suppose that đč, đș ⶠ0, 1â â 0, 1 are two Boolean functions. Apolynomial-time reduction (or sometimes just âreductionâ for short) fromđč to đș is a way to show that đč is âno harderâ than đș, in the sensethat a polynomial-time algorithm for đș implies a polynomial-timealgorithm for đč .
454 introduction to theoretical computer science
Figure 14.2: If đč â€đ đș then we can transform apolynomial-time algorithm đ” that computes đșinto a polynomial-time algorithm đŽ that computesđč . To compute đč(đ„) we can run the reduction đ guaranteed by the fact that đč â€đ đș to obtain đŠ =đč(đ„) and then run our algorithm đ” for đș to computeđș(đŠ).
Definition 14.1 â Polynomial-time reductions. Let đč, đș ⶠ0, 1â â 0, 1.We say that đč reduces to đș, denoted by đč â€đ đș if there is apolynomial-time computable đ ⶠ0, 1â â 0, 1â such thatfor every đ„ â 0, 1â, đč(đ„) = đș(đ (đ„)) . (14.2)
We say that đč and đș have equivalent complexity if đč â€đ đș and đș â€đđč .
The following exercise justifies our intuition that đč â€đ đș signifiesthat âđč is no harder than đș.Solved Exercise 14.1 â Reductions and P. Prove that if đč â€đ đș and đș â Pthen đč â P.
PAs usual, solving this exercise on your own is an excel-lent way to make sure you understand Definition 14.1.
Solution:
Suppose there was an algorithm đ” that compute đș in time đ(đ)where đ is its input size. Then, (14.2) directly gives an algorithmđŽ to compute đč (see Fig. 14.2). Indeed, on input đ„ â 0, 1â, Al-gorithm đŽ will run the polynomial-time reduction đ to obtainđŠ = đ (đ„) and then return đ”(đŠ). By (14.2), đș(đ (đ„)) = đč(đ„) andhence Algorithm đŽ will indeed compute đč .
We now show that đŽ runs in polynomial time. By assumption, đ can be computed in time đ(đ) for some polynomial đ. In particular,this means that |đŠ| †đ(|đ„|) (as just writing down đŠ takes |đŠ| steps).Computing đ”(đŠ) will take at most đ(|đŠ|) †đ(đ(|đ„|)) steps. Thusthe total running time of đŽ on inputs of length đ is at most the timeto compute đŠ, which is bounded by đ(đ), and the time to computeđ”(đŠ), which is bounded by đ(đ(đ)), and since the composition oftwo polynomials is a polynomial, đŽ runs in polynomial time.
Big Idea 21 A reduction đč â€đ đș shows that đč is âno harder thanđșâ or equivalently that đș is âno easier than đčâ.
14.2.1 Whistling pigs and flying horsesA reduction from đč to đș can be used for two purposes:
polynomial-time reductions 455
âą If we already know an algorithm for đș and đč â€đ đș then we canuse the reduction to obtain an algorithm for đč . This is a widelyused tool in algorithm design. For example in Section 12.1.4 we sawhow the Min-Cut Max-Flow theorem allows to reduce the task ofcomputing a minimum cut in a graph to the task of computing amaximum flow in it.
âą If we have proven (or have evidence) that there exists no polynomial-time algorithm for đč and đč â€đ đș then the existence of this reductionallows us to conclude that there exists no polynomial-time algo-rithm for đș. This is the âif pigs could whistle then horses couldflyâ interpretation weâve seen in Section 9.4. We show that if therewas an hypothetical efficient algorithm for đș (a âwhistling pigâ)then since đč â€đ đș then there would be an efficient algorithm forđč (a âflying horseâ). In this book we often use reductions for thissecond purpose, although the lines between the two is sometimesblurry (see the bibliographical notes in Section 14.9).
The most crucial difference between the notion in Definition 14.1and the reductions we saw in the context of uncomputability (e.g.,in Section 9.4) is that for relating time complexity of problems, weneed the reduction to be computable in polynomial time, as opposed tomerely computable. Definition 14.1 also restricts reductions to have avery specific format. That is, to show that đč â€đ đș, rather than allow-ing a general algorithm for đč that uses a âmagic boxâ that computesđș, we only allow an algorithm that computes đč(đ„) by outputtingđș(đ (đ„)). This restricted form is convenient for us, but people havedefined and used more general reductions as well (see Section 14.9).
In this chapter we use reductions to relate the computational com-plexity of the problems mentioned above: 3SAT, Quadratic Equations,Maximum Cut, and Longest Path, as well as a few others. We willreduce 3SAT to the latter problems, demonstrating that solving anyone of them efficiently will result in an efficient algorithm for 3SAT.In Chapter 15 we show the other direction: reducing each one of theseproblems to 3SAT in one fell swoop.
Transitivity of reductions. Since we think of đč â€đ đș as saying that(as far as polynomial-time computation is concerned) đč is âeasier orequal in difficulty toâ đș, we would expect that if đč â€đ đș and đș â€đ đ» ,then it would hold that đč â€đ đ» . Indeed this is the case:Solved Exercise 14.2 â Transitivity of polynomial-time reductions. For everyđč, đș, đ» ⶠ0, 1â â 0, 1, if đč â€đ đș and đș â€đ đ» then đč â€đ đ» .
456 introduction to theoretical computer science
1 If you are familiar with matrix notation you maynote that such equations can be written as đŽđ„ = bwhere đŽ is an đ Ă đ matrix with entries in 0/1 andb â âđ.
Solution:
If đč â€đ đș and đș â€đ đ» then there exist polynomial-time com-putable functions đ 1 and đ 2 mapping 0, 1â to 0, 1â such thatfor every đ„ â 0, 1â, đč(đ„) = đș(đ 1(đ„)) and for every đŠ â 0, 1â,đș(đŠ) = đ»(đ 2(đŠ)). Combining these two equalities, we see thatfor every đ„ â 0, 1â, đč(đ„) = đ»(đ 2(đ 1(đ„))) and so to show thatđč â€đ đ» , it is sufficient to show that the map đ„ ⊠đ 2(đ 1(đ„)) iscomputable in polynomial time. But if there are some constants đ, đsuch that đ 1(đ„) is computable in time |đ„|đ and đ 2(đŠ) is computablein time |đŠ|đ then đ 2(đ 1(đ„)) is computable in time (|đ„|đ)đ = |đ„|đđwhich is polynomial.
14.3 REDUCING 3SAT TO ZERO ONE AND QUADRATIC EQUATIONS
We now show our first example of a reduction. The Zero-One Lin-ear Equations problem corresponds to the function 01EQ ⶠ0, 1â â0, 1 whose input is a collection đž of linear equations in variablesđ„0, ⊠, đ„đâ1, and the output is 1 iff there is an assignment đ„ â 0, 1đof 0/1 values to the variables that satisfies all the equations. For exam-ple, if the input đž is a string encoding the set of equationsđ„0 + đ„1 + đ„2 = 2đ„0 + đ„2 = 1đ„1 + đ„2 = 2 (14.3)
then 01EQ(đž) = 1 since the assignment đ„ = 011 satisfies all threeequations. We specifically restrict attention to linear equations invariables đ„0, ⊠, đ„đâ1 in which every equation has the form âđâđ đ„đ =đ where đ â [đ] and đ â â.1
If we asked the question of whether there is a solution đ„ â âđ ofreal numbers to đž, then this can be solved using the famous Gaussianelimination algorithm in polynomial time. However, there is no knownefficient algorithm to solve 01EQ. Indeed, such an algorithm wouldimply an algorithm for 3SAT as shown by the following theorem:
Theorem 14.2 â Hardness of 01đžđ. 3SAT â€đ 01EQProof Idea:
A constraint đ„2 âš đ„5 âš đ„7 can be written as đ„2 + (1 â đ„5) + đ„7 â„ 1.This is a linear inequality but since the sum on the left-hand side isat most three, we can also turn it into an equality by adding two newvariables đŠ, đ§ and writing it as đ„2 + (1 â đ„5) + đ„7 + đŠ + đ§ = 3. (We willuse fresh such variables đŠ, đ§ for every constraint.) Finally, for everyvariable đ„đ we can add a variable đ„âČđ corresponding to its negation by
polynomial-time reductions 457
adding the equation đ„đ + đ„âČđ = 1, hence mapping the original constraintđ„2 âš đ„5 âš đ„7 to đ„2 + đ„âČ5 + đ„7 + đŠ + đ§ = 3. The main takeawaytechnique from this reduction is the idea of adding auxiliary variablesto replace an equation such as đ„1 + đ„2 + đ„3 â„ 1 that is not quite in theform we want with the equivalent (for 0/1 valued variables) equationđ„1 + đ„2 + đ„3 + đą + đŁ = 3 which is in the form we want.â
Figure 14.3: Left: Python code implementing thereduction of 3SAT to 01EQ. Right: Example output ofthe reduction. Code is in our repository.
Proof of Theorem 14.2. To prove the theorem we need to:
1. Describe an algorithm đ for mapping an input đ for 3SAT into aninput đž for 01EQ.
2. Prove that the algorithm runs in polynomial time.
3. Prove that 01EQ(đ (đ)) = 3SAT(đ) for every 3CNF formula đ.
We now proceed to do just that. Since this is our first reduction, wewill spell out this proof in detail. However it straightforwardly followsthe proof idea.
458 introduction to theoretical computer science
Algorithm 14.3 â 3đđŽđ to 01đžđ reduction.
Input: 3CNF formula đ with đ variables đ„0, ⊠, đ„đâ1 and đclauses.
Output: Set đž of linear equations over 0/1 such that3đđŽđ (đ) = 1 -iff 01đžđ(đž) = 1.1: Let đžâs variables be đ„0, ⊠, đ„đâ1, đ„âČ0, ⊠, đ„âČđâ1, đŠ0, ⊠, đŠđâ1,đ§0, ⊠, đ§đâ1.2: for đ â [đ] do3: add to đž the equation đ„đ + đ„âČđ = 14: end for5: for jâ[m] do6: Let đ-th clause be đ€0 âš đ€1 âš đ€2 where đ€0, đ€1, đ€2 are
literals.7: for đ â [3] do8: if đ€đ is variable đ„đ then9: set đĄđ â đ„đ
10: end if11: if đ€đ is negation ÂŹđ„đ then12: set đĄđ â đ„âČđ13: end if14: end for15: Add to đž the equation đĄ0 + đĄ1 + đĄ2 + đŠđ + đ§đ = 3.16: end for17: return đž
The reduction is described in Algorithm 14.3, see also Fig. 14.3. Ifthe input formula has đ variable and đ steps, Algorithm 14.3 creates aset đž of đ+đ equations over 2đ+2đ variables. Algorithm 14.3 makesan initial loop of đ steps (each taking constant time) and then anotherloop of đ steps (each taking constant time) to create the equations,and hence it runs in polynomial time.
Let đ be the function computed by Algorithm 14.3. The heart ofthe proof is to show that for every 3CNF đ, 01EQ(đ (đ)) = 3SAT(đ).We split the proof into two parts. The first part, traditionally knownas the completeness property, is to show that if 3SAT(đ) = 1 thenđ1EQ(đ (đ)) = 1. The second part, traditionally known as the sound-ness property, is to show that if 01EQ(đ (đ)) = 1 then 3SAT(đ) = 1.(The names âcompletenessâ and âsoundnessâ derive viewing a so-lution to đ (đ) as a âproofâ that đ is satisfiable, in which case theseconditions corresponds to completeness and soundness as definedin Section 11.1.1. However, if you find the names confusing you cansimply think of completeness as the â1-instance maps to 1-instanceâ
polynomial-time reductions 459
property and soundness as the â0-instance maps to 0-instanceâ prop-erty.)
We complete the proof by showing both parts:
âą Completeness: Suppose that 3SAT(đ) = 1, which means thatthere is an assignment đ„ â 0, 1đ that satisfies đ. If we use theassignment đ„0, ⊠, đ„đâ1 and 1 â đ„0, ⊠, 1 â đ„đâ1 for the first 2đvariables of đž = đ (đ) then we will satisfy all equations of the formđ„đ + đ„âČđ = 1. Moreover, for every đ â [đ], if đĄ0 + đĄ1 + đĄ2 + đŠđ + đ§đ =3(â) is the equation arising from the đth clause of đ (with đĄ0, đĄ1, đĄ2being variables of the form đ„đ or đ„âČđ depending on the literals of theclause) then our assignment to the first 2đ variables ensures thatđĄ0 + đĄ1 + đĄ2 â„ 1 (since đ„ satisfied đ) and hence we can assign valuesto đŠđ and đ§đ that will ensure that the equation (â) is satisfied. Hencein this case đž = đ (đ) is satisfied, meaning that 01EQ(đ (đ)) = 1.
âą Soundness: Suppose that 01EQ(đ (đ)) = 1, which means that theset of equations đž = đ (đ) has a satisfying assignment đ„0, ⊠, đ„đâ1,đ„âČ0, ⊠, đ„âČđâ1, đŠ0, ⊠, đŠđâ1, đ§0, ⊠, đ§đâ1. Then, since the equationscontain the condition đ„đ + đ„âČđ = 1, for every đ â [đ], đ„âČđ is the negationof đ„đ, and morover, for every đ â [đ], if đ¶ has the form đ€0 âš đ€1 âš đ€2is the đ-th clause of đ¶, then the corresponding assignment đ„ willensure that đ€0 + đ€1 + đ€2 â„ 1, implying that đ¶ is satisfied. Hence inthis case 3SAT(đ) = 1.
14.3.1 Quadratic equationsNow that we reduced 3SAT to 01EQ, we can use this to reduce 3SATto the quadratic equations problem. This is the function QUADEQ inwhich the input is a list of đ-variate polynomials đ0, ⊠, đđâ1 ⶠâđ â âthat are all of degree at most two (i.e., they are quadratic) and withinteger coefficients. (The latter condition is for convenience and canbe achieved by scaling.) We define QUADEQ(đ0, ⊠, đđâ1) to equal 1if and only if there is a solution đ„ â âđ to the equations đ0(đ„) = 0,đ1(đ„) = 0, âŠ, đđâ1(đ„) = 0.
For example, the following is a set of quadratic equations over thevariables đ„0, đ„1, đ„2: đ„20 â đ„0 = 0đ„21 â đ„1 = 0đ„22 â đ„2 = 01 â đ„0 â đ„1 + đ„0đ„1 = 0 (14.4)
You can verify that đ„ â â3 satisfies this set of equations if and only ifđ„ â 0, 13 and đ„0 âš đ„1 = 1.
460 introduction to theoretical computer science
Theorem 14.4 â Hardness of quadratic equations.3SAT â€đ QUADEQ (14.5)
Proof Idea:
Using the transitivity of reductions (Solved Exercise 14.2), it isenough to show that 01EQ â€đ QUADEQ, but this follows since wecan phrase the equation đ„đ â 0, 1 as the quadratic constraint đ„2đ âđ„đ = 0. The takeaway technique of this reduction is that we can usenonlinearity to force continuous variables (e.g., variables taking valuesin â) to be discrete (e.g., take values in 0, 1).âProof of Theorem 14.4. By Theorem 14.2 and Solved Exercise 14.2,it is sufficient to prove that 01EQ â€đ QUADEQ. Let đž be an in-stance of 01EQ with variables đ„0, ⊠, đ„đâ1. We map đž to the set ofquadratic equations đžâČ that is obtained by taking the linear equationsin đž and adding to them the đ quadratic equations đ„2đ â đ„đ = 0 forall đ â [đ]. (See Algorithm 14.5.) The map đž ⊠đžâČ can be com-puted in polynomial time. We claim that 01EQ(đž) = 1 if and onlyif QUADEQ(đžâČ) = 1. Indeed, the only difference between the twoinstances is that:
âą In the 01EQ instance đž, the equations are over variables đ„0, ⊠, đ„đâ1in 0, 1.
âą In the QUADEQ instance đžâČ, the equations are over variablesđ„0, ⊠, đ„đâ1 â â but we have the extra constraints đ„2đ â đ„đ = 0for all đ â [đ].Since for every đ â â, đ2 â đ = 0 if and only if đ â 0, 1, the two
sets of equations are equivalent and 01EQ(đž) = QUADEQ(đžâČ) whichis what we wanted to prove.
polynomial-time reductions 461
Algorithm 14.5 â 3đđŽđ to 01đžđ reduction.
Input: Set đž of linear equations over đ variables đ„0, ⊠, đ„đâ1.Output: Set đžâČ of quadratic eqations ovar đ variablesđ€0, ⊠, đ€đâ1 such that there is an 0/1 assignmentđ„ â 0, 1đ1: satisfying the equations of đž -iff there is an assignmentđ€ â âđ satisfying the equations of đžâČ.2: That is, đ1đžđ(đž) = đđđŽđ·đžđ(đžâČ).3: Let đ â đ.4: Variables of đžâČ are set to be same variable đ„0, ⊠, đ„đâ1
as đž.5: for every equation đ â đž do6: Add đ to đžâČ7: end for8: for đ â [đ] do9: Add to đžâČ the equation đ„2đ â đ„đ = 0.
10: end for11: return đžâČ
14.4 THE INDEPENDENT SET PROBLEM
For a graph đș = (đ , đž), an independent set (also known as a stableset) is a subset đ â đ such that there are no edges with both end-points in đ (in other words, đž(đ, đ) = â ). Every âsingletonâ (setconsisting of a single vertex) is trivially an independent set, but find-ing larger independent sets can be challenging. The maximum indepen-dent set problem (henceforth simply âindependent setâ) is the task offinding the largest independent set in the graph. The independent setproblem is naturally related to scheduling problems: if we put an edgebetween two conflicting tasks, then an independent set correspondsto a set of tasks that can all be scheduled together without conflicts.The independent set problem has been studied in a variety of settings,including for example in the case of algorithms for finding structure inprotein-protein interaction graphs.
As mentioned in Section 14.1, we think of the independent set prob-lem as the function ISET ⶠ0, 1â â 0, 1 that on input a graph đșand a number đ outputs 1 if and only if the graph đș contains an in-dependent set of size at least đ. We now reduce 3SAT to Independentset.
Theorem 14.6 â Hardness of Independent Set. 3SAT â€đ ISET.
Proof Idea:
462 introduction to theoretical computer science
Figure 14.4: An example of the reduction of 3SATto ISET for the case the original input formula isđ = (đ„0 âšđ„1 âšđ„2)â§(đ„0 âšđ„1 âšđ„2)â§(đ„1 âšđ„2 âšđ„3). Wemap each clause of đ to a triangle of three vertices,each tagged above with âđ„đ = 0â or âđ„đ = 1âdepending on the value of đ„đ that would satisfy theparticular literal. We put an edge between every twoliterals that are conflicting (i.e., tagged with âđ„đ = 0âand âđ„đ = 1â respectively).
The idea is that finding a satisfying assignment to a 3SAT formulacorresponds to satisfying many local constraints without creatingany conflicts. One can think of âđ„17 = 0â and âđ„17 = 1â as twoconflicting events, and of the constraints đ„17 âš đ„5 âš đ„9 as creatinga conflict between the events âđ„17 = 0â, âđ„5 = 1â and âđ„9 = 0â,saying that these three cannot simultaneously co-occur. Using theseideas, we can we can think of solving a 3SAT problem as trying toschedule non conflicting events, though the devil is, as usual, in thedetails. The takeaway technique here is to map each clause of theoriginal formula into a gadget which is a small subgraph (or moregenerally âsubinstanceâ) satisfying some convenient properties. Wewill see these âgadgetsâ used time and again in the construction ofpolynomial-time reductions.â
Algorithm 14.7 â 3đđŽđ to đŒđ reduction.
Input: 3đđŽđ formula đ with đ variables and đ clauses.Output: Graph đș = (đ , đž) and number đ, such that đș
has an independent set of size đ -iff đ has a satisfyingassignment.
1: That is, 3đđŽđ (đ) = đŒđđžđ (đș, đ),2: Initialize đ â â , đž â â 3: for every clause đ¶ = đŠ âš đŠâČ âš đŠâł of đ do4: Add three vertices (đ¶, đŠ), (đ¶, đŠâČ), (đ¶, đŠâł) to đ5: Add edges (đ¶, đŠ), (đ¶, đŠâČ), (đ¶, đŠâČ), (đ¶, đŠâł),(đ¶, đŠâł), (đ¶, đŠ) to đž.6: end for7: for every distinct clauses đ¶, đ¶âČ in đ do8: for every đ â [đ] do9: if đ¶ contains literal đ„đ and đ¶âČ contains literal đ„đ
then10: Add edge (đ¶, đ„đ), (đ¶, đ„đ) to đž11: end if12: end for13: end for14: return đș = (đ , đž)
Proof of Theorem 14.6. Given a 3SAT formula đ on đ variables andwith đ clauses, we will create a graph đș with 3đ vertices as follows.(See Algorithm 14.7, see also Fig. 14.4 for an example and Fig. 14.5 forPython code.)
âą A clause đ¶ in đ has the form đ¶ = đŠ âš đŠâČ âš đŠâł where đŠ, đŠâČ, đŠâł areliterals (variables or their negation). For each such clause đ¶, we will
polynomial-time reductions 463
add three vertices to đș, and label them (đ¶, đŠ), (đ¶, đŠâČ), and (đ¶, đŠâł)respectively. We will also add the three edges between all pairs ofthese vertices, so they form a triangle. Since there are đ clauses in đ,the graph đș will have 3đ vertices.
âą In addition to the above edges, we also add an edge between everypair vertices of the form (đ¶, đŠ) and (đ¶âČ, đŠâČ) where đŠ and đŠâČ are con-flicting literals. That is, we add an edge between (đ¶, đŠ) and (đ¶âČ, đŠâČ)if there is an đ such that đŠ = đ„đ and đŠâČ = đ„đ or vice versa.
The algorithm constructing of đș based on đ takes polynomial timesince it involves two loops, the first taking đ(đ) steps and the secondtaking đ(đ2đ) steps (see Algorithm 14.7). Hence to prove the theo-rem we need to show that đ is satisfiable if and only if đș contains anindependent set of đ vertices. We now show both directions of thisequivalence:
Part 1: Completeness. The âcompletenessâ direction is to show thatif đ has a satisfying assignment đ„â, then đș has an independent set đâof đ vertices. Let us now show this.
Indeed, suppose that đ has a satisfying assignment đ„â â 0, 1đ.Then for every clause đ¶ = đŠ âš đŠâČ âš đŠâł of đ, one of the literals đŠ, đŠâČ, đŠâłmust evaluate to true under the assignment đ„â (as otherwise it wouldnot satisfy đ). We let đ be a set of đ vertices that is obtained by choos-ing for every clause đ¶ one vertex of the form (đ¶, đŠ) such that đŠ eval-uates to true under đ„â. (If there is more than one such vertex for thesame đ¶, we arbitrarily choose one of them.)
We claim that đ is an independent set. Indeed, suppose otherwisethat there was a pair of vertices (đ¶, đŠ) and (đ¶âČ, đŠâČ) in đ that have anedge between them. Since we picked one vertex out of each trianglecorresponding to a clause, it must be that đ¶ â đ¶âČ. Hence the onlyway that there is an edge between (đ¶, đŠ) and (đ¶, đŠâČ) is if đŠ and đŠâČ areconflicting literals (i.e. đŠ = đ„đ and đŠâČ = đ„đ for some đ). But then theycanât both evaluate to true under the assignment đ„â, which contradictsthe way we constructed the set đ. This completes the proof of thecompleteness condition.
Part 2: Soundness. The âsoundnessâ direction is to show that ifđș has an independent set đâ of đ vertices, then đ has a satisfyingassignment đ„â â 0, 1đ. Let us now show this.
Indeed, suppose that đș has an independent set đâ with đ vertices.We will define an assignment đ„â â 0, 1đ for the variables of đ asfollows. For every đ â [đ], we set đ„âđ according to the following rules:
âą If đâ contains a vertex of the form (đ¶, đ„đ) then we set đ„âđ = 1.âą If đâ contains a vertex of the form (đ¶, đ„đ) then we set đ„âđ = 0.
464 introduction to theoretical computer science
âą If đâ does not contain a vertex of either of these forms, then it doesnot matter which value we give to đ„âđ , but for concreteness weâll setđ„âđ = 0.The first observation is that đ„â is indeed well defined, in the sense
that the rules above do not conflict with one another, and ask to set đ„âđto be both 0 and 1. This follows from the fact that đâ is an independentset and hence if it contains a vertex of the form (đ¶, đ„đ) then it cannotcontain a vertex of the form (đ¶âČ, đ„đ).
We now claim that đ„â is a satisfying assignment for đ. Indeed, sinceđâ is an independent set, it cannot have more than one vertex insideeach one of the đ triangles (đ¶, đŠ), (đ¶, đŠâČ), (đ¶, đŠâł) corresponding to aclause of đ. Hence since |đâ| = đ, it must have exactly one vertex ineach such triangle. For every clause đ¶ of đ, if (đ¶, đŠ) is the vertex inđâ in the triangle corresponding to đ¶, then by the way we defined đ„â,the literal đŠ must evaluate to true, which means that đ„â satisfies thisclause. Therefore đ„â satisfies all clauses of đ, which is the definition ofa satisfying assignment.
This completes the proof of Theorem 14.6
Figure 14.5: The reduction of 3SAT to Independent Set.On the righthand side is Python code that implementsthis reduction. On the lefthand side is a sampleoutput of the reduction. We use black for the âtriangleedgesâ and red for the âconflict edgesâ. Note that thesatisfying assignment đ„â = 0110 corresponds to theindependent set (0, ÂŹđ„3), (1, ÂŹđ„0), (2, đ„2).
14.5 SOME EXERCISES AND ANATOMY OF A REDUCTION.
Reductions can be confusing and working out exercises is a great wayto gain more comfort with them. Here is one such example. As usual,I recommend you try it out yourself before looking at the solution.Solved Exercise 14.3 â Vertex cover. A vertex cover in a graph đș = (đ , đž)is a subset đ â đ of vertices such that every edge touches at least onevertex of đ (see ??). The vertex cover problem is the task to determine,given a graph đș and a number đ, whether there exists a vertex coverin the graph with at most đ vertices. Formally, this is the functionVC ⶠ0, 1â â 0, 1 such that for every đș = (đ , đž) and đ â â,
polynomial-time reductions 465
Figure 14.6: A vertex cover in a graph is a subset ofvertices that touches all edges. In this 7-vertex graph,the 3 filled vertices are a vertex cover.
VC(đș, đ) = 1 if and only if there exists a vertex cover đ â đ such that|đ| †đ.Prove that 3SAT â€đ VC.
Solution:
The key observation is that if đ â đ is a vertex cover thattouches all vertices, then there is no edge đ such that both đ âs end-points are in the set đ = đ ⧔ đ, and vice versa. In other words,đ is a vertex cover if and only if đ is an independent set. Sincethe size of đ is |đ | â |đ|, we see that the polynomial-time mapđ (đș, đ) = (đș, đ â đ) (where đ is the number of vertices of đș)satisfies that VC(đ (đș, đ)) = ISET(đș, đ) which means that it is areduction from independent set to vertex cover.
Solved Exercise 14.4 â Clique is equivalent to independent set. The maximumclique problem corresponds to the function CLIQUE ⶠ0, 1â â 0, 1such that for a graph đș and a number đ, CLIQUE(đș, đ) = 1 iff thereis a đ subset of đ vertices such that for every distinct đą, đŁ â đ, the edgeđą, đŁ is in đș. Such a set is known as a clique.
Prove that CLIQUE â€đ ISET and ISET â€đ CLIQUE.
Solution:
If đș = (đ , đž) is a graph, we denote by đș its complement whichis the graph on the same vertices đ and such that for every distinctđą, đŁ â đ , the edge đą, đŁ is present in đș if and only if this edge isnot present in đș.
This means that for every set đ, đ is an independent set in đș ifand only if đ is a clique in đș. Therefore for every đ, ISET(đș, đ) =CLIQUE(đș, đ). Since the map đș ⊠đș can be computed efficiently,this yields a reduction ISET â€đ CLIQUE. Moreover, since đș = đșthis yields a reduction in the other direction as well.
14.5.1 Dominating setIn the two examples above, the reduction was almost âtrivialâ: thereduction from independent set to vertex cover merely changes thenumber đ to đ â đ, and the reduction from independent set to cliqueflips edges to non-edges and vice versa. The following exercise re-quires a somewhat more interesting reduction.Solved Exercise 14.5 â Dominating set. A dominating set in a graph đș =(đ , đž) is a subset đ â đ of vertices such that for every đą â đ ⧔ đ is aneighbor in đș of some đ â đ (see Fig. 14.7). The dominating set problem
466 introduction to theoretical computer science
Figure 14.7: A dominating set is a subset đ of verticessuch that every vertex in the graph is either in đ or aneighbor of đ. The figure above are two copies of thesame graph. The red vertices on the left are a vertexcover that is not a dominating set. The blue verticeson the right are a dominating set that is not a vertexcover.
is the task, given a graph đș = (đ , đž) and number đ, of determiningwhether there exists a dominating set đ â đ with |đ| †đ. Formally,this is the function DS ⶠ0, 1â â 0, 1 such that DS(đș, đ) = 1 iffthere is a dominating set in đș of at most đ vertices.
Prove that ISET â€đ DS.
Solution:
Since we know that ISET â€đ VC, using transitivity, it is enoughto show that VC â€đ DS. As Fig. 14.7 shows, a dominating set isnot the same thing as a vertex cover. However, we can still relatethe two problems. The idea is to map a graph đș into a graph đ»such that a vertex cover in đș would translate into a dominating setin đ» and vice versa. We do so by including in đ» all the verticesand edges of đș, but for every edge đą, đŁ of đș we also add to đ» anew vertex đ€đą,đŁ and connect it to both đą and đŁ. Let â be the numberof isolated vertices in đș. The idea behind the proof is that we cantransform a vertex cover đ of đ vertices in đș into a dominating setof đ + â vertices in đ» by adding to đ all the isolated vertices, andmoreover we can transform every đ + â sized dominating set in đ»into a vertex cover in đș. We now give the details.
Description of the algorithm. Given an instance (đș, đ) for thevertex cover problem, we will map đș into an instance (đ», đâČ) forthe dominating set problem as follows (see Fig. 14.8 for Pythonimplementation):
Algorithm 14.8 â đ đ¶ to đ·đ reduction.
Input: Graph đș = (đ , đž) and number đ.Output: Graph đ» = (đ âČ, đžâČ) and number đâČ, such thatđș has a vertex cover of size đ -iff đ» has a dominating
set of size đâČ1: That is, đ·đ(đ», đâČ) = đŒđđžđ (đș, đ),2: Initialize đ âČ â đ , đžâČ â đž3: for every edge đą, đŁ â đž do4: Add vertex đ€đą,đŁ to đ âČ5: Add edges đą, đ€đą,đŁ, đŁ, đ€đą,đŁ to đžâČ.6: end for7: Let â â number of isolated vertices in đș8: return (đ» = (đ âČ, đžâČ) , đ + â)
Algorithm 14.8 runs in polynomial time, since the loop takesđ(đ) steps where đ is the number of edges, with each step can beimplemented in constant or at most linear time (depending on the
polynomial-time reductions 467
representation of the graph đ»). Counting the number of isolatedvertices in an đ vertex graph đș can be done in time đ(đ2) if đș isrepresented in the adjacency matrix representation and đ(đ) timeif it is represented in the adjacency list representation. Regardlessthe algorithm runs in polynomial time.
To complete the proof we need to prove that for every đș, đ, ifđ», đâČ is the output of Algorithm 14.8 on input (đș, đ), thenDS(đ», đâČ) = VC(đș, đ). We split the proof into two parts. The
completeness part is that if VC(đș, đ) = 1 then DS(đ», đâČ) = 1. Thesoundness part is that if DS(đ», đâČ) = 1 then VC(đș, đ) = 1.
Completeness. Suppose that VC(đș, đ) = 1. Then there is a ver-tex cover đ â đ of at most đ vertices. Let đŒ be the set of isolatedvertices in đș and â be their number. Then |đ âȘ đŒ| †đ + â. We claimthat đ âȘ đŒ is a dominating set in đ»âČ. Indeed for every vertex đŁ of đ»âČthere are three cases:
âą Case 1: đŁ is an isolated vertex of đș. In this case đŁ is in đ âȘ đŒ .âą Case 2: đŁ is a non-isolated vertex of đș and hence there is an edgeđą, đŁ of đș for some đą. In this case since đ is a vertex cover, one
of đą, đŁ has to be in đ, and hence either đŁ or a neighbor of đŁ has tobe in đ â đ âȘ đŒ .
âą Case 3: đŁ is of the form đ€đą,đąâČ for some two neighbors đą, đąâČ in đș.But then since đ is a vertex cover, one of đą, đąâČ has to be in đ andhence đ contains a neighbor of đŁ.We conclude that đ âȘ đŒ is a dominating set of size at mostđâČ = đ + â in đ»âČ and hence under the assumption that VC(đș.đ) = 1,
DS(đ»âČ, đâČ) = 1.Soundness. Suppose that DS(đ», đâČ) = 1. Then there is a domi-
nating set đ· of size at most đâČ = đ + â in đ» . For every edge đą, đŁ inthe graph đș, if đ· contains the vertex đ€đą,đŁ then we remove this ver-tex and add đą in its place. The only two neighbors of đ€đą,đŁ are đą andđŁ, and since đą is a neighbor of both đ€đą,đŁ and of đŁ, replacing đ€đą,đŁwith đŁ maintains the property that it is a dominating set. Moreover,this change cannot increase the size of đ·. Thus following this mod-ification, we can assume that đ· is a dominating set of at most đ + âvertices that does not contain any vertices of the form đ€đą,đŁ.
Let đŒ be the set of isolated vertices in đș. These vertices are alsoisolated in đ» and hence must be included in đ· (an isolated ver-tex must be in any dominating set, since it has no neighbors). Welet đ = đ· ⧔ đŒ . Then |đ| †đŒ . We claim that đ is a vertex coverin đș. Indeed, for every edge đą, đŁ of đș, either the vertex đ€đą,đŁ or
468 introduction to theoretical computer science
one of its neighbors must be in đ by the dominating set property.But since we ensured đ doesnât contain any of the vertices of theform đ€đą,đŁ, it must be the case that either đą or đŁ is in đ. This showsthat đ is a vertex cover of đș of size at most đ, hence proving thatVC(đș, đ) = 1.
A corollary of Algorithm 14.8 and the other reduction we have seenso far is that if DS â P (i.e., dominating set has a polynomial-time al-gorithm) then 3SAT â P (i.e., 3SAT has a polynomial-time algorithm).By the contra-positive, if 3SAT does not have a polynomial-time algo-rithm then neither does dominating set.
Figure 14.8: Python implementation of the reductionfrom vertex cover to dominating set, together with anexample of an input graph and the resulting outputgraph. This reduction allows to transform a hypothet-ical polynomial-time algorithm for dominating set (aâwhistling pigâ) into a hypothetical polynomial-timealgorithm for vertex-cover (a âflying horseâ).
14.5.2 Anatomy of a reduction
Figure 14.9: The four components of a reduction,illustrated for the particular reduction of vertex coverto dominating set. A reduction from problem đč toproblem đș is an algorithm that maps an input đ„ for đčinto an input đ (đ„) for đș. To show that the reductionis correct we need to show the properties of efficiency:algorithm đ runs in polynomial time, completeness:if đč(đ„) = 1 then đș(đ (đ„)) = 1, and soundness: ifđč(đ (đ„)) = 1 then đș(đ„) = 1.
The reduction of Solved Exercise 14.5 gives a good illustration ofthe anatomy of a reduction. A reduction consists of four parts:âą Algorithm description: This is the description of how the algorithm
maps an input into the output. For example, in Solved Exercise 14.5this is the description of how we map an instance (đș, đ) of thevertex cover problem into an instance (đ», đâČ) of the dominating setproblem.
polynomial-time reductions 469
âą Algorithm analysis: It is not enough to describe how the algorithmworks but we need to also explain why it works. In particular weneed to provide an analysis explaining why the reduction is bothefficient (i.e., runs in polynomial time) and correct (satisfies thatđș(đ (đ„)) = đč(đ„) for every đ„). Specifically, the components ofanalysis of a reduction đ include:
â Efficiency: We need to show that đ runs in polynomial time. Inmost reductions we encounter this part is straightforward, as thereductions we typically use involve a constant number of nestedloops, each involving a constant number of operations. For ex-ample, the reduction of Solved Exercise 14.5 just enumerates overthe edges and vertices of the input graph.
â Completeness: In a reduction đ demonstrating đč â€đ đș, thecompleteness condition is the condition that for every đ„ â 0, 1â,if đč(đ„) = 1 then đș(đ (đ„)) = 1. Typically we construct thereduction to ensure that this holds, by giving a way to map aâcertificate/solutionâ certifying that đč(đ„) = 1 into a solutioncertifying that đș(đ (đ„)) = 1. For example, in Solved Exercise 14.5we constructed the graph đ» such that for every vertex cover đin đș, the set đ âȘ đŒ (where đŒ is the isolated vertices) would be adominating set in đ» .
â Soundness: This is the condition that if đč(đ„) = 0 thenđș(đ (đ„)) = 0 or (taking the contrapositive) if đș(đ (đ„)) = 1 thenđč(đ„) = 1. This is sometimes straightforward but can often beharder to show than the completeness condition, and in moreadvanced reductions (such as the reduction SAT â€đ ISETof Theorem 14.6) demonstrating soundness is the main partof the analysis. For example, in Solved Exercise 14.5 to showsoundness we needed to show that for every dominating set đ· inthe graph đ» , there exists a vertex cover đ of size at most |đ·| â âin the graph đș (where â is the number of isolated vertices).This was challenging since the dominating set đ· might not benecessarily the one we âhad in mindâ. In particular, in the proofabove we needed to modify đ· to ensure that it does not containvertices of the form đ€đą,đŁ, and it was important to show that thismodification still maintains the property that đ· is a dominatingset, and also does not make it bigger.
Whenever you need to provide a reduction, you should make surethat your description has all these components. While it is sometimestempting to weave together the description of the reduction and itsanalysis, it is usually clearer if you separate the two, and also breakdown the analysis to its three components of efficiency, completeness,and soundness.
470 introduction to theoretical computer science
14.6 REDUCING INDEPENDENT SET TO MAXIMUM CUT
We now show that the independent set problem reduces to the maxi-mum cut (or âmax cutâ) problem, modeled as the function MAXCUTthat on input a pair (đș, đ) outputs 1 iff đș contains a cut of at least đedges. Since both are graph problems, a reduction from independentset to max cut maps one graph into the other, but as we will see theoutput graph does not have to have the same vertices or edges as theinput graph.
Theorem 14.9 â Hardness of Max Cut. ISET â€đ MAXCUT
Proof Idea:
We will map a graph đș into a graph đ» such that a large indepen-dent set in đș becomes a partition cutting many edges in đ» . We canthink of a cut in đ» as coloring each vertex either âblueâ or âredâ. Wewill add a special âsourceâ vertex đ â, connect it to all other vertices,and assume without loss of generality that it is colored blue. Hencethe more vertices we color red, the more edges from đ â we cut. Now,for every edge đą, đŁ in the original graph đș we will add a special âgad-getâ which will be a small subgraph that involves đą,đŁ, the source đ â,and two other additional vertices. We design the gadget in a way sothat if the red vertices are not an independent set in đș then the cor-responding cut in đ» will be âpenalizedâ in the sense that it wouldnot cut as many edges. Once we set for ourselves this objective, it isnot hard to find a gadget that achieves itâ see the proof below. Onceagain the takeaway technique is to use (this time a slightly moreclever) gadget.â
Figure 14.10: In the reduction of ISET to MAXCUTwe map an đ-vertex đ-edge graph đș into theđ + 2đ + 1 vertex and đ + 5đ edge graph đ» asfollows. The graph đ» contains a special âsourceâvertex đ â,đ vertices đŁ0, ⊠, đŁđâ1, and 2đ ver-tices đ00, đ10, ⊠, đ0đâ1, đ1đâ1 with each pair cor-responding to an edge of đș. We put an edge be-tween đ â and đŁđ for every đ â [đ], and if the đĄ-thedge of đș was (đŁđ, đŁđ) then we add the five edges(đ â, đ0đĄ ), (đ â, đ1đĄ ), (đŁđ, đ0đĄ ), (đŁđ, đ1đĄ ), (đ0đĄ , đ1đĄ ). The intentis that if cut at most one of đŁđ, đŁđ from đ â then weâllbe able to cut 4 out of these five edges, while if wecut both đŁđ and đŁđ from đ â then weâll be able to cut atmost three of them.
Proof of Theorem 14.9. We will transform a graph đș of đ vertices and đedges into a graph đ» of đ + 1 + 2đ vertices and đ + 5đ edges in thefollowing way (see also Fig. 14.10). The graph đ» contains all vertices
polynomial-time reductions 471
Figure 14.11: In the reduction of independent setto max cut, for every đĄ â [đ], we have a âgadgetâcorresponding to the đĄ-th edge đ = đŁđ, đŁđ in theoriginal graph. If we think of the side of the cutcontaining the special source vertex đ â as âwhiteâ andthe other side as âblueâ, then the leftmost and centerfigures show that if đŁđ and đŁđ are not both blue thenwe can cut four edges from the gadget. In contrast,by enumerating all possibilities one can verify thatif both đą and đŁ are blue, then no matter how wecolor the intermediate vertices đ0đĄ , đ1đĄ , we will cut atmost three edges from the gadget. The figure abovecontains only the gadget edges and ignores the edgesconnecting đ â to the vertices đŁ0, ⊠, đŁđâ1.
of đș (though not the edges between them!) and in addition đ» alsohas:
* A special vertex đ â that is connected to all the vertices of đș* For every edge đ = đą, đŁ â đž(đș), two vertices đ0, đ1 such that đ0
is connected to đą and đ1 is connected to đŁ, and moreover we add theedges đ0, đ1, đ0, đ â, đ1, đ â to đ» .
Theorem 14.9 will follow by showing that đș contains an indepen-dent set of size at least đ if and only if đ» has a cut cutting at leastđ + 4đ edges. We now prove both directions of this equivalence:
Part 1: Completeness. If đŒ is an independent đ-sized set in đș, thenwe can define đ to be a cut in đ» of the following form: we let đ con-tain all the vertices of đŒ and for every edge đ = đą, đŁ â đž(đș), if đą â đŒand đŁ â đŒ then we add đ1 to đ; if đą â đŒ and đŁ â đŒ then we add đ0 tođ; and if đą â đŒ and đŁ â đŒ then we add both đ0 and đ1 to đ. (We donâtneed to worry about the case that both đą and đŁ are in đŒ since it is anindependent set.) We can verify that in all cases the number of edgesfrom đ to its complement in the gadget corresponding to đ will be four(see Fig. 14.11). Since đ â is not in đ, we also have đ edges from đ â to đŒ ,for a total of đ + 4đ edges.
Part 2: Soundness. Suppose that đ is a cut in đ» that cuts at leastđ¶ = đ + 4đ edges. We can assume that đ â is not in đ (otherwise wecan âflipâ đ to its complement đ, since this does not change the sizeof the cut). Now let đŒ be the set of vertices in đ that correspond to theoriginal vertices of đș. If đŒ was an independent set of size đ then wewould be done. This might not always be the case but we will see thatif đŒ is not an independent set then itâs also larger than đ. Specifically,we define đđđ = |đž(đŒ, đŒ)| be the set of edges in đș that are containedin đŒ and let đđđąđĄ = đ â đđđ (i.e., if đŒ is an independent set thenđđđ = 0 and đđđąđĄ = đ). By the properties of our gadget we knowthat for every edge đą, đŁ of đș, we can cut at most three edges whenboth đą and đŁ are in đ, and at most four edges otherwise. Hence thenumber đ¶ of edges cut by đ satisfies 𶠆|đŒ| + 3đđđ + 4đđđąđĄ =|đŒ| + 3đđđ + 4(đ â đđđ) = |đŒ| + 4đ â đđđ. Since đ¶ = đ + 4đ weget that |đŒ| â đđđ â„ đ. Now we can transform đŒ into an independentset đŒâČ by going over every one of the đđđ edges that are inside đŒ andremoving one of the endpoints of the edge from it. The resulting set đŒâČis an independent set in the graph đș of size |đŒ| â đđđ â„ đ and so thisconcludes the proof of the soundness condition.
472 introduction to theoretical computer science
Figure 14.12: The reduction of independent set tomax cut. On the righthand side is Python codeimplementing the reduction. On the lefthand side isan example output of the reduction where we applyit to the independent set instance that is obtained byrunning the reduction of Theorem 14.6 on the 3CNFformula (đ„0 âšđ„3 âšđ„2)â§(đ„0 âšđ„1 âšđ„2)â§(đ„1 âšđ„2 âšđ„3).
Figure 14.13: We can transform a 3SAT formula đ intoa graph đș such that the longest path in the graph đșwould correspond to a satisfying assignment in đ. Inthis graph, the black colored part corresponds to thevariables of đ and the blue colored part correspondsto the vertices. A sufficiently long path would have tofirst âsnakeâ through the black part, for each variablechoosing either the âupper pathâ (correspondingto assigning it the value True) or the âlower pathâ(corresponding to assigning it the value False). Thento achieve maximum length the path would traversethrough the blue part, where to go between twovertices corresponding to a clause such as đ„17 âš đ„32 âšđ„57, the corresponding vertices would have to havebeen not traversed before.
Figure 14.14: The graph above with the longest pathmarked on it, the part of the path corresponding tovariables is in green and part corresponding to theclauses is in pink.
14.7 REDUCING 3SAT TO LONGEST PATH
Note: This section is still a little messy; feel free to skip it or just readit without going into the proof details. The proof appears in Section7.5 in Sipserâs book.
One of the most basic algorithms in Computer Science is Dijkstraâsalgorithm to find the shortest path between two vertices. We now showthat in contrast, an efficient algorithm for the longest path problemwould imply a polynomial-time algorithm for 3SAT.
Theorem 14.10 â Hardness of longest path.3SAT â€đ LONGPATH (14.6)
Proof Idea:
To prove Theorem 14.10 need to show how to transform a 3CNFformula đ into a graph đș and two vertices đ , đĄ such that đș has a pathof length at least đ if and only if đ is satisfiable. The idea of the reduc-tion is sketched in Fig. 14.13 and Fig. 14.14. We will construct a graphthat contains a potentially long âsnaking pathâ that corresponds toall variables in the formula. We will add a âgadgetâ correspondingto each clause of đ in a way that we would only be able to use thegadgets if we have a satisfying assignment.âdef TSAT2LONGPATH(Ï):
"""Reduce 3SAT to LONGPATH"""def var(v): # return variable and True/False depending
if positive or negatedreturn int(v[2:]),False if v[0]=="ÂŹ" else
int(v[1:]),Truen = numvars(Ï)clauses = getclauses(Ï)m = len(clauses)G =Graph()
polynomial-time reductions 473
G.edge("start","start_0")for i in range(n): # add 2 length-m paths per variable
G.edge(f"start_i",f"v_i_0_T")G.edge(f"start_i",f"v_i_0_F")for j in range(m-1):
G.edge(f"v_i_j_T",f"v_i_j+1_T")G.edge(f"v_i_j_F",f"v_i_j+1_F")
G.edge(f"v_i_m-1_T",f"end_i")G.edge(f"v_i_m-1_F",f"end_i")if i<n-1:
G.edge(f"end_i",f"start_i+1")G.edge(f"end_n-1","start_clauses")for j,C in enumerate(clauses): # add gadget for each
clausefor v in enumerate(C):
i,sign = var(v[1])s = "F" if sign else "T"G.edge(f"C_j_in",f"v_i_j_s")G.edge(f"v_i_j_s",f"C_j_out")
if j<m-1:G.edge(f"C_j_out",f"C_j+1_in")
G.edge("start_clauses","C_0_in")G.edge(f"C_m-1_out","end")return G, 1+n*(m+1)+1+2*m+1
Proof of Theorem 14.10. We build a graph đș that âsnakesâ from đ to đĄ asfollows. After đ we add a sequence of đ long loops. Each loop has anâupper pathâ and a âlower pathâ. A simple path cannot take both theupper path and the lower path, and so it will need to take exactly oneof them to reach đ from đĄ.
Our intention is that a path in the graph will correspond to an as-signment đ„ â 0, 1đ in the sense that taking the upper path in the đđĄâloop corresponds to assigning đ„đ = 1 and taking the lower path cor-responds to assigning đ„đ = 0. When we are done snaking through allthe đ loops corresponding to the variables to reach đĄ we need to passthrough đ âobstaclesâ: for each clause đ we will have a small gad-get consisting of a pair of vertices đ đ, đĄđ that have three paths betweenthem. For example, if the đđĄâ clause had the form đ„17 âš đ„55 âš đ„72 thenone path would go through a vertex in the lower loop correspondingto đ„17, one path would go through a vertex in the upper loop corre-sponding to đ„55 and the third would go through the lower loop cor-responding to đ„72. We see that if we went in the first stage accordingto a satisfying assignment then we will be able to find a free vertex totravel from đ đ to đĄđ. We link đĄ1 to đ 2, đĄ2 to đ 3, etc and link đĄđ to đĄ. Thus
474 introduction to theoretical computer science
Figure 14.15: The result of applying the reduction of3SAT to LONGPATH to the formula (đ„0 âš ÂŹđ„3 âš đ„2) â§(ÂŹđ„0 âš đ„1 âš ÂŹđ„2) ⧠(đ„1 âš đ„2 âš ÂŹđ„3).
a satisfying assignment would correspond to a path from đ to đĄ thatgoes through one path in each loop corresponding to the variables,and one path in each loop corresponding to the clauses. We can makethe loop corresponding to the variables long enough so that we musttake the entire path in each loop in order to have a fighting chance ofgetting a path as long as the one corresponds to a satisfying assign-ment. But if we do that, then the only way if we are able to reach đĄ isif the paths we took corresponded to a satisfying assignment, sinceotherwise we will have one clause đ where we cannot reach đĄđ from đ đwithout using a vertex we already used before.
14.7.1 Summary of relationsWe have shown that there are a number of functions đč for which wecan prove a statement of the form âIf đč â P then 3SAT â Pâ. Hencecoming up with a polynomial-time algorithm for even one of theseproblems will entail a polynomial-time algorithm for 3SAT (see forexample Fig. 14.16). In Chapter 15 we will show the inverse direction(âIf 3SAT â P then đč â Pâ) for these functions, hence allowing us toconclude that they have equivalent complexity to 3SAT.
Figure 14.16: So far we have shown that P â EXPand that several problems we care about such as3SAT and MAXCUT are in EXP but it is not knownwhether or not they are in EXP. However, since3SAT â€đ MAXCUT we can rule out the possiblitythat MAXCUT â P but 3SAT â P. The relation ofP/poly to the class EXP is not known. We know thatEXP does not contain P/poly since the latter evencontains uncomputable functions, but we do notknow whether ot not EXP â P/poly (though it isbelieved that this is not the case and in particular thatboth 3SAT and MAXCUT are not in P/poly). Chapter Recap
âą The computational complexity of many seeminglyunrelated computational problems can be relatedto one another through the use of reductions.
âą If đč â€đ đș then a polynomial-time algorithmfor đș can be transformed into a polynomial-timealgorithm for đč .
âą Equivalently, if đč â€đ đș and đč does not have apolynomial-time algorithm then neither does đș.
âą Weâve developed many techniques to show that3SAT â€đ đč for interesting functions đč . Sometimeswe can do so by using transitivity of reductions: if3SAT â€đ đș and đș â€đ đč then 3SAT â€đ đč .
polynomial-time reductions 475
14.8 EXERCISES
14.9 BIBLIOGRAPHICAL NOTES
Several notions of reductions are defined in the literature. The notiondefined in Definition 14.1 is often known as a mapping reduction, manyto one reduction or a Karp reduction.
The maximal (as opposed to maximum) independent set is the taskof finding a âlocal maximumâ of an independent set: an independentset đ such that one cannot add a vertex to it without losing the in-dependence property (such a set is known as a vertex cover). Unlikefinding a maximum independent set, finding a maximal independentset can be done efficiently by a greedy algorithm, but this local maxi-mum can be much smaller than the global maximum.
Reduction of independent set to max cut taken from these notes.Image of Hamiltonian Path through Dodecahedron by ChristophSommer.
We have mentioned that the line between reductions used for algo-rithm design and showing hardness is sometimes blurry. An excellentexample for this is the area of SAT Solvers (see [Gom+08]). In thisfield people use algorithms for SAT (that take exponential time in theworst case but often are much faster on many instances in practice)together with reductions of the form đč â€đ SAT to derive algorithmsfor other functions đč of interest.
15NP, NP completeness, and the Cook-Levin Theorem
âIn this paper we give theorems that suggest, but do not imply, that theseproblems, as well as many others, will remain intractable perpetuallyâ, RichardKarp, 1972
âSad to say, but it will be many more years, if ever before we really understandthe Mystical Power of Twoness⊠2-SAT is easy, 3-SAT is hard, 2-dimensionalmatching is easy, 3-dimensional matching is hard. Why? oh, Why?â EugeneLawler
So far we have shown that 3SAT is no harder than Quadratic Equa-tions, Independent Set, Maximum Cut, and Longest Path. But to showthat these problems are computationally equivalent we need to give re-ductions in the other direction, reducing each one of these problems to3SAT as well. It turns out we can reduce all three problems to 3SAT inone fell swoop.
In fact, this result extends far beyond these particular problems. Allof the problems we discussed in Chapter 14, and a great many otherproblems, share the same commonality: they are all search problems,where the goal is to decide, given an instance đ„, whether there existsa solution đŠ that satisfies some condition that can be verified in poly-nomial time. For example, in 3SAT, the instance is a formula and thesolution is an assignment to the variable; in Max-Cut the instance is agraph and the solution is a cut in the graph; and so on and so forth. Itturns out that every such search problem can be reduced to 3SAT.
15.1 THE CLASS NPTo make the above precise, we will make the following mathematicaldefinition. We define the class NP to contain all Boolean functions thatcorrespond to a search problem of the form above. That is, a Booleanfunction đč is in NP if đč has the form that on input a string đ„, đč(đ„) = 1if and only if there exists a âsolutionâ string đ€ such that the pair (đ„, đ€)satisfies some polynomial-time checkable condition. Formally, NP isdefined as follows:
Compiled on 8.26.2020 18:10
Learning Objectives:âą Introduce the class NP capturing a great many
important computational problemsâą NP-completeness: evidence that a problem
might be intractable.âą The P vs NP problem.
478 introduction to theoretical computer science
Figure 15.1: Overview of the results of this chapter.We define NP to contain all decision problems forwhich a solution can be efficiently verified. The mainresult of this chapter is the Cook Levin Theorem (The-orem 15.6) which states that 3SAT has a polynomial-time algorithm if and only if every problem in NPhas a polynomial-time algorithm. Another way tostate this theorem is that 3SAT is NP complete. Wewill prove the Cook-Levin theorem by defining thetwo intermediate problems NANDSAT and 3NAND,proving that NANDSAT is NP complete, and thenproving that NANDSAT â€đ 3NAND â€đ 3SAT.
Figure 15.2: The class NP corresponds to problemswhere solutions can be efficiently verified. That is, thisis the class of functions đč such that đč(đ„) = 1 if thereis a âsolutionâ đ€ of length polynomial in |đ„| that canbe verified by a polynomial-time algorithm đ .
Definition 15.1 â NP. We say that đč ⶠ0, 1â â 0, 1 is in NP if thereexists some integer đ > 0 and đ ⶠ0, 1â â 0, 1 such that đ â Pand for every đ„ â 0, 1đ,đč(đ„) = 1 â âđ€â0,1đđ s.t. đ (đ„đ€) = 1 . (15.1)
In other words, for đč to be in NP, there needs to exist somepolynomial-time computable verification function đ , such that ifđč(đ„) = 1 then there must exist đ€ (of length polynomial in |đ„|) suchthat đ (đ„đ€) = 1, and if đč(đ„) = 0 then for every such đ€, đ (đ„đ€) = 0.Since the existence of this string đ€ certifies that đč(đ„) = 1, đ€ is oftenreferred to as a certificate, witness, or proof that đč(đ„) = 1.
See also Fig. 15.2 for an illustration of Definition 15.1. The nameNP stands for ânondeterministic polynomial timeâ and is used forhistorical reasons; see the bibiographical notes. The string đ€ in (15.1)is sometimes known as a solution, certificate, or witness for the instanceđ„.Solved Exercise 15.1 â Alternative definition of NP. Show that the conditionthat |đ€| = |đ„|đ in Definition 15.1 can be replaced by the conditionthat |đ€| †đ(|đ„|) for some polynomial đ. That is, prove that for everyđč ⶠ0, 1â â 0, 1, đč â NP if and only if there is a polynomial-time Turing machine đ and a polynomial đ ⶠâ â â such that forevery đ„ â 0, 1â đč(đ„) = 1 if and only if there exists đ€ â 0, 1â with|đ€| †đ(|đ„|) such that đ (đ„, đ€) = 1.
np, np completeness, and the cook-levin theorem 479
Solution:
The âonly ifâ direction (namely that if đč â NP then there is analgorithm đ and a polynomial đ as above) follows immediatelyfrom Definition 15.1 by letting đ(đ) = đđ. For the âifâ direc-tion, the idea is that if a string đ€ is of size at most đ(đ) for degreeđ polynomial đ, then there is some đ0 such that for all đ > đ0,|đ€| < đđ+1. Hence we can encode đ€ by a string of exactly lengthđđ+1 by padding it with 1 and an appropriate number of zeroes.Hence if there is an algorithm đ and polynomial đ as above, thenwe can define an algorithm đ âČ that does the following on inputđ„, đ€âČ with |đ„| = đ and |đ€âČ| = đđ:âą If đ †đ0 then đ âČ(đ„, đ€âČ) ignores đ€âČ and enumerates over all đ€
of length at most đ(đ) and outputs 1 if there exists đ€ such thatđ (đ„, đ€) = 1. (Since đ < đ0, this only takes a constant number ofsteps.)
âą If đ > đ0 then đ âČ(đ„, đ€âČ) âstrips outâ the padding by droppingall the rightmost zeroes from đ€ until it reaches out the first 1(which it drops as well) and obtains a string đ€. If |đ€| †đ(đ)then đ âČ outputs đ (đ„, đ€).Since đ runs in polynomial time, đ âČ runs in polynomial time
as well, and by definition for every đ„, there exists đ€âČ â 0, 1|đ„|đsuch that đ âČ(đ„đ€âČ) = 1 if and only if there exists đ€ â 0, 1â with|đ€| †đ(|đ„|) such that đ (đ„đ€) = 1.
The definition of NP means that for every đč â NP and stringđ„ â 0, 1â, đč(đ„) = 1 if and only if there is a short and efficientlyverifiable proof of this fact. That is, we can think of the function đ inDefinition 15.1 as a verifier algorithm, similar to what weâve seen inSection 11.1. The verifier checks whether a given string đ€ â 0, 1â is avalid proof for the statement âđč(đ„) = 1â. Essentially all proof systemsconsidered in mathematics involve line-by-line checks that can be car-ried out in polynomial time. Thus the heart of NP is asking for state-ments that have short (i.e., polynomial in the size of the statements)proof. Indeed, as we will see in Chapter 16, Kurt Gödel phrased thequestion of whether NP = P as asking whether âthe mental work ofa mathematician [in proving theorems] could be completely replacedby a machineâ.
RRemark 15.2 â NP not (necessarily) closed under com-plement. Definition 15.1 is asymmetric in the sense that
480 introduction to theoretical computer science
there is a difference between an output of 1 and anoutput of 0. You should make sure you understandwhy this definition does not guarantee that if đč â NPthen the function 1 â đč (i.e., the map đ„ ⊠1 â đč(đ„)) isin NP as well.In fact, it is believed that there do exist functions đčsuch that đč â NP but 1 â đč â NP. For example, asshown below, 3SAT â NP, but the function 3SAT thaton input a 3CNF formula đ outputs 1 if and only if đis not satisfiable is not known (nor believed) to be inNP. This is in contrast to the class P which does satisfythat if đč â P then 1 â đč is in P as well.
15.1.1 Examples of functions in NPWe now present some examples of functions that are in the class NP.We start with the canonical example of the 3SAT function.
Example 15.3 â 3đđŽđ â NP. 3SAT is in NP since for every â-variable formula đ, 3SAT(đ) = 1 if and only if there exists asatisfying assignment đ„ â 0, 1â such that đ(đ„) = 1, and wecan check this condition in polynomial time.
The above reasoning explains why 3SAT is in NP, but since thisis our first example, we will now belabor the point and expand outin full formality the precise representation of the witness đ€ and thealgorithm đ that demonstrate that 3SAT is in NP. Since demon-strating that functions are in NP is fairly straightforward, in futurecases we will not use as much detail, and the reader can also feelfree to skip the rest of this example.
Using Solved Exercise 15.1, it is OK if witness is of size at mostpolynomial in the input length đ, rather than of precisely size đđfor some integer đ > 0. Specifically, we can represent a 3CNFformula đ with đ variables and đ clauses as a string of lengthđ = đ(đ log đ), since every one of the đ clauses involves threevariables and their negation, and the identity of each variable canbe represented using âlog2 đâ. We assume that every variable par-ticipates in some clause (as otherwise it can be ignored) and hencethat đ â„ đ, which in particular means that the input length đ is atleast as large as đ and đ.
We can represent an assignment to the đ variables using a đ-length string đ€. The following algorithm checks whether a given đ€satisfies the formula đ:
np, np completeness, and the cook-levin theorem 481
Algorithm 15.4 â Verifier for 3đđŽđ .
Input: 3CNF formula đ on đ variables and with đclauses, string đ€ â 0, 1đ
Output: 1 iff đ€ satisfies đ1: for đ â [đ] do2: Let â1 âš â2 âš âđ be the đ-th clause of đ3: if đ€ violates all three literals then4: return 05: end if6: end for7: return 1
Algorithm 15.4 takes đ(đ) time to enumerate over all clauses,and will return 1 if and only if đŠ satisfies all the clauses.
Here are some more examples for problems in NP. For each oneof these problems we merely sketch how the witness is representedand why it is efficiently checkable, but working out the details can be agood way to get more comfortable with Definition 15.1:
âą QUADEQ is in NP since for every â-variable instance of quadraticequations đž, QUADEQ(đž) = 1 if and only if there exists an assign-ment đ„ â 0, 1â that satisfies đž. We can check the condition thatđ„ satisfies đž in polynomial time by enumerating over all the equa-tions in đž, and for each such equation đ, plug in the values of đ„ andverify that đ is satisfied.
âą ISET is in NP since for every graph đș and integer đ, ISET(đș, đ) =1 if and only if there exists a set đ of đ vertices that contains nopair of neighbors in đș. We can check the condition that đ is anindependent set of size â„ đ in polynomial time by first checkingthat |đ| â„ đ and then enumerating over all edges đą, đŁ in đș, andfor each such edge verify that either đą â đ or đŁ â đ.
âą LONGPATH is in NP since for every graph đș and integer đ,LONGPATH(đș, đ) = 1 if and only if there exists a simple path đin đș that is of length at least đ. We can check the condition that đis a simple path of length đ in polynomial time by checking that ithas the form (đŁ0, đŁ1, ⊠, đŁđ) where each đŁđ is a vertex in đș, no đŁđ isrepeated, and for every đ â [đ], the edge đŁđ, đŁđ+1 is present in thegraph.
âą MAXCUT is in NP since for every graph đș and integer đ,MAXCUT(đș, đ) = 1 if and only if there exists a cut (đ, đ) in đș thatcuts at least đ edges. We can check that condition that (đ, đ) is acut of value at least đ in polynomial time by checking that đ is a
482 introduction to theoretical computer science
subset of đșâs vertices and enumerating over all the edges đą, đŁ ofđș, counting those edges such that đą â đ and đŁ â đ or vice versa.
15.1.2 Basic facts about NPThe definition of NP is one of the most important definitions of thisbook, and is worth while taking the time to digest and internalize. Thefollowing solved exercises establish some basic properties of this class.As usual, I highly recommend that you try to work out the solutionsyourself.Solved Exercise 15.2 â Verifying is no harder than solving. Prove that P â NP.
Solution:
Suppose that đč â P. Define the following function đ : đ (đ„0đ) =1 iff đ = |đ„| and đč(đ„) = 1. (đ outputs 0 on all other inputs.) Sinceđč â P we can clearly compute đ in polynomial time as well.Let đ„ â 0, 1đ be some string. If đč(đ„) = 1 then đ (đ„0đ) = 1. On
the other hand, if đč(đ„) = 0 then for every đ€ â 0, 1đ, đ (đ„đ€) = 0.Therefore, setting đ = đ = 1, we see that đ satisfies (15.1), and es-tablishes that đč â NP.
RRemark 15.5 â NP does not mean non-polynomial!.People sometimes think that NP stands for ânon poly-nomial timeâ. As Solved Exercise 15.2 shows, this isfar from the truth, and in fact every polynomial-timecomputable function is in NP as well.If đč is in NP it certainly does not mean that đč is hardto compute (though it does not, as far as we know,necessarily mean that itâs easy to compute either).Rather, it means that đč is easy to verify, in the technicalsense of Definition 15.1.
Solved Exercise 15.3 â NP is in exponential time. Prove that NP â EXP.
Solution:
Suppose that đč â NP and let đ be the polynomial-time com-putable function that satisfies (15.1) and đ the correspondingconstant. Then given every đ„ â 0, 1đ, we can check whetherđč(đ„) = 1 in time đđđđŠ(đ) â 2đđ = đ(2đđ+1) by enumerating overall the 2đđ strings đ€ â 0, 1đđ and checking whether đ (đ„đ€) = 1,in which case we return 1. If đ (đ„đ€) = 0 for every such đ€ then wereturn 0. By construction, the algorithm above will run in time at
np, np completeness, and the cook-levin theorem 483
most exponential in its input length and by the definition of NP itwill return đč(đ„) for every đ„.
Solved Exercise 15.2 and Solved Exercise 15.3 together imply that
P â NP â EXP . (15.2)
The time hierarchy theorem (Theorem 13.9) implies that P â EXPand hence at least one of the two inclusions P â NP or NP â EXPis strict. It is believed that both of them are in fact strict inclusions.That is, it is believed that there are functions in NP that cannot becomputed in polynomial time (this is the P â NP conjecture) andthat there are functions đč in EXP for which we cannot even effi-ciently certify that đč(đ„) = 1 for a given input đ„. One function đčthat is believed to lie in EXP ⧔ NP is the function 3SAT defined as3SAT(đ) = 1 â 3SAT(đ) for every 3CNF formula đ. The conjecturethat 3SAT â NP is known as the âNP â co â NPâ conjecture. Itimplies the P â NP conjecture (see Exercise 15.2).
We have previously informally equated the notion of đč â€đ đș withđč being âno harder than đșâ and in particular have seen in SolvedExercise 14.1 that if đș â P and đč â€đ đș, then đč â P as well. Thefollowing exercise shows that if đč â€đ đș then it is also âno harder toverifyâ than đș. That is, regardless of whether or not it is in P, if đș hasthe property that solutions to it can be efficiently verified, then so doesđč .Solved Exercise 15.4 â Reductions and NP. Let đč, đș ⶠ0, 1â â 0, 1.Show that if đč â€đ đș and đș â NP then đč â NP.
Solution:
Suppose that đș is in NP and in particular there exists đ and đ âP such that for every đŠ â 0, 1â, đș(đŠ) = 1 â âđ€â0,1|đŠ|đ đ (đŠđ€) = 1.Suppose also that đč â€đ đș and so in particular there is a đđ-time computable function đ such that đč(đ„) = đș(đ (đ„)) for allđ„ â 0, 1â. Define đ âČ to be a Turing Machine that on input a pair(đ„, đ€) computes đŠ = đ (đ„) and returns 1 if and only if |đ€| = |đŠ|đand đ (đŠđ€) = 1. Then đ âČ runs in polynomial time, and for everyđ„ â 0, 1â, đč(đ„) = 1 iff there exists đ€ of size |đ (đ„)|đ which is atmost polynomial in |đ„| such that đ âČ(đ„, đ€) = 1, hence demonstratingthat đč â NP.
484 introduction to theoretical computer science
15.2 FROM NP TO 3SAT: THE COOK-LEVIN THEOREM
We have seen several examples of problems for which we do not knowif their best algorithm is polynomial or exponential, but we can showthat they are in NP. That is, we donât know if they are easy to solve, butwe do know that it is easy to verify a given solution. There are many,many, many, more examples of interesting functions we would like tocompute that are easily shown to be in NP. What is quite amazing isthat if we can solve 3SAT then we can solve all of them!
The following is one of the most fundamental theorems in Com-puter Science:
Theorem 15.6 â Cook-Levin Theorem. For every đč â NP, đč â€đ 3SAT.We will soon show the proof of Theorem 15.6, but note that it im-
mediately implies that QUADEQ, LONGPATH, and MAXCUT allreduce to 3SAT. Combining it with the reductions weâve seen in Chap-ter 14, it implies that all these problems are equivalent! For example,to reduce QUADEQ to LONGPATH, we can first reduce QUADEQ to3SAT using Theorem 15.6 and use the reduction weâve seen in Theo-rem 14.10 from 3SAT to LONGPATH. That is, since QUADEQ â NP,Theorem 15.6 implies that QUADEQ â€đ 3SAT, and Theorem 14.10implies that 3SAT â€đ LONGPATH, which by the transitivity of reduc-tions (Solved Exercise 14.2) means that QUADEQ â€đ LONGPATH.Similarly, since LONGPATH â NP, we can use Theorem 15.6 andTheorem 14.4 to show that LONGPATH â€đ 3SAT â€đ QUADEQ,concluding that LONGPATH and QUADEQ are computationallyequivalent.
There is of course nothing special about QUADEQ and LONGPATHhere: by combining (15.6) with the reductions we saw, we see that justlike 3SAT, every đč â NP reduces to LONGPATH, and the same is truefor QUADEQ and MAXCUT. All these problems are in some senseâthe hardest in NPâ since an efficient algorithm for any one of themwould imply an efficient algorithm for all the problems in NP. Thismotivates the following definition:
Definition 15.7 â NP-hardness and NP-completeness. Let đș ⶠ0, 1â â0, 1. We say that đș is NP hard if for every đč â NP, đč â€đ đș.We say that đș is NP complete if đș is NP hard and đș â NP.
The Cook-Levin Theorem (Theorem 15.6) can be rephrased assaying that 3SAT is NP hard, and since it is also in NP, this means that3SAT is NP complete. Together with the reductions of Chapter 14,Theorem 15.6 shows that despite their superficial differences, 3SAT,quadratic equations, longest path, independent set, and maximum
np, np completeness, and the cook-levin theorem 485
Figure 15.3: The world if P â NP (left) and P = NP(right). In the former case the set of NP-completeproblems is disjoint from P and Ladnerâs theoremshows that there exist problems that are neither inP nor are NP-complete. (There are remarkably fewnatural candidates for such problems, with someprominent examples being decision variants ofproblems such as integer factoring, lattice shortestvector, and finding Nash equilibria.) In the latter casethat P = NP the notion of NP-completeness loses itsmeaning, as essentially all functions in P (save for thetrivial constant zero and constant one functions) areNP-complete.
Figure 15.4: A rough illustration of the (conjectured)status of problems in exponential time. Darker colorscorrespond to higher running time, and the circle inthe middle is the problems in P. NP is a (conjecturedto be proper) superclass of P and the NP-completeproblems (or NPC for short) are the âhardestâ prob-lems in NP, in the sense that a solution for one ofthem implies a solution for all other problems in NP.It is conjectured that all the NP-complete problemsrequire at least exp(đđ) time to solve for a constantđ > 0, and many require exp(Ω(đ)) time. The per-manent is not believed to be contained in NP thoughit is NP-hard, which means that a polynomial-timealgorithm for it implies that P = NP.
cut, are all NP-complete. Many thousands of additional problemshave been shown to be NP-complete, arising from all the sciences,mathematics, economics, engineering and many other fields. (For afew examples, see this Wikipedia page and this website.)
Big Idea 22 If a single NP-complete has a polynomial-time algo-rithm, then there is such an algorithm for every decision problem thatcorresponds to the existence of an efficiently-verifiable solution.
15.2.1 What does this mean?As weâve seen in Solved Exercise 15.2, P â NP. The most famous con-jecture in Computer Science is that this containment is strict. That is,it is widely conjectured that P â NP. One way to refute the conjec-ture that P â NP is to give a polynomial-time algorithm for even asingle one of the NP-complete problems such as 3SAT, Max Cut, orthe thousands of others that have been studied in all fields of humanendeavors. The fact that these problems have been studied by so manypeople, and yet not a single polynomial-time algorithm for any ofthem has been found, supports that conjecture that indeed P â NP. Infact, for many of these problems (including all the ones we mentionedabove), we donât even know of a 2đ(đ)-time algorithm! However, to thefrustration of computer scientists, we have not yet been able to provethat P â NP or even rule out the existence of an đ(đ)-time algorithmfor 3SAT. Resolving whether or not P = NP is known as the P vs NPproblem. A million-dollar prize has been offered for the solution ofthis problem, a popular book has been written, and every year a newpaper comes out claiming a proof of P = NP or P â NP, only to witherunder scrutiny.
One of the mysteries of computation is that people have observed acertain empirical âzero-one lawâ or âdichotomyâ in the computationalcomplexity of natural problems, in the sense that many natural prob-lems are either in P (often in TIME(đ(đ)) or TIME(đ(đ2))), or theyare are NP hard. This is related to the fact that for most natural prob-lems, the best known algorithm is either exponential or polynomial,with not too many examples where the best running time is somestrange intermediate complexity such as 22âlogđ . However, it is be-lieved that there exist problems in NP that are neither in P nor are NP-complete, and in fact a result known as âLadnerâs Theoremâ showsthat if P â NP then this is indeed the case (see also Exercise 15.1 andFig. 15.3).
486 introduction to theoretical computer science
15.2.2 The Cook-Levin Theorem: Proof outlineWe will now prove the Cook-Levin Theorem, which is the underpin-ning to a great web of reductions from 3SAT to thousands of problemsacross many great fields. Some problems that have been shown to beNP-complete include: minimum-energy protein folding, minimumsurface-area foam configuration, map coloring, optimal Nash equi-librium, quantum state entanglement, minimum supersequence ofa genome, minimum codeword problem, shortest vector in a lattice,minimum genus knots, positive Diophantine equations, integer pro-gramming, and many many more. The worst-case complexity of allthese problems is (up to polynomial factors) equivalent to that of3SAT, and through the Cook-Levin Theorem, to all problems in NP.
To prove Theorem 15.6 we need to show that đč â€đ 3SAT for everyđč â NP. We will do so in three stages. We define two intermediateproblems: NANDSAT and 3NAND. We will shortly show the def-initions of these two problems, but Theorem 15.6 will follow fromcombining the following three results:
1. NANDSAT is NP hard (Lemma 15.8).
2. NANDSAT â€đ 3NAND (Lemma 15.9).
3. 3NAND â€đ 3SAT (Lemma 15.10).
By the transitivity of reductions, it will follow that for every đč âNP, đč â€đ NANDSAT â€đ 3NAND â€đ 3SAT (15.3)
hence establishing Theorem 15.6.We will prove these three results Lemma 15.8, Lemma 15.9 and
Lemma 15.10 one by one, providing the requisite definitions as we goalong.
15.3 THE NANDSAT PROBLEM, AND WHY IT IS NP HARD
The function NANDSAT ⶠ0, 1â â 0, 1 is defined as follows:
âą The input to NANDSAT is a string đ representing a NAND-CIRCprogram (or equivalently, a circuit with NAND gates).
âą The output of NANDSAT on input đ is 1 if and only if there exists astring đ€ â 0, 1đ (where đ is the number of inputs to đ) such thatđ(đ€) = 1.
Solved Exercise 15.5 â đđŽđđ·đđŽđ â NP. Prove that NANDSAT â NP.
np, np completeness, and the cook-levin theorem 487
Solution:
We have seen that the circuit (or straightline program) evalua-tion problem can be computed in polynomial time. Specifically,given a NAND-CIRC program đ of đ lines and đ inputs, andđ€ â 0, 1đ, we can evaluate đ on the input đ€ in time which ispolynomial in đ and hence verify whether or not đ(đ€) = 1.
We now prove that NANDSAT is NP hard.Lemma 15.8 NANDSAT is NP hard.
Proof Idea:
The proof closely follows the proof that P â P/poly (Theorem 13.12, see also Section 13.6.2). Specifically, if đč â NP then there is a poly-nomial time Turing machine đ and positive integer đ such that forevery đ„ â 0, 1đ, đč(đ„) = 1 iff there is some đ€ â 0, 1đđ such thatđ(đ„đ€) = 1. The proof that P â P/poly gave us way (via âunrolling theloopâ) to come up in polynomial time with a Boolean circuit đ¶ on đđinputs that computes the function đ€ ⊠đ(đ„đ€). We can then translateđ¶ into an equivalent NAND circuit (or NAND-CIRC program) đ. Wesee that there is a string đ€ â 0, 1đđ such that đ(đ€) = 1 if and only ifthere is such đ€ satisfying đ(đ„đ€) = 1 which (by definition) happens ifand only if đč(đ„) = 1. Hence the translation of đ„ into the circuit đ is areduction showing đč â€đ NANDSAT.â
PThe proof is a little bit technical but ultimately followsquite directly from the definition of NP, as well as theability to âunroll the loopâ of NAND-TM programs asdiscussed in Section 13.6.2. If you find it confusing, tryto pause here and think how you would implementin your favorite programming language the functionunroll which on input a NAND-TM program đand numbers đ , đ outputs an đ-input NAND-CIRCprogram đ of đ(|đ |) lines such that for every inputđ§ â 0, 1đ, if đ halts on đ§ within at most đ steps andoutputs đŠ, then đ(đ§) = đŠ.
Proof of Lemma 15.8. Let đč â NP. To prove Lemma 15.8 we need togive a polynomial-time computable function that will map every đ„â â0, 1â to a NAND-CIRC program đ such that đč(đ„) = NANDSAT(đ).
Let đ„â â 0, 1â be such a string and let đ = |đ„â| be its length. ByDefinition 15.1 there exists đ â P and positive đâ such that đč(đ„â) = 1if and only if there exists đ€ â 0, 1đđ satisfying đ (đ„âđ€) = 1.
488 introduction to theoretical computer science
Let đ = đđ. Since đ â P there is some NAND-TM program đ â thatcomputes đ on inputs of the form đ„đ€ with đ„ â 0, 1đ and đ€ â 0, 1đin at most (đ + đ)đ time for some constant đ. Using our âunrollingthe loop NAND-TM to NAND compilerâ of Theorem 13.14, we canobtain a NAND-CIRC program đâČ that has đ + đ inputs and at mostđ((đ + đ)2đ) lines such that đâČ(đ„đ€) = đ â(đ„đ€) for every đ„ â 0, 1đand đ€ â 0, 1đ.
We can then use a simple âhardwiringâ technique, reminiscent ofRemark 9.11 to map đâČ into a circuit/NAND-CIRC program đ on đinputs such that đ(đ€) = đâČ(đ„âđ€) for every đ€ â 0, 1đ.
CLAIM: There is a polynomial-time algorithm that on input aNAND-CIRC program đâČ on đ + đ inputs and đ„â â 0, 1đ, outputsa NAND-CIRC program đ such that for every đ€ â 0, 1đ, đ(đ€) =đâČ(đ„âđ€).
PROOF OF CLAIM: We can do so by adding a few lines to ensurethat the variables zero and one are 0 and 1 respectively, and thensimply replacing any reference in đâČ to an input đ„đ with đ â [đ] thecorresponding value based on đ„âđ . See Fig. 15.5 for an implementationof this reduction in Python.
Our final reduction maps an input đ„â, into the NAND-CIRC pro-gram đ obtained above. By the above discussion, this reduction runsin polynomial time. Since we know that đč(đ„â) = 1 if and only if thereexists đ€ â 0, 1đ such that đ â(đ„âđ€) = 1, this means that đč(đ„â) = 1 ifand only if NANDSAT(đ) = 1, which is what we wanted to prove.
Figure 15.5: Given an đ -line NAND-CIRC programđ that has đ + đ inputs and some đ„â â 0, 1đ,we can transform đ into a đ + 3 line NAND-CIRCprogram đâČ that computes the map đ€ ⊠đ(đ„âđ€)for đ€ â 0, 1đ by simply adding code to computethe zero and one constants, replacing all references toX[đ] with either zero or one depending on the valueof đ„âđ , and then replacing the remaining referencesto X[đ] with X[đ â đ]. Above is Python code thatimplements this transformation, as well as an exampleof its execution on a simple program.
15.4 THE 3NAND PROBLEM
The 3NAND problem is defined as follows:
âą The input is a logical formula Κ on a set of variables đ§0, ⊠, đ§đâ1which is an AND of constraints of the form đ§đ = NAND(đ§đ, đ§đ).
np, np completeness, and the cook-levin theorem 489
âą The output is 1 if and only if there is an input đ§ â 0, 1đ thatsatisfies all of the constraints.
For example, the following is a 3NAND formula with 5 variablesand 3 constraints:
Κ = (đ§3 = NAND(đ§0, đ§2))â§(đ§1 = NAND(đ§0, đ§2))â§(đ§4 = NAND(đ§3, đ§1)) .(15.4)
In this case 3NAND(Κ) = 1 since the assignment đ§ = 01010 satisfiesit. Given a 3NAND formula Κ on đ variables and an assignment đ§ â0, 1đ, we can check in polynomial time whether Κ(đ§) = 1, and hence3NAND â NP. We now prove that 3NAND is NP hard:Lemma 15.9 NANDSAT â€đ 3NAND.
Proof Idea:
To prove Lemma 15.9 we need to give a polynomial-time map fromevery NAND-CIRC program đ to a 3NAND formula Κ such that thereexists đ€ such that đ(đ€) = 1 if and only if there exists đ§ satisfying Κ.For every line đ of đ, we define a corresponding variable đ§đ of Κ. Ifthe line đ has the form foo = NAND(bar,blah) then we will add theclause đ§đ = NAND(đ§đ, đ§đ) where đ and đ are the last lines in which barand blah were written to. We will also set variables correspondingto the input variables, as well as add a clause to ensure that the finaloutput is 1. The resulting reduction can be implemented in about adozen lines of Python, see Fig. 15.6.â
Figure 15.6: Python code to reduce an instance đ ofNANDSAT to an instance Κ of 3NAND. In the exam-ple above we transform the NAND-CIRC programxor5 which has 5 input variables and 16 lines, intoa 3NAND formula Κ that has 24 variables and 20clauses. Since xor5 outputs 1 on the input 1, 0, 0, 1, 1,there exists an assignment đ§ â 0, 124 to Κ such that(đ§0, đ§1, đ§2, đ§3, đ§4) = (1, 0, 0, 1, 1) and Κ evaluates totrue on đ§.
490 introduction to theoretical computer science
Proof of Lemma 15.9. To prove Lemma 15.9 we need to give a reductionfrom NANDSAT to 3NAND. Let đ be a NAND-CIRC program withđ inputs, one output, and đ lines. We can assume without loss ofgenerality that đ contains the variables one and zero as usual.
We map đ to a 3NAND formula Κ as follows:
⹠Κ has đ + đ variables đ§0, ⊠, đ§đ+đâ1.âą The first đ variables đ§0, ⊠, đ§đâ1 will corresponds to the inputs of đ.
The next đ variables đ§đ, ⊠, đ§đ+đâ1 will correspond to the đ linesof đ.
âą For every â â đ, đ + 1, ⊠, đ + đ, if the â â đ-th line of the programđ is foo = NAND(bar,blah) then we add to Κ the constraint đ§â =NAND(đ§đ, đ§đ) where đ â đ and đ â đ correspond to the last linesin which the variables bar and blah (respectively) were written to.If one or both of bar and blah was not written to before then weuse đ§â0 instead of the corresponding value đ§đ or đ§đ in the constraint,where â0 â đ is the line in which zero is assigned a value. If one orboth of bar and blah is an input variable X[i] then we use đ§đ in theconstraint.
âą Let ââ be the last line in which the output y_0 is assigned a value.Then we add the constraint đ§ââ = NAND(đ§â0 , đ§â0) where â0 â đ is asabove the last line in which zero is assigned a value. Note that thisis effectively the constraint đ§ââ = NAND(0, 0) = 1.To complete the proof we need to show that there exists đ€ â 0, 1đ
s.t. đ(đ€) = 1 if and only if there exists đ§ â 0, 1đ+đ that satisfies allconstraints in Κ. We now show both sides of this equivalence.
Part I: Completeness. Suppose that there is đ€ â 0, 1đ s.t. đ(đ€) =1. Let đ§ â 0, 1đ+đ be defined as follows: for đ â [đ], đ§đ = đ€đ andfor đ â đ, đ + 1, ⊠, đ + đ đ§đ equals the value that is assigned inthe (đ â đ)-th line of đ when executed on đ€. Then by constructionđ§ satisfies all of the constraints of Κ (including the constraint thatđ§ââ = NAND(0, 0) = 1 since đ(đ€) = 1.)
Part II: Soundness. Suppose that there exists đ§ â 0, 1đ+đ satisfy-ing Κ. Soundness will follow by showing that đ(đ§0, ⊠, đ§đâ1) = 1 (andhence in particular there exists đ€ â 0, 1đ, namely đ€ = đ§0 ⯠đ§đâ1,such that đ(đ€) = 1). To do this we will prove the following claim(â): for every â â [đ], đ§â+đ equals the value assigned in the â-th stepof the execution of the program đ on đ§0, ⊠, đ§đâ1. Note that because đ§satisfies the constraints of Κ, (â) is sufficient to prove the soundnesscondition since these constraints imply that the last value assigned tothe variable y_0 in the execution of đ on đ§0 ⯠đ€đâ1 is equal to 1. Toprove (â) suppose, towards a contradiction, that it is false, and let â be
np, np completeness, and the cook-levin theorem 491
Figure 15.7: A 3NAND instance that is obtained bytaking a NAND-TM program for computing theAND function, unrolling it to obtain a NANDSATinstance, and then composing it with the reduction ofLemma 15.9.
the smallest number such that đ§â+đ is not equal to the value assignedin the â-th step of the execution of đ on đ§0, ⊠, đ§đâ1. But since đ§ sat-isfies the constraints of Κ, we get that đ§â+đ = NAND(đ§đ, đ§đ) where(by the assumption above that â is smallest with this property) thesevalues do correspond to the values last assigned to the variables on therighthand side of the assignment operator in the â-th line of the pro-gram. But this means that the value assigned in the â-th step is indeedsimply the NAND of đ§đ and đ§đ, contradicting our assumption on thechoice of â.
15.5 FROM 3NAND TO 3SATThe final step in the proof of Theorem 15.6 is the following:Lemma 15.10 3NAND â€đ 3SAT.Proof Idea:
To prove Lemma 15.10 we need to map a 3NAND formula đ intoa 3SAT formula đ such that đ is satisfiable if and only if đ is. Theidea is that we can transform every NAND constraint of the formđ = NAND(đ, đ) into the AND of ORs involving the variables đ, đ, đand their negations, where each of the ORs contains at most threeterms. The construction is fairly straightforward, and the details aregiven below.â
PIt is a good exercise for you to try to find a 3CNF for-mula đ on three variables đ, đ, đ such that đ(đ, đ, đ) istrue if and only if đ = NAND(đ, đ). Once you do so, tryto see why this implies a reduction from 3NAND to3SAT, and hence completes the proof of Lemma 15.10
Figure 15.8: Code and example output for the reduc-tion given in Lemma 15.10 of 3NAND to 3SAT.
492 introduction to theoretical computer science
1 The resulting formula will have some of the ORâsinvolving only two variables. If we wanted to insist oneach formula involving three distinct variables we canalways add a âdummy variableâ đ§đ+đ and include itin all the ORâs involving only two variables, and add aconstraint requiring this dummy variable to be zero.
Proof of Lemma 15.10. The constraintđ§đ = NAND(đ§đ, đ§đ) (15.5)is satisfied if đ§đ = 1 whenever (đ§đ, đ§đ) â (1, 1). By going through allcases, we can verify that (15.5) is equivalent to the constraint(đ§đ âš đ§đ âš đ§đ) ⧠(đ§đ âš đ§đ) ⧠(đ§đ âš đ§đ) . (15.6)
Indeed if đ§đ = đ§đ = 1 then the first constraint of Eq. (15.6) is onlytrue if đ§đ = 0. On the other hand, if either of đ§đ or đ§đ equals 0 then un-less đ§đ = 1 either the second or third constraints will fail. This meansthat, given any 3NAND formula đ over đ variables đ§0, ⊠, đ§đâ1, we canobtain a 3SAT formula đ over the same variables by replacing every3NAND constraint of đ with three 3OR constraints as in Eq. (15.6).1Because of the equivalence of (15.5) and (15.6), the formula đ sat-isfies that đ(đ§0, ⊠, đ§đâ1) = đ(đ§0, ⊠, đ§đâ1) for every assignmentđ§0, ⊠, đ§đâ1 â 0, 1đ to the variables. In particular đ is satisfiable ifand only if đ is, thus completing the proof.
Figure 15.9: An instance of the independent set problemobtained by applying the reductions NANDSAT â€đ3NAND â€đ 3SAT â€đ ISAT starting with the xor5NAND-CIRC program.
15.6 WRAPPING UP
We have shown that for every function đč in NP, đč â€đ NANDSAT â€đ3NAND â€đ 3SAT, and so 3SAT is NP-hard. Since in Chapter 14 wesaw that 3SAT â€đ QUADEQ, 3SAT â€đ ISET, 3SAT â€đ MAXCUTand 3SAT â€đ LONGPATH, all these problems are NP-hard as well.Finally, since all the aforementioned problems are in NP, they areall in fact NP-complete and have equivalent complexity. There arethousands of other natural problems that are NP-complete as well.Finding a polynomial-time algorithm for any one of them will imply apolynomial-time algorithm for all of them.
np, np completeness, and the cook-levin theorem 493
Figure 15.10: We believe that P â NP and all NPcomplete problems lie outside of P, but we cannotrule out the possiblity that P = NP. However, wecan rule out the possiblity that some NP-completeproblems are in P and other do not, since we knowthat if even one NP-complete problem is in P thenP = NP. The relation between P/poly and NP isnot known though it can be shown that if one NP-complete problem is in P/poly then NP â P/poly.
2 Hint: Use the function đč that on input a formula đand a string of the form 1đĄ, outputs 1 if and only if đis satisfiable and đĄ = |đ|log |đ|.3 Hint: Prove and then use the fact that P is closedunder complement.
Chapter Recap
âą Many of the problems for which we donât knowpolynomial-time algorithms are NP-complete,which means that finding a polynomial-time algo-rithm for one of them would imply a polynomial-time algorithm for all of them.
âą It is conjectured that NP â P which means thatwe believe that polynomial-time algorithms forthese problems are not merely unknown but arenonexistent.
âą While an NP-hardness result means for examplethat a full-fledged âtextbookâ solution to a problemsuch as MAX-CUT that is as clean and general asthe algorithm for MIN-CUT probably does notexist, it does not mean that we need to give upwhenever we see a MAX-CUT instance. Later inthis course we will discuss several strategies to dealwith NP-hardness, including average-case complexityand approximation algorithms.
15.7 EXERCISES
Exercise 15.1 â Poor manâs Ladnerâs Theorem. Prove that if there is nođđ(log2 đ) time algorithm for 3SAT then there is some đč â NP suchthat đč â P and đč is not NP complete.2
Exercise 15.2 â NP â co â NP â NP â P. Let 3SAT be the functionthat on input a 3CNF formula đ return 1 â 3SAT(đ). Prove that if3SAT â NP then P â NP. See footnote for hint.3
Exercise 15.3 Define WSAT to be the following function: the input is aCNF formula đ where each clause is the OR of one to three variables(without negations), and a number đ â â. For example, the followingformula can be used for a valid input to WSAT: đ = (đ„5 âš đ„2 âš đ„1) â§(đ„1 âš đ„3 âš đ„0) ⧠(đ„2 âš đ„4 âš đ„0). The output WSAT(đ, đ) = 1 if andonly if there exists a satisfying assignment to đ in which exactly đof the variables get the value 1. For example for the formula above
494 introduction to theoretical computer science
WSAT(đ, 2) = 1 since the assignment (1, 1, 0, 0, 0, 0) satisfies all theclauses. However WSAT(đ, 1) = 0 since there is no single variableappearing in all clauses.
Prove that WSAT is NP-complete.
Exercise 15.4 In the employee recruiting problem we are given a list ofpotential employees, each of which has some subset of đ potentialskills, and a number đ. We need to assemble a team of đ employeessuch that for every skill there would be one member of the team withthis skill.
For example, if Alice has the skills âC programmingâ, âNANDprogrammingâ and âSolving Differential Equationsâ, Bob has theskills âC programmingâ and âSolving Differential Equationsâ, andCharlie has the skills âNAND programmingâ and âCoffee Brewingâ,then if we want a team of two people that covers all the four skills, wewould hire Alice and Charlie.
Define the function EMP s.t. on input the skills đż of all potentialemployees (in the form of a sequence đż of đ lists đż1, ⊠, đżđ, eachcontaining distinct numbers between 0 and đ), and a number đ,EMP(đż, đ) = 1 if and only if there is a subset đ of đ potential em-ployees such that for every skill đ in [đ], there is an employee in đ thathas the skill đ.
Prove that EMP is NP complete.
Exercise 15.5 â Balanced max cut. Prove that the âbalanced variantâ ofthe maximum cut problem is NP-complete, where this is defined asBMC ⶠ0, 1â â 0, 1 where for every graph đș = (đ , đž) and đ â â,BMC(đș, đ) = 1 if and only if there exists a cut đ in đș cutting at least đedges such that |đ| = |đ |/2.
Exercise 15.6 â Regular expression intersection. Let MANYREGS be the fol-lowing function: On input a list of regular expressions đđ„đ0, ⊠, expđ(represented as strings in some standard way), output 1 if and only ifthere is a single string đ„ â 0, 1â that matches all of them. Prove thatMANYREGS is NP-hard.
15.8 BIBLIOGRAPHICAL NOTES
Aaronsonâs 120 page survey [Aar16] is a beautiful and extensive ex-position to the P vs NP problem, its importance and status. See alsoas well as Chapter 3 in Wigdersonâs excellent book [Wig19]. Johnson[Joh12] gives a survey of the historical development of the theory ofNP completeness. The following web page keeps a catalog of failed
np, np completeness, and the cook-levin theorem 495
attempts at settling P vs NP. At the time of this writing, it lists about110 papers claiming to resolve the question, of which about 60 claim toprove that P = NP and about 50 claim to prove that P â NP.
Eugene Lawlerâs quote on the âmystical power of twonessâ wastaken from the wonderful book âThe Nature of Computationâ byMoore and Mertens. See also this memorial essay on Lawler byLenstra.
1 Paul ErdĆs (1913-1996) was one of the most prolificmathematicians of all times. Though he was an athe-ist, ErdĆs often referred to âThe Bookâ in which Godkeeps the most elegant proof of each mathematicaltheorem.
2 The đ-th Ramsey number, denoted as đ (đ, đ), is thesmallest number đ such that for every graph đș on đvertices, both đș and its complement contain a đ-sizedindependent set. If P = NP then we can computeđ (đ, đ) in time polynomial in 2đ, while otherwise itcan potentially take closer to 222đ steps.
16What if P equals NP?
âYou donât have to believe in God, but you should believe in The Book.â, PaulErdĆs, 1985.1
âNo more half measures, Walterâ, Mike Ehrmantraut in âBreaking Badâ,2010.
âThe evidence in favor of [P â NP] and [ its algebraic counterpart ] is sooverwhelming, and the consequences of their failure are so grotesque, that theirstatus may perhaps be compared to that of physical laws rather than that ofordinary mathematical conjectures.â, Volker Strassen, laudation for LeslieValiant, 1986.
âSuppose aliens invade the earth and threaten to obliterate it in a yearâs timeunless human beings can find the [fifth Ramsey number]. We could marshalthe worldâs best minds and fastest computers, and within a year we could prob-ably calculate the value. If the aliens demanded the [sixth Ramsey number],however, we would have no choice but to launch a preemptive attack.â, PaulErdĆs, as quoted by Graham and Spencer, 1990.2
We have mentioned that the question of whether P = NP, whichis equivalent to whether there is a polynomial-time algorithm for3SAT, is the great open question of Computer Science. But why is it soimportant? In this chapter, we will try to figure out the implications ofsuch an algorithm.
First, let us get one qualm out of the way. Sometimes people say,âWhat if P = NP but the best algorithm for 3SAT takes đ1000 time?â Well,đ1000 is much larger than, say, 20.001âđ for any input smaller than 250,as large as a harddrive as you will encounter, and so another way tophrase this question is to say âwhat if the complexity of 3SAT is ex-ponential for all inputs that we will ever encounter, but then growsmuch smaller than that?â To me this sounds like the computer scienceequivalent of asking, âwhat if the laws of physics change completelyonce they are out of the range of our telescopes?â. Sure, this is a validpossibility, but wondering about it does not sound like the most pro-ductive use of our time.
Compiled on 8.26.2020 18:10
Learning Objectives:âą Explore the consequences of P = NPâą Search-to-decision reduction: transform
algorithms that solve decision version tosearch version for NP-complete problems.
âą Optimization and learning problemsâą Quantifier elimination and solving problems
in the polynomial hierarchy.âą What is the evidence for P = NP vs P â NP?
498 introduction to theoretical computer science
So, as the saying goes, weâll keep an open mind, but not so openthat our brains fall out, and assume from now on that:
âą There is a mathematical god,
and
âą She does not âbeat around the bushâ â or take âhalf measuresâ.
What we mean by this is that we will consider two extreme scenar-ios:
âą 3SAT is very easy: 3SAT has an đ(đ) or đ(đ2) time algorithm witha not too huge constant (say smaller than 106.)
âą 3SAT is very hard: 3SAT is exponentially hard and cannot besolved faster than 2đđ for some not too tiny đ > 0 (say at least10â6). We can even make the stronger assumption that for everysufficiently large đ, the restriction of 3SAT to inputs of length đcannot be computed by a circuit of fewer than 2đđ gates.
At the time of writing, the fastest known algorithm for 3SAT re-quires more than 20.35đ to solve đ variable formulas, while we do noteven know how to rule out the possibility that we can compute 3SATusing 10đ gates. To put it in perspective, for the case đ = 1000 ourlower and upper bounds for the computational costs are apart bya factor of about 10100. As far as we know, it could be the case that1000-variable 3SAT can be solved in a millisecond on a first-generationiPhone, and it can also be the case that such instances require morethan the age of the universe to solve on the worldâs fastest supercom-puter.
So far, most of our evidence points to the latter possibility of 3SATbeing exponentially hard, but we have not ruled out the former possi-bility either. In this chapter we will explore some of the consequencesof the â3SAT easyâ scenario.
16.1 SEARCH-TO-DECISION REDUCTION
A priori, having a fast algorithm for 3SAT might not seem so impres-sive. Sure, such an algorithm allows us to decide the satisfiability ofnot just 3CNF formulas but also of quadratic equations, as well as findout whether there is a long path in a graph, and solve many other de-cision problems. But this is not typically what we want to do. Itâs notenough to know if a formula is satisfiable: we want to discover theactual satisfying assignment. Similarly, itâs not enough to find out if agraph has a long path: we want to actually find the path.
It turns out that if we can solve these decision problems, we cansolve the corresponding search problems as well:
what if p equals np? 499
Theorem 16.1 â Search vs Decision. Suppose that P = NP. Thenfor every polynomial-time algorithm đ and đ, đ â â,there is apolynomial-time algorithm FINDđ such that for every đ„ â 0, 1đ,if there exists đŠ â 0, 1đđđ satisfying đ (đ„đŠ) = 1, then FINDđ (đ„)finds some string đŠâČ satisfying this condition.
PTo understand what the statement of Theo-rem 16.1 means, let us look at the special case ofthe MAXCUT problem. It is not hard to see that thereis a polynomial-time algorithm VERIFYCUT such thatVERIFYCUT(đș, đ, đ) = 1 if and only if đ is a subsetof đșâs vertices that cuts at least đ edges. Theorem 16.1implies that if P = NP then there is a polynomial-timealgorithm FINDCUT that on input đș, đ outputs a setđ such that VERIFYCUT(đș, đ, đ) = 1 if such a setexists. This means that if P = NP, by trying all valuesof đ we can find in polynomial time a maximum cutin any given graph. We can use a similar argument toshow that if P = NP then we can find a satisfying as-signment for every satisfiable 3CNF formula, find thelongest path in a graph, solve integer programming,and so and so forth.
Proof Idea:
The idea behind the proof of Theorem 16.1 is simple; let usdemonstrate it for the special case of 3SAT. (In fact, this case is notso âspecialââ since 3SAT is NP-complete, we can reduce the task ofsolving the search problem for MAXCUT or any other problem inNP to the task of solving it for 3SAT.) Suppose that P = NP and weare given a satisfiable 3CNF formula đ, and we now want to find asatisfying assignment đŠ for đ. Define 3SAT0(đ) to output 1 if there isa satisfying assignment đŠ for đ such that its first bit is 0, and similarlydefine 3SAT1(đ) = 1 if there is a satisfying assignment đŠ with đŠ0 = 1.The key observation is that both 3SAT0 and 3SAT1 are in NP, and so ifP = NP then we can compute them in polynomial time as well. Thuswe can use this to find the first bit of the satisfying assignment. Wecan continue in this way to recover all the bits.âProof of Theorem 16.1. Let đ be some polynomial time algorithm andđ, đ â â some constants. Define the function STARTSWITHđ asfollows: For every đ„ â 0, 1â and đ§ â 0, 1â, STARTSWITHđ (đ„, đ§) =1 if and only if there exists some đŠ â 0, 1đđđâ|đ§| (where đ = |đ„|) suchthat đ (đ„đ§đŠ) = 1. That is, STARTSWITHđ (đ„, đ§) outputs 1 if there issome string đ€ of length đ|đ„|đ such that đ (đ„, đ€) = 1 and the first |đ§|
500 introduction to theoretical computer science
bits of đ€ are đ§0, ⊠, đ§ââ1. Since, given đ„, đŠ, đ§ as above, we can check inpolynomial time if đ (đ„đ§đŠ) = 1, the function STARTSWITHđ is in NPand hence if P = NP we can compute it in polynomial time.
Now for every such polynomial-time đ and đ, đ â â, we can imple-ment FINDđ (đ„) as follows:
Algorithm 16.2 â đčđŒđđ·đ : Search to decision reduction.
Input: đ„ â 0, 1đOutput: đ§ â 0, 1đđđ s.t. đ (đ„đ§) = 1, if such đ§ exists. Other-
wise output the empty string.1: Initially đ§0 = đ§1 = ⯠= đ§đđđâ1 = 0.2: for â = 0, ⊠, đđđ â 1 do3: Let đ0 â đđ đŽđ đ đđđŒđ đ»đ (đ„đ§0 ⯠đ§ââ10).4: Let đ1 â đđ đŽđ đ đđđŒđ đ»đ (đ„đ§0 ⯠đ§ââ11).5: if đ0 = đ1 = 0 then6: return ââ7: # Canât extend đ„đ§0 ⊠đ§ââ1 to input đ accepts8: end if9: if đ0 = 1 then
10: đ§â â 011: # Can extend đ„đ§0 ⊠đ„ââ1 with 0 to accepting input12: else13: đ§â â 114: # Can extend đ„đ§0 ⊠đ„ââ1 with 1 to accepting input15: end if16: end for17: return đ§0, ⊠, đ§đđđâ1
To analyze Algorithm 16.2, note that it makes 2đđđ invocations toSTARTSWITHđ and hence if the latter is polynomial-time, then so isAlgorithm 16.2 Now suppose that đ„ is such that there exists some đŠsatisfying đ (đ„đŠ) = 1. We claim that at every step â = 0, ⊠, đđđ â 1, wemaintain the invariant that there exists đŠ â 0, 1đđđ whose first â bitsare đ§ s.t. đ (đ„đŠ) = 1. Note that this claim implies the theorem, since inparticular it means that for â = đđđ â 1, đ§ satisfies đ (đ„đ§) = 1.
We prove the claim by induction. For â = 0, this holds vacuously.Now for every â > 0, if the call STARTSWITHđ (đ„đ§0 ⯠đ§ââ10)returns 1, then we are guaranteed the invariant by definition ofSTARTSWITHđ . Now under our inductive hypothesis, there isđŠâ, ⊠, đŠđđđâ1 such that đ(đ„đ§0, ⊠, đ§ââ1đŠâ, ⊠, đŠđđđâ1) = 1. If the call toSTARTSWITHđ (đ„đ§0 ⯠đ§ââ10) returns 0 then it must be the case thatđŠâ = 1, and hence when we set đ§â = 1 we maintain the invariant.
what if p equals np? 501
16.2 OPTIMIZATION
Theorem 16.1 allows us to find solutions for NP problems if P = NP,but it is not immediately clear that we can find the optimal solution.For example, suppose that P = NP, and you are given a graph đș. Canyou find the longest simple path in đș in polynomial time?
PThis is actually an excellent question for you to at-tempt on your own. That is, assuming P = NP, givea polynomial-time algorithm that on input a graph đș,outputs a maximally long simple path in the graph đș.
The answer is Yes. The idea is simple: if P = NP then we can findout in polynomial time if an đ-vertex graph đș contains a simple pathof length đ, and moreover, by Theorem 16.1, if đș does contain such apath, then we can find it. (Can you see why?) If đș does not contain asimple path of length đ, then we will check if it contains a simple pathof length đ â 1, and continue in this way to find the largest đ such thatđș contains a simple path of length đ.
The above reasoning was not specifically tailored to finding pathsin graphs. In fact, it can be vastly generalized to proving the followingresult:
Theorem 16.3 â Optimization from P = NP. Suppose that P = NP. Thenfor every polynomial-time computable function đ ⶠ0, 1â â â(identifying đ(đ„) with natural numbers via the binary representa-tion) there is a polynomial-time algorithm OPT such that on inputđ„ â 0, 1â,
OPT(đ„, 1đ) = maxđŠâ0,1đ đ(đ„, đŠ) . (16.1)
Moreover under the same assumption, there is a polynomial-time algorithm FINDOPT such that for every đ„ â 0, 1â, FINDOPT(đ„, 1đ)outputs đŠâ â 0, 1â such that đ(đ„, đŠâ) = maxđŠâ0,1đ đ(đ„, đŠ).
PThe statement of Theorem 16.3 is a bit cumbersome.To understand it, think how it would subsume theexample above of a polynomial time algorithm forfinding the maximum length path in a graph. Inthis case the function đ would be the map that oninput a pair đ„, đŠ outputs 0 if the pair (đ„, đŠ) does notrepresent some graph and a simple path inside thegraph respectively; otherwise đ(đ„, đŠ) would equalthe length of the path đŠ in the graph đ„. Since a pathin an đ vertex graph can be represented by at most
502 introduction to theoretical computer science
đ logđ bits, for every đ„ representing a graph of đ ver-tices, finding maxđŠâ0,1đ logđ đ(đ„, đŠ) corresponds tofinding the length of the maximum simple path in thegraph corresponding to đ„, and finding the string đŠâthat achieves this maximum corresponds to actuallyfinding the path.
Proof Idea:
The proof follows by generalizing our ideas from the longest pathexample above. Let đ be as in the theorem statement. If P = NP thenfor every for every string đ„ â 0, 1â and number đ, we can test inin đđđđŠ(|đ„|, đ) time whether there exists đŠ such that đ(đ„, đŠ) â„ đ, orin other words test whether maxđŠâ0,1đ đ(đ„, đŠ) â„ đ. If đ(đ„, đŠ) is aninteger between 0 and đđđđŠ(|đ„| + |đŠ|) (as is the case in the example oflongest path) then we can just try out all possibilities for đ to find themaximum number đ for which maxđŠ đ(đ„, đŠ) â„ đ. Otherwise, we canuse binary search to hone down on the right value. Once we do so, wecan use search-to-decision to actually find the string đŠâ that achievesthe maximum.âProof of Theorem 16.3. For every đ as in the theorem statement, we candefine the Boolean function đč ⶠ0, 1â â 0, 1 as follows.
đč(đ„, 1đ, đ) = â§âšâ©1 âđŠâ0,1đđ(đ„, đŠ) â„ đ0 otherwise(16.2)
Since đ is computable in polynomial time, đč is in NP, and so underour assumption that P = NP, đč itself can be computed in polynomialtime. Now, for every đ„ and đ, we can compute the largest đ such thatđč(đ„, 1đ, đ) = 1 by a binary search. Specifically, we will do this asfollows:
1. We maintain two numbers đ, đ such that we are guaranteed thatđ †maxđŠâ0,1đ đ(đ„, đŠ) < đ.2. Initially we set đ = 0 and đ = 2đ (đ) where đ (đ) is the running time
of đ . (A function with đ (đ) running time canât output more thanđ (đ) bits and so canât output a number larger than 2đ (đ).)3. At each point in time, we compute the midpoint đ = â(đ + đ)/2â)
and let đŠ = đč(1đ, đ).a. If đŠ = 1 then we set đ = đ and leave đ as it is.b. If đŠ = 0 then we set đ = đ and leave đ as it is.
4. We then go back to step 3, until đ †đ + 1.
what if p equals np? 503
Since |đ â đ| shrinks by a factor of 2, within log2 2đ (đ) = đ (đ)steps, we will get to the point at which đ †đ + 1, and then we cansimply output đ. Once we find the maximum value of đ such thatđč(đ„, 1đ, đ) = 1, we can use the search to decision reduction of Theo-rem 16.1 to obtain the actual value đŠâ â 0, 1đ such that đ(đ„, đŠâ) = đ.
Example 16.4 â Integer programming. One application for Theo-rem 16.3 is in solving optimization problems. For example, the taskof linear programming is to find đŠ â âđ that maximizes some linearobjective âđâ1đ=0 đđđŠđ subject to the constraint that đŠ satisfies linearinequalities of the form âđâ1đ=0 đđđŠđ †đ. As we discussed in Sec-tion 12.1.3, there is a known polynomial-time algorithm for linearprogramming. However, if we want to place additional constraintson đŠ, such as requiring the coordinates of đŠ to be integer or 0/1valued then the best-known algorithms run in exponential time inthe worst case. However, if P = NP then Theorem 16.3 tells usthat we would be able to solve all problems of this form in poly-nomial time. For every string đ„ that describes a set of constraintsand objective, we will define a function đ such that if đŠ satisfiesthe constraints of đ„ then đ(đ„, đŠ) is the value of the objective, andotherwise we set đ(đ„, đŠ) = âđ where đ is some large number. Wecan then use Theorem 16.3 to compute the đŠ that maximizes đ(đ„, đŠ)and that will give us the assignment for the variables that satisfiesour constraints and maximizes the objective. (If the computationresults in đŠ such that đ(đ„, đŠ) = âđ then we can double đ and tryagain; if the true maximum objective is achieved by some stringđŠâ, then eventually đ will be large enough so that âđ would besmaller than the objective achieved by đŠâ, and hence when we runprocedure of Theorem 16.3 we would get a value larger than âđ .)
RRemark 16.5 â Need for binary search.. In many exam-ples, such as the case of finding longest path, we donâtneed to use the binary search step in Theorem 16.3,and can simply enumerate over all possible values forđ until we find the correct one. One example wherewe do need to use this binary search step is in the caseof the problem of finding a maximum length path ina weighted graph. This is the problem where đș is aweighted graph, and every edge of đș is given a weightwhich is a number between 0 and 2đ. Theorem 16.3shows that we can find the maximum-weight simplepath in đș (i.e., simple path maximizing the sum of
504 introduction to theoretical computer science
3 This is often known as Empirical Risk Minimization.
the weights of its edges) in time polynomial in thenumber of vertices and in đ.Beyond just this example there is a vast field of math-ematical optimization that studies problems of thesame form as in Theorem 16.3. In the context of opti-mization, đ„ typically denotes a set of constraints oversome variables (that can be Boolean, integer, or realvalued), đŠ encodes an assignment to these variables,and đ(đ„, đŠ) is the value of some objective function thatwe want to maximize. Given that we donât knowefficient algorithms for NP complete problems, re-searchers in optimization research study special casesof functions đ (such as linear programming andsemidefinite programming) where it is possible tooptimize the value efficiently. Optimization is widelyused in a great many scientific areas including: ma-chine learning, engineering, economics and operationsresearch.
16.2.1 Example: Supervised learningOne classical optimization task is supervised learning. In supervisedlearning we are given a list of examples đ„0, đ„1, ⊠, đ„đâ1 (where wecan think of each đ„đ as a string in 0, 1đ for some đ) and the la-bels for them đŠ0, ⊠, đŠđâ1 (which we will think of simply bits, i.e.,đŠđ â 0, 1). For example, we can think of the đ„đâs as images of ei-ther dogs or cats, for which đŠđ = 1 in the former case and đŠđ = 0in the latter case. Our goal is to come up with a hypothesis or predic-tor â ⶠ0, 1đ â 0, 1 such that if we are given a new example đ„that has an (unknown to us) label đŠ, then with high probability âwill predict the label. That is, with high probability it will hold thatâ(đ„) = đŠ. The idea in supervised learning is to use the Occamâs Ra-zor principle: the simplest hypothesis that explains the data is likelyto be correct. There are several ways to model this, but one popularapproach is to pick some fairly simple function đ» ⶠ0, 1đ+đ â 0, 1.We think of the first đ inputs as the parameters and the last đ inputsas the example data. (For example, we can think of the first đ inputsof đ» as specifying the weights and connections for some neural net-work that will then be applied on the latter đ inputs.) We can thenphrase the supervised learning problem as finding, given a set of la-beled examples đ = (đ„0, đŠ0), ⊠, (đ„đâ1, đŠđâ1), the set of parametersđ0, ⊠, đđâ1 â 0, 1 that minimizes the number of errors made by thepredictor đ„ ⊠đ»(đ, đ„).3
In other words, we can define for every set đ as above the functionđčđ ⶠ0, 1đ â [đ] such that đčđ(đ) = â(đ„,đŠ)âđ |đ»(đ, đ„) â đŠ|. Now,finding the value đ that minimizes đčđ(đ) is equivalent to solving thesupervised learning problem with respect to đ» . For every polynomial-
what if p equals np? 505
time computable đ» ⶠ0, 1đ+đ â 0, 1, the task of minimizingđčđ(đ) can be âmassagedâ to fit the form of Theorem 16.3 and hence ifP = NP, then we can solve the supervised learning problem in greatgenerality. In fact, this observation extends to essentially any learn-ing model, and allows for finding the optimal predictors given theminimum number of examples. (This is in contrast to many currentlearning algorithms, which often rely on having access to an extremelylarge number of examplesâ far beyond the minimum needed, andin particular far beyond the number of examples humans use for thesame tasks.)
16.2.2 Example: Breaking cryptosystemsWe will discuss cryptography later in this course, but it turns out thatif P = NP then almost every cryptosystem can be efficiently bro-ken. One approach is to treat finding an encryption key as an in-stance of a supervised learning problem. If there is an encryptionscheme that maps a âplaintextâ message đ and a key đ to a âcipher-textâ đ, then given examples of ciphertext/plaintext pairs of theform (đ0, đ0), ⊠, (đđâ1, đđâ1), our goal is to find the key đ such thatđž(đ, đđ) = đđ where đž is the encryption algorithm. While you mightthink getting such âlabeled examplesâ is unrealistic, it turns out (asmany amateur home-brew crypto designers learn the hard way) thatthis is actually quite common in real-life scenarios, and that it is alsopossible to relax the assumption to having more minimal prior infor-mation about the plaintext (e.g., that it is English text). We defer amore formal treatment to Chapter 21.
16.3 FINDING MATHEMATICAL PROOFS
In the context of Gödelâs Theorem, we discussed the notion of a proofsystem (see Section 11.1). Generally speaking, a proof system can bethought of as an algorithm đ ⶠ0, 1â â 0, 1 (known as the verifier)such that given a statement đ„ â 0, 1â and a candidate proof đ€ â 0, 1â,đ (đ„, đ€) = 1 if and only if đ€ encodes a valid proof for the statement đ„.Any type of proof system that is used in mathematics for geometry,number theory, analysis, etc., is an instance of this form. In fact, stan-dard mathematical proof systems have an even simpler form wherethe proof đ€ encodes a sequence of lines đ€0, ⊠, đ€đ (each of which isitself a binary string) such that each line đ€đ is either an axiom or fol-lows from some prior lines through an application of some inferencerule. For example, Peanoâs axioms encode a set of axioms and rulesfor the natural numbers, and one can use them to formalize proofsin number theory. Also, there are some even stronger axiomatic sys-tems, the most popular one being ZermeloâFraenkel with the Axiomof Choice or ZFC for short. Thus, although mathematicians typically
506 introduction to theoretical computer science
4 The undecidability of Entscheidungsproblem refersto the uncomputability of the function that maps astatement in first order logic to 1 if and only if thatstatement has a proof.
write their papers in natural language, proofs of number theoristscan typically be translated to ZFC or similar systems, and so in par-ticular the existence of an đ-page proof for a statement đ„ implies thatthere exists a string đ€ of length đđđđŠ(đ) (in fact often đ(đ) or đ(đ2))that encodes the proof in such a system. Moreover, because verify-ing a proof simply involves going over each line and checking that itdoes indeed follow from the prior lines, it is fairly easy to do that inđ(|đ€|) or đ(|đ€|2) (where as usual |đ€| denotes the length of the proofđ€). This means that for every reasonable proof system đ , the follow-ing function SHORTPROOFđ ⶠ0, 1â â 0, 1 is in NP, wherefor every input of the form đ„1đ, SHORTPROOFđ (đ„, 1đ) = 1 if andonly if there exists đ€ â 0, 1â with |đ€| †đ s.t. đ (đ„đ€) = 1. Thatis, SHORTPROOFđ (đ„, 1đ) = 1 if there is a proof (in the system đ )of length at most đ bits that đ„ is true. Thus, if P = NP, then despiteGödelâs Incompleteness Theorems, we can still automate mathematicsin the sense of finding proofs that are not too long for every statementthat has one. (Frankly speaking, if the shortest proof for some state-ment requires a terabyte, then human mathematicians wonât ever findthis proof either.) For this reason, Gödel himself felt that the questionof whether SHORTPROOFđ has a polynomial time algorithm is ofgreat interest. As Gödel wrote in a letter to John von Neumann in 1956(before the concept of NP or even âpolynomial timeâ was formallydefined):
One can obviously easily construct a Turing machine, which for everyformula đč in first order predicate logic and every natural number đ, al-lows one to decide if there is a proof of đč of length đ (length = numberof symbols). Let đ(đč , đ) be the number of steps the machine requiresfor this and let đ(đ) = maxđč đ(đč , đ). The question is how fast đ(đ)grows for an optimal machine. One can show that đ â„ đ â đ [for someconstant đ > 0]. If there really were a machine with đ(đ) ⌠đ â đ (oreven ⌠đ â đ2), this would have consequences of the greatest importance.Namely, it would obviously mean that in spite of the undecidabilityof the Entscheidungsproblem,4 the mental work of a mathematicianconcerning Yes-or-No questions could be completely replaced by a ma-chine. After all, one would simply have to choose the natural numberđ so large that when the machine does not deliver a result, it makes nosense to think more about the problem.
For many reasonable proof systems (including the one that Gödelreferred to), SHORTPROOFđ is in fact NP-complete, and so Gödel canbe thought of as the first person to formulate the P vs NP question.Unfortunately, the letter was only discovered in 1988.
what if p equals np? 507
5 Since NAND-CIRC programs are equivalent toBoolean circuits, the search problem corresponding to(16.6) known as the circuit minimization problem andis widely studied in Engineering. You can skip aheadto Section 16.4.1 to see a particularly compellingapplication of this.
16.4 QUANTIFIER ELIMINATION (ADVANCED)
If P = NP then we can solve all NP search and optimization problems inpolynomial time. But can we do more? It turns out that the answer isthat Yes we can!
An NP decision problem can be thought of as the task of deciding,given some string đ„ â 0, 1â the truth of a statement of the formâđŠâ0,1đ(|đ„|)đ (đ„đŠ) = 1 (16.3)
for some polynomial-time algorithm đ and polynomial đ ⶠâ â â.That is, we are trying to determine, given some string đ„, whetherthere exists a string đŠ such that đ„ and đŠ satisfy some polynomial-timecheckable condition đ . For example, in the independent set problem,the string đ„ represents a graph đș and a number đ, the string đŠ repre-sents some subset đ of đșâs vertices, and the condition that we check iswhether |đ| â„ đ and there is no edge đą, đŁ in đș such that both đą â đand đŁ â đ.
We can consider more general statements such as checking, given astring đ„ â 0, 1â, the truth of a statement of the formâđŠâ0,1đ0(|đ„|)âđ§â0,1đ1(|đ„|)đ (đ„đŠđ§) = 1 , (16.4)
which in words corresponds to checking, given some string đ„, whetherthere exists a string đŠ such that for every string đ§, the triple (đ„, đŠ, đ§) sat-isfy some polynomial-time checkable condition. We can also considermore levels of quantifiers such as checking the truth of the statementâđŠâ0,1đ0(|đ„|)âđ§â0,1đ1(|đ„|)âđ€â0,1đ2(|đ„|)đ (đ„đŠđ§đ€) = 1 (16.5)
and so on and so forth.For example, given an đ-input NAND-CIRC program đ , we might
want to find the smallest NAND-CIRC program đ âČ that computes thesame function as đ . The question of whether there is such a đ âČ thatcan be described by a string of at most đ bits can be phrased asâđ âČâ0,1đ âđ„â0,1đđ(đ„) = đ âČ(đ„) (16.6)which has the form (16.4).5 Another example of a statement involvingđ levels of quantifiers would be to check, given a chess position đ„,whether there is a strategy that guarantees that White wins within đsteps. For example is đ = 3 we would want to check if given the boardposition đ„, there exists a move đŠ for White such that for every move đ§ forBlack there exists a move đ€ for White that ends in a a checkmate.
It turns out that if P = NP then we can solve these kinds of prob-lems as well:
508 introduction to theoretical computer science
6 For the ease of notation, we assume that all thestrings we quantify over have the same length đ =đ(đ), but using simple padding one can show thatthis captures the general case of strings of differentpolynomial lengths.
Theorem 16.6 â Polynomial hierarchy collapse. If P = NP then for everyđ â â, polynomial đ ⶠâ â â and polynomial-time algorithmđ , there is a polynomial-time algorithm SOLVEđ ,đ that on inputđ„ â 0, 1đ returns 1 if and only ifâđŠ0â0,1đâđŠ1â0,1đ ⯠đŹđŠđâ1â0,1đđ (đ„đŠ0đŠ1 ⯠đŠđâ1) = 1 (16.7)
where đ = đ(đ) and đŹ is either â or â depending on whether đ isodd or even, respectively. 6
Proof Idea:
To understand the idea behind the proof, consider the special casewhere we want to decide, given đ„ â 0, 1đ, whether for every đŠ â0, 1đ there exists đ§ â 0, 1đ such that đ (đ„đŠđ§) = 1. Consider thefunction đč such that đč(đ„đŠ) = 1 if there exists đ§ â 0, 1đ such thatđ (đ„đŠđ§) = 1. Since đ runs in polynomial-time đč â NP and hence ifP = NP, then there is an algorithm đ âČ that on input đ„, đŠ outputs 1 ifand only if there exists đ§ â 0, 1đ such that đ (đ„đŠđ§) = 1. Now wecan see that the original statement we consider is true if and only if forevery đŠ â 0, 1đ, đ âČ(đ„đŠ) = 1, which means it is false if and only ifthe following condition (â) holds: there exists some đŠ â 0, 1đ suchthat đ âČ(đ„đŠ) = 0. But for every đ„ â 0, 1đ, the question of whetherthe condition (â) is itself in NP (as we assumed đ âČ can be computedin polynomial time) and hence under the assumption that P = NPwe can determine in polynomial time whether the condition (â), andhence our original statement, is true.âProof of Theorem 16.6. We prove the theorem by induction. We assumethat there is a polynomial-time algorithm SOLVEđ ,đâ1 that can solvethe problem (16.7) for đ â 1 and use that to solve the problem for đ.For đ = 1, SOLVEđ ,đâ1(đ„) = 1 iff đ (đ„) = 1 which is a polynomial-timecomputation since đ runs in polynomial time. For every đ„, đŠ0, definethe statement đđ„,đŠ0 to be the following:
đđ„,đŠ0 = âđŠ1â0,1đâđŠ2â0,1đ ⯠đŹđŠđâ1â0,1đđ (đ„đŠ0đŠ1 ⯠đŠđâ1) = 1 (16.8)
By the definition of SOLVEđ ,đ, for every đ„ â 0, 1đ, our goal isthat SOLVEđ ,đ(đ„) = 1 if and only if there exists đŠ0 â 0, 1đ such thatđđ„,đŠ0 is true.
The negation of đđ„,đŠ0 is the statement
đđ„,đŠ0 = âđŠ1â0,1đâđŠ2â0,1đ ⯠đŹđŠđâ1â0,1đđ (đ„đŠ0đŠ1 ⯠đŠđâ1) = 0 (16.9)
what if p equals np? 509
7 We do not know whether such loss is inherent.As far as we can tell, itâs possible that the quantifiedboolean formula problem has a linear-time algorithm.We will, however, see later in this course that itsatisfies a notion known as PSPACE-hardness that iseven stronger than NP-hardness.
where đŹ is â if đŹ was â and đŹ is â otherwise. (Please stop and verifythat you understand why this is true, this is a generalization of the factthat if Κ is some logical condition then the negation of âđŠâđ§Îš(đŠ, đ§) isâđŠâđ§ÂŹÎš(đŠ, đ§).)
The crucial observation is that đđ„,đŠ0 is exactly a statement of theform we consider with đ â 1 quantifiers instead of đ, and hence byour inductive hypothesis there is some polynomial time algorithmđ that on input đ„đŠ0 outputs 1 if and only if đđ„,đŠ0 is true. If we let đbe the algorithm that on input đ„, đŠ0 outputs 1 â đ(đ„đŠ0) then we seethat đ outputs 1 if and only if đđ„,đŠ0 is true. Hence we can rephrase theoriginal statement (16.7) as follows:âđŠ0â0,1đđ(đ„đŠ0) = 1 (16.10)
but since đ is a polynomial-time algorithm, Eq. (16.10) is clearly astatement in NP and hence under our assumption that P = NP there isa polynomial time algorithm that on input đ„ â 0, 1đ, will determineif (16.10) is true and so also if the original statement (16.7) is true.
The algorithm of Theorem 16.6 can also solve the search problemas well: find the value đŠ0 that certifies the truth of (16.7). We notethat while this algorithm is in polynomial time, the exponent of thispolynomial blows up quite fast. If the original NANDSAT algorithmrequired Ω(đ2) time, solving đ levels of quantifiers would require timeΩ(đ2đ).716.4.1 Application: self improving algorithm for 3SATSuppose that we found a polynomial-time algorithm đŽ for 3SAT thatis âgood but not greatâ. For example, maybe our algorithm runs intime đđ2 for some not too small constant đ. However, itâs possiblethat the best possible SAT algorithm is actually much more efficientthan that. Perhaps, as we guessed before, there is a circuit đ¶â of atmost 106đ gates that computes 3SAT on đ variables, and we simplyhavenât discovered it yet. We can use Theorem 16.6 to âbootstrapâ ouroriginal âgood but not greatâ 3SAT algorithm to discover the optimalone. The idea is that we can phrase the question of whether thereexists a size đ circuit that computes 3SAT for all length đ inputs asfollows: there exists a size †đ circuit đ¶ such that for every formula đdescribed by a string of length at most đ, if đ¶(đ) = 1 then there existsan assignment đ„ to the variables of đ that satisfies it. One can see thatthis is a statement of the form (16.5) and hence if P = NP we can solveit in polynomial time as well. We can therefore imagine investing hugecomputational resources in running đŽ one time to discover the circuitđ¶â and then using đ¶â for all further computation.
510 introduction to theoretical computer science
16.5 APPROXIMATING COUNTING PROBLEMS AND POSTERIORSAMPLING (ADVANCED, OPTIONAL)
Given a Boolean circuit đ¶, if P = NP then we can find an input đ„ (ifone exists) such that đ¶(đ„) = 1. But what if there is more than one đ„like that? Clearly we canât efficiently output all such đ„âs; there mightbe exponentially many. But we can get an arbitrarily good multiplica-tive approximation (i.e., a 1±đ factor for arbitrarily small đ > 0) for thenumber of such đ„âs, as well as output a (nearly) uniform member ofthis set. The details are beyond the scope of this book, but this result isformally stated in the following theorem (whose proof is omitted).
Theorem 16.7 â Approximate counting if P = NP. Let đ ⶠ0, 1â â 0, 1be some polynomial-time algorithm, and suppose that P = NP.Then there exists an algorithm COUNTđ that on input đ„, 1đ, đ,runs in time polynomial in |đ„|, đ, 1/đ and outputs a number in[2đ + 1] satisfying(1âđ)COUNTđ (đ„, đ, đ) †âŁđŠ â 0, 1đ ⶠđ (đ„đŠ) = 1⣠†(1+đ)COUNTđ (đ„, đ, đ) .
(16.11)
In other words, the algorithm COUNTđ gives an approximationup to a factor of 1 ± đ for the number of witnesses for đ„ with respectto the verifying algorithm đ . Once again, to understand this theoremit can be useful to see how it implies that if P = NP then there is apolynomial-time algorithm that given a graph đș and a number đ,can compute a number đŸ that is within a 1 ± 0.01 factor equal to thenumber of simple paths in đș of length đ. (That is, đŸ is between 0.99 to1.01 times the number of such paths.)
Posterior sampling and probabilistic programming. The algorithm for count-ing can also be extended to sampling from a given posterior distri-bution. That is, if đ¶ ⶠ0, 1đ â 0, 1đ is a Boolean circuit andđŠ â 0, 1đ, then if P = NP we can sample from (a close approx-imation of) the distribution of uniform đ„ â 0, 1đ conditioned onđ¶(đ„) = đŠ. This task is known as posterior sampling and is crucial forBayesian data analysis. These days it is known how to achieve pos-terior sampling only for circuits đ¶ of very special form, and even inthese cases more often than not we do have guarantees on the qualityof the sampling algorithm. The field of making inferences by samplingfrom posterior distribution specified by circuits or programs is knownas probabilistic programming.
what if p equals np? 511
8 One interesting theory is that P = NP and evolutionhas already discovered this algorithm, which we arealready using without realizing it. At the moment,there seems to be very little evidence for such a sce-nario. In fact, we have some partial results in theother direction showing that, regardless of whetherP = NP, many types of âlocal searchâ or âevolution-aryâ algorithms require exponential time to solve3SAT and other NP-hard problems.
16.6 WHAT DOES ALL OF THIS IMPLY?
So, what will happen if we have a 106đ algorithm for 3SAT? We havementioned that NP-hard problems arise in many contexts, and indeedscientists, engineers, programmers and others routinely encountersuch problems in their daily work. A better 3SAT algorithm will prob-ably make their lives easier, but that is the wrong place to look forthe most foundational consequences. Indeed, while the invention ofelectronic computers did of course make it easier to do calculationsthat people were already doing with mechanical devices and pen andpaper, the main applications computers are used for today were noteven imagined before their invention.
An exponentially faster algorithm for all NP problems would beno less radical an improvement (and indeed, in some sense wouldbe more) than the computer itself, and it is as hard for us to imaginewhat it would imply as it was for Babbage to envision todayâs world.For starters, such an algorithm would completely change the way weprogram computers. Since we could automatically find the âbestâ(in any measure we chose) program that achieves a certain task, wewould not need to define how to achieve a task, but only specify testsas to what would be a good solution, and could also ensure that aprogram satisfies an exponential number of tests without actuallyrunning them.
The possibility that P = NP is often described as âautomatingcreativityâ. There is something to that analogy, as we often think ofa creative solution as one that is hard to discover but that, once theâsparkâ hits, is easy to verify. But there is also an element of hubristo that statement, implying that the most impressive consequence ofsuch an algorithmic breakthrough will be that computers would suc-ceed in doing something that humans already do today. Nevertheless,artificial intelligence, like many other fields, will clearly be greatlyimpacted by an efficient 3SAT algorithm. For example, it is clearlymuch easier to find a better Chess-playing algorithm when, given anyalgorithm đ , you can find the smallest algorithm đ âČ that plays Chessbetter than đ . Moreover, as we mentioned above, much of machinelearning (and statistical reasoning in general) is about finding âsim-pleâ concepts that explain the observed data, and if NP = P, we couldsearch for such concepts automatically for any notion of âsimplicityâwe see fit. In fact, we could even âskip the middle manâ and do anautomatic search for the learning algorithm with smallest general-ization error. Ultimately the field of Artificial Intelligence is abouttrying to âshortcutâ billions of years of evolution to obtain artificialprograms that match (or beat) the performance of natural ones, and afast algorithm for NP would provide the ultimate shortcut.8
512 introduction to theoretical computer science
More generally, a faster algorithm for NP problems would be im-mensely useful in any field where one is faced with computational orquantitative problemsâ which is basically all fields of science, math,and engineering. This will not only help with concrete problems suchas designing a better bridge, or finding a better drug, but also withaddressing basic mysteries such as trying to find scientific theories orâlaws of natureâ. In a fascinating talk, physicist Nima Arkani-Hameddiscusses the effort of finding scientific theories in much the same lan-guage as one would describe solving an NP problem, for which thesolution is easy to verify or seems âinevitableâ, once found, but thatrequires searching through a huge landscape of possibilities to reach,and that often can get âstuckâ at local optima:
âthe laws of nature have this amazing feeling of inevitability⊠which is associ-ated with local perfection.â
âThe classical picture of the world is the top of a local mountain in the space ofideas. And you go up to the top and it looks amazing up there and absolutelyincredible. And you learn that there is a taller mountain out there. Find it,Mount QuantumâŠ. theyâre not smoothly connected ⊠youâve got to make ajump to go from classical to quantum ⊠This also tells you why we have suchmajor challenges in trying to extend our understanding of physics. We donâthave these knobs, and little wheels, and twiddles that we can turn. We have tolearn how to make these jumps. And it is a tall order. And thatâs why things aredifficult.â
Finding an efficient algorithm for NP amounts to always being ableto search through an exponential space and find not just the âlocalâmountain, but the tallest peak.
But perhaps more than any computational speedups, a fast algo-rithm for NP problems would bring about a new type of understanding.In many of the areas where NP-completeness arises, it is not as mucha barrier for solving computational problems as it is a barrier for ob-taining âclosed-form formulasâ or other types of more constructivedescriptions of the behavior of natural, biological, social and other sys-tems. A better algorithm for NP, even if it is âmerelyâ 2âđ-time, seemsto require obtaining a new way to understand these types of systems,whether it is characterizing Nash equilibria, spin-glass configurations,entangled quantum states, or any of the other questions where NP iscurrently a barrier for analytical understanding. Such new insightswould be very fruitful regardless of their computational utility.
Big Idea 23 If P = NP, we can efficiently solve a fantastic numberof decision, search, optimization, counting, and sampling problemsfrom all areas of human endeavors.
what if p equals np? 513
9 This inefficiency is not necessarily inherent. Laterin this course we may discuss results in program-checking, interactive proofs, and average-case com-plexity, that can be used for efficient verification ofproofs of related statements. In contrast, the ineffi-ciency of verifying failure of all programs could wellbe inherent.
16.7 CAN P â NP BE NEITHER TRUE NOR FALSE?
The Continuum Hypothesis is a conjecture made by Georg Cantor in1878, positing the non-existence of a certain type of infinite cardinality.(One way to phrase it is that for every infinite subset đ of the realnumbers â, either there is a one-to-one and onto function đ ⶠđ â âor there is a one-to-one and onto function đ ⶠđ â â.) This wasconsidered one of the most important open problems in set theory,and settling its truth or falseness was the first problem put forward byHilbert in the 1900 address we mentioned before. However, using thetheories developed by Gödel and Turing, in 1963 Paul Cohen provedthat both the Continuum Hypothesis and its negation are consistentwith the standard axioms of set theory (i.e., the Zermelo-Fraenkelaxioms + the Axiom of choice, or âZFCâ for short). Formally, whathe proved is that if ZFC is consistent, then so is ZFC when we assumeeither the continuum hypothesis or its negation.
Today, many (though not all) mathematicians interpret this resultas saying that the Continuum Hypothesis is neither true nor false, butrather is an axiomatic choice that we are free to make one way or theother. Could the same hold for P â NP?
In short, the answer is No. For example, suppose that we are try-ing to decide between the â3SAT is easyâ conjecture (there is an 106đtime algorithm for 3SAT) and the â3SAT is hardâ conjecture (for ev-ery đ, any NAND-CIRC program that solves đ variable 3SAT takes210â6đ lines). Then, since for đ = 108, 210â6đ > 106đ, this boils downto the finite question of deciding whether or not there is a 1013-lineNAND-CIRC program deciding 3SAT on formulas with 108 variables.If there is such a program then there is a finite proof of its existence,namely the approximately 1TB file describing the program, and forwhich the verification is the (finite in principle though infeasible inpractice) process of checking that it succeeds on all inputs.9 If thereisnât such a program, then there is also a finite proof of that, thoughany such proof would take longer since we would need to enumer-ate over all programs as well. Ultimately, since it boils down to a finitestatement about bits and numbers; either the statement or its negationmust follow from the standard axioms of arithmetic in a finite numberof arithmetic steps. Thus, we cannot justify our ignorance in distin-guishing between the â3SAT easyâ and â3SAT hardâ cases by claimingthat this might be an inherently ill-defined question. Similar reason-ing (with different numbers) applies to other variants of the P vs NPquestion. We note that in the case that 3SAT is hard, it may well bethat there is no short proof of this fact using the standard axioms, andthis is a question that people have been studying in various restrictedforms of proof systems.
514 introduction to theoretical computer science
10 Actually, the computational difficulty of problems ineconomics such as finding optimal (or any) equilibriais quite subtle. Some variants of such problems areNP-hard, while others have a certain âintermediateâcomplexity.
11 Talk more about coping with NP hardness. Maintwo approaches are heuristics such as SAT solvers thatsucceed on some instances, and proxy measures suchas mathematical relaxations that instead of solvingproblem đ (e.g., an integer program) solve programđâČ (e.g., a linear program) that is related to that.Maybe give compressed sensing as an example, andleast square minimization as a proxy for maximumapostoriori probability.
16.8 IS P = NP âIN PRACTICEâ?
The fact that a problem is NP-hard means that we believe there is noefficient algorithm that solve it in the worst case. It does not, however,mean that every single instance of the problem is hard. For exam-ple, if all the clauses in a 3SAT instance đ contain the same variableđ„đ (possibly in negated form), then by guessing a value to đ„đ we canreduce đ to a 2SAT instance which can then be efficiently solved. Gen-eralizations of this simple idea are used in âSAT solversâ, which arealgorithms that have solved certain specific interesting SAT formulaswith thousands of variables, despite the fact that we believe SAT tobe exponentially hard in the worst case. Similarly, a lot of problemsarising in economics and machine learning are NP-hard.10 And yetvendors and customers manage to figure out market-clearing prices(as economists like to point out, there is milk on the shelves) and micesucceed in distinguishing cats from dogs. Hence people (and ma-chines) seem to regularly succeed in solving interesting instances ofNP-hard problems, typically by using some combination of guessingwhile making local improvements.
It is also true that there are many interesting instances of NP-hardproblems that we do not currently know how to solve. Across all ap-plication areas, whether it is scientific computing, optimization, con-trol or more, people often encounter hard instances of NP problemson which our current algorithms fail. In fact, as we will see, all of ourdigital security infrastructure relies on the fact that some concrete andeasy-to-generate instances of, say, 3SAT (or, equivalently, any otherNP-hard problem) are exponentially hard to solve.
Thus it would be wrong to say that NP is easy âin practiceâ, norwould it be correct to take NP-hardness as the âfinal wordâ on thecomplexity of a problem, particularly when we have more informa-tion about how any given instance is generated. Understanding boththe âtypical complexityâ of NP problems, as well as the power andlimitations of certain heuristics (such as various local-search based al-gorithms) is a very active area of research. We will see more on thesetopics later in this course.
11
16.9 WHAT IF P â NP?
So, P = NP would give us all kinds of fantastical outcomes. But westrongly suspect that P â NP, and moreover that there is no much-better-than-brute-force algorithm for 3SAT. If indeed that is the case, isit all bad news?
One might think that impossibility results, telling you that youcannot do something, is the kind of cloud that does not have a silver
what if p equals np? 515
lining. But in fact, as we already alluded to before, it does. A hard(in a sufficiently strong sense) problem in NP can be used to create acode that cannot be broken, a task that for thousands of years has beenthe dream of not just spies but of many scientists and mathematiciansover the generations. But the complexity viewpoint turned out toyield much more than simple codes, achieving tasks that people hadpreviously not even dared to dream of. These include the notion ofpublic key cryptography, allowing two people to communicate securelywithout ever having exchanged a secret key; electronic cash, allowingprivate and secure transaction without a central authority; and securemultiparty computation, enabling parties to compute a joint function onprivate inputs without revealing any extra information about it. Also,as we will see, computational hardness can be used to replace the roleof randomness in many settings.
Furthermore, while it is often convenient to pretend that computa-tional problems are simply handed to us, and that our job as computerscientists is to find the most efficient algorithm for them, this is nothow things work in most computing applications. Typically even for-mulating the problem to solve is a highly non-trivial task. When wediscover that the problem we want to solve is NP-hard, this might be auseful sign that we used the wrong formulation for it.
Beyond all these, the quest to understand computational hardnessâ including the discoveries of lower bounds for restricted compu-tational models, as well as new types of reductions (such as thosearising from âprobabilistically checkable proofsâ) â has already hadsurprising positive applications to problems in algorithm design, aswell as in coding for both communication and storage. This is notsurprising since, as we mentioned before, from group theory to thetheory of relativity, the pursuit of impossibility results has often beenone of the most fruitful enterprises of mankind.
Chapter Recap
âą The question of whether P = NP is one of themost important and fascinating questions of com-puter science and science at large, touching on allfields of the natural and social sciences, as well asmathematics and engineering.
âą Our current evidence and understanding supportsthe âSAT hardâ scenario that there is no much-better-than-brute-force algorithm for 3SAT or manyother NP-hard problems.
âą We are very far from proving this, however. Re-searchers have studied proving lower bounds onthe number of gates to compute explicit functionsin restricted forms of circuits, and have made some
516 introduction to theoretical computer science
advances in this effort, along the way generatingmathematical tools that have found other uses.However, we have made essentially no headwayin proving lower bounds for general models ofcomputation such as Boolean circuits and Turingmachines. Indeed, we currently do not even knowhow to rule out the possibility that for every đ â â,SAT restricted to đ-length inputs has a Booleancircuit of less than 10đ gates (even though thereexist đ-input functions that require at least 2đ/(10đ)gates to compute).
âą Understanding how to cope with this compu-tational intractability, and even benefit from it,comprises much of the research in theoreticalcomputer science.
16.10 EXERCISES
16.11 BIBLIOGRAPHICAL NOTES
As mentioned before, Aaronsonâs survey [Aar16] is a great expositionof the P vs NP problem. Another recommended survey by Aaronsonis [Aar05] which discusses the question of whether NP completeproblems could be computed by any physical means.
The paper [BU11] discusses some results about problems in thepolynomial hierarchy.
17Space bounded computation
PLAN: Example of space bounded algorithms, importance of pre-serving space. The classes L and PSPACE, space hierarchy theorem,PSPACE=NPSPACE, constant space = regular languages.
17.1 EXERCISES
17.2 BIBLIOGRAPHICAL NOTES
Compiled on 8.26.2020 18:10
IVRANDOMIZED COMPUTATION
18Probability Theory 101
âGod doesnât play dice with the universeâ, Albert Einstein
âEinstein was doubly wrong ⊠not only does God definitely play dice, but Hesometimes confuses us by throwing them where they canât be seen.â, StephenHawking
â âThe probability of winning a battle has no place in our theory because itdoes not belong to any [random experiment]. Probability cannot be appliedto this problem any more than the physical concept of work can be applied tothe âworkâ done by an actor reciting his part.â, Richard Von Mises, 1928(paraphrased)
âI am unable to see why âobjectivityâ requires us to interpret every probabilityas a frequency in some random experiment; particularly when in most problemsprobabilities are frequencies only in an imaginary universe invented just for thepurpose of allowing a frequency interpretation.â, E.T. Jaynes, 1976
Before we show how to use randomness in algorithms, let us do aquick review of some basic notions in probability theory. This is notmeant to replace a course on probability theory, and if you have notseen this material before, I highly recommend you look at additionalresources to get up to speed. Fortunately, we will not need many ofthe advanced notions of probability theory, but, as we will see, eventhe so-called âsimpleâ setting of tossing đ coins can lead to very subtleand interesting issues.
18.1 RANDOM COINS
The nature of randomness and probability is a topic of great philo-sophical, scientific and mathematical depth. Is there actual random-ness in the world, or does it proceed in a deterministic clockwork fash-ion from some initial conditions set at the beginning of time? Doesprobability refer to our uncertainty of beliefs, or to the frequency ofoccurrences in repeated experiments? How can we define probabilityover infinite sets?
Compiled on 8.26.2020 18:10
Learning Objectives:âą Review the basic notion of probability theory
that we will use.âą Sample spaces, and in particular the space0, 1đâą Events, probabilities of unions and
intersections.âą Random variables and their expectation,
variance, and standard deviation.âą Independence and correlation for both events
and random variables.âą Markov, Chebyshev and Chernoff tail bounds
(bounding the probability that a randomvariable will deviate from its expectation).
522 introduction to theoretical computer science
Figure 18.1: The probabilistic experiment of tossingthree coins corresponds to making 2 Ă 2 Ă 2 = 8choices, each with equal probability. In this example,the blue set corresponds to the event đŽ = đ„ â0, 13 | đ„0 = 0 where the first coin toss is equalto 0, and the pink set corresponds to the event đ” =đ„ â 0, 13 | đ„1 = 1 where the second coin toss isequal to 1 (with their intersection having a purplishcolor). As we can see, each of these events contains 4elements (out of 8 total) and so has probability 1/2.The intersection of đŽ and đ” contains two elements,and so the probability that both of these events occuris 2/8 = 1/4.
Figure 18.2: The event that if we toss three coinsđ„0, đ„1, đ„2 â 0, 1 then the sum of the đ„đâs is evenhas probability 1/2 since it corresponds to exactly 4out of the 8 possible strings of length 3.
These are all important questions that have been studied and de-bated by scientists, mathematicians, statisticians and philosophers.Fortunately, we will not need to deal directly with these questionshere. We will be mostly interested in the setting of tossing đ random,unbiased and independent coins. Below we define the basic proba-bilistic objects of events and random variables when restricted to thissetting. These can be defined for much more general probabilistic ex-periments or sample spaces, and later on we will briefly discuss howthis can be done. However, the đ-coin case is sufficient for almosteverything weâll need in this course.
If instead of âheadsâ and âtailsâ we encode the sides of each coinby âzeroâ and âoneâ, we can encode the result of tossing đ coins asa string in 0, 1đ. Each particular outcome đ„ â 0, 1đ is obtainedwith probability 2âđ. For example, if we toss three coins, then weobtain each of the 8 outcomes 000, 001, 010, 011, 100, 101, 110, 111with probability 2â3 = 1/8 (see also Fig. 18.1). We can describe theexperiment of tossing đ coins as choosing a string đ„ uniformly atrandom from 0, 1đ, and hence weâll use the shorthand đ„ ⌠0, 1đfor đ„ that is chosen according to this experiment.
An event is simply a subset đŽ of 0, 1đ. The probability of đŽ, de-noted by Prđ„âŒ0,1đ [đŽ] (or Pr[đŽ] for short, when the sample space isunderstood from the context), is the probability that an đ„ chosen uni-formly at random will be contained in đŽ. Note that this is the same as|đŽ|/2đ (where |đŽ| as usual denotes the number of elements in the setđŽ). For example, the probability that đ„ has an even number of onesis Pr[đŽ] where đŽ = đ„ ⶠâđâ1đ=0 đ„đ = 0 mod 2. In the case đ = 3,đŽ = 000, 011, 101, 110, and hence Pr[đŽ] = 48 = 12 (see Fig. 18.2). Itturns out this is true for every đ:Lemma 18.1 For every đ > 0,
Prđ„âŒ0,1đ[đâ1âđ=0 đ„đ is even ] = 1/2 (18.1)
PTo test your intuition on probability, try to stop hereand prove the lemma on your own.
Proof of Lemma 18.1. We prove the lemma by induction on đ. For thecase đ = 1 it is clear since đ„ = 0 is even and đ„ = 1 is odd, and hencethe probability that đ„ â 0, 1 is even is 1/2. Let đ > 1. We assumeby induction that the lemma is true for đ â 1 and we will prove itfor đ. We split the set 0, 1đ into four disjoint sets đž0, đž1, đ0, đ1,where for đ â 0, 1, đžđ is defined as the set of đ„ â 0, 1đ such that
probability theory 101 523
đ„0 ⯠đ„đâ2 has even number of ones and đ„đâ1 = đ and similarly đđ isthe set of đ„ â 0, 1đ such that đ„0 ⯠đ„đâ2 has odd number of ones andđ„đâ1 = đ. Since đž0 is obtained by simply extending đ â 1-length stringwith even number of ones by the digit 0, the size of đž0 is simply thenumber of such đâ1-length strings which by the induction hypothesisis 2đâ1/2 = 2đâ2. The same reasoning applies for đž1, đ0, and đ1.Hence each one of the four sets đž0, đž1, đ0, đ1 is of size 2đâ2. Sinceđ„ â 0, 1đ has an even number of ones if and only if đ„ â đž0 âȘ đ1(i.e., either the first đ â 1 coordinates sum up to an even number andthe final coordinate is 0 or the first đ â 1 coordinates sum up to an oddnumber and the final coordinate is 1), we get that the probability thatđ„ satisfies this property is
|đž0âȘđ1|2đ = 2đâ2 + 2đâ22đ = 12 , (18.2)
using the fact that đž0 and đ1 are disjoint and hence |đž0 âȘ đ1| =|đž0| + |đ1|.
We can also use the intersection (â©) and union (âȘ) operators totalk about the probability of both event đŽ and event đ” happening, orthe probability of event đŽ or event đ” happening. For example, theprobability đ that đ„ has an even number of ones and đ„0 = 1 is the sameas Pr[đŽ â© đ”] where đŽ = đ„ â 0, 1đ ⶠâđâ1đ=0 đ„đ = 0 mod 2 andđ” = đ„ â 0, 1đ ⶠđ„0 = 1. This probability is equal to 1/4 forđ > 1. (It is a great exercise for you to pause here and verify that youunderstand why this is the case.)
Because intersection corresponds to considering the logical ANDof the conditions that two events happen, while union correspondsto considering the logical OR, we will sometimes use the ⧠and âšoperators instead of â© and âȘ, and so write this probability đ = Pr[đŽ â©đ”] defined above also as
Prđ„âŒ0,1đ [âđ đ„đ = 0 mod 2 ⧠đ„0 = 1] . (18.3)
If đŽ â 0, 1đ is an event, then đŽ = 0, 1đ ⧔ đŽ corresponds to theevent that đŽ does not happen. Since |đŽ| = 2đ â |đŽ|, we get that
Pr[đŽ] = |đŽ|2đ = 2đâ|đŽ|2đ = 1 â |đŽ|2đ = 1 â Pr[đŽ] (18.4)
This makes sense: since đŽ happens if and only if đŽ does not happen,the probability of đŽ should be one minus the probability of đŽ.
R
524 introduction to theoretical computer science
Remark 18.2 â Remember the sample space. While theabove definition might seem very simple and almosttrivial, the human mind seems not to have evolved forprobabilistic reasoning, and it is surprising how oftenpeople can get even the simplest settings of probabilitywrong. One way to make sure you donât get confusedwhen trying to calculate probability statements isto always ask yourself the following two questions:(1) Do I understand what is the sample space thatthis probability is taken over?, and (2) Do I under-stand what is the definition of the event that we areanalyzing?.For example, suppose that I were to randomize seatingin my course, and then it turned out that studentssitting in row 7 performed better on the final: howsurprising should we find this? If we started out withthe hypothesis that there is something special aboutthe number 7 and chose it ahead of time, then theevent that we are discussing is the event đŽ that stu-dents sitting in number 7 had better performance onthe final, and we might find it surprising. However, ifwe first looked at the results and then chose the rowwhose average performance is best, then the eventwe are discussing is the event đ” that there exists somerow where the performance is higher than the over-all average. đ” is a superset of đŽ, and its probability(even if there is no correlation between sitting andperformance) can be quite significant.
18.1.1 Random variablesEvents correspond to Yes/No questions, but often we want to analyzefiner questions. For example, if we make a bet at the roulette wheel,we donât want to just analyze whether we won or lost, but also howmuch weâve gained. A (real valued) random variable is simply a wayto associate a number with the result of a probabilistic experiment.Formally, a random variable is a function đ ⶠ0, 1đ â â that mapsevery outcome đ„ â 0, 1đ to an element đ(đ„) â â. For example, thefunction SUM ⶠ0, 1đ â â that maps đ„ to the sum of its coordinates(i.e., to âđâ1đ=0 đ„đ) is a random variable.
The expectation of a random variable đ, denoted by đŒ[đ], is theaverage value that this number takes, taken over all draws from theprobabilistic experiment. In other words, the expectation of đ is de-fined as follows: đŒ[đ] = âđ„â0,1đ 2âđđ(đ„) . (18.5)
If đ and đ are random variables, then we can define đ + đ assimply the random variable that maps a point đ„ â 0, 1đ to đ(đ„) +đ (đ„). One basic and very useful property of the expectation is that itis linear:
probability theory 101 525
Lemma 18.3 â Linearity of expectation.đŒ[đ + đ ] = đŒ[đ] + đŒ[đ ] (18.6)
Proof. đŒ[đ + đ ] = âđ„â0,1đ 2âđ (đ(đ„) + đ (đ„)) =âđ„â0,1đ 2âđđ(đ„) + âđ„â0,1đ 2âđđ (đ„) =đŒ[đ] + đŒ[đ ] (18.7)
Similarly, đŒ[đđ] = đ đŒ[đ] for every đ â â.Solved Exercise 18.1 â Expectation of sum. Let đ ⶠ0, 1đ â â be therandom variable that maps đ„ â 0, 1đ to đ„0 + đ„1 + ⊠+ đ„đâ1. Provethat đŒ[đ] = đ/2.
Solution:
We can solve this using the linearity of expectation. We can de-fine random variables đ0, đ1, ⊠, đđâ1 such that đđ(đ„) = đ„đ. Sinceeach đ„đ equals 1 with probability 1/2 and 0 with probability 1/2,đŒ[đđ] = 1/2. Since đ = âđâ1đ=0 đđ, by the linearity of expectationđŒ[đ] = đŒ[đ0] + đŒ[đ1] + ⯠+ đŒ[đđâ1] = đ2 . (18.8)
PIf you have not seen discrete probability before, pleasego over this argument again until you are sure youfollow it; it is a prototypical simple example of thetype of reasoning we will employ again and again inthis course.
If đŽ is an event, then 1đŽ is the random variable such that 1đŽ(đ„)equals 1 if đ„ â đŽ, and 1đŽ(đ„) = 0 otherwise. Note that Pr[đŽ] = đŒ[1đŽ](can you see why?). Using this and the linearity of expectation, wecan show one of the most useful bounds in probability theory:Lemma 18.4 â Union bound. For every two events đŽ, đ”, Pr[đŽ âȘ đ”] â€Pr[đŽ] + Pr[đ”]
PBefore looking at the proof, try to see why the unionbound makes intuitive sense. We can also proveit directly from the definition of probabilities and
526 introduction to theoretical computer science
Figure 18.3: The union bound tells us that the proba-bility of đŽ or đ” happening is at most the sum of theindividual probabilities. We can see it by noting thatfor every two sets |đŽ âȘ đ”| †|đŽ| + |đ”| (with equalityonly if đŽ and đ” have no intersection).
the cardinality of sets, together with the equation|đŽ âȘ đ”| †|đŽ| + |đ”|. Can you see why the latterequation is true? (See also Fig. 18.3.)
Proof of Lemma 18.4. For every đ„, the variable 1đŽâȘđ”(đ„) †1đŽ(đ„)+1đ”(đ„).Hence, Pr[đŽâȘđ”] = đŒ[1đŽâȘđ”] †đŒ[1đŽ+1đ”] = đŒ[1đŽ]+đŒ[1đ”] = Pr[đŽ]+Pr[đ”].
The way we often use this in theoretical computer science is toargue that, for example, if there is a list of 100 bad events that can hap-pen, and each one of them happens with probability at most 1/10000,then with probability at least 1 â 100/10000 = 0.99, no bad eventhappens.
18.1.2 Distributions over stringsWhile most of the time we think of random variables as havingas output a real number, we sometimes consider random vari-ables whose output is a string. That is, we can think of a mapđ ⶠ0, 1đ â 0, 1â and consider the ârandom variableâ đ suchthat for every đŠ â 0, 1â, the probability that đ outputs đŠ is equalto 12đ |đ„ â 0, 1đ | đ (đ„) = đŠ|. To avoid confusion, we will typicallyrefer to such string-valued random variables as distributions overstrings. So, a distribution đ over strings 0, 1â can be thought of asa finite collection of strings đŠ0, ⊠, đŠđâ1 â 0, 1â and probabilitiesđ0, ⊠, đđâ1 (which are non-negative numbers summing up to one),so that Pr[đ = đŠđ] = đđ.
Two distributions đ and đ âČ are identical if they assign the sameprobability to every string. For example, consider the following twofunctions đ , đ âČ â¶ 0, 12 â 0, 12. For every đ„ â 0, 12, we defineđ (đ„) = đ„ and đ âČ(đ„) = đ„0(đ„0 â đ„1) where â is the XOR operations.Although these are two different functions, they induce the samedistribution over 0, 12 when invoked on a uniform input. The distri-bution đ (đ„) for đ„ ⌠0, 12 is of course the uniform distribution over0, 12. On the other hand đ âČ is simply the map 00 ⊠00, 01 ⊠01,10 ⊠11, 11 ⊠10 which is a permutation of đ .
18.1.3 More general sample spacesWhile throughout most of this book we assume that the underlyingprobabilistic experiment corresponds to tossing đ independent coins,all the claims we make easily generalize to sampling đ„ from a moregeneral finite or countable set đ (and not-so-easily generalizes touncountable sets đ as well). A probability distribution over a finite setđ is simply a function đ ⶠđ â [0, 1] such that âđ„âđ đ(đ„) = 1. Wethink of this as the experiment where we obtain every đ„ â đ with
probability theory 101 527
Figure 18.4: Two events đŽ and đ” are independent ifPr[đŽ â© đ”] = Pr[đŽ] â Pr[đ”]. In the two figures above,the empty đ„ Ă đ„ square is the sample space, and đŽand đ” are two events in this sample space. In the leftfigure, đŽ and đ” are independent, while in the rightfigure they are negatively correlated, since đ” is lesslikely to occur if we condition on đŽ (and vice versa).Mathematically, one can see this by noticing that inthe left figure the areas of đŽ and đ” respectively aređ â đ„ and đ â đ„, and so their probabilities are đâ đ„đ„2 = đđ„and đâ đ„đ„2 = đđ„ respectively, while the area of đŽ â© đ” isđ â đ which corresponds to the probability đâ đđ„2 . In theright figure, the area of the triangle đ” is đâ đ„2 whichcorresponds to a probability of đ2đ„ , but the area ofđŽ â© đ” is đâČâ đ2 for some đâČ < đ. This means that theprobability of đŽ â© đ” is đâČâ đ2đ„2 < đ2đ„ â đđ„ , or in other wordsPr[đŽ â© đ”] < Pr[đŽ] â Pr[đ”].
probability đ(đ„), and sometimes denote this as đ„ ⌠đ. In particular,tossing đ random coins corresponds to the probability distributionđ ⶠ0, 1đ â [0, 1] defined as đ(đ„) = 2âđ for every đ„ â 0, 1đ. Anevent đŽ is a subset of đ, and the probability of đŽ, which we denote byPrđ[đŽ], is âđ„âđŽ đ(đ„). A random variable is a function đ ⶠđ â â, wherethe probability that đ = đŠ is equal to âđ„âđ s.t. đ(đ„)=đŠ đ(đ„).18.2 CORRELATIONS AND INDEPENDENCE
One of the most delicate but important concepts in probability is thenotion of independence (and the opposing notion of correlations). Subtlecorrelations are often behind surprises and errors in probability andstatistical analysis, and several mistaken predictions have been blamedon miscalculating the correlations between, say, housing prices inFlorida and Arizona, or voter preferences in Ohio and Michigan. Seealso Joe Blitzsteinâs aptly named talk âConditioning is the Soul ofStatisticsâ. (Another thorny issue is of course the difference betweencorrelation and causation. Luckily, this is another point we donât need toworry about in our clean setting of tossing đ coins.)
Two events đŽ and đ” are independent if the fact that đŽ happensmakes đ” neither more nor less likely to happen. For example, if wethink of the experiment of tossing 3 random coins đ„ â 0, 13, and welet đŽ be the event that đ„0 = 1 and đ” the event that đ„0 + đ„1 + đ„2 â„ 2,then if đŽ happens it is more likely that đ” happens, and hence theseevents are not independent. On the other hand, if we let đ¶ be the eventthat đ„1 = 1, then because the second coin toss is not affected by theresult of the first one, the events đŽ and đ¶ are independent.
The formal definition is that events đŽ and đ” are independent ifPr[đŽ â© đ”] = Pr[đŽ] â Pr[đ”]. If Pr[đŽ â© đ”] > Pr[đŽ] â Pr[đ”] then we saythat đŽ and đ” are positively correlated, while if Pr[đŽ â© đ”] < Pr[đŽ] â Pr[đ”]then we say that đŽ and đ” are negatively correlated (see Fig. 18.1).
If we consider the above examples on the experiment of choosingđ„ â 0, 13 then we can see that
Pr[đ„0 = 1] = 12Pr[đ„0 + đ„1 + đ„2 â„ 2] = Pr[011, 101, 110, 111] = 48 = 12 (18.9)
but
Pr[đ„0 = 1 ⧠đ„0 +đ„1 +đ„2 â„ 2] = Pr[101, 110, 111] = 38 > 12 â 12 (18.10)
and hence, as we already observed, the events đ„0 = 1 and đ„0 +đ„1 + đ„2 â„ 2 are not independent and in fact are positively correlated.On the other hand, Pr[đ„0 = 1 ⧠đ„1 = 1] = Pr[110, 111] = 28 = 12 â 12and hence the events đ„0 = 1 and đ„1 = 1 are indeed independent.
528 introduction to theoretical computer science
Figure 18.5: Consider the sample space 0, 1đ andthe events đŽ, đ”, đ¶, đ·, đž corresponding to đŽ: đ„0 = 1,đ”: đ„1 = 1, đ¶: đ„0 + đ„1 + đ„2 â„ 2, đ·: đ„0 + đ„1 + đ„2 =0đđđ2 and đ·: đ„0 + đ„1 = 0đđđ2. We can see thatđŽ and đ” are independent, đ¶ is positively correlatedwith đŽ and positively correlated with đ”, the threeevents đŽ, đ”, đ· are mutually independent, and whileevery pair out of đŽ, đ”, đž is independent, the threeevents đŽ, đ”, đž are not mutually independent sincetheir intersection has probability 28 = 14 instead of12 â 12 â 12 = 18 .
RRemark 18.5 â Disjointness vs independence. Peoplesometimes confuse the notion of disjointness and in-dependence, but these are actually quite different. Twoevents đŽ and đ” are disjoint if đŽ â© đ” = â , which meansthat if đŽ happens then đ” definitely does not happen.They are independent if Pr[đŽ â© đ”] = Pr[đŽ]Pr[đ”] whichmeans that knowing that đŽ happens gives us no infor-mation about whether đ” happened or not. If đŽ and đ”have nonzero probability, then being disjoint impliesthat they are not independent, since in particular itmeans that they are negatively correlated.
Conditional probability: If đŽ and đ” are events, and đŽ happens withnonzero probability then we define the probability that đ” happensconditioned on đŽ to be Pr[đ”|đŽ] = Pr[đŽ â© đ”]/Pr[đŽ]. This correspondsto calculating the probability that đ” happens if we already knowthat đŽ happened. Note that đŽ and đ” are independent if and only ifPr[đ”|đŽ] = Pr[đ”].More than two events: We can generalize this definition to more thantwo events. We say that events đŽ1, ⊠, đŽđ are mutually independentif knowing that any set of them occurred or didnât occur does notchange the probability that an event outside the set occurs. Formally,the condition is that for every subset đŒ â [đ],
Pr[â§đâđŒđŽđ] = âđâđŒ Pr[đŽđ]. (18.11)
For example, if đ„ ⌠0, 13, then the events đ„0 = 1, đ„1 = 1 andđ„2 = 1 are mutually independent. On the other hand, the eventsđ„0 = 1, đ„1 = 1 and đ„0 + đ„1 = 0 mod 2 are not mutuallyindependent, even though every pair of these events is independent(can you see why? see also Fig. 18.5).
18.2.1 Independent random variablesWe say that two random variables đ ⶠ0, 1đ â â and đ ⶠ0, 1đ â âare independent if for every đą, đŁ â â, the events đ = đą and đ = đŁare independent. (We use đ = đą as shorthand for đ„ | đ(đ„) = đą.)In other words, đ and đ are independent if Pr[đ = đą ⧠đ = đŁ] =Pr[đ = đą]Pr[đ = đŁ] for every đą, đŁ â â. For example, if two randomvariables depend on the result of tossing different coins then they areindependent:Lemma 18.6 Suppose that đ = đ 0, ⊠, đ đâ1 and đ = đĄ0, ⊠, đĄđâ1 aredisjoint subsets of 0, ⊠, đ â 1 and let đ, đ ⶠ0, 1đ â â be randomvariables such that đ = đč(đ„đ 0 , ⊠, đ„đ đâ1) and đ = đș(đ„đĄ0 , ⊠, đ„đĄđâ1) for
probability theory 101 529
some functions đč ⶠ0, 1đ â â and đș ⶠ0, 1đ â â. Then đ and đare independent.
PThe notation in the lemmaâs statement is a bit cum-bersome, but at the end of the day, it simply says thatif đ and đ are random variables that depend on twodisjoint sets đ and đ of coins (for example, đ mightbe the sum of the first đ/2 coins, and đ might be thelargest consecutive stretch of zeroes in the second đ/2coins), then they are independent.
Proof of Lemma 18.6. Let đ, đ â â, and let đŽ = đ„ â 0, 1đ ⶠđč (đ„) = đand đ” = đ„ â 0, 1đ ⶠđč (đ„) = đ. Since đ and đ are disjoint, we canreorder the indices so that đ = 0, ⊠, đ â 1 and đ = đ, ⊠, đ + đ â 1without affecting any of the probabilities. Hence we can write Pr[đ =đ ⧠đ = đ] = |đ¶|/2đ where đ¶ = đ„0, ⊠, đ„đâ1 ⶠ(đ„0, ⊠, đ„đâ1) âđŽ ⧠(đ„đ, ⊠, đ„đ+đâ1) â đ”. Another way to write this using stringconcatenation is that đ¶ = đ„đŠđ§ ⶠđ„ â đŽ, đŠ â đ”, đ§ â 0, 1đâđâđ, andhence |đ¶| = |đŽ||đ”|2đâđâđ, which means that|đ¶|2đ = |đŽ|2đ |đ”|2đ 2đâđâđ2đâđâđ = Pr[đ = đ]Pr[đ = đ]. (18.12)
If đ and đ are independent random variables then (letting đđ, đđdenote the sets of all numbers that have positive probability of beingthe output of đ and đ , respectively):đŒ[XY] = âđâđđ,đâđđ Pr[đ = đ ⧠đ = đ] â đđ =(1) âđâđđ,đâđđ Pr[đ = đ]Pr[đ = đ] â đđ =(2)
( âđâđđ Pr[đ = đ] â đ) ( âđâđđ Pr[đ = đ] â đ) =(3)đŒ[đ] đŒ[đ ](18.13)
where the first equality (=(1)) follows from the independence of đand đ , the second equality (=(2)) follows by âopening the parenthe-sesâ of the righthand side, and the third equality (=(3)) follows fromthe definition of expectation. (This is not an âif and only ifâ; see Exer-cise 18.3.)
Another useful fact is that if đ and đ are independent randomvariables, then so are đč(đ) and đș(đ ) for all functions đč, đș ⶠâ â â.This is intuitively true since learning đč(đ) can only provide us withless information than does learning đ itself. Hence, if learning đdoes not teach us anything about đ (and so also about đč(đ )) then
530 introduction to theoretical computer science
neither will learning đč(đ). Indeed, to prove this we can write forevery đ, đ â â:
Pr[đč (đ) = đ ⧠đș(đ ) = đ] = âđ„ s.t.đč(đ„)=đ,đŠ s.t. đș(đŠ)=đ Pr[đ = đ„ ⧠đ = đŠ] =âđ„ s.t.đč(đ„)=đ,đŠ s.t. đș(đŠ)=đ Pr[đ = đ„]Pr[đ = đŠ] =âââ âđ„ s.t.đč(đ„)=đ Pr[đ = đ„]âââ â âââ âđŠ s.t.đș(đŠ)=đ Pr[đ = đŠ]âââ =
Pr[đč (đ) = đ]Pr[đș(đ ) = đ].(18.14)
18.2.2 Collections of independent random variablesWe can extend the notions of independence to more than two randomvariables: we say that the random variables đ0, ⊠, đđâ1 are mutuallyindependent if for every đ0, ⊠, đđâ1 â â,
Pr [đ0 = đ0 ⧠⯠⧠đđâ1 = đđâ1] = Pr[đ0 = đ0] âŻPr[đđâ1 = đđâ1].(18.15)
And similarly, we have thatLemma 18.7 â Expectation of product of independent random variables. Ifđ0, ⊠, đđâ1 are mutually independent thenđŒ[đâ1âđ=0 đđ] = đâ1âđ=0 đŒ[đđ]. (18.16)
Lemma 18.8 â Functions preserve independence. If đ0, ⊠, đđâ1 are mu-tually independent, and đ0, ⊠, đđâ1 are defined as đđ = đčđ(đđ) forsome functions đč0, ⊠, đčđâ1 ⶠâ â â, then đ0, ⊠, đđâ1 are mutuallyindependent as well.
PWe leave proving Lemma 18.7 and Lemma 18.8 asExercise 18.6 and Exercise 18.7. It is good idea for youstop now and do these exercises to make sure you arecomfortable with the notion of independence, as wewill use it heavily later on in this course.
18.3 CONCENTRATION AND TAIL BOUNDS
The name âexpectationâ is somewhat misleading. For example, sup-pose that you and I place a bet on the outcome of 10 coin tosses, whereif they all come out to be 1âs then I pay you 100,000 dollars and other-
probability theory 101 531
Figure 18.6: The probabilities that we obtain a partic-ular sum when we toss đ = 10, 20, 100, 1000 coinsconverge quickly to the Gaussian/normal distribu-tion.
Figure 18.7: Markovâs Inequality tells us that a non-negative random variable đ cannot be much largerthan its expectation, with high probability. For exam-ple, if the expectation of đ is đ, then the probabilitythat đ > 4đ must be at most 1/4, as otherwise justthe contribution from this part of the sample spacewill be too large.
wise you pay me 10 dollars. If we let đ ⶠ0, 110 â â be the randomvariable denoting your gain, then we see thatđŒ[đ] = 2â10 â 100000 â (1 â 2â10)10 ⌠90. (18.17)
But we donât really âexpectâ the result of this experiment to be foryou to gain 90 dollars. Rather, 99.9% of the time you will pay me 10dollars, and you will hit the jackpot 0.01% of the times.
However, if we repeat this experiment again and again (with freshand hence independent coins), then in the long run we do expect youraverage earning to be close to 90 dollars, which is the reason whycasinos can make money in a predictable way even though everyindividual bet is random. For example, if we toss đ independent andunbiased coins, then as đ grows, the number of coins that come upones will be more and more concentrated around đ/2 according to thefamous âbell curveâ (see Fig. 18.6).
Much of probability theory is concerned with so called concentrationor tail bounds, which are upper bounds on the probability that a ran-dom variable đ deviates too much from its expectation. The first andsimplest one of them is Markovâs inequality:
Theorem 18.9 â Markovâs inequality. If đ is a non-negative randomvariable then for every đ > 1, Pr[đ â„ đ đŒ[đ]] †1/đ.
PMarkovâs Inequality is actually a very natural state-ment (see also Fig. 18.7). For example, if you knowthat the average (not the median!) household incomein the US is 70,000 dollars, then in particular you candeduce that at most 25 percent of households makemore than 280,000 dollars, since otherwise, even ifthe remaining 75 percent had zero income, the top25 percent alone would cause the average income tobe larger than 70,000 dollars. From this example youcan already see that in many situations, Markovâsinequality will not be tight and the probability of devi-ating from expectation will be much smaller: see theChebyshev and Chernoff inequalities below.
Proof of Theorem 18.9. Let đ = đŒ[đ] and define đ = 1đâ„đđ. Thatis, đ (đ„) = 1 if đ(đ„) â„ đđ and đ (đ„) = 0 otherwise. Note that bydefinition, for every đ„, đ (đ„) †đ/(đđ). We need to show đŒ[đ ] †1/đ.But this follows since đŒ[đ ] †đŒ[đ/đ(đ)] = đŒ[đ]/(đđ) = đ/(đđ) = 1/đ.
532 introduction to theoretical computer science
The averaging principle. While the expectation of a random variableđ is hardly always the âtypical valueâ, we can show that đ is guar-anteed to achieve a value that is at least its expectation with positiveprobability. For example, if the average grade in an exam is 87 points,at least one student got a grade 87 or more on the exam. This is knownas the averaging principle, and despite its simplicity it is surprisinglyuseful.Lemma 18.10 Let đ be a random variable, then Pr[đ â„ đŒ[đ]] > 0.Proof. Suppose towards the sake of contradiction that Pr[đ < đŒ[đ]] =1. Then the random variable đ = đŒ[đ] â đ is always positive. Bylinearity of expectation đŒ[đ ] = đŒ[đ] â đŒ[đ] = 0. Yet by Markov, anon-negative random variable đ with đŒ[đ ] = 0 must equal 0 withprobability 1, since the probability that đ > đ â 0 = 0 is at most 1/đ forevery đ > 1. Hence we get a contradiction to the assumption that đ isalways positive.
18.3.1 Chebyshevâs InequalityMarkovâs inequality says that a (non-negative) random variable đcanât go too crazy and be, say, a million times its expectation, withsignificant probability. But ideally we would like to say that withhigh probability, đ should be very close to its expectation, e.g., in therange [0.99đ, 1.01đ] where đ = đŒ[đ]. In such a case we say that đ isconcentrated, and hence its expectation (i.e., mean) will be close to itsmedian and other ways of measuring đâs âtypical valueâ. Chebyshevâsinequality can be thought of as saying that đ is concentrated if it has asmall standard deviation.
A standard way to measure the deviation of a random variablefrom its expectation is by using its standard deviation. For a randomvariable đ, we define the variance of đ as Var[đ] = đŒ[(đ â đ)2]where đ = đŒ[đ]; i.e., the variance is the average squared distanceof đ from its expectation. The standard deviation of đ is defined asđ[đ] = âVar[đ]. (This is well-defined since the variance, being anaverage of a square, is always a non-negative number.)
Using Chebyshevâs inequality, we can control the probability thata random variable is too many standard deviations away from itsexpectation.
Theorem 18.11 â Chebyshevâs inequality. Suppose that đ = đŒ[đ] andđ2 = Var[đ]. Then for every đ > 0, Pr[|đ â đ| â„ đđ] †1/đ2.Proof. The proof follows from Markovâs inequality. We define therandom variable đ = (đ â đ)2. Then đŒ[đ ] = Var[đ] = đ2, and hence
probability theory 101 533
Figure 18.8: In the normal distribution or the Bell curve,the probability of deviating đ standard deviationsfrom the expectation shrinks exponentially in đ2, andspecifically with probability at least 1 â 2đâđ2/2, arandom variable đ of expectation đ and standarddeviation đ satisfies đâđđ †đ †đ+đđ. This figuregives more precise bounds for đ = 1, 2, 3, 4, 5, 6.(Image credit:Imran Baghirov)
by Markov the probability that đ > đ2đ2 is at most 1/đ2. But clearly(đ â đ)2 â„ đ2đ2 if and only if |đ â đ| â„ đđ.
One example of how to use Chebyshevâs inequality is the settingwhen đ = đ1 + ⯠+ đđ where đđâs are independent and identicallydistributed (i.i.d for short) variables with values in [0, 1] where eachhas expectation 1/2. Since đŒ[đ] = âđ đŒ[đđ] = đ/2, we would like tosay that đ is very likely to be in, say, the interval [0.499đ, 0.501đ]. Us-ing Markovâs inequality directly will not help us, since it will only tellus that đ is very likely to be at most 100đ (which we already knew,since it always lies between 0 and đ). However, since đ1, ⊠, đđ areindependent,
Var[đ1 + ⯠+ đđ] = Var[đ1] + ⯠+ Var[đđ] . (18.18)
(We leave showing this to the reader as Exercise 18.8.)For every random variable đđ in [0, 1], Var[đđ] †1 (if the variable
is always in [0, 1], it canât be more than 1 away from its expectation),and hence (18.18) implies that Var[đ] †đ and hence đ[đ] †âđ.For large đ, âđ âȘ 0.001đ, and in particular if âđ †0.001đ/đ, we canuse Chebyshevâs inequality to bound the probability that đ is not in[0.499đ, 0.501đ] by 1/đ2.18.3.2 The Chernoff boundChebyshevâs inequality already shows a connection between inde-pendence and concentration, but in many cases we can hope fora quantitatively much stronger result. If, as in the example above,đ = đ1 + ⊠+ đđ where the đđâs are bounded i.i.d random variablesof mean 1/2, then as đ grows, the distribution of đ would be roughlythe normal or Gaussian distributionâ that is, distributed according tothe bell curve (see Fig. 18.6 and Fig. 18.8). This distribution has theproperty of being very concentrated in the sense that the probability ofdeviating đ standard deviations from the mean is not merely 1/đ2 as isguaranteed by Chebyshev, but rather is roughly đâđ2 . Specifically, fora normal random variable đ of expectation đ and standard deviationđ, the probability that |đ â đ| â„ đđ is at most 2đâđ2/2. That is, we havean exponential decay of the probability of deviation.
The following extremely useful theorem shows that such expo-nential decay occurs every time we have a sum of independent andbounded variables. This theorem is known under many names in dif-ferent communities, though it is mostly called the Chernoff bound inthe computer science literature:
534 introduction to theoretical computer science
Theorem 18.12 â Chernoff/Hoeffding bound. If đ0, ⊠, đđâ1 are i.i.d ran-dom variables such that đđ â [0, 1] and đŒ[đđ] = đ for every đ, thenfor every đ > 0
Pr[âŁđâ1âđ=0 đđ â đđ⣠> đđ] †2 â đâ2đ2đ. (18.19)
We omit the proof, which appears in many texts, and uses Markovâsinequality on i.i.d random variables đ0, ⊠, đđ that are of the formđđ = đđđđ for some carefully chosen parameter đ. See Exercise 18.11for a proof of the simple (but highly useful and representative) casewhere each đđ is 0, 1 valued and đ = 1/2. (See also Exercise 18.12for a generalization.)
RRemark 18.13 â Slight simplification of Chernoff. Since đis roughly 2.7 (and in particular larger than 2),(18.19) would still be true if we replaced its righthandside with đâ2đ2đ+1. For đ > 1/đ2, the equation willstill be true if we replaced the righthand side withthe simpler đâđ2đ. Hence we will sometimes use theChernoff bound as stating that for đ0, ⊠, đđâ1 and đas above, đ > 1/đ2 then
Pr[âŁđâ1âđ=0 đđ â đđ⣠> đđ] †đâđ2đ. (18.20)
18.3.3 Application: Supervised learning and empirical risk minimizationHere is a nice application of the Chernoff bound. Consider the taskof supervised learning. You are given a set đ of đ samples of the form(đ„0, đŠ0), ⊠, (đ„đâ1, đŠđâ1) drawn from some unknown distribution đ·over pairs (đ„, đŠ). For simplicity we will assume that đ„đ â 0, 1đ andđŠđ â 0, 1. (We use here the concept of general distribution over thefinite set 0, 1đ+1 as discussed in Section 18.1.3.) The goal is to finda classifier â ⶠ0, 1đ â 0, 1 that will minimize the test error whichis the probability đż(â) that â(đ„) â đŠ where (đ„, đŠ) is drawn from thedistribution đ·. That is, đż(â) = Pr(đ„,đŠ)âŒđ·[â(đ„) â đŠ].
One way to find such a classifier is to consider a collection đ of po-tential classifiers and look at the classifier â in đ that does best on thetraining set đ. The classifier â is known as the empirical risk minimizer(see also Section 12.1.6) . The Chernoff bound can be used to showthat as long as the number đ of samples is sufficiently larger than thelogarithm of |đ|, the test error đż(â) will be close to its training error
probability theory 101 535
đ(â), which is defined as the fraction of pairs (đ„đ, đŠđ) â đ that it failsto classify. (Equivalently, đ(â) = 1đ âđâ[đ] |â(đ„đ) â đŠđ|.)
Theorem 18.14 â Generalization of ERM. Let đ· be any distribution overpairs (đ„, đŠ) â 0, 1đ+1 and đ be any set of functions mapping0, 1đ to 0, 1. Then for every đ, đż > 0, if đ > log |đ| log(1/đż)đ2 and đis a set of (đ„0, đŠ0), ⊠, (đ„đâ1, đŠđâ1) samples that are drawn indepen-dently from đ· then
Prđ [âââđ|đż(â) â đ(â)| †đ] > 1 â đż , (18.21)
where the probability is taken over the choice of the set of samplesđ.In particular if |đ| †2đ and đ > đ log(1/đż)đ2 then with probability at
least 1âđż, the classifier ââ â đ that minimizes that empirical test er-ror đ(đ¶) satisfies đż(ââ) †đ(ââ) + đ, and hence its test error is atmost đ worse than its training error.
Proof Idea:
The idea is to combine the Chernoff bound with the union bound.Let đ = log |đ|. We first use the Chernoff bound to show that forevery fixed â â đ, if we choose đ at random then the probability that|đż(â) â đ(â)| > đ will be smaller than đż2đ . We can then use the unionbound over all the 2đ members of đ to show that this will be the casefor every â.âProof of Theorem 18.14. Set đ = log |đ| and so đ > đ log(1/đż)/đ2. Westart by making the following claim
CLAIM: For every â â đ, the probability over đ that |đż(â) âđ(â)| â„ đ is smaller than đż/2đ.We prove the claim using the Chernoff bound. Specifically, for ev-
ery such â, let us defined a collection of random variables đ0, ⊠, đđâ1as follows:
đđ = â§âšâ©1 â(đ„đ) â đŠđ0 otherwise. (18.22)
Since the samples (đ„0, đŠ0), ⊠, (đ„đâ1, đŠđâ1) are drawn independentlyfrom the same distribution đ·, the random variables đ0, ⊠, đđâ1 areindependently and identically distributed. Moreover, for every đ,đŒ[đđ] = đż(â). Hence by the Chernoff bound (see (18.20)), the proba-bility that | âđđ=0 đđ âđâ đż(â)| â„ đđ is at most đâđ2đ < đâđ log(1/đż) < đż/2đ(using the fact that đ > 2). Since (â) = 1đ âđâ[đ] đđ, this completesthe proof of the claim.
536 introduction to theoretical computer science
Given the claim, the theorem follows from the union bound. In-deed, for every â â đ, define the âbad eventâ đ”â to be the event (overthe choice of đ) that |đż(â) â đ(â)| > đ. By the claim Pr[đ”â] < đż/2đ,and hence by the union bound the probability that the union of đ”â forall â â â happens is smaller than |đ|đż/2đ = đż. If for every â â đ, đ”âdoes not happen, it means that for every â â â, |đż(â) â đ(â)| †đ,and so the probability of the latter event is larger than 1 â đż which iswhat we wanted to prove.
Chapter Recap
âą A basic probabilistic experiment corresponds totossing đ coins or choosing đ„ uniformly at randomfrom 0, 1đ.
âą Random variables assign a real number to everyresult of a coin toss. The expectation of a randomvariable đ is its average value.
âą There are several concentration results, also knownas tail bounds showing that under certain condi-tions, random variables deviate significantly fromtheir expectation only with small probability.
18.4 EXERCISES
Exercise 18.1 Suppose that we toss three independent fair coins đ, đ, đ â0, 1. What is the probability that the XOR of đ,đ, and đ is equal to 1?What is the probability that the AND of these three values is equal to1? Are these two events independent?
Exercise 18.2 Give an example of random variables đ, đ ⶠ0, 13 â âsuch that đŒ[XY] â đŒ[đ] đŒ[đ ].
Exercise 18.3 Give an example of random variables đ, đ ⶠ0, 13 â âsuch that đ and đ are not independent but đŒ[XY] = đŒ[đ] đŒ[đ ].
Exercise 18.4 Let đ be an odd number, and let đ ⶠ0, 1đ â â be therandom variable defined as follows: for every đ„ â 0, 1đ, đ(đ„) = 1 ifâđ=0 đ„đ > đ/2 and đ(đ„) = 0 otherwise. Prove that đŒ[đ] = 1/2.
Exercise 18.5 â standard deviation. 1. Give an example for a randomvariable đ such that đâs standard deviation is equal to đŒ[|đ â đŒ[đ]|]
probability theory 101 537
1 While you donât need this to solve this exercise, thisis the function that maps đ to the entropy (as definedin Exercise 18.9) of the đ-biased coin distribution over0, 1, which is the function đ ⶠ0, 1 â [0, 1] s.y.đ(0) = 1 â đ and đ(1) = đ.2 Hint: Use Stirlingâs formula for approximating thefactorial function.
2. Give an example for a random variable đ such that đâs standarddeviation is not equal to đŒ[|đ â đŒ[đ]|]
Exercise 18.6 â Product of expectations. Prove Lemma 18.7
Exercise 18.7 â Transformations preserve independence. Prove Lemma 18.8
Exercise 18.8 â Variance of independent random variables. Prove that ifđ0, ⊠, đđâ1 are independent random variables then Var[đ0 + ⯠+đđâ1] = âđâ1đ=0 Var[đđ].
Exercise 18.9 â Entropy (challenge). Recall the definition of a distributionđ over some finite set đ. Shannon defined the entropy of a distributionđ, denoted by đ»(đ), to be âđ„âđ đ(đ„) log(1/đ(đ„)). The idea is that if đis a distribution of entropy đ, then encoding members of đ will requiređ bits, in an amortized sense. In this exercise we justify this definition.Let đ be such that đ»(đ) = đ.
1. Prove that for every one to one function đč ⶠđ â 0, 1â,đŒđ„âŒđ |đč (đ„)| â„ đ.2. Prove that for every đ, there is some đ and a one-to-one functionđč ⶠđđ â 0, 1â, such that đŒđ„âŒđđ |đč (đ„)| †đ(đ + đ), where đ„ ⌠đ
denotes the experiments of choosing đ„0, ⊠, đ„đâ1 each independentlyfrom đ using the distribution đ.
Exercise 18.10 â Entropy approximation to binomial. Let đ»(đ) = đ log(1/đ) +(1 â đ) log(1/(1 â đ)).1 Prove that for every đ â (0, 1) and đ > 0, if đ islarge enough then2 2(đ»(đ)âđ)đ( đđđ) †2(đ»(đ)+đ)đ (18.23)
where (đđ) is the binomial coefficient đ!đ!(đâđ)! which is equal to thenumber of đ-size subsets of 0, ⊠, đ â 1.
Exercise 18.11 â Chernoff using Stirling. 1. Prove that Prđ„âŒ0,1đ [â đ„đ =đ] = (đđ)2âđ.2. Use this and Exercise 18.10 to prove (an approximate version of)
the Chernoff bound for the case that đ0, ⊠, đđâ1 are i.i.d. randomvariables over 0, 1 each equaling 0 and 1 with probability 1/2.That is, prove that for every đ > 0, and đ0, ⊠, đđâ1 as above,Pr[| âđâ1đ=0 đđ â đ2 | > đđ] < 20.1â đ2đ.
538 introduction to theoretical computer science
3 Hint: Bound the number of tuples đ0, ⊠, đđâ1 suchthat every đđ is even and â đđ = đ using the Binomialcoefficient and the fact that in any such tuple there areat most đ/2 distinct indices.4 Hint: Set đ = 2âđ2đ/1000â and then show that if theevent | â đđ| â„ đđ happens then the random variable(â đđ)đ is a factor of đâđ larger than its expectation.
Exercise 18.12 â Poor manâs Chernoff. Exercise 18.11 establishes the Cher-noff bound for the case that đ0, ⊠, đđâ1 are i.i.d variables over 0, 1with expectation 1/2. In this exercise we use a slightly differentmethod (bounding the moments of the random variables) to estab-lish a version of Chernoff where the random variables range over [0, 1]and their expectation is some number đ â [0, 1] that may be differentthan 1/2. Let đ0, ⊠, đđâ1 be i.i.d random variables with đŒ đđ = đ andPr[0 †đđ †1] = 1. Define đđ = đđ â đ.1. Prove that for every đ0, ⊠, đđâ1 â â, if there exists one đ such that đđ
is odd then đŒ[âđâ1đ=0 đ đđđ ] = 0.2. Prove that for every đ, đŒ[(âđâ1đ=0 đđ)đ] †(10đđ)đ/2.33. Prove that for every đ > 0, Pr[| âđ đđ| â„ đđ] â„ 2âđ2đ/(10000 log1/đ).4
Exercise 18.13 â Sampling. Suppose that a country has 300,000,000 citi-zens, 52 percent of which prefer the color âgreenâ and 48 percent ofwhich prefer the color âorangeâ. Suppose we sample đ random citi-zens and ask them their favorite color (assume they will answer truth-fully). What is the smallest value đ among the following choices sothat the probability that the majority of the sample answers âgreenâ isat most 0.05?a. 1,000
b. 10,000
c. 100,000
d. 1,000,000
Exercise 18.14 Would the answer to Exercise 18.13 change if the countryhad 300,000,000,000 citizens?
Exercise 18.15 â Sampling (2). Under the same assumptions as Exer-cise 18.13, what is the smallest value đ among the following choices sothat the probability that the majority of the sample answers âgreenâ isat most 2â100?a. 1,000
b. 10,000
c. 100,000
probability theory 101 539
d. 1,000,000
e. It is impossible to get such low probability since there are fewerthan 2100 citizens.
18.5 BIBLIOGRAPHICAL NOTES
There are many sources for more information on discrete probability,including the texts referenced in Section 1.9. One particularly recom-mended source for probability is Harvardâs STAT 110 class, whoselectures are available on youtube and whose book is available online.
The version of the Chernoff bound that we stated in Theorem 18.12is sometimes known as Hoeffdingâs Inequality. Other variants of theChernoff bound are known as well, but all of them are equally goodfor the applications of this book.
Figure 19.1: A 1947 entry in the log book of the Har-vard MARK II computer containing an actual bug thatcaused a hardware malfunction. By Courtesy of theNaval Surface Warfare Center.
1 Some texts also talk about âLas Vegas algorithmsâthat always return the right answer but whose run-ning time is only polynomial on the average. Sincethis Monte Carlo vs Las Vegas terminology is confus-ing, we will not use these terms anymore, and simplytalk about randomized algorithms.
19Probabilistic computation
âin 1946 .. (I asked myself) what are the chances that a Canfield solitaire laidout with 52 cards will come out successfully? After spending a lot of timetrying to estimate them by pure combinatorial calculations, I wondered whethera more practical method ⊠might not be to lay it our say one hundred times andsimple observe and countâ, Stanislaw Ulam, 1983
âThe salient features of our method are that it is probabilistic ⊠and with acontrollable miniscule probability of error.â, Michael Rabin, 1977
In early computer systems, much effort was taken to drive outrandomness and noise. Hardware components were prone to non-deterministic behavior from a number of causes, whether it was vac-uum tubes overheating or actual physical bugs causing short circuits(see Fig. 19.1). This motivated John von Neumann, one of the earlycomputing pioneers, to write a paper on how to error correct computa-tion, introducing the notion of redundancy.
So it is quite surprising that randomness turned out not just a hin-drance but also a resource for computation, enabling us to achievetasks much more efficiently than previously known. One of the firstapplications involved the very same John von Neumann. While hewas sick in bed and playing cards, Stan Ulam came up with the ob-servation that calculating statistics of a system could be done muchfaster by running several randomized simulations. He mentioned thisidea to von Neumann, who became very excited about it; indeed, itturned out to be crucial for the neutron transport calculations thatwere needed for development of the Atom bomb and later on the hy-drogen bomb. Because this project was highly classified, Ulam, vonNeumann and their collaborators came up with the codeword âMonteCarloâ for this approach (based on the famous casinos where Ulamâsuncle gambled). The name stuck, and randomized algorithms areknown as Monte Carlo algorithms to this day.1
In this chapter, we will see some examples of randomized algo-rithms that use randomness to compute a quantity in a faster or
Compiled on 8.26.2020 18:10
Learning Objectives:âą See examples of randomized algorithmsâą Get more comfort with analyzing
probabilistic processes and tail boundsâą Success amplification using tail bounds
542 introduction to theoretical computer science
simpler way than was known otherwise. We will describe the algo-rithms in an informal / âpseudo-codeâ way, rather than as NAND-TM, NAND-RAM programs or Turing macines. In Chapter 20 we willdiscuss how to augment the computational models we say before toincorporate the ability to âtoss coinsâ.
19.1 FINDING APPROXIMATELY GOOD MAXIMUM CUTS
We start with the following example. Recall the maximum cut problemof finding, given a graph đș = (đ , đž), the cut that maximizes the num-ber of edges. This problem is NP-hard, which means that we do notknow of any efficient algorithm that can solve it, but randomizationenables a simple algorithm that can cut at least half of the edges:
Theorem 19.1 â Approximating max cut. There is an efficient probabilis-tic algorithm that on input an đ-vertex đ-edge graph đș, outputs acut (đ, đ) that cuts at least đ/2 of the edges of đș in expectation.
Proof Idea:
We simply choose a random cut: we choose a subset đ of vertices bychoosing every vertex đŁ to be a member of đ with probability 1/2 in-dependently. Itâs not hard to see that each edge is cut with probability1/2 and so the expected number of cut edges is đ/2.âProof of Theorem 19.1. The algorithm is extremely simple:
Algorithm Random Cut:Input: Graph đș = (đ , đž) with đ vertices and đ edges. Denote đ =đŁ0, đŁ1, ⊠, đŁđâ1.Operation:
1. Pick đ„ uniformly at random in 0, 1đ.2. Let đ â đ be the set đŁđ ⶠđ„đ = 1 , đ â [đ] that includes all vertices
corresponding to coordinates of đ„ where đ„đ = 1.3. Output the cut (đ, đ).We claim that the expected number of edges cut by the algorithm isđ/2. Indeed, for every edge đ â đž, let đđ be the random variable such
that đđ(đ„) = 1 if the edge đ is cut by đ„, and đđ(đ„) = 0 otherwise. Forevery such edge đ = đ, đ, đđ(đ„) = 1 if and only if đ„đ â đ„đ. Since thepair (đ„đ, đ„đ) obtains each of the values 00, 01, 10, 11 with probability1/4, the probability that đ„đ â đ„đ is 1/2 and hence đŒ[đđ] = 1/2. If we let
probabilist ic computation 543
đ be the random variable corresponding to the total number of edgescut by đ, then đ = âđâđž đđ and hence by linearity of expectationđŒ[đ] = âđâđž đŒ[đđ] = đ(1/2) = đ/2 . (19.1)
Randomized algorithms work in the worst case. It is tempting of a random-ized algorithm such as the one of Theorem 19.1 as an algorithm thatworks for a ârandom input graphâ but it is actually much better thanthat. The expectation in this theorem is not taken over the choice ofthe graph, but rather only over the random choices of the algorithm. Inparticular, for every graph đș, the algorithm is guaranteed to cut half ofthe edges of the input graph in expectation. That is,
Big Idea 24 A randomized algorithm outputs the correct valuewith good probability on every possible input.
We will define more formally what âgood probabilityâ means inChapter 20 but the crucial point is that this probability is always onlytaken over the random choices of the algorithm, while the input is notchosen at random.
19.1.1 Amplifying the success of randomized algorithmsTheorem 19.1 gives us an algorithm that cuts đ/2 edges in expectation.But, as we saw before, expectation does not immediately imply con-centration, and so a priori, it may be the case that when we run thealgorithm, most of the time we donât get a cut matching the expecta-tion. Luckily, we can amplify the probability of success by repeatingthe process several times and outputting the best cut we find. Westart by arguing that the probability the algorithm above succeeds incutting at least đ/2 edges is not too tiny.Lemma 19.2 The probability that a random cut in an đ edge graph cutsat least đ/2 edges is at least 1/(2đ).Proof Idea:
To see the idea behind the proof, think of the case that đ = 1000. Inthis case one can show that we will cut at least 500 edges with proba-bility at least 0.001 (and so in particular larger than 1/(2đ) = 1/2000).Specifically, if we assume otherwise, then this means that with proba-bility more than 0.999 the algorithm cuts 499 or fewer edges. But sincewe can never cut more than the total of 1000 edges, given this assump-tion, the highest value the expected number of edges cut is if we cutexactly 499 edges with probability 0.999 and cut 1000 edges with prob-ability 0.001. Yet even in this case the expected number of edges will
544 introduction to theoretical computer science
be 0.999 â 499 + 0.001 â 1000 < 500, which contradicts the fact that weâvecalculated the expectation to be at least 500 in Theorem 19.1.âProof of Lemma 19.2. Let đ be the probability that we cut at least đ/2edges and suppose, towards a contradiction, that đ < 1/(2đ). Sincethe number of edges cut is an integer, and đ/2 is a multiple of 0.5,by definition of đ, with probability 1 â đ we cut at most đ/2 â 0.5edges. Moreover, since we can never cut more than đ edges, underour assumption that đ < 1/(2đ), we can bound the expected numberof edges cut byđđ + (1 â đ)(đ/2 â 0.5) †đđ + đ/2 â 0.5 (19.2)But if đ < 1/(2đ) then đđ < 0.5 and so the righthand side is smallerthan đ/2, which contradicts the fact that (as proven in Theorem 19.1)the expected number of edges cut is at least đ/2.
19.1.2 Success amplificationLemma 19.2 shows that our algorithm succeeds at least some of thetime, but weâd like to succeed almost all of the time. The approachto do that is to simply repeat our algorithm many times, with freshrandomness each time, and output the best cut we get in one of theserepetitions. It turns out that with extremely high probability we willget a cut of size at least đ/2. For example, if we repeat this experiment2000đ times, then (using the inequality (1 â 1/đ)đ †1/đ †1/2) wecan show that the probability that we will never cut at least đ/2 edgesis at most (1 â 1/(2đ))2000đ †2â1000 . (19.3)
More generally, the same calculations can be used to show thefollowing lemma:Lemma 19.3 There is an algorithm that on input a graph đș = (đ , đž) anda number đ, runs in time polynomial in |đ | and đ and outputs a cut(đ, đ) such that
Pr[number of edges cut by (đ, đ) â„ |đž|/2] â„ 1 â 2âđ . (19.4)
Proof of Lemma 19.3. The algorithm will work as follows:
Algorithm AMPLIFY RANDOM CUT:Input: Graph đș = (đ , đž) with đ vertices and đ edges. Denote đ =đŁ0, đŁ1, ⊠, đŁđâ1. Number đ > 0.Operation:
probabilist ic computation 545
1. Repeat the following 200đđ times:
a. Pick đ„ uniformly at random in 0, 1đ.b. Let đ â đ be the set đŁđ ⶠđ„đ = 1 , đ â [đ] that includes all
vertices corresponding to coordinates of đ„ where đ„đ = 1.c. If (đ, đ) cuts at least đ/2 then halt and output (đ, đ).
2. Output âfailedâ
We leave completing the analysis as an exercise to the reader (seeExercise 19.1).
19.1.3 Two-sided amplificationThe analysis above relied on the fact that the maximum cut has onesided error. By this we mean that if we get a cut of size at least đ/2then we know we have succeeded. This is common for randomizedalgorithms, but is not the only case. In particular, consider the task ofcomputing some Boolean function đč ⶠ0, 1â â 0, 1. A randomizedalgorithm đŽ for computing đč , given input đ„, might toss coins and suc-ceed in outputting đč(đ„) with probability, say, 0.9. We say that đŽ hastwo sided errors if there is positive probability that đŽ(đ„) outputs 1 whenđč(đ„) = 0, and positive probability that đŽ(đ„) outputs 0 when đč(đ„) = 1.In such a case, to simplify đŽâs success, we cannot simply repeat it đtimes and output 1 if a single one of those repetitions resulted in 1,nor can we output 0 if a single one of the repetitions resulted in 0. Butwe can output the majority value of these repetitions. By the Chernoffbound (Theorem 18.12), with probability exponentially close to 1 (i.e.,1 â 2Ω(đ)), the fraction of the repetitions where đŽ will output đč(đ„)will be at least, say 0.89, and in such cases we will of course output thecorrect answer.
The above translates into the following theorem
Theorem 19.4 â Two-sided amplification. If đč ⶠ0, 1â â 0, 1 is a func-tion such that there is a polynomial-time algorithm đŽ satisfying
Pr[đŽ(đ„) = đč(đ„)] â„ 0.51 (19.5)
for every đ„ â 0, 1â, then there is a polynomial time algorithm đ”satisfying
Pr[đ”(đ„) = đč(đ„)] â„ 1 â 2â|đ„| (19.6)for every đ„ â 0, 1â.We omit the proof of Theorem 19.4, since we will prove a more
general result later on in Theorem 20.5.
546 introduction to theoretical computer science
2 This question does have some significance to prac-tice, since hardware that generates high qualityrandomness at speed is nontrivial to construct.
19.1.4 What does this mean?We have shown a probabilistic algorithm that on any đ edge graphđș, will output a cut of at least đ/2 edges with probability at least1 â 2â1000. Does it mean that we can consider this problem as âeasyâ?Should we be somewhat wary of using a probabilistic algorithm, sinceit can sometimes fail?
First of all, it is important to emphasize that this is still a worst caseguarantee. That is, we are not assuming anything about the inputgraph: the probability is only due to the internal randomness of the al-gorithm. While a probabilistic algorithm might not seem as nice as adeterministic algorithm that is guaranteed to give an output, to get asense of what a failure probability of 2â1000 means, note that:
âą The chance of winning the Massachusetts Mega Million lottery isone over (75)5 â 15 which is roughly 2â35. So 2â1000 corresponds towinning the lottery about 300 times in a row, at which point youmight not care so much about your algorithm failing.
âą The chance for a U.S. resident to be struck by lightning is about1/700000, which corresponds to about 2â45 chance that youâll bestruck by lightning the very second that youâre reading this sen-tence (after which again you might not care so much about thealgorithmâs performance).
âą Since the earth is about 5 billion years old, we can estimate thechance that an asteroid of the magnitude that caused the dinosaursâextinction will hit us this very second to be about 2â60. It is quitelikely that even a deterministic algorithm will fail if this happens.
So, in practical terms, a probabilistic algorithm is just as good asa deterministic one. But it is still a theoretically fascinating ques-tion whether randomized algorithms actually yield more power, orwhether is it the case that for any computational problem that can besolved by probabilistic algorithm, there is a deterministic algorithmwith nearly the same performance.2 For example, we will see in Ex-ercise 19.2 that there is in fact a deterministic algorithm that can cutat least đ/2 edges in an đ-edge graph. We will discuss this questionin generality in Chapter 20. For now, let us see a couple of exampleswhere randomization leads to algorithms that are better in some sensethan the known deterministic algorithms.
19.1.5 Solving SAT through randomizationThe 3SAT problem is NP hard, and so it is unlikely that it has a poly-nomial (or even subexponential) time algorithm. But this does notmean that we canât do at least somewhat better than the trivial 2đ al-gorithm for đ-variable 3SAT. The best known worst-case algorithms
probabilist ic computation 547
3 At the time of this writing, the best known ran-domized algorithms for 3SAT run in time roughlyđ(1.308đ), and the best known deterministic algo-rithms run in time đ(1.3303đ) in the worst case.
for 3SAT are randomized, and are related to the following simplealgorithm, variants of which are also used in practice:
Algorithm WalkSAT:Input: An đ variable 3CNF formula đ.Parameters: đ , đ â âOperation:
1. Repeat the following đ steps:a. Choose a random assignment đ„ â 0, 1đ and repeat the following
for đ steps:1. If đ„ satisfies đ then output đ„.2. Otherwise, choose a random clause (âđ âš âđ âš âđ) that đ„ does
not satisfy, choose a random literal in âđ, âđ, âđ and modify đ„ tosatisfy this literal.
2. If all the đ â đ repetitions above did not result in a satisfying assign-ment then output Unsatisfiable
The running time of this algorithm is đ â đ â đđđđŠ(đ), and so thekey question is how small we can make đ and đ so that the proba-bility that WalkSAT outputs Unsatisfiable on a satisfiable formulađ is small. It is known that we can do so with ST = ((4/3)đ) =(1.333 âŠđ) (see Exercise 19.4 for a weaker result), but weâll showbelow a simpler analysis yielding ST = (â3đ) = (1.74đ), which isstill much better than the trivial 2đ bound.3
Theorem 19.5 â WalkSAT simple analysis. If we set đ = 100 â â3đ andđ = đ/2, then the probability we output Unsatisfiable for asatisfiable đ is at most 1/2.
Proof. Suppose that đ is a satisfiable formula and let đ„â be a satisfyingassignment for it. For every đ„ â 0, 1đ, denote by Î(đ„, đ„â) the num-ber of coordinates that differ between đ„ and đ„â. The heart of the proofis the following claim:
Claim I: For every đ„, đ„â as above, in every local improvement step,the value Î(đ„, đ„â) is decreased by one with probability at least 1/3.
Proof of Claim I: Since đ„â is a satisfying assignment, if đ¶ is a clausethat đ„ does not satisfy, then at least one of the variables involve in đ¶must get different values in đ„ and đ„â. Thus when we change đ„ by oneof the three literals in the clause, we have probability at least 1/3 ofdecreasing the distance.
The second claim is that our starting point is not that bad:Claim 2: With probability at least 1/2 over a random đ„ â 0, 1đ,Î(đ„, đ„â) †đ/2.Proof of Claim II: Consider the map FLIP ⶠ0, 1đ â 0, 1đ that
simply âflipsâ all the bits of its input from 0 to 1 and vice versa. That
548 introduction to theoretical computer science
Figure 19.2: For every đ„â â 0, 1đ, we can sort allstrings in 0, 1đ according to their distance fromđ„â (top to bottom in the above figure), where we letđŽ = đ„ â 0, 1đ | đđđ đĄ(đ„, đ„â †đ/2 be the âtophalfâ of strings. If we define FLIP ⶠ0, 1đ â 0, 1to be the map that âflipsâ the bits of a given string đ„then it maps every đ„ â đŽ to an output FLIP(đ„) â đŽin a one-to-one way, and so it demonstrates that|đŽ| †|đŽ| which implies that Pr[đŽ] â„ Pr[đŽ] and hencePr[đŽ] â„ 1/2.
Figure 19.3: The bipartite matching problem in thegraph đș = (đżâȘđ , đž) can be reduced to the minimumđ , đĄ cut problem in the graph đșâČ obtained by addingvertices đ , đĄ to đș, connecting đ with đż and connectingđĄ with đ .
is, FLIP(đ„0, ⊠, đ„đâ1) = (1âđ„0, ⊠, 1âđ„đâ1). Clearly FLIP is one to one.Moreover, if đ„ is of distance đ to đ„â, then FLIP(đ„) is distance đ â đ tođ„â. Now let đ” be the âbad eventâ in which đ„ is of distance > đ/2 fromđ„â. Then the set đŽ = FLIP(đ”) = FLIP(đ„) ⶠđ„ â 0, 1đ satisfies|đŽ| = |đ”| and that if đ„ â đŽ then đ„ is of distance < đ/2 from đ„â. SinceđŽ and đ” are disjoint events, Pr[đŽ] + Pr[đ”] †1. Since they have thesame cardinality, they have the same probability and so we get that2Pr[đ”] †1 or Pr[đ”] †1/2. (See also Fig. 19.2).
Claims I and II imply that each of the đ iterations of the outer loopsucceeds with probability at least 1/2 â â3âđ. Indeed, by Claim II,the original guess đ„ will satisfy Î(đ„, đ„â) †đ/2 with probabilityPr[Î(đ„, đ„â) †đ/2] â„ 1/2. By Claim I, even conditioned on all thehistory so far, for each of the đ = đ/2 steps of the inner loop we haveprobability at least â„ 1/3 of being âluckyâ and decreasing the distance(i.e. the output of Î) by one. The chance we will be lucky in all đ/2steps is hence at least (1/3)đ/2 = â3âđ.
Since any single iteration of the outer loop succeeds with probabil-ity at least 12 â â3âđ, the probability that we never do so in đ = 100â3đrepetitions is at most (1 â 12â3đ )100â â3đ †(1/đ)50.
19.1.6 Bipartite matchingThe matching problem is one of the canonical optimization problems,arising in all kinds of applications: matching residents with hospitals,kidney donors with patients, flights with crews, and many others.One prototypical variant is bipartite perfect matching. In this problem,we are given a bipartite graph đș = (đż âȘ đ , đž) which has 2đ verticespartitioned into đ-sized sets đż and đ , where all edges have one end-point in đż and the other in đ . The goal is to determine whether thereis a perfect matching, a subset đ â đž of đ disjoint edges. That is, đmatches every vertex in đż to a unique vertex in đ .
The bipartite matching problem turns out to have a polynomial-time algorithm, since we can reduce finding a matching in đș to find-ing a maximum flow (or equivalently, minimum cut) in a relatedgraph đșâČ (see Fig. 19.3). However, we will see a different probabilisticalgorithm to determine whether a graph contains such a matching.
Let us label đșâs vertices as đż = â0, ⊠, âđâ1 and đ = đ0, ⊠, đđâ1.A matching đ corresponds to a permutation đ â đđ (i.e., one-to-oneand onto function đ ⶠ[đ] â [đ]) where for every đ â [đ], we define đ(đ)to be the unique đ such that đ contains the edge âđ, đđ. Define anđ Ă đ matrix đŽ = đŽ(đș) where đŽđ,đ = 1 if and only if the edge âđ, đđis present and đŽđ,đ = 0 otherwise. The correspondence betweenmatchings and permutations implies the following claim:
probabilist ic computation 549
4 The sign of a permutation đ ⶠ[đ] â [đ], denoted byđ đđđ(đ), can be defined in several equivalent ways,one of which is that đ đđđ(đ) = (â1)đŒđđ (đ) whereINV(đđ) = |(đ„, đŠ) â [đ] | đ„ < đŠ ⧠đ(đ„) > đ(đŠ) (i.e.,INV(đ) is the number of pairs of elements that areinverted by đ). The importance of the term đ đđđ(đ) isthat it makes đ equal to the determinant of the matrix(đ„đ,đ) and hence efficiently computable.
Figure 19.4: A degree đ curve in one variable canhave at most đ roots. In higher dimensions, a đ-variate degree-đ polynomial can have an infinitenumber roots though the set of roots will be an đ â 1dimensional surface. Over a finite field đœ, an đ-variatedegree đ polynomial has at most đ|đœ|đâ1 roots.
Lemma 19.6 â Matching polynomial. Define đ = đ(đș) to be the polynomialmapping âđ2 to â wheređ(đ„0,0, ⊠, đ„đâ1,đâ1) = âđâđđ (đâ1âđ=0 đ đđđ(đ)đŽđ,đ(đ)) đâ1âđ=0 đ„đ,đ(đ) (19.7)
Then đș has a perfect matching if and only if đ is not identically zero.That is, đș has a perfect matching if and only if there exists some as-signment đ„ = (đ„đ,đ)đ,đâ[đ] â âđ2 such that đ(đ„) â 0.4Proof. If đș has a perfect matching đâ, then let đâ be the permutationcorresponding to đ and let đ„â â â€đ2 defined as follows: đ„đ,đ = 1 ifđ = đ(đ) and đ„đ,đ = 0. Note that for every đ â đâ, âđâ1đ=0 đ„đ,đ(đ) = 0 butâđâ1đ=0 đ„âđ,đâ(đ) = 1. Hence đ(đ„â) will equal âđâ1đ=0 đŽđ,đâ(đ). But since đâ isa perfect matching in đș, âđâ1đ=0 đŽđ,đâ(đ) = 1.
On the other hand, suppose that đ is not identically zero. By (19.7),this means that at least one of the terms âđâ1đ=0 đŽđ,đ(đ) is not equal tozero. But then this permutation đ must be a perfect matching in đș.
As weâve seen before, for every đ„ â âđ2 , we can compute đ(đ„)by simply computing the determinant of the matrix đŽ(đ„), which isobtained by replacing đŽđ,đ with đŽđ,đđ„đ,đ. This reduces testing perfectmatching to the zero testing problem for polynomials: given some poly-nomial đ(â ), test whether đ is identically zero or not. The intuitionbehind our randomized algorithm for zero testing is the following:
If a polynomial is not identically zero, then it canât have âtoo manyâ roots.
This intuition sort of makes sense. For one variable polynomi-als, we know that a nonzero linear function has at most one root, aquadratic function (e.g., a parabola) has at most two roots, and gener-ally a degree đ equation has at most đ roots. While in more than onevariable there can be an infinite number of roots (e.g., the polynomialđ„0 + đŠ0 vanishes on the line đŠ = âđ„) it is still the case that the set ofroots is very âsmallâ compared to the set of all inputs. For example,the root of a bivariate polynomial form a curve, the roots of a three-variable polynomial form a surface, and more generally the roots of anđ-variable polynomial are a space of dimension đ â 1.
This intuition leads to the following simple randomized algorithm:
To decide if đ is identically zero, choose a ârandomâ input đ„ and check ifđ(đ„) â 0.This makes sense: if there are only âfewâ roots, then we expect that
with high probability the random input đ„ is not going to be one ofthose roots. However, to transform this into an actual algorithm, we
550 introduction to theoretical computer science
need to make both the intuition and the notion of a ârandomâ inputprecise. Choosing a random real number is quite problematic, espe-cially when you have only a finite number of coins at your disposal,and so we start by reducing the task to a finite setting. We will use thefollowing result:
Theorem 19.7 â SchwartzâZippel lemma. For every integer đ, and poly-nomial đ ⶠâđ â â with integer coefficients. If đ has degree atmost đ and is not identically zero, then it has at most đđđâ1 roots inthe set [đ]đ = (đ„0, ⊠, đ„đâ1) ⶠđ„đ â 0, ⊠, đ â 1.We omit the (not too complicated) proof of Theorem 19.7. We
remark that it holds not just over the real numbers but over any fieldas well. Since the matching polynomial đ of Lemma 19.6 has degree atmost đ, Theorem 19.7 leads directly to a simple algorithm for testing ifit is nonzero:
Algorithm Perfect-Matching:Input: Bipartite graph đș on 2đ vertices â0, ⊠, âđâ1, đ0, ⊠, đđâ1.Operation:
1. For every đ, đ â [đ], choose đ„đ,đ independently at random from[2đ] = 0, ⊠2đ â 1.2. Compute the determinant of the matrix đŽ(đ„) whose (đ, đ)đĄâ entry
equals đ„đ,đ if the edge âđ, đđ is present and 0 otherwise.3. Output no perfect matching if this determinant is zero, and out-
put perfect matching otherwise.
This algorithm can be improved further (e.g., see Exercise 19.5).While it is not necessarily faster than the cut-based algorithms for per-fect matching, it does have some advantages. In particular, it is moreamenable for parallelization. (However, it also has the significant dis-advantage that it does not produce a matching but only states that oneexists.) The SchwartzâZippel Lemma, and the associated zero testingalgorithm for polynomials, is widely used across computer science,including in several settings where we have no known deterministicalgorithm matching their performance.
Chapter Recap
âą Using concentration results, we can amplify inpolynomial time the success probability of a prob-abilistic algorithm from a mere 1/đ(đ) to 1 â 2âđ(đ)for every polynomials đ and đ.
âą There are several randomized algorithms that arebetter in various senses (e.g., simpler, faster, orother advantages) than the best known determinis-tic algorithm for the same problem.
probabilist ic computation 551
5 TODO: add exercise to give a deterministic max cutalgorithm that gives đ/2 edges. Talk about greedyapproach.
6 Hint: Think of đ„ â 0, 1đ as choosing đ numbersđŠ1, ⊠, đŠđ â 0, ⊠, 2âlogđâ â 1. Output the first suchnumber that is in 0, ⊠, đ â 1.
19.2 EXERCISES
Exercise 19.1 â Amplification for max cut. Prove Lemma 19.3
Exercise 19.2 â Deterministic max cut algorithm. 5
Exercise 19.3 â Simulating distributions using coins. Our model for proba-bility involves tossing đ coins, but sometimes algorithm require sam-pling from other distributions, such as selecting a uniform number in0, ⊠, đ â 1 for some đ . Fortunately, we can simulate this with anexponentially small probability of error: prove that for every đ , if đ >đâlogđâ, then there is a function đč ⶠ0, 1đ â 0, ⊠, đ â 1 âȘ â„such that (1) The probability that đč(đ„) = â„ is at most 2âđ and (2) thedistribution of đč(đ„) conditioned on đč(đ„) â â„ is equal to the uniformdistribution over 0, ⊠, đ â 1.6
Exercise 19.4 â Better walksat analysis. 1. Prove that for every đ > 0, ifđ is large enough then for every đ„â â 0, 1đ Prđ„âŒ0,1đ [Î(đ„, đ„â) â€đ/3] †2â(1âđ»(1/3)âđ)đ where đ»(đ) = đ log(1/đ)+(1âđ) log(1/(1âđ))is the same function as in Exercise 18.10.
2. Prove that 21âđ»(1/4)+(1/4) log3 = (3/2).3. Use the above to prove that for every đż > 0 and large enough đ, if
we set đ = 1000 â (3/2 + đż)đ and đ = đ/4 in the WalkSAT algorithmthen for every satisfiable 3CNF đ, the probability that we outputunsatisfiable is at most 1/2.
Exercise 19.5 â Faster bipartite matching (challenge). (to be completed: im-prove the matching algorithm by working modulo a prime)
19.3 BIBLIOGRAPHICAL NOTES
The books of Motwani and Raghavan [MR95] and Mitzenmacher andUpfal [MU17] are two excellent resources for randomized algorithms.Some of the history of the discovery of Monte Carlo algorithm is cov-ered here.
19.4 ACKNOWLEDGEMENTS
Figure 20.1: A mechanical coin tosser built for PercyDiaconis by Harvard technicians Steve Sansone andRick Haggerty
20Modeling randomized computation
âAny one who considers arithmetical methods of producing random digits is, ofcourse, in a state of sin.â John von Neumann, 1951.
So far we have described randomized algorithms in an informalway, assuming that an operation such as âpick a string đ„ â 0, 1đâcan be done efficiently. We have neglected to address two questions:
1. How do we actually efficiently obtain random strings in the physi-cal world?
2. What is the mathematical model for randomized computations,and is it more powerful than deterministic computation?
The first question is of both practical and theoretical importance,but for now letâs just say that there are various physical sources ofârandomâ or âunpredictableâ data. A userâs mouse movements andtyping pattern, (non solid state) hard drive and network latency,thermal noise, and radioactive decay have all been used as sources forrandomness (see discussion in Section 20.8). For example, many Intelchips come with a random number generator built in. One can evenbuild mechanical coin tossing machines (see Fig. 20.1).
In this chapter we focus on the second question: formally modelingprobabilistic computation and studying its power. We will show that:
1. We can define the class BPP that captures all Boolean functions thatcan be computed in polynomial time by a randomized algorithm.Crucially BPP is still very much a worst case class of computation:the probability is only over the choice of the random coins of thealgorithm, as opposed to the choice of the input.
2. We can amplify the success probability of randomized algorithms,and as a result the class BPP would be identical if we changedthe required success probability to any number đ that lies strictlybetween 1/2 and 1 (and in fact any number in the range 1/2+1/đ(đ)to 1 â 2âđ(đ) for any polynomial đ(đ)).
Compiled on 8.26.2020 18:10
Learning Objectives:âą Formal definition of probabilistic polynomial
time: the class BPP.âą Proof that every function in BPP can be
computed by đđđđŠ(đ)-sized NAND-CIRCprograms/circuits.
âą Relations between BPP and NP.âą Pseudorandom generators
554 introduction to theoretical computer science
3. Though, as is the case for P and NP, there is much we do not knowabout the class BPP, we can establish some relations between BPPand the other complexity classes we saw before. In particular wewill show that P â BPP â EXP and BPP â P/poly.
4. While the relation between BPP and NP is not known, we can showthat if P = NP then BPP = P.
5. We also show that the concept of NP completeness applies equallywell if we use randomized algorithms as our model of âefficientcomputationâ. That is, if a single NP complete problem has a ran-domized polynomial-time algorithm, then all of NP can be com-puted in polynomial-time by randomized algorithms.
6. Finally we will discuss the question of whether BPP = P and showsome of the intriguing evidence that the answer might actually beâYesâ using the concept of pseudorandom generators.
20.1 MODELING RANDOMIZED COMPUTATION
Modeling randomized computation is actually quite easy. We canadd the following operations to any programming language such asNAND-TM, NAND-RAM, NAND-CIRC etc..:foo = RAND()
where foo is a variable. The result of applying this operation is thatfoo is assigned a random bit in 0, 1. (Every time the RAND operationis invoked it returns a fresh independent random bit.) We call theprogramming languages that are augmented with this extra operationRNAND-TM, RNAND-RAM, and RNAND-CIRC respectively.
Similarly, we can easily define randomized Turing machines asTuring machines in which the transition function đż gets as an extrainput (in addition to the current state and symbol read from the tape)a bit đ that in each step is chosen at random â0, 1. Of course thefunction can ignore this bit (and have the same output regardless ofwhether đ = 0 or đ = 1) and hence randomized Turing machinesgeneralize deterministic Turing machines.
We can use the RAND() operation to define the notion of a functionbeing computed by a randomized đ (đ) time algorithm for every nicetime bound đ ⶠâ â â, as well as the notion of a finite function beingcomputed by a size đ randomized NAND-CIRC program (or, equiv-alently, a randomized circuit with đ gates that correspond to eitherthe NAND or coin-tossing operations). However, for simplicity wewill not define randomized computation in full generality, but simplyfocus on the class of functions that are computable by randomizedalgorithms running in polynomial time, which by historical conventionis known as BPP:
modeling randomized computation 555
Definition 20.1 â The class BPP. Let đč ⶠ0, 1â â 0, 1. We say thatđč â BPP if there exist constants đ, đ â â and an RNAND-TMprogram đ such that for every đ„ â 0, 1â, on input đ„, the programđ halts within at most đ|đ„|đ steps and
Pr[đ (đ„) = đč(đ„)] â„ 23 (20.1)
where this probability is taken over the result of the RAND opera-tions of đ .
Note that the probability in (20.1) is taken only over the ran-dom choices in the execution of đ and not over the choice of the in-put đ„. In particular, as discussed in Big Idea 24, BPP is still a worstcase complexity class, in the sense that if đč is in BPP then there is apolynomial-time randomized algorithm that computes đč with proba-bility at least 2/3 on every possible (and not just random) input.
The same polynomial-overhead simulation of NAND-RAM pro-grams by NAND-TM programs we saw in Theorem 13.5 extends torandomized programs as well. Hence the class BPP is the same re-gardless of whether it is defined via RNAND-TM or RNAND-RAMprograms. Similarly, we could have just as well defined BPP usingrandomized Turing machines.
Because of these equivalences, below we will use the name âpoly-nomial time randomized algorithmâ to denote a computation that can bemodeled by a polynomial-time RNAND-TM program, RNAND-RAMprogram, or a randomized Turing machine (or any programming lan-guage that includes a coin tossing operation). Since all these modelsare equivalent up to polynomial factors, you can use your favoritemodel to capture polynomial-time randomized algorithms withoutany loss in generality.Solved Exercise 20.1 â Choosing from a set. Modern programming lan-guages often involve not just the ability to toss a random coin in 0, 1but also to choose an element at random from a set đ. Show that youcan emulate this primitive using coin tossing. Specifically, show thatthere is randomized algorithm đŽ that on input a set đ of đ strings oflength đ, runs in time đđđđŠ(đ, đ) and outputs either an element đ„ â đor âfailâ such that
1. Let đ be the probability that đŽ outputs âfailâ, then đ < 2âđ (anumber small enough that it can be ignored).
2. For every đ„ â đ, the probability that đŽ outputs đ„ is exactly 1âđđ(and so the output is uniform over đ if we ignore the tiny probabil-ity of failure)
556 introduction to theoretical computer science
Figure 20.2: The two equivalent views of random-ized algorithms. We can think of such an algorithmas having access to an internal RAND() operationthat outputs a random independent value in 0, 1whenever it is invoked, or we can think of it as a de-terministic algorithm that in addition to the standardinput đ„ â 0, 1đ obtains an additional auxiliaryinput đ â 0, 1đ that is chosen uniformly at random.
Solution:
If the size of đ is a power of two, that is đ = 2â for some â â đ ,then we can choose a random element in đ by tossing â coins toobtain a string đ€ â 0, 1â and then output the đ-th element of đwhere đ is the number whose binary representation is đ€.
If đ is not a power of two, then our first attempt will be to letâ = âlogđâ and do the same, but then output the đ-th element ofđ if đ â [đ] and output âfailâ otherwise. Conditioned on not out-putting âfailâ, this element is distributed uniformly in đ. However,in the worst case, 2â can be almost 2đ and so the probability of failmight be close to half. To reduce the failure probability, we canrepeat the experiment above đ times. Specifically, we will use thefollowing algorithm
Algorithm 20.2 â Sample from set.
Input: Set đ = đ„0, ⊠, đ„đâ1 with đ„đ â 0, 1đ for allđ â [đ].Output: Either đ„ â đ or âfailâ1: Let â â âlogđâ2: for đ = 0, 1, ⊠, đ â 1 do3: Pick đ€ ⌠0, 1â4: Let đ â [2â] be number whose binary representa-
tion is đ€.5: if đ < đ then6: return đ„đ7: end if8: end for9: return âfailâ
Conditioned on not failing, the output of Algorithm 20.2 is uni-formly distributed in đ. However, since 2â < 2đ, the probabilityof failure in each iteration is less than 1/2 and so the probability offailure in all of them is at most (1/2)đ = 2âđ.
20.1.1 An alternative view: random coins as an âextra inputâWhile we presented randomized computation as adding an extraâcoin tossingâ operation to our programs, we can also model this asbeing given an additional extra input. That is, we can think of a ran-domized algorithm đŽ as a deterministic algorithm đŽâČ that takes twoinputs đ„ and đ where the second input đ is chosen at random from0, 1đ for some đ â â (see Fig. 20.2). The equivalence to the Defini-tion 20.1 is shown in the following theorem:
modeling randomized computation 557
Theorem 20.3 â Alternative characterization of BPP. Let đč ⶠ0, 1â â0, 1. Then đč â BPP if and only if there exists đ, đ â â andđș ⶠ0, 1â â 0, 1 such that đș is in P and for every đ„ â 0, 1â,PrđâŒ0,1đ|đ„|đ [đș(đ„đ) = đč(đ„)] â„ 23 . (20.2)
Proof Idea:
The idea behind the proof is that, as illustrated in Fig. 20.2, we cansimply replace sampling a random coin with reading a bit from theextra ârandom inputâ đ and vice versa. To prove this rigorously weneed to work through some slightly cumbersome formal notation.This might be one of those proofs that is easier to work out on yourown than to read.âProof of Theorem 20.3. We start by showing the âonly ifâ direction.Let đč â BPP and let đ be an RNAND-TM program that computesđč as per Definition 20.1, and let đ, đ â â be such that on every inputof length đ, the program đ halts within at most đđđ steps. We willconstruct a polynomial-time algorithm that đ âČ such that for everyđ„ â 0, 1đ, if we set đ = đđđ, then
PrđâŒ0,1đ[đ âČ(đ„đ) = 1] = Pr[đ (đ„) = 1] , (20.3)
where the probability in the righthand side is taken over the RAND()operations in đ . In particular this means that if we define đș(đ„đ) =đ âČ(đ„đ) then the function đș satisfies the conditions of (20.2).
The algorithm đ âČ will be very simple: it simulates the program đ ,maintaining a counter đ initialized to 0. Every time that đ makes aRAND() operation, the program đ âČ will supply the result from đđ andincrement đ by one. We will never ârun outâ of bits, since the runningtime of đ is at most đđđ and hence it can make at most this number ofRNAND() calls. The output of đ âČ(đ„đ) for a random đ ⌠0, 1đ will bedistributed identically to the output of đ(đ„).
For the other direction, given a function đș â P satisfying the condi-tion (20.2) and a NAND-TM đ âČ that computes đș in polynomial time,we can construct an RNAND-TM program đ that computes đč in poly-nomial time. On input đ„ â 0, 1đ, the program đ will simply use theRNAND() instruction đđđ times to fill an array R[0] , âŠ, R[đđđ â 1] andthen execute the original program đ âČ on input đ„đ where đđ is the đ-thelement of the array R. Once again, it is clear that if đ âČ runs in polyno-mial time then so will đ , and for every input đ„ and đ â 0, 1đđđ , theoutput of đ on input đ„ and where the coin tosses outcome is đ is equalto đ âČ(đ„đ).
558 introduction to theoretical computer science
RRemark 20.4 â Definitions of BPP and NP. The char-acterization of BPP Theorem 20.3 is reminiscent ofthe characterization of NP in Definition 15.1, withthe randomness in the case of BPP playing the roleof the solution in the case of NP. However, there areimportant differences between the two:
âą The definition of NP is âone sidedâ: đč(đ„) = 1 ifthere exists a solution đ€ such that đș(đ„đ€) = 1 andđč(đ„) = 0 if for every string đ€ of the appropriatelength, đș(đ„đ€) = 0. In contrast, the characteriza-tion of BPP is symmetric with respect to the casesđč(đ„) = 0 and đč(đ„) = 1.
âą The relation between NP and BPP is not immedi-ately clear. It is not known whether BPP â NP,NP â BPP, or these two classes are incomparable.It is however known (with a non-trivial proof) thatif P = NP then BPP = P (see Theorem 20.11).
âą Most importantly, the definition of NP is âinef-fective,â since it does not yield a way of actuallyfinding whether there exists a solution among theexponentially many possibilities. By contrast, thedefinition of BPP gives us a way to compute thefunction in practice by simply choosing the secondinput at random.
âRandom tapesâ. Theorem 20.3 motivates sometimes considering therandomness of an RNAND-TM (or RNAND-RAM) program as anextra input. As such, if đŽ is a randomized algorithm that on inputs oflength đ makes at most đ coin tosses, we will often use the notationđŽ(đ„; đ) (where đ„ â 0, 1đ and đ â 0, 1đ) to refer to the result ofexecuting đ„ when the coin tosses of đŽ correspond to the coordinatesof đ. This second, or âauxiliary,â input is sometimes referred to asa ârandom tape.â This terminology originates from the model ofrandomized Turing machines.
20.1.2 Success amplification of two-sided error algorithmsThe number 2/3 might seem arbitrary, but as weâve seen in Chapter 19it can be amplified to our liking:
Theorem 20.5 â Amplification. Let đč ⶠ0, 1â â 0, 1 be a Booleanfunction such that there is a polynomial đ ⶠâ â â and apolynomial-time randomized algorithm đŽ satisfying that for ev-
modeling randomized computation 559
Figure 20.3: If đč â BPP then there is randomizedpolynomial-time algorithm đ with the followingproperty: In the case đč(đ„) = 0 two thirds of theâpopulationâ of random choices satisfy đ(đ„; đ) = 0and in the case đč(đ„) = 1 two thirds of the populationsatisfy đ(đ„; đ) = 1. We can think of amplification asa form of âpollingâ of the choices of randomness. Bythe Chernoff bound, if we poll a sample of đ( log(1/đż)đ2 )random choices đ, then with probability at least1 â đż, the fraction of đâs in the sample satisfyingđ(đ„; đ) = 1 will give us an estimate of the fraction ofthe population within an đ margin of error. This is thesame calculation used by pollsters to determine theneeded sample size in their polls.
ery đ„ â 0, 1đ,Pr[đŽ(đ„) = đč(đ„)] â„ 12 + 1đ(đ) . (20.4)
Then for every polynomial đ ⶠâ â â there is a polynomial-timerandomized algorithm đ” satisfying for every đ„ â 0, 1đ,
Pr[đ”(đ„) = đč(đ„)] â„ 1 â 2âđ(đ) . (20.5)
Big Idea 25 We can amplify the success of randomized algorithmsto a value that is arbitrarily close to 1.
Proof Idea:
The proof is the same as weâve seen before in the case of maximumcut and other examples. We use the Chernoff bound to argue that ifđŽ computes đč with probability at least 12 + đ and we run it đ(đ/đ2)times, each time using fresh and independent random coins, then theprobability that the majority of the answers will not be correct will beless than 2âđ. Amplification can be thought of as a âpollingâ of thechoices for randomness for the algorithm (see Fig. 20.3).âProof of Theorem 20.5. Let đŽ be an algorithm satisfying (20.4). Setđ = 1đ(đ) and đ = đ(đ) where đ, đ are the polynomials in the theoremstatement. We can run đ on input đ„ for đĄ = 10đ/đ2 times, using freshrandomness in each execution, and compute the outputs đŠ0, ⊠, đŠđĄâ1.We output the value đŠ that appeared the largest number of times. Letđđ be the random variable that is equal to 1 if đŠđ = đč(đ„) and equal to0 otherwise. The random variables đ0, ⊠, đđĄâ1 are i.i.d. and satisfyđŒ[đđ] = Pr[đđ = 1] â„ 1/2 + đ, and hence by linearity of expectationđŒ[âđĄâ1đ=0 đđ] â„ đĄ(1/2 + đ). For the plurality value to be incorrect, it musthold that âđĄâ1đ=0 đđ †đĄ/2, which means that âđĄâ1đ=0 đđ is at least đđĄ farfrom its expectation. Hence by the Chernoff bound (Theorem 18.12),the probability that the plurality value is not correct is at most 2đâđ2đĄ,which is smaller than 2âđ for our choice of đĄ.
20.2 BPP AND NP COMPLETENESS
Since ânoisy processesâ abound in nature, randomized algorithms canbe realized physically, and so it is reasonable to propose BPP ratherthan P as our mathematical model for âfeasibleâ or âtractableâ com-putation. One might wonder if this makes all the previous chapters
560 introduction to theoretical computer science
irrelevant, and in particular if the theory of NP completeness stillapplies to probabilistic algorithms. Fortunately, the answer is Yes:
Theorem 20.6 â NP hardness and BPP. Suppose that đč is NP-hard andđč â BPP. Then NP â BPP.
Before seeing the proof, note that Theorem 20.6 implies that if therewas a randomized polynomial time algorithm for any NP-completeproblem such as 3SAT, ISET etc., then there would be such an algo-rithm for every problem in NP. Thus, regardless of whether our modelof computation is deterministic or randomized algorithms, NP com-plete problems retain their status as the âhardest problems in NP.â
Proof Idea:
The idea is to simply run the reduction as usual, and plug it intothe randomized algorithm instead of a deterministic one. It wouldbe an excellent exercise, and a way to reinforce the definitions of NP-hardness and randomized algorithms, for you to work out the prooffor yourself. However for the sake of completeness, we include thisproof below.âProof of Theorem 20.6. Suppose that đč is NP-hard and đč â BPP.We will now show that this implies that NP â BPP. Let đș â NP.By the definition of NP-hardness, it follows that đș â€đ đč , or that inother words there exists a polynomial-time computable function đ â¶0, 1â â 0, 1â such that đș(đ„) = đč(đ (đ„)) for every đ„ â 0, 1â. Nowif đč is in BPP then there is a polynomial-time RNAND-TM program đsuch that
Pr[đ (đŠ) = đč(đŠ)] â„ 2/3 (20.6)
for every đŠ â 0, 1â (where the probability is taken over the randomcoin tosses of đ ). Hence we can get a polynomial-time RNAND-TMprogram đ âČ to compute đș by setting đ âČ(đ„) = đ(đ (đ„)). By (20.6)Pr[đ âČ(đ„) = đč(đ (đ„))] â„ 2/3 and since đč(đ (đ„)) = đș(đ„) this implies thatPr[đ âČ(đ„) = đș(đ„)] â„ 2/3, which proves that đș â BPP.
Most of the results weâve seen about NP hardness, including thesearch to decision reduction of Theorem 16.1, the decision to optimiza-tion reduction of Theorem 16.3, and the quantifier elimination resultof Theorem 16.6, all carry over in the same way if we replace P withBPP as our model of efficient computation. Thus if NP â BPP thenwe get essentially all of the strange and wonderful consequences ofP = NP. Unsurprisingly, we cannot rule out this possibility. In fact,unlike P = EXP, which is ruled out by the time hierarchy theorem, we
modeling randomized computation 561
1 At the time of this writing, the largest ânaturalâ com-plexity class which we canât rule out being containedin BPP is the class NEXP, which we did not definein this course, but corresponds to non deterministicexponential time. See this paper for a discussion ofthis question.
Figure 20.4: Some possibilities for the relations be-tween BPP and other complexity classes. Mostresearchers believe that BPP = P and that theseclasses are not powerful enough to solve NP-completeproblems, let alone all problems in EXP. However,we have not even been able yet to rule out the possi-bility that randomness is a âsilver bulletâ that allowsexponential speedup on all problems, and henceBPP = EXP. As weâve already seen, we also canâtrule out that P = NP. Interestingly, in the latter case,P = BPP.
donât even know how to rule out the possibility that BPP = EXP! Thusa priori itâs possible (though seems highly unlikely) that randomnessis a magical tool that allows us to speed up arbitrary exponential timecomputation.1 Nevertheless, as we discuss below, it is believed thatrandomizationâs power is much weaker and BPP lies in much moreâpedestrianâ territory.
20.3 THE POWER OF RANDOMIZATION
A major question is whether randomization can add power to compu-tation. Mathematically, we can phrase this as the following question:does BPP = P? Given what weâve seen so far about the relations ofother complexity classes such as P and NP, or NP and EXP, one mightguess that:
1. We do not know the answer to this question.
2. But we suspect that BPP is different than P.
One would be correct about the former, but wrong about the latter.As we will see, we do in fact have reasons to believe that BPP = P.This can be thought of as supporting the extended Church Turing hy-pothesis that deterministic polynomial-time Turing machines capturewhat can be feasibly computed in the physical world.
We now survey some of the relations that are known betweenBPP and other complexity classes we have encountered. (See alsoFig. 20.4.)
20.3.1 Solving BPP in exponential timeIt is not hard to see that if đč is in BPP then it can be computed inexponential time.
Theorem 20.7 â Simulating randomized algorithms in exponential time.
BPP â EXP
PThe proof of Theorem 20.7 readily follows by enumer-ating over all the (exponentially many) choices for therandom coins. We omit the formal proof, as doing itby yourself is an excellent way to get comfortable withDefinition 20.1.
20.3.2 Simulating randomized algorithms by circuitsWe have seen in Theorem 13.12 that if đč is in P, then there is a polyno-mial đ ⶠâ â â such that for every đ, the restriction đčđ of đč to inputs
562 introduction to theoretical computer science
0, 1đ is in SIZE(đ(đ)). (In other words, that P â P/poly.) A priori it isnot at all clear that the same holds for a function in BPP, but this doesturn out to be the case.
Figure 20.5: The possible guarantees for a random-ized algorithm đŽ computing some function đč . Inthe tables above, the columns correspond to differ-ent inputs and the rows to different choices of therandom tape. A cell at position đ, đ„ is colored greenif đŽ(đ„; đ) = đč(đ„) (i.e., the algorithm outputs thecorrect answer) and red otherwise. The standard BPPguarantee corresponds to the middle figure, wherefor every input đ„, at least two thirds of the choicesđ for a random tape will result in đŽ computing thecorrect value. That is, every column is colored greenin at least two thirds of its coordinates. In the leftfigure we have an âaverage caseâ guarantee wherethe algorithm is only guaranteed to output the correctanswer with probability two thirds over a randominput (i.e., at most one third of the total entries of thetable are colored red, but there could be an all redcolumn). The right figure corresponds to the âofflineBPPâ case, with probability at least two thirds overthe random choice đ, đ will be good for every input.That is, at least two thirds of the rows are all green.Theorem 20.8 (BPP â P/poly) is proven by amplify-ing the success of a BPP algorithm until we have theâoffline BPPâ guarantee, and then hardwiring thechoice of the randomness đ to obtain a nonuniformdeterministic algorithm.
Theorem 20.8 â Randomness does not help for non uniform computation.
BPP â P/poly.That is, for every đč â BPP, there exist some đ, đ â â such that
for every đ > 0, đčđ â SIZE(đđđ) where đčđ is the restriction of đč toinputs in 0, 1đ.Proof Idea:
The idea behind the proof is that we can first amplify by repetitionthe probability of success from 2/3 to 1 â 0.1 â 2âđ. This will allow us toshow that for every đ â â there exists a single fixed choice of âfavorablecoinsâ which is a string đ of length polynomial in đ such that if đ isused for the randomness then we output the right answer on all ofthe possible 2đ inputs. We can then use the standard âunravelling theloopâ technique to transform an RNAND-TM program to an RNAND-CIRC program, and âhardwireâ the favorable choice of random coinsto transform the RNAND-CIRC program into a plain old deterministicNAND-CIRC program.âProof of Theorem 20.8. Suppose that đč â BPP. Let đ be a polynomial-time RNAND-TM program that computes đč as per Definition 20.1.Using Theorem 20.5, we can amplify the success probability of đ toobtain an RNAND-TM program đ âČ that is at most a factor of đ(đ)
modeling randomized computation 563
slower (and hence still polynomial time) such that for every đ„ â0, 1đPrđâŒ0,1đ[đ âČ(đ„; đ) = đč(đ„)] â„ 1 â 0.1 â 2âđ , (20.7)
where đ is the number of coin tosses that đ âČ uses on inputs oflength đ. We use the notation đ âČ(đ„; đ) to denote the execution of đ âČon input đ„ and when the result of the coin tosses corresponds to thestring đ.
For every đ„ â 0, 1đ, define the âbadâ event đ”đ„ to hold if đ âČ(đ„) â đč(đ„), where the sample space for this event consists of the coins of đ âČ.Then by (20.7), Pr[đ”đ„] †0.1 â 2âđ for every đ„ â 0, 1đ. Since thereare 2đ many such đ„âs, by the union bound we see that the probabilitythat the union of the events đ”đ„đ„â0,1đ is at most 0.1. This means thatif we choose đ ⌠0, 1đ, then with probability at least 0.9 it will bethe case that for every đ„ â 0, 1đ, đč(đ„) = đ âČ(đ„; đ). (Indeed, otherwisethe event đ”đ„ would hold for some đ„.) In particular, because of themere fact that the the probability of âȘđ„â0,1đđ”đ„ is smaller than 1, thismeans that there exists a particular đâ â 0, 1đ such thatđ âČ(đ„; đâ) = đč(đ„) (20.8)
for every đ„ â 0, 1đ.Now let us use the standard âunravelling the loopâ the technique
and transform đ âČ into a NAND-CIRC program đ of polynomial inđ size, such that đ(đ„đ) = đ âČ(đ„; đ) for every đ„ â 0, 1đ and đ â0, 1đ. Then by âhardwiringâ the values đâ0, ⊠, đâđâ1 in place of thelast đ inputs of đ, we obtain a new NAND-CIRC program đđâ thatsatisfies by (20.8) that đđâ(đ„) = đč(đ„) for every đ„ â 0, 1đ. Thisdemonstrates that đčđ has a polynomial sized NAND-CIRC program,hence completing the proof of Theorem 20.8.
20.4 DERANDOMIZATION
The proof of Theorem 20.8 can be summarized as follows: we canreplace a đđđđŠ(đ)-time algorithm that tosses coins as it runs with analgorithm that uses a single set of coin tosses đâ â 0, 1đđđđŠ(đ) whichwill be good enough for all inputs of size đ. Another way to say it isthat for the purposes of computing functions, we do not need âonlineâaccess to random coins and can generate a set of coins âofflineâ aheadof time, before we see the actual input.
But this does not really help us with answering the question ofwhether BPP equals P, since we still need to find a way to generatethese âofflineâ coins in the first place. To derandomize an RNAND-TM program we will need to come up with a single deterministic
564 introduction to theoretical computer science
2 One amusing anecdote is a recent case where scam-mers managed to predict the imperfect âpseudo-random generatorâ used by slot machines to cheatcasinos. Unfortunately we donât know the details ofhow they did it, since the case was sealed.
algorithm that will work for all input lengths. That is, unlike in thecase of RNAND-CIRC programs, we cannot choose for every inputlength đ some string đâ â 0, 1đđđđŠ(đ) to use as our random coins.
Can we derandomize randomized algorithms, or does randomnessadd an inherent extra power for computation? This is a fundamentallyinteresting question but is also of practical significance. Ever sincepeople started to use randomized algorithms during the Manhattanproject, they have been trying to remove the need for randomness andreplace it with numbers that are selected through some deterministicprocess. Throughout the years this approach has often been usedsuccessfully, though there have been a number of failures as well.2
A common approach people used over the years was to replacethe random coins of the algorithm by a ârandomish lookingâ stringthat they generated through some arithmetic progress. For example,one can use the digits of đ for the random tape. Using these type ofmethods corresponds to what von Neumann referred to as a âstateof sinâ. (Though this is a sin that he himself frequently committed,as generating true randomness in sufficient quantity was and still isoften too expensive.) The reason that this is considered a âsinâ is thatsuch a procedure will not work in general. For example, it is easy tomodify any probabilistic algorithm đŽ such as the ones we have seen inChapter 19, to an algorithm đŽâČ that is guaranteed to fail if the randomtape happens to equal the digits of đ. This means that the procedureâreplace the random tape by the digits of đâ does not yield a generalway to transform a probabilistic algorithm to a deterministic one thatwill solve the same problem. Of course, this procedure does not alwaysfail, but we have no good way to determine when it fails and whenit succeeds. This reasoning is not specific to đ and holds for everydeterministically produced string, whether it obtained by đ, đ, theFibonacci series, or anything else.
An algorithm that checks if its random tape is equal to đ and thenfails seems to be quite silly, but this is but the âtip of the icebergâ for avery serious issue. Time and again people have learned the hard waythat one needs to be very careful about producing random bits usingdeterministic means. As we will see when we discuss cryptography,many spectacular security failures and break-ins were the result ofusing âinsufficiently randomâ coins.
20.4.1 Pseudorandom generatorsSo, we canât use any single string to âderandomizeâ a probabilisticalgorithm. It turns out however, that we can use a collection of stringsto do so. Another way to think about it is that rather than trying toeliminate the need for randomness, we start by focusing on reducing the
modeling randomized computation 565
Figure 20.6: A pseudorandom generator đș mapsa short string đ â 0, 1â into a long string đ â0, 1đ such that an small program/circuit đ cannotdistinguish between the case that it is provided arandom input đ ⌠0, 1đ and the case that itis provided a âpseudorandomâ input of the formđ = đș(đ ) where đ ⌠0, 1â. The short string đ is sometimes called the seed of the pseudorandomgenerator, as it is a small object that can be thought asyielding a large âtree of randomnessâ.
amount of randomness needed. (Though we will see that if we reducethe randomness sufficiently, we can eventually get rid of it altogether.)
We make the following definition:
Definition 20.9 â Pseudorandom generator. A function đș ⶠ0, 1â â0, 1đ is a (đ , đ)-pseudorandom generator if for every circuit đ¶ withđ inputs, one output, and at most đ gates,⣠Prđ âŒ0,1â[đ¶(đș(đ )) = 1] â PrđâŒ0,1đ[đ¶(đ) = 1]⣠< đ (20.9)
PThis is a definition thatâs worth reading more thanonce, and spending some time to digest it. Note that ittakes several parameters:
âą đ is the limit on the number of gates of the circuitđ¶ that the generator needs to âfoolâ. The larger đis, the stronger the generator.
âą đ is how close is the output of the pseudorandomgenerator to the true uniform distribution over0, 1đ. The smaller đ is, the stronger the generator.
âą â is the input length and đ is the output length.If â â„ đ then it is trivial to come up with sucha generator: on input đ â 0, 1â, we can outputđ 0, ⊠, đ đâ1. In this case Prđ âŒ0,1â [đ (đș(đ )) = 1]will simply equal Prđâ0,1đ [đ (đ) = 1], no matterhow many lines đ has. So, the smaller â is and thelarger đ is, the stronger the generator, and to getanything non-trivial, we need đ > â.
Furthermore note that although our eventual goal is tofool probabilistic randomized algorithms that take anunbounded number of inputs, Definition 20.9 refers tofinite and deterministic NAND-CIRC programs.
We can think of a pseudorandom generator as a ârandomnessamplifier.â It takes an input đ of â bits chosen at random and ex-pands these â bits into an output đ of đ > â pseudorandom bits. Ifđ is small enough then the pseudorandom bits will âlook randomâto any NAND-CIRC program that is not too big. Still, there are twoquestions we havenât answered:
âą What reason do we have to believe that pseudorandom generators withnon-trivial parameters exist?
âą Even if they do exist, why would such generators be useful to derandomizerandomized algorithms? After all, Definition 20.9 does not involve
566 introduction to theoretical computer science
RNAND-TM or RNAND-RAM programs, but rather deterministicNAND-CIRC programs with no randomness and no loops.
We will now (partially) answer both questions. For the first ques-tion, let us come clean and confess we do not know how to prove thatinteresting pseudorandom generators exist. By interesting we meanpseudorandom generators that satisfy that đ is some small constant(say đ < 1/3), đ > â, and the function đș itself can be computed inđđđđŠ(đ) time. Nevertheless, Lemma 20.12 (whose statement and proofis deferred to the end of this chapter) shows that if we only drop thelast condition (polynomial-time computability), then there do in factexist pseudorandom generators where đ is exponentially larger than â.
PAt this point you might want to skip ahead and look atthe statement of Lemma 20.12. However, since its proofis somewhat subtle, I recommend you defer reading ituntil youâve finished reading the rest of this chapter.
20.4.2 From existence to constructivityThe fact that there exists a pseudorandom generator does not meanthat there is one that can be efficiently computed. However, it turnsout that we can turn complexity âon its headâ and use the assumednon existence of fast algorithms for problems such as 3SAT to obtainpseudorandom generators that can then be used to transform random-ized algorithms into deterministic ones. This is known as the Hardnessvs Randomness paradigm. A number of results along those lines, mostof which are outside the scope of this course, have led researchers tobelieve the following conjecture:
Optimal PRG conjecture: There is a polynomial-time computablefunction PRG ⶠ0, 1â â 0, 1 that yields an exponentially securepseudorandom generator.Specifically, there exists a constant đż > 0 such that for every â andđ < 2đżâ, if we define đș ⶠ0, 1â â 0, 1đ as đș(đ )đ = PRG(đ , đ) for everyđ â 0, 1â and đ â [đ], then đș is a (2đżâ, 2âđżâ) pseudorandom generator.
PThe âoptimal PRG conjectureâ is worth while readingmore than once. What it posits is that we can obtain(đ , đ) pseudorandom generator đș such that everyoutput bit of đș can be computed in time polynomialin the length â of the input, where đ is exponentiallylarge in â and đ is exponentially small in â. (Note thatwe could not hope for the entire output to be com-
modeling randomized computation 567
3 A pseudorandom generator of the form we posit,where each output bit can be computed individuallyin time polynomial in the seed length, is commonlyknown as a pseudorandom function generator. For moreon the many interesting results and connections in thestudy of pseudorandomness, see this monograph of SalilVadhan.
putable in â, as just writing the output down will taketoo long.)To understand why we call such a pseudorandomgenerator âoptimal,â it is a great exercise to convinceyourself that, for example, there does not exist a(21.1â, 2â1.1â) pseudorandom generator (in fact, thenumber đż in the conjecture must be smaller than 1). Tosee that we canât have đ â« 2â, note that if we allow aNAND-CIRC program with much more than 2â linesthen this NAND-CIRC program could âhardwireâ in-side it all the outputs of đș on all its 2â inputs, and usethat to distinguish between a string of the form đș(đ )and a uniformly chosen string in 0, 1đ. To see thatwe canât have đ âȘ 2ââ, note that by guessing the inputđ (which will be successful with probability 2â2â), wecan obtain a small (i.e., đ(â) line) NAND-CIRC pro-gram that achieves a 2ââ advantage in distinguishing apseudorandom and uniform input. Working out thesedetails is a highly recommended exercise.
We emphasize again that the optimal PRG conjecture is, as its nameimplies, a conjecture, and we still do not know how to prove it. In par-ticular, it is stronger than the conjecture that P â NP. But we do havesome evidence for its truth. There is a spectrum of different types ofpseudorandom generators, and there are weaker assumptions thanthe optimal PRG conjecture that suffice to prove that BPP = P. Inparticular this is known to hold under the assumption that there existsa function đč â TIME(2đ(đ)) and đ > 0 such that for every sufficientlylarge đ, đčđ is not in SIZE(2đđ). The name âOptimal PRG conjectureâis non standard. This conjecture is sometimes known in the literatureas the existence of exponentially strong pseudorandom functions.3
20.4.3 Usefulness of pseudorandom generatorsWe now show that optimal pseudorandom generators are indeed veryuseful, by proving the following theorem:
Theorem 20.10 â Derandomization of BPP. Suppose that the optimalPRG conjecture is true. Then BPP = P.
Proof Idea:
The optimal PRG conjecture tells us that we can achieve exponentialexpansion of â truly random coins into as many as 2đżâ âpseudorandomcoins.â Looked at from the other direction, it allows us to reduce theneed for randomness by taking an algorithm that uses đ coins andconverting it into an algorithm that only uses đ(logđ) coins. Now analgorithm of the latter type by can be made fully deterministic by enu-merating over all the 2đ(logđ) (which is polynomial in đ) possibilitiesfor its random choices.
568 introduction to theoretical computer science
âWe now proceed with the proof details.
Proof of Theorem 20.10. Let đč â BPP and let đ be a NAND-TM pro-gram and đ, đ, đ, đ constants such that for every đ„ â 0, 1đ, đ(đ„)runs in at most đ â đđ steps and PrđâŒ0,1đ [đ (đ„; đ) = đč(đ„)] â„ 2/3.By âunrolling the loopâ and hardwiring the input đ„, we can obtainfor every input đ„ â 0, 1đ a NAND-CIRC program đđ„ of at most,say, đ = 10đ â đđ lines, that takes đ bits of input and such thatđ(đ) = đ(đ„; đ).
Now suppose that đș ⶠ0, 1â â 0, 1 is a (đ , 0.1) pseudorandomgenerator. Then we could deterministically estimate the probabilityđ(đ„) = PrđâŒ0,1đ [đđ„(đ) = 1] up to 0.1 accuracy in time đ(đ â 2â â đ â đđđ đĄ(đș)) where đđđ đĄ(đș) is the time that it takes to compute a singleoutput bit of đș.
The reason is that we know that đ(đ„) = Prđ âŒ0,1â [đđ„(đș(đ )) =1] will give us such an estimate for đ(đ„), and we can compute theprobability đ(đ„) by simply trying all 2â possibillites for đ . Now, underthe optimal PRG conjecture we can set đ = 2đżâ or equivalently â =1đż logđ , and our total computation time is polynomial in 2â = đ 1/đż.Since đ †10đ â đđ, this running time will be polynomial in đ.
This completes the proof, since we are guaranteed thatPrđâŒ0,1đ [đđ„(đ) = đč(đ„)] â„ 2/3, and hence estimating theprobability đ(đ„) to within 0.1 accuracy is sufficient to compute đč(đ„).
20.5 P = NP AND BPP VS PTwo computational complexity questions that we cannot settle are:
âą Is P = NP? Where we believe the answer is negative.
âą Is BPP = P? Where we believe the answer is positive.
However we can say that the âconventional wisdomâ is correct onat least one of these questions. Namely, if weâre wrong on the firstcount, then weâll be right on the second one:
Theorem 20.11 â SipserâGĂĄcs Theorem. If P = NP then BPP = P.
PBefore reading the proof, it is instructive to thinkwhy this result is not âobvious.â If P = NP thengiven any randomized algorithm đŽ and input đ„,we will be able to figure out in polynomial time ifthere is a string đ â 0, 1đ of random coins for đŽ
modeling randomized computation 569
Figure 20.7: If đč â BPP then through amplification wecan ensure that there is an algorithm đŽ to computeđč on đ-length inputs and using đ coins such thatPrđâ0,1đ [đŽ(đ„đ) â đč(đ„)] âȘ 1/đđđđŠ(đ). Henceif đč(đ„) = 1 then almost all of the 2đ choices for đwill cause đŽ(đ„đ) to output 1, while if đč(đ„) = 0 thenđŽ(đ„đ) = 0 for almost all đâs. To prove the SipserâGĂĄcs Theorem we consider several âshiftsâ of the setđ â 0, 1đ of the coins đ such that đŽ(đ„đ) = 1. Ifđč(đ„) = 1 then we can find a set of đ shifts đ 0, ⊠, đ đâ1for which âȘđâ[đ](đ â đ đ) = 0, 1đ. If đč(đ„) = 0 thenfor every such set |đđąđđâ[đ]đđ| †đ|đ| âȘ 2đ. We canphrase the question of whether there is such a set ofshift using a constant number of quantifiers, and socan solve it in polynomial time if P = NP.
such that đŽ(đ„đ) = 1. The problem is that even ifPrđâ0,1đ [đŽ(đ„đ) = đč(đ„)] â„ 0.9999, it can still be thecase that even when đč(đ„) = 0 there exists a string đsuch that đŽ(đ„đ) = 1.The proof is rather subtle. It is much more importantthat you understand the statement of the theorem thanthat you follow all the details of the proof.
Proof Idea:
The construction follows the âquantifier eliminationâ idea whichwe have seen in Theorem 16.6. We will show that for every đč â BPP,we can reduce the question of some input đ„ satisfies đč(đ„) = 1 to thequestion of whether a formula of the form âđąâ0,1đâđŁâ0,1đđ(đą, đŁ)is true, where đ, đ are polynomial in the length of đ„ and đ ispolynomial-time computable. By Theorem 16.6, if P = NP then we candecide in polynomial time whether such a formula is true or false.
The idea behind this construction is that using amplification wecan obtain a randomized algorithm đŽ for computing đč using đ coinssuch that for every đ„ â 0, 1đ, if đč(đ„) = 0 then the set đ â 0, 1đof coins that make đŽ output 1 is extremely tiny, and if đč(đ„) = 1 thenit is very large. Now in the case đč(đ„) = 1, one can show that thismeans that there exists a small number đ of âshiftsâ đ 0, ⊠, đ đâ1 suchthat the union of the sets đ â đ đ (i.e., sets of the form đ â đ đ | đ â đ)covers 0, 1đ, while in the case đč(đ„) = 0 this union will always be ofsize at most đ|đ| which is much smaller than 2đ. We can express thecondition that there exists đ 0, ⊠, đ đâ1 such that âȘđâ[đ](đ â đ đ) = 0, 1đas a statement with a constant number of quantifiers.âProof of Theorem 20.11. Let đč â BPP. Using Theorem 20.5, thereexists a polynomial-time algorithm đŽ such that for every đ„ â 0, 1đ,Prđ„â0,1đ [đŽ(đ„đ) = đč(đ„)] â„ 1 â 2âđ where đ is polynomial in đ. Inparticular (since an exponential dominates a polynomial, and we canalways assume đ is sufficiently large), it holds that
Prđ„â0,1đ[đŽ(đ„đ) = đč(đ„)] â„ 1 â 110đ2 . (20.10)
Let đ„ â 0, 1đ, and let đđ„ â 0, 1đ be the set đ â 0, 1đ â¶đŽ(đ„đ) = 1. By our assumption, if đč(đ„) = 0 then |đđ„| †110đ2 2đ and ifđč(đ„) = 1 then |đđ„| â„ (1 â 110đ2 )2đ.For a set đ â 0, 1đ and a string đ â 0, 1đ, we define the setđ â đ to be đ â đ ⶠđ â đ where â denotes the XOR operation. That
is, đ â đ is the set đ âshiftedâ by đ . Note that |đ â đ | = |đ|. (Pleasemake sure that you see why this is true.)
The heart of the proof is the following two claims:
570 introduction to theoretical computer science
CLAIM I: For every subset đ â 0, 1đ, if |đ| †11000đ 2đ, then forevery đ 0, ⊠, đ 100đâ1 â 0, 1đ, âȘđâ[100đ](đ â đ đ) â 0, 1đ.
CLAIM II: For every subset đ â 0, 1đ, if |đ| â„ 12 2đ then thereexist a set of string đ 0, ⊠, đ 100đâ1 such that âȘđâ[100đ](đ â đ đ) = 0, 1đ.
CLAIM I and CLAIM II together imply the theorem. Indeed, theymean that under our assumptions, for every đ„ â 0, 1đ, đč(đ„) = 1 ifand only ifâđ 0,âŠ,đ 100đâ1â0,1đ âȘđâ[100đ] (đđ„ â đ đ) = 0, 1đ (20.11)
which we can re-write as
âđ 0,âŠ,đ 100đâ1â0,1đâđ€â0,1đ(đ€ â (đđ„âđ 0)âšđ€ â (đđ„âđ 1)âšâŻ đ€ â (đđ„âđ 100đâ1))(20.12)
or equivalently
âđ 0,âŠ,đ 100đâ1â0,1đâđ€â0,1đ(đŽ(đ„(đ€âđ 0)) = 1âšđŽ(đ„(đ€âđ 1)) = 1âšâŻâšđŽ(đ„(đ€âđ 100đâ1)) = 1)(20.13)
which (since đŽ is computable in polynomial time) is exactly thetype of statement shown in Theorem 16.6 to be decidable in polyno-mial time if P = NP.
We see that all that is left is to prove CLAIM I and CLAIM II.CLAIM I follows immediately from the fact that
âȘđâ[100đâ1]|đđ„ â đ đ| †100đâ1âđ=0 |đđ„ â đ đ| = 100đâ1âđ=0 |đđ„| = 100đ|đđ„| .(20.14)
To prove CLAIM II, we will use a technique known as the prob-abilistic method (see the proof of Lemma 20.12 for a more extensivediscussion). Note that this is a completely different use of probabilitythan in the theorem statement, we just use the methods of probabilityto prove an existential statement.
Proof of CLAIM II: Let đ â 0, 1đ with |đ| â„ 0.5 â 2đ be asin the claimâs statement. Consider the following probabilistic ex-periment: we choose 100đ random shifts đ 0, ⊠, đ 100đâ1 indepen-dently at random in 0, 1đ, and consider the event GOOD thatâȘđâ[100đ](đ â đ đ) = 0, 1đ. To prove CLAIM II it is enough to showthat Pr[GOOD] > 0, since that means that in particular there must existshifts đ 0, ⊠, đ 100đâ1 that satisfy this condition.
For every đ§ â 0, 1đ, define the event BADđ§ to hold if đ§ ââȘđâ[100đâ1](đ â đ đ). The event GOOD holds if BADđ§ fails for everyđ§ â 0, 1đ, and so our goal is to prove that Pr[âȘđ§â0,1đBADđ§] < 1. Bythe union bound, to show this, it is enough to show that Pr[BADđ§] <
modeling randomized computation 571
4 The condition of independence here is subtle. Itis not the case that all of the 2đ Ă 100đ eventsBADđđ§đ§â0,1đ,đâ[100đ] are mutually independent.Only for a fixed đ§ â 0, 1đ, the 100đ events of theform BADđđ§ are mutually independent.
5 There is a whole (highly recommended) book byAlon and Spencer devoted to this method.
2âđ for every đ§ â 0, 1đ. Define the event BADđđ§ to hold if đ§ â đ â đ đ.Since every shift đ đ is chosen independently, for every fixed đ§ theevents BAD0đ§, ⊠,BAD100đâ1đ§ are mutually independent,4 and hence
Pr[BADđ§] = Pr[â©đâ[100đâ1]BADđđ§] = 100đâ1âđ=0 Pr[BADđđ§] . (20.15)
So this means that the result will follow by showing thatPr[BADđđ§] †12 for every đ§ â 0, 1đ and đ â [100đ] (as that wouldallow to bound the righthand side of (20.15) by 2â100đ). In otherwords, we need to show that for every đ§ â 0, 1đ and set đ â 0, 1đwith |đ| â„ 12 2đ,
Prđ â0,1đ[đ§ â đ â đ ] â„ 12 . (20.16)
To show this, we observe that đ§ â đ â đ if and only if đ â đ â đ§ (canyou see why). Hence we can rewrite the probability on the lefthandside of (20.16) as Prđ â0,1đ [đ â đâđ§] which simply equals |đâđ§|/2đ =|đ|/2đ â„ 1/2! This concludes the proof of CLAIM I and hence ofTheorem 20.11.
20.6 NON-CONSTRUCTIVE EXISTENCE OF PSEUDORANDOM GEN-ERATORS (ADVANCED, OPTIONAL)
We now show that, if we donât insist on constructivity of pseudoran-dom generators, then we can show that there exists pseudorandomgenerators with output that exponentially larger in the input length.Lemma 20.12 â Existence of inefficient pseudorandom generators. There issome absolute constant đ¶ such that for every đ, đ , if â > đ¶(logđ +log(1/đ)) and đ †đ , then there is an (đ , đ) pseudorandom generatorđș ⶠ0, 1â â 0, 1đ.
Proof Idea:
The proof uses an extremely useful technique known as the âprob-abilistic methodâ which is not too hard mathematically but can beconfusing at first.5 The idea is to give a ânon constructiveâ proof ofexistence of the pseudorandom generator đș by showing that if đș waschosen at random, then the probability that it would be a valid (đ , đ)pseudorandom generator is positive. In particular this means thatthere exists a single đș that is a valid (đ , đ) pseudorandom generator.The probabilistic method is just a proof technique to demonstrate theexistence of such a function. Ultimately, our goal is to show the exis-tence of a deterministic function đș that satisfies the condition.â
572 introduction to theoretical computer science
The above discussion might be rather abstract at this point, butwould become clearer after seeing the proof.
Proof of Lemma 20.12. Let đ, đ , â, đ be as in the lemmaâs statement. Weneed to show that there exists a function đș ⶠ0, 1â â 0, 1đ thatâfoolsâ every đ line program đ in the sense of (20.9). We will showthat this follows from the following claim:
Claim I: For every fixed NAND-CIRC program đ , if we pick đș â¶0, 1â â 0, 1đ at random then the probability that (20.9) is violatedis at most 2âđ 2 .
Before proving Claim I, let us see why it implies Lemma 20.12. Wecan identify a function đș ⶠ0, 1â â 0, 1đ with its âtruth tableâor simply the list of evaluations on all its possible 2â inputs. Sinceeach output is an đ bit string, we can also think of đș as a string in0, 1đâ 2â . We define â±đâ to be the set of all functions from 0, 1â to0, 1đ. As discussed above we can identify â±đâ with 0, 1đâ 2â andchoosing a random function đș ⌠â±đâ corresponds to choosing arandom đ â 2â-long bit string.
For every NAND-CIRC program đ let đ”đ be the event that, if wechoose đș at random from â±đâ then (20.9) is violated with respect tothe program đ . It is important to understand what is the sample spacethat the event đ”đ is defined over, namely this event depends on thechoice of đș and so đ”đ is a subset of â±đâ . An equivalent way to definethe event đ”đ is that it is the subset of all functions mapping 0, 1â to0, 1đ that violate (20.9), or in other words:
đ”đ = â§âšâ©đș â â±đâ ⣠⣠12â âđ â0,1â đ(đș(đ )) â 12đ âđâ0,1đ đ(đ)⣠> đâ«âŹâ(20.17)
(Weâve replaced here the probability statements in (20.9) with theequivalent sums so as to reduce confusion as to what is the samplespace that đ”đ is defined over.)
To understand this proof it is crucial that you pause here and seehow the definition of đ”đ above corresponds to (20.17). This may welltake re-reading the above text once or twice, but it is a good exerciseat parsing probabilistic statements and learning how to identify thesample space that these statements correspond to.
Now, weâve shown in Theorem 5.2 that up to renaming variables(which makes no difference to programâs functionality) there are2đ(đ logđ ) NAND-CIRC programs of at most đ lines. Since đ logđ <đ 2 for sufficiently large đ , this means that if Claim I is true, thenby the union bound it holds that the probability of the union ofđ”đ over all NAND-CIRC programs of at most đ lines is at most2đ(đ logđ )2âđ 2 < 0.1 for sufficiently large đ . What is important for
modeling randomized computation 573
us about the number 0.1 is that it is smaller than 1. In particular thismeans that there exists a single đșâ â â±đâ such that đșâ does not violate(20.9) with respect to any NAND-CIRC program of at most đ lines,but that precisely means that đșâ is a (đ , đ) pseudorandom generator.
Hence to conclude the proof of Lemma 20.12, it suffices to proveClaim I. Choosing a random đș ⶠ0, 1â â 0, 1đ amounts to choos-ing đż = 2â random strings đŠ0, ⊠, đŠđżâ1 â 0, 1đ and letting đș(đ„) = đŠđ„(identifying 0, 1â and [đż] via the binary representation). This meansthat proving the claim amounts to showing that for every fixed func-tion đ ⶠ0, 1đ â 0, 1, if đż > 2đ¶(logđ +log đ) (which by setting đ¶ > 4,we can ensure is larger than 10đ 2/đ2) then the probability that⣠1đż đżâ1âđ=0 đ(đŠđ) â Prđ âŒ0,1đ[đ (đ ) = 1]⣠> đ (20.18)
is at most 2âđ 2 .(20.18) follows directly from the Chernoff bound. Indeed, if we
let for every đ â [đż] the random variable đđ denote đ(đŠđ), thensince đŠ0, ⊠, đŠđżâ1 is chosen independently at random, these are in-dependently and identically distributed random variables with meanđŒđŠâŒ0,1đ [đ (đŠ)] = PrđŠâŒ0,1đ [đ (đŠ) = 1] and hence the probability thatthey deviate from their expectation by đ is at most 2 â 2âđ2đż/2.
Figure 20.8: The relation between BPP and the othercomplexity classes that we have seen. We know thatP â BPP â EXP and BPP â P/poly but we donâtknow how BPP compares with NP and canât rule outeven BPP = EXP. Most evidence points out to thepossibliity that BPP = P.
Chapter Recap
âą We can model randomized algorithms by eitheradding a special âcoin tossâ operation or assumingan extra randomly chosen input.
âą The class BPP contains the set of Boolean func-tions that can be computed by polynomial timerandomized algorithms.
âą BPP is a worst case class of computation: a ran-domized algorithm to compute a function mustcompute it correctly with high probability on everyinput.
574 introduction to theoretical computer science
âą We can amplify the success probability of random-ized algorithm from any value strictly larger than1/2 into a success probability that is exponentiallyclose to 1.
âą We know that P â BPP â EXP.âą We also know that BPP â P/poly.âą The relation between BPP and NP is not known,
but we do know that if P = NP then BPP = P.âą Pseudorandom generators are objects that take
a short random âseedâ and expand it to a muchlonger output that âappears randomâ for effi-cient algorithms. We conjecture that exponentiallystrong pseudorandom generators exist. Under thisconjecture, BPP = P.
20.7 EXERCISES
20.8 BIBLIOGRAPHICAL NOTES
In this chapter we ignore the issue of how we actually get randombits in practice. The output of many physical processes, whether itis thermal heat, network and hard drive latency, user typing pat-tern and mouse movements, and more can be thought of as a binarystring sampled from some distribution đ that might have significantunpredictability (or entropy) but is not necessarily the uniform distri-bution over 0, 1đ. Indeed, as this paper shows, even (real-world)coin tosses do not have exactly the distribution of a uniformly randomstring. Therefore, to use the resulting measurements for randomizedalgorithms, one typically needs to apply a âdistillationâ or random-ness extraction process to the raw measurements to transform themto the uniform distribution. Vadhanâs book [Vad+12] is an excellentsource for more discussion on both randomness extractors and pseu-dorandom generators.
The name BPP stands for âbounded probability polynomial timeâ.This is an historical accident: this class probably should have beencalled RP or PP but both names were taken by other classes.
The proof of Theorem 20.8 actually yields more than its statement.We can use the same âunrolling the loopâ arguments weâve used be-fore to show that the restriction to 0, 1đ of every function in BPPis also computable by a polynomial-size RNAND-CIRC program(i.e., NAND-CIRC program with the RAND operation). Like in the Pvs SIZE(đđđđŠ(đ)) case, there are also functions outside BPP whoserestrictions can be computed by polynomial-size RNAND-CIRC pro-grams. Nevertheless the proof of Theorem 20.8 shows that even suchfunctions can be computed by polynomial sized NAND-CIRC pro-
modeling randomized computation 575
grams without using the rand operations. This can be phrased assaying that BPSIZE(đ (đ)) â SIZE(đ(đđ (đ))) (where BPSIZE isdefined in the natural way using RNAND progams). The strongerversion of Theorem 20.8 we mentioned can be phrased as saying thatBPP/poly = P/poly.
VADVANCED TOPICS
21Cryptography
âHuman ingenuity cannot concoct a cipher which human ingenuity cannotresolve.â, Edgar Allen Poe, 1841
âA good disguise should not reveal the personâs heightâ, Shafi Goldwasserand Silvio Micali, 1982
ââPerfect Secrecyâ is defined by requiring of a system that after a cryptogramis intercepted by the enemy the a posteriori probabilities of this cryptogram rep-resenting various messages be identically the same as the a priori probabilitiesof the same messages before the interception. It is shown that perfect secrecy ispossible but requires, if the number of messages is finite, the same number ofpossible keys.â, Claude Shannon, 1945
âWe stand today on the brink of a revolution in cryptography.â, WhitfeldDiffie and Martin Hellman, 1976
Cryptography - the art or science of âsecret writingâ - has beenaround for several millennia, and for almost all of that time EdgarAllan Poeâs quote above held true. Indeed, the history of cryptographyis littered with the figurative corpses of cryptosystems believed secureand then broken, and sometimes with the actual corpses of those whohave mistakenly placed their faith in these cryptosystems.
Yet, something changed in the last few decades, which is the ârevo-lutionâ alluded to (and to a large extent initiated by) Diffie and Hell-manâs 1976 paper quoted above. New cryptosystems have been foundthat have not been broken despite being subjected to immense effortsinvolving both human ingenuity and computational power on a scalethat completely dwarves the âcode breakersâ of Poeâs time. Even moreamazingly, these cryptosystem are not only seemingly unbreakable,but they also achieve this under much harsher conditions. Not only dotodayâs attackers have more computational power but they also havemore data to work with. In Poeâs age, an attacker would be lucky ifthey got access to more than a few encryptions of known messages.These days attackers might have massive amounts of data- terabytes
Compiled on 8.26.2020 18:10
Learning Objectives:âą Definition of perfect secrecyâą The one-time pad encryption schemeâą Necessity of long keys for perfect secrecyâą Computational secrecy and the derandomized
one-time pad.âą Public key encryptionâą A taste of advanced topics
580 introduction to theoretical computer science
Figure 21.1: Snippet from encrypted communicationbetween queen Mary and Sir Babington
Figure 21.2: XKCDâs take on the added security ofusing uncommon symbols
or more - at their disposal. In fact, with public key encryption, an at-tacker can generate as many ciphertexts as they wish.
The key to this success has been a clearer understanding of bothhow to define security for cryptographic tools and how to relate thissecurity to concrete computational problems. Cryptography is a vastand continuously changing topic, but we will touch on some of theseissues in this chapter.
21.1 CLASSICAL CRYPTOSYSTEMS
A great many cryptosystems have been devised and broken through-out the ages. Let us recount just some of these stories. In 1587, Marythe queen of Scots, and the heir to the throne of England, wanted toarrange the assassination of her cousin, queen Elisabeth I of England,so that she could ascend to the throne and finally escape the housearrest under which she had been for the last 18 years. As part of thiscomplicated plot, she sent a coded letter to Sir Anthony Babington.
Mary used whatâs known as a substitution cipher where each letteris transformed into a different obscure symbol (see Fig. 21.1). At afirst look, such a letter might seem rather inscrutable- a meaninglesssequence of strange symbols. However, after some thought, one mightrecognize that these symbols repeat several times and moreover thatdifferent symbols repeat with different frequencies. Now it doesnâttake a large leap of faith to assume that perhaps each symbol corre-sponds to a different letter and the more frequent symbols correspondto letters that occur in the alphabet with higher frequency. From thisobservation, there is a short gap to completely breaking the cipher,which was in fact done by queen Elisabethâs spies who used the de-coded letters to learn of all the co-conspirators and to convict queenMary of treason, a crime for which she was executed. Trusting in su-perficial security measures (such as using âinscrutableâ symbols) is atrap that users of cryptography have been falling into again and againover the years. (As in many things, this is the subject of a great XKCDcartoon, see Fig. 21.2.)
The VigenĂšre cipher is named after Blaise de VigenĂšre who de-scribed it in a book in 1586 (though it was invented earlier by Bellaso).The idea is to use a collection of substitution cyphers - if there are đdifferent ciphers then the first letter of the plaintext is encoded withthe first cipher, the second with the second cipher, the đđĄâ with theđđĄâ cipher, and then the đ + 1đ đĄ letter is again encoded with the firstcipher. The key is usually a word or a phrase of đ letters, and theđđĄâ substitution cipher is obtained by shifting each letter đđ positionsin the alphabet. This âflattensâ the frequencies and makes it muchharder to do frequency analysis, which is why this cipher was consid-ered âunbreakableâ for 300+ years and got the nickname âle chiffre
cryptography 581
Figure 21.3: Confederate Cipher Disk for implement-ing the VigenĂšre cipher
Figure 21.4: Confederate encryption of the messageâGenâl Pemberton: You can expect no help from thisside of the river. Let Genâl Johnston know, if possible,when you can attack the same point on the enemyâslines. Inform me also and I will endeavor to make adiversion. I have sent some caps. I subjoin a despatchfrom General Johnston.â
Figure 21.5: In the Enigma mechanical cipher the secretkey would be the settings of the rotors and internalwires. As the operator types up their message, theencrypted version appeared in the display area above,and the internal state of the cipher was updated (andso typing the same letter twice would generally resultin two different letters output). Decrypting followsthe same process: if the sender and receiver are usingthe same key then typing the ciphertext would resultin the plaintext appearing in the display.1 Here is a nice exercise: compute (up to an orderof magnitude) the probability that a 50-letter longmessage composed of random letters will end up notcontaining the letter âLâ.
indĂ©chiffrableâ (âthe unbreakable cipherâ). Nevertheless, CharlesBabbage cracked the VigenĂšre cipher in 1854 (though he did not pub-lish it). In 1863 Friedrich Kasiski broke the cipher and published theresult. The idea is that once you guess the length of the cipher, youcan reduce the task to breaking a simple substitution cipher which canbe done via frequency analysis (can you see why?). Confederate gen-erals used VigenĂšre regularly during the civil war, and their messageswere routinely cryptanalzed by Union officers.
The Enigma cipher was a mechanical cipher (looking like a type-writer, see Fig. 21.5) where each letter typed would get mapped into adifferent letter depending on the (rather complicated) key and currentstate of the machine which had several rotors that rotated at differentpaces. An identically wired machine at the other end could be usedto decrypt. Just as many ciphers in history, this has also been believedby the Germans to be âimpossible to breakâ and even quite late in thewar they refused to believe it was broken despite mounting evidenceto that effect. (In fact, some German generals refused to believe itwas broken even after the war.) Breaking Enigma was an heroic effortwhich was initiated by the Poles and then completed by the British atBletchley Park, with Alan Turing (of the Turing machines) playing akey role. As part of this effort the Brits built arguably the worldâs firstlarge scale mechanical computation devices (though they looked moresimilar to washing machines than to iPhones). They were also helpedalong the way by some quirks and errors of the German operators. Forexample, the fact that their messages ended with âHeil Hitlerâ turnedout to be quite useful.
Here is one entertaining anecdote: the Enigma machine wouldnever map a letter to itself. In March 1941, Mavis Batey, a cryptana-lyst at Bletchley Park received a very long message that she tried todecrypt. She then noticed a curious propertyâ the message did notcontain the letter âLâ.1 She realized that the probability that no âLââsappeared in the message is too small for this to happen by chance.Hence she surmised that the original message must have been com-posed only of Lâs. That is, it must have been the case that the operator,perhaps to test the machine, have simply sent out a message where herepeatedly pressed the letter âLâ. This observation helped her decodethe next message, which helped inform of a planned Italian attack andsecure a resounding British victory in what became known as âtheBattle of Cape Matapanâ. Mavis also helped break another Enigmamachine. Using the information she provided, the Brits were ableto feed the Germans with the false information that the main alliedinvasion would take place in Pas de Calais rather than on Normandy.
In the words of General Eisenhower, the intelligence from Bletchleypark was of âpriceless valueâ. It made a huge difference for the Allied
582 introduction to theoretical computer science
Figure 21.6: A private-key encryption scheme is apair of algorithms đž, đ· such that for every keyđ â 0, 1đ and plaintext đ„ â 0, 1đż(đ), đŠ = đžđ(đ„)is a ciphertext of length đ¶(đ). The encryption schemeis valid if for every such đŠ, đ·đ(đŠ) = đ„. That is, thedecryption of an encryption of đ„ is đ„, as long as bothencryption and decryption use the same key.
war effort, thereby shortening World War II and saving millions oflives. See also this interview with Sir Harry Hinsley.
21.2 DEFINING ENCRYPTION
Many of the troubles that cryptosystem designers faced over history(and still face!) can be attributed to not properly defining or under-standing what are the goals they want to achieve in the first place. Letus focus on the setting of private key encryption. (This is also known asâsymmetric encryptionâ; for thousands of years, âprivate key encryp-tionâ was synonymous with encryption and only in the 1970âs wasthe concept of public key encryption invented, see Definition 21.11.) Asender (traditionally called âAliceâ) wants to send a message (knownalso as a plaintext) đ„ â 0, 1â to a receiver (traditionally called âBobâ).They would like their message to be kept secret from an adversarywho listens in or âeavesdropsâ on the communication channel (and istraditionally called âEveâ).
Alice and Bob share a secret key đ â 0, 1â. (While the letter đis often used elsewhere in the book to denote a natural number, inthis chapter we use it to denote the string corresponding to a secretkey.) Alice uses the key đ to âscrambleâ or encrypt the plaintext đ„ intoa ciphertext đŠ, and Bob uses the key đ to âunscrambleâ or decrypt theciphertext đŠ back into the plaintext đ„. This motivates the followingdefinition which attempts to capture what it means for an encryptionscheme to be valid or âmake senseâ, regardless of whether or not it issecure:
Definition 21.1 â Valid encryption scheme. Let đż ⶠâ â â and đ¶ ⶠâ â âbe two functions mapping natural numbers to natural numbers.A pair of polynomial-time computable functions (đž, đ·) map-ping strings to strings is a valid private key encryption scheme (orencryption scheme for short) with plaintext length function đż(â ) andciphertext length function đ¶(â ) if for every đ â â, đ â 0, 1đ andđ„ â 0, 1đż(đ), |đžđ(đ„)| = đ¶(đ) andđ·(đ, đž(đ, đ„)) = đ„ . (21.1)
We will often write the first input (i.e., the key) to the encryp-tion and decryption as a subscript and so can write (21.1) also asđ·đ(đžđ(đ„)) = đ„.Solved Exercise 21.1 â Lengths of ciphertext and plaintext. Prove that forevery valid encryption scheme (đž, đ·) with functions đż, đ¶. đ¶(đ) â„đż(đ) for every đ.
cryptography 583
2 The actual quote is âIl faut quâil nâexige pas lesecret, et quâil puisse sans inconvĂ©nient tomber entreles mains de lâennemiâ loosely translated as âThesystem must not require secrecy and can be stolen bythe enemy without causing troubleâ. According toSteve Bellovin the NSA version is âassume that thefirst copy of any device we make is shipped to theKremlinâ.
Solution:
For every fixed key đ â 0, 1đ, the equation (21.1) implies thatthe map đŠ ⊠đ·đ(đŠ) inverts the map đ„ ⊠đžđ(đ„), which in partic-ular means that the map đ„ ⊠đžđ(đ„) must be one to one. Henceits codomain must be at least as large as its domain, and since itsdomain is 0, 1đż(đ) and its codomain is 0, 1đ¶(đ) it follows thatđ¶(đ) â„ đż(đ).
Since the ciphertext length is always at least the plaintext length(and in most applications it is not much longer than that), we typi-cally focus on the plaintext length as the quantity to optimize in anencryption scheme. The larger đż(đ) is, the better the scheme, since itmeans we need a shorter secret key to protect messages of the samelength.
21.3 DEFINING SECURITY OF ENCRYPTION
Definition 21.1 says nothing about the security of đž and đ·, and evenallows the trivial encryption scheme that ignores the key altogetherand sets đžđ(đ„) = đ„ for every đ„. Defining security is not a trivial matter.
PYou would appreciate the subtleties of defining secu-rity of encryption more if at this point you take a fiveminute break from reading, and try (possibly with apartner) to brainstorm on how you would mathemat-ically define the notion that an encryption scheme issecure, in the sense that it protects the secrecy of theplaintext đ„.
Throughout history, many attacks on cryptosystems were rootedin the cryptosystem designersâ reliance on âsecurity throughobscurityââ trusting that the fact their methods are not known totheir enemy will protect them from being broken. This is a faultyassumption - if you reuse a method again and again (even with adifferent key each time) then eventually your adversaries will figureout what you are doing. And if Alice and Bob meet frequently in asecure location to decide on a new method, they might as well takethe opportunity to exchange their secrets. These considerations ledAuguste Kerckhoffs in 1883 to state the following principle:
A cryptosystem should be secure even if everything about the system, except thekey, is public knowledge.2
Why is it OK to assume the key is secret and not the algorithm?Because we can always choose a fresh key. But of course that wonât
584 introduction to theoretical computer science
help us much if our key is â1234â or âpassw0rd!â. In fact, if you useany deterministic algorithm to choose the key then eventually youradversary will figure this out. Therefore for security we must choosethe key at random and can restate Kerckhoffsâs principle as follows:
There is no secrecy without randomness
This is such a crucial point that is worth repeating:
Big Idea 26 There is no secrecy without randomness.
At the heart of every cryptographic scheme there is a secret key,and the secret key is always chosen at random. A corollary of thatis that to understand cryptography, you need to know probabilitytheory.
RRemark 21.2 â Randomness in the real world. Choos-ing the secrets for cryptography requires generatingrandomness, which is often done by measuring someâunpredictableâ or âhigh entropyâ data, and thenapplying hash functions to the result to âextractâ auniformly random string. Great care must be taken indoing this, and randomness generators often turn outto be the Achilles heel of secure systems.In 2006 a programmer removed a line of code from theprocedure to generate entropy in OpenSSL packagedistributed by Debian since it caused a warning insome automatic verification code. As a result for twoyears (until this was discovered) all the randomnessgenerated by this procedure used only the processID as an âunpredictableâ source. This means that allcommunication done by users in that period is fairlyeasily breakable (and in particular, if some entitiesrecorded that communication they could break it alsoretroactively). See XKCDâs take on that incident.In 2012 two separate teams of researchers scanned alarge number of RSA keys on the web and found outthat about 4 percent of them are easy to break. Themain issue were devices such as routers, internet-connected printers and such. These devices sometimesrun variants of Linux- a desktop operating system-but without a hard drive, mouse or keyboard, theydonât have access to many of the entropy sources thatdesktop have. Coupled with some good old fashionedignorance of cryptography and software bugs, this ledto many keys that are downright trivial to break, seethis blog post and this web page for more details.Since randomness is so crucial to security, breakingthe procedure to generate randomness can lead to acomplete break of the system that uses this random-ness. Indeed, the Snowden documents, combined with
cryptography 585
observations of Shumow and Ferguson, strongly sug-gest that the NSA has deliberately inserted a trapdoorin one of the pseudorandom generators published bythe National Institute of Standards and Technologies(NIST). Fortunately, this generator wasnât widelyadapted but apparently the NSA did pay 10 milliondollars to RSA security so the latter would make thisgenerator their default option in their products.
21.4 PERFECT SECRECY
If you think about encryption scheme security for a while, you mightcome up with the following principle for defining security: âAnencryption scheme is secure if it is not possible to recover the key đ fromđžđ(đ„)â. However, a momentâs thought shows that the key is not reallywhat weâre trying to protect. After all, the whole point of an encryp-tion is to protect the confidentiality of the plaintext đ„. So, we can try todefine that âan encryption scheme is secure if it is not possible to recover theplaintext đ„ from đžđ(đ„)â. Yet it is not clear what this means either. Sup-pose that an encryption scheme reveals the first 10 bits of the plaintextđ„. It might still not be possible to recover đ„ completely, but on an in-tuitive level, this seems like it would be extremely unwise to use suchan encryption scheme in practice. Indeed, often even partial informationabout the plaintext is enough for the adversary to achieve its goals.
The above thinking led Shannon in 1945 to formalize the notion ofperfect secrecy, which is that an encryption reveals absolutely nothingabout the message. There are several equivalent ways to define it, butperhaps the cleanest one is the following:
Definition 21.3 â Perfect secrecy. A valid encryption scheme (đž, đ·)with plaintext length đż(â ) is perfectly secret if for every đ â â andplaintexts đ„, đ„âČ â 0, 1đż(đ), the following two distributions đ andđ âČ over 0, 1â are identical:
âą đ is obtained by sampling đ ⌠0, 1đ and outputting đžđ(đ„).âą đ âČ is obtained by sampling đ ⌠0, 1đ and outputting đžđ(đ„âČ).
PThis definition might take more than one readingto parse. Try to think of how this condition wouldcorrespond to your intuitive notion of âlearning noinformationâ about đ„ from observing đžđ(đ„), and toShannonâs quote in the beginning of this chapter.
586 introduction to theoretical computer science
Figure 21.7: For any key length đ, we can visualize anencryption scheme (đž, đ·) as a graph with a vertexfor every one of the 2đż(đ) possible plaintexts and forevery one of the ciphertexts in 0, 1â of the formđžđ(đ„) for đ â 0, 1đ and đ„ â 0, 1đż(đ). For everyplaintext đ„ and key đ, we add an edge labeled đbetween đ„ and đžđ(đ„). By the validity condition, if wepick any fixed key đ, the map đ„ ⊠đžđ(đ„) must beone-to-one. The condition of perfect secrecy simplycorresponds to requiring that every two plaintextsđ„ and đ„âČ have exactly the same set of neighbors (ormulti-set, if there are parallel edges).
In particular, suppose that you knew ahead of timethat Alice sent either an encryption of đ„ or an en-cryption of đ„âČ. Would you learn anything new fromobserving the encryption of the message that Aliceactually sent? It may help you to look at Fig. 21.7.
21.4.1 Example: Perfect secrecy in the battlefieldTo understand Definition 21.3, suppose that Alice sends only one oftwo possible messages: âattackâ or âretreatâ, which we denote by đ„0and đ„1 respectively, and that she sends each one of those messageswith probability 1/2. Let us put ourselves in the shoes of Eve, theeavesdropping adversary. A priori we would have guessed that Alicesent either đ„0 or đ„1 with probability 1/2. Now we observe đŠ = đžđ(đ„đ)where đ is a uniformly chosen key in 0, 1đ. How does this newinformation cause us to update our beliefs on whether Alice sent theplaintext đ„0 or the plaintext đ„1?
PBefore reading the next paragraph, you might wantto try the analysis yourself. You may find it useful tolook at the Wikipedia entry on Bayesian Inference orthese MIT lecture notes.
Let us define đ0(đŠ) to be the probability (taken over đ ⌠0, 1đ)that đŠ = đžđ(đ„0) and similarly đ1(đŠ) to be PrđâŒ0,1đ [đŠ = đžđ(đ„1)].Note that, since Alice chooses the message to send at random, oura priori probability for observing đŠ is 12 đ0(đŠ) + 12 đ1(đŠ). However,as per Definition 21.3, the perfect secrecy condition guarantees thatđ0(đŠ) = đ1(đŠ)! Let us denote the number đ0(đŠ) = đ1(đŠ) by đ. By theformula for conditional probability, the probability that Alice sent themessage đ„0 conditioned on our observation đŠ is simply
Pr[đ = 0|đŠ = đžđ(đ„đ)] = Pr[đ = 0 ⧠đŠ = đžđ(đ„đ)]Pr[đŠ = đžđ(đ„)] . (21.2)
(The equation (21.2) is a special case of Bayesâ rule which, althougha simple restatement of the formula for conditional probability, isan extremely important and widely used tool in statistics and dataanalysis.)
Since the probability that đ = 0 and đŠ is the ciphertext đžđ(0) is equalto 12 â đ0(đŠ), and the a priori probability of observing đŠ is 12 đ0(đŠ) +12 đ1(đŠ), we can rewrite (21.2) as
Pr[đ = 0|đŠ = đžđ(đ„đ)] = 12 đ0(đŠ)12 đ0(đŠ) + 12 đ1(đŠ) = đđ + đ = 12 (21.3)
cryptography 587
Figure 21.8: A perfectly secret encryption schemefor two-bit keys and messages. The blue verticesrepresent plaintexts and the red vertices representciphertexts, each edge mapping a plaintext đ„ to a ci-phertext đŠ = đžđ(đ„) is labeled with the correspondingkey đ. Since there are four possible keys, the degree ofthe graph is four and it is in fact a complete bipartitegraph. The encryption scheme is valid in the sensethat for every đ â 0, 12, the map đ„ ⊠đžđ(đ„) isone-to-one, which in other words means that the setof edges labeled with đ is a matching.
using the fact that đ0(đŠ) = đ1(đŠ) = đ. This means that observing theciphertext đŠ did not help us at all! We still would not be able to guesswhether Alice sent âattackâ or âretreatâ with better than 50/50 odds!
This example can be vastly generalized to show that perfect secrecyis indeed âperfectâ in the sense that observing a ciphertext gives Eveno additional information about the plaintext beyond her a priori knowl-edge.
21.4.2 Constructing perfectly secret encryptionPerfect secrecy is an extremely strong condition, and implies that aneavesdropper does not learn any information from observing the ci-phertext. You might think that an encryption scheme satisfying such astrong condition will be impossible, or at least extremely complicated,to achieve. However it turns out we can in fact obtain perfectly secretencryption scheme fairly easily. Such a scheme for two-bit messages isillustrated in Fig. 21.8
In fact, this can be generalized to any number of bits:
Theorem 21.4 â One Time Pad (Vernam 1917, Shannon 1949). There is a per-fectly secret valid encryption scheme (đž, đ·) with đż(đ) = đ¶(đ) = đ.
Proof Idea:
Our scheme is the one-time pad also known as the âVernam Ci-pherâ, see Fig. 21.9. The encryption is exceedingly simple: to encrypta message đ„ â 0, 1đ with a key đ â 0, 1đ we simply output đ„ â đwhere â is the bitwise XOR operation that outputs the string corre-sponding to XORing each coordinate of đ„ and đ.âProof of Theorem 21.4. For two binary strings đ and đ of the samelength đ, we define đ â đ to be the string đ â 0, 1đ such thatđđ = đđ + đđ mod 2 for every đ â [đ]. The encryption scheme(đž, đ·) is defined as follows: đžđ(đ„) = đ„ â đ and đ·đ(đŠ) = đŠ â đ.By the associative law of addition (which works also modulo two),đ·đ(đžđ(đ„)) = (đ„ â đ) â đ = đ„ â (đ â đ) = đ„ â 0đ = đ„, using the factthat for every bit đ â 0, 1, đ + đ mod 2 = 0 and đ + 0 = đ mod 2.Hence (đž, đ·) form a valid encryption.
To analyze the perfect secrecy property, we claim that for everyđ„ â 0, 1đ, the distribution đđ„ = đžđ(đ„) where đ ⌠0, 1đ is simplythe uniform distribution over 0, 1đ, and hence in particular thedistributions đđ„ and đđ„âČ are identical for every đ„, đ„âČ â 0, 1đ. Indeed,for every particular đŠ â 0, 1đ, the value đŠ is output by đđ„ if andonly if đŠ = đ„ â đ which holds if and only if đ = đ„ â đŠ. Since đ ischosen uniformly at random in 0, 1đ, the probability that đ happens
588 introduction to theoretical computer science
Figure 21.9: In the one time pad encryption scheme weencrypt a plaintext đ„ â 0, 1đ with a key đ â 0, 1đby the ciphertext đ„ â đ where â denotes the bitwiseXOR operation.
to equal đ„ â đŠ is exactly 2âđ, which means that every string đŠ is outputby đđ„ with probability 2âđ.
PThe argument above is quite simple but is worthreading again. To understand why the one-time padis perfectly secret, it is useful to envision it as a bi-partite graph as weâve done in Fig. 21.8. (In fact theencryption scheme of Fig. 21.8 is precisely the one-time pad for đ = 2.) For every đ, the one-time padencryption scheme corresponds to a bipartite graphwith 2đ vertices on the âleft sideâ corresponding to theplaintexts in 0, 1đ and 2đ vertices on the âright sideâcorresponding to the ciphertexts 0, 1đ. For everyđ„ â 0, 1đ and đ â 0, 1đ, we connect đ„ to the vertexđŠ = đžđ(đ„) with an edge that we label with đ. One cansee that this is the complete bipartite graph, whereevery vertex on the left is connected to all vertices onthe right. In particular this means that for every leftvertex đ„, the distribution on the ciphertexts obtainedby taking a random đ â 0, 1đ and going to theneighbor of đ„ on the edge labeled đ is the uniform dis-tribution over 0, 1đ. This ensures the perfect secrecycondition.
21.5 NECESSITY OF LONG KEYS
So, does Theorem 21.4 give the final word on cryptography, andmeans that we can all communicate with perfect secrecy and livehappily ever after? No it doesnât. While the one-time pad is efficient,and gives perfect secrecy, it has one glaring disadvantage: to commu-nicate đ bits you need to store a key of length đ. In contrast, practicallyused cryptosystems such as AES-128 have a short key of 128 bits (i.e.,16 bytes) that can be used to protect terabytes or more of communica-tion! Imagine that we all needed to use the one time pad. If that wasthe case, then if you had to communicate with đ people, you wouldhave to maintain (securely!) đ huge files that are each as long as thelength of the maximum total communication you expect with that per-son. Imagine that every time you opened an account with Amazon,Google, or any other service, they would need to send you in the mail(ideally with a secure courier) a DVD full of random numbers, andevery time you suspected a virus, youâd need to ask all these servicesfor a fresh DVD. This doesnât sound so appealing.
This is not just a theoretical issue. The Soviets have used the one-time pad for their confidential communication since before the 1940âs.In fact, even before Shannonâs work, the U.S. intelligence already
cryptography 589
Figure 21.10: Gene Grabeel, who founded the U.S.Russian SigInt program on 1 Feb 1943. Photo taken in1942, see Page 7 in the Venona historical study.
Figure 21.11: An encryption scheme where the num-ber of keys is smaller than the number of plaintextscorresponds to a bipartite graph where the degree issmaller than the number of vertices on the left side.Together with the validity condition this implies thatthere will be two left vertices đ„, đ„âČ with non-identicalneighborhoods, and hence the scheme does not satisfyperfect secrecy.
knew in 1941 that the one-time pad is in principle âunbreakableâ (seepage 32 in the Venona document). However, it turned out that thehassle of manufacturing so many keys for all the communication tookits toll on the Soviets and they ended up reusing the same keys formore than one message. They did try to use them for completely dif-ferent receivers in the (false) hope that this wouldnât be detected. TheVenona Project of the U.S. Army was founded in February 1943 byGene Grabeel (see Fig. 21.10), a former home economics teacher fromMadison Heights, Virgnia and Lt. Leonard Zubko. In October 1943,they had their breakthrough when it was discovered that the Russianswere reusing their keys. In the 37 years of its existence, the project hasresulted in a treasure chest of intelligence, exposing hundreds of KGBagents and Russian spies in the U.S. and other countries, includingJulius Rosenberg, Harry Gold, Klaus Fuchs, Alger Hiss, Harry DexterWhite and many others.
Unfortunately it turns out that such long keys are necessary forperfect secrecy:
Theorem 21.5 â Perfect secrecy requires long keys. For every perfectlysecret encryption scheme (đž, đ·) the length function đż satisfiesđż(đ) †đ.
Proof Idea:
The idea behind the proof is illustrated in Fig. 21.11. We define agraph between the plaintexts and ciphertexts, where we put an edgebetween plaintext đ„ and ciphertext đŠ if there is some key đ such thatđŠ = đžđ(đ„). The degree of this graph is at most the number of potentialkeys. The fact that the degree is smaller than the number of plaintexts(and hence of ciphertexts) implies that there would be two plaintextsđ„ and đ„âČ with different sets of neighbors, and hence the distributionof a ciphertext corresponding to đ„ (with a random key) will not beidentical to the distribution of a ciphertext corresponding to đ„âČ.âProof of Theorem 21.5. Let đž, đ· be a valid encryption scheme withmessages of length đż and key of length đ < đż. We will show that(đž, đ·) is not perfectly secret by providing two plaintexts đ„0, đ„1 â0, 1đż such that the distributions đđ„0 and đđ„1 are not identical, wheređđ„ is the distribution obtained by picking đ ⌠0, 1đ and outputtingđžđ(đ„).
We choose đ„0 = 0đż. Let đ0 â 0, 1â be the set of all ciphertextsthat have nonzero probability of being output in đđ„0 . That is, đ0 =đŠ | âđâ0,1đđŠ = đžđ(đ„0). Since there are only 2đ keys, we know that|đ0| †2đ.
590 introduction to theoretical computer science
We will show the following claim:Claim I: There exists some đ„1 â 0, 1đż and đ â 0, 1đ such thatđžđ(đ„1) â đ0.Claim I implies that the string đžđ(đ„1) has positive probability of
being output by đđ„1 and zero probability of being output by đđ„0 andhence in particular đđ„0 and đđ„1 are not identical. To prove Claim I, justchoose a fixed đ â 0, 1đ. By the validity condition, the map đ„ âŠđžđ(đ„) is a one to one map of 0, 1đż to 0, 1â and hence in particularthe image of this map which is the set đŒđ = đŠ | âđ„â0,1đżđŠ = đžđ(đ„)has size at least (in fact exactly) 2đż. Since |đ0| †2đ < 2đż, this meansthat |đŒđ| > |đ0| and so in particular there exists some string đŠ in đŒđ ⧔ đ0.But by the definition of đŒđ this means that there is some đ„ â 0, 1đżsuch that đžđ(đ„) â đ0 which concludes the proof of Claim I and henceof Theorem 21.5.
21.6 COMPUTATIONAL SECRECY
To sum up the previous episodes, we now know that:
âą It is possible to obtain a perfectly secret encryption scheme with keylength the same as the plaintext.
and
âą It is not possible to obtain such a scheme with key that is even asingle bit shorter than the plaintext.
How does this mesh with the fact that, as weâve already seen, peo-ple routinely use cryptosystems with a 16 byte (i.e., 128 bit) key butmany terabytes of plaintext? The proof of Theorem 21.5 does give infact a way to break all these cryptosystems, but an examination of thisproof shows that it only yields an algorithm with time exponential inthe length of the key. This motivates the following relaxation of perfectsecrecy to a condition known as âcomputational secrecyâ. Intuitively,an encryption scheme is computationally secret if no polynomial timealgorithm can break it. The formal definition is below:
Definition 21.6 â Computational secrecy. Let (đž, đ·) be a valid encryp-tion scheme where for keys of length đ, the plaintexts are of lengthđż(đ) and the ciphertexts are of length đ(đ). We say that (đž, đ·) iscomputationally secret if for every polynomial đ ⶠâ â â, and large
cryptography 591
enough đ, if đ is an đ(đ)-input and single output NAND-CIRCprogram of at most đ(đ) lines, and đ„0, đ„1 â 0, 1đż(đ) then
⣠đŒđâŒ0,1đ[đ (đžđ(đ„0))] â đŒđâŒ0,1đ[đ (đžđ(đ„1))]⣠< 1đ(đ) (21.4)
PDefinition 21.6 requires a second or third read andsome practice to truly understand. One excellent exer-cise to make sure you follow it is to see that if we allowđ to be an arbitrary function mapping 0, 1đ(đ) to0, 1, and we replace the condition in (21.4) that thelefthand side is smaller than 1đ(đ) with the conditionthat it is equal to 0 then we get the perfect secrecycondition of Definition 21.3. Indeed if the distributionsđžđ(đ„0) and đžđ(đ„1) are identical then applying anyfunction đ to them we get the same expectation. Onthe other hand, if the two distributions above give adifferent probability for some element đŠâ â 0, 1đ(đ),then the function đ(đŠ) that outputs 1 iff đŠ = đŠâ willhave a different expectation under the former distribu-tion than under the latter.
Definition 21.6 raises two natural questions:
âą Is it strong enough to ensure that a computationally secret encryp-tion scheme protects the secrecy of messages that are encryptedwith it?
âą It is weak enough that, unlike perfect secrecy, it is possible to obtaina computationally secret encryption scheme where the key is muchsmaller than the message?
To the best of our knowledge, the answer to both questions is Yes.This is just one example of a much broader phenomenon. We canuse computational hardness to achieve many cryptographic goals,including some goals that have been dreamed about for millenia, andother goals that people have not even dared to imagine.
Big Idea 27 Computational hardness is necessary and sufficient foralmost all cryptographic applications.
Regarding the first question, it is not hard to show that if, for ex-ample, Alice uses a computationally secret encryption algorithm toencrypt either âattackâ or âretreatâ (each chosen with probability
592 introduction to theoretical computer science
Figure 21.12: In a stream cipher or âderandomizedone-time padâ we use a pseudorandom generatorđș ⶠ0, 1đ â 0, 1đż to obtain an encryption schemewith a key length of đ and plaintexts of length đż.We encrypt the plaintext đ„ â 0, 1đż with keyđ â 0, 1đ by the ciphertext đ„ â đș(đ).
1/2), then as long as sheâs restricted to polynomial-time algorithms, anadversary Eve will not be able to guess the message with probabilitybetter than, say, 0.51, even after observing its encrypted form. (Weomit the proof, but it is an excellent exercise for you to work it out onyour own.)
To answer the second question we will show that under the sameassumption we used for derandomizing BPP, we can obtain a com-putationally secret cryptosystem where the key is almost exponentiallysmaller than the plaintext.
21.6.1 Stream ciphers or the âderandomized one-time padâIt turns out that if pseudorandom generators exist as in the optimalPRG conjecture, then there exists a computationally secret encryptionscheme with keys that are much shorter than the plaintext. The con-struction below is known as a stream cipher, though perhaps a bettername is the âderandomized one-time padâ. It is widely used in prac-tice with keys on the order of a few tens or hundreds of bits protectingmany terabytes or even petabytes of communication.
We start by recalling the notion of a pseudorandom generator, as de-fined in Definition 20.9. For this chapter, we will fix a special case ofthe definition:
Definition 21.7 â Cryptographic pseudorandom generator. Let đż ⶠâ â â besome function. A cryptographic pseudorandom generator with stretchđż(â ) is a polynomial-time computable function đș ⶠ0, 1â â 0, 1âsuch that:
âą For every đ â â and đ â 0, 1đ, |đș(đ )| = đż(đ).âą For every polynomial đ ⶠâ â â and đ large enough, if đ¶ is a cir-
cuit of đż(đ) inputs, one output, and at most đ(đ) gates then⣠Prđ âŒ0,1â[đ¶(đș(đ )) = 1] â PrđâŒ0,1đ[đ¶(đ) = 1]⣠< 1đ(đ) . (21.5)
In this chapter we will call a cryptographic pseudorandom gener-ator simply a pseudorandom generator or PRG for short. The optimalPRG conjecture of Section 20.4.2 implies that there is a pseudoran-dom generator that can âfoolâ circuits of exponential size and wherethe gap in probabilities is at most one over an exponential quantity.Since exponential grow faster than every polynomial, the optimal PRGconjecture implies the following:
The crypto PRG conjecture: For every đ â â, there is a cryptographicpseudorandom generator with đż(đ) = đđ.
cryptography 593
The crypto PRG conjecture is a weaker conjecture than the optimalPRG conjecture, but it too (as we will see) is still stronger than theconjecture that P â NP.
Theorem 21.8 â Derandomized one-time pad. Suppose that the cryptoPRG conjecture is true. Then for every constant đ â â there is acomputationally secret encryption scheme (đž, đ·) with plaintextlength đż(đ) at least đđ.
Proof Idea:
The proof is illustrated in Fig. 21.12. We simply take the one-timepad on đż bit plaintexts, but replace the key with đș(đ) where đ is astring in 0, 1đ and đș ⶠ0, 1đ â 0, 1đż is a pseudorandom gen-erator. Since the one time pad cannot be broken, an adversary thatbreaks the derandomized one-time pad can be used to distinguishbetween the output of the pseudorandom generator and the uniformdistribution.âProof of Theorem 21.8. Let đș ⶠ0, 1đ â 0, 1đż for đż = đđ be therestriction to input length đ of the pseudorandom generator đș whoseexistence we are guaranteed from the crypto PRG conjecture. Wenow define our encryption scheme as follows: given key đ â 0, 1đand plaintext đ„ â 0, 1đż, the encryption đžđ(đ„) is simply đ„ â đș(đ).To decrypt a string đŠ â 0, 1đ we output đŠ â đș(đ). This is a validencryption since đș is computable in polynomial time and (đ„ â đș(đ)) âđș(đ) = đ„ â (đș(đ) â đș(đ)) = đ„ for every đ„ â 0, 1đż.
Computational secrecy follows from the condition of a pseudo-random generator. Suppose, towards a contradiction, that there isa polynomial đ, NAND-CIRC program đ of at most đ(đż) lines andđ„, đ„âČ â 0, 1đż(đ) such that⣠đŒđâŒ0,1đ[đ(đžđ(đ„))] â đŒđâŒ0,1đ[đ(đžđ(đ„âČ))]⣠> 1đ(đż) . (21.6)
(We use here the simple fact that for a 0, 1-valued random variableđ, Pr[đ = 1] = đŒ[đ].)By the definition of our encryption scheme, this means that⣠đŒđâŒ0,1đ[đ(đș(đ) â đ„)] â đŒđâŒ0,1đ[đ(đș(đ) â đ„âČ)]⣠> 1đ(đż) . (21.7)
Now since (as we saw in the security analysis of the one-time pad),for every strings đ„, đ„âČ â 0, 1đż, the distribution đ â đ„ and đ â đ„âČ areidentical, where đ ⌠0, 1đż. HenceđŒđâŒ0,1đż[đ(đ â đ„)] = đŒđâŒ0,1đż[đ(đ â đ„âČ)] . (21.8)
594 introduction to theoretical computer science
By plugging (21.8) into (21.7) we can derive that⣠đŒđâŒ0,1đ[đ(đș(đ) â đ„)] â đŒđâŒ0,1đż[đ(đ â đ„)] + đŒđâŒ0,1đż[đ(đ â đ„âČ)] â đŒđâŒ0,1đ[đ(đș(đ) â đ„âČ)]⣠> 1đ(đż) .(21.9)
(Please make sure that you can see why this is true.)Now we can use the triangle inequality that |đŽ + đ”| †|đŽ| + |đ”| for
every two numbers đŽ, đ”, applying it for đŽ = đŒđâŒ0,1đ [đ(đș(đ) â đ„)] âđŒđâŒ0,1đż [đ(đâđ„)] and đ” = đŒđâŒ0,1đż [đ(đâđ„âČ)]âđŒđâŒ0,1đ [đ(đș(đ)âđ„âČ)]to derive⣠đŒđâŒ0,1đ[đ(đș(đ) â đ„)] â đŒđâŒ0,1đż[đ(đ â đ„)]âŁ+⣠đŒđâŒ0,1đż[đ(đ â đ„âČ)] â đŒđâŒ0,1đ[đ(đș(đ) â đ„âČ)]⣠> 1đ(đż) .
(21.10)In particular, either the first term or the second term of the
lefthand-side of (21.10) must be at least 12đ(đż) . Let us assume the firstcase holds (the second case is analyzed in exactly the same way).Then we get that⣠đŒđâŒ0,1đ[đ(đș(đ) â đ„)] â đŒđâŒ0,1đż[đ(đ â đ„)]⣠> 12đ(đż) . (21.11)
But if we now define the NAND-CIRC program đđ„ that on inputđ â 0, 1đż outputs đ(đâđ„) then (since XOR of đż bits can be computedin đ(đż) lines), we get that đđ„ has đ(đż) + đ(đż) lines and by (21.11) itcan distinguish between an input of the form đș(đ) and an input of theform đ ⌠0, 1đ with advantage better than 12đ(đż) . Since a polynomialis dominated by an exponential, if we make đż large enough, this willcontradict the (2đżđ, 2âđżđ) security of the pseudorandom generator đș.
RRemark 21.9 â Stream ciphers in practice. The twomost widely used forms of (private key) encryptionschemes in practice are stream ciphers and block ciphers.(To make things more confusing, a block cipher isalways used in some mode of operation and someof these modes effectively turn a block cipher intoa stream cipher.) A block cipher can be thought asa sort of a ârandom invertible mapâ from 0, 1đ to0, 1đ, and can be used to construct a pseudorandomgenerator and from it a stream cipher, or to encryptdata directly using other modes of operations. Thereare a great many other security notions and consider-ations for encryption schemes beyond computationalsecrecy. Many of those involve handling scenariossuch as chosen plaintext, man in the middle, and cho-sen ciphertext attacks, where the adversary is not justmerely a passive eavesdropper but can influence thecommunication in some way. While this chapter is
cryptography 595
meant to give you some taste of the ideas behind cryp-tography, there is much more to know before applyingit correctly to obtain secure applications, and a greatmany people have managed to get it wrong.
21.7 COMPUTATIONAL SECRECY AND NPWeâve also mentioned before that an efficient algorithm for NP couldbe used to break all cryptography. We now give an example of howthis can be done:
Theorem 21.10 â Breaking encryption using NP algorithm. If P = NPthen there is no computationally secret encryption scheme withđż(đ) > đ.
Furthermore, for every valid encryption scheme (đž, đ·) withđż(đ) > đ + 100 there is a polynomial đ such that for every largeenough đ there exist đ„0, đ„1 â 0, 1đż(đ) and a đ(đ)-line NAND-CIRC program EVE s.t.
PrđâŒ0,1,đâŒ0,1đ[EVE(đžđ(đ„đ)) = đ] â„ 0.99 . (21.12)
Note that the âfurthermoreâ part is extremely strong. It meansthat if the plaintext is even a little bit larger than the key, then we canalready break the scheme in a very strong way. That is, there will bea pair of messages đ„0, đ„1 (think of đ„0 as âsellâ and đ„1 as âbuyâ) andan efficient strategy for Eve such that if Eve gets a ciphertext đŠ thenshe will be able to tell whether đŠ is an encryption of đ„0 or đ„1 withprobability very close to 1. (We model breaking the scheme as Eveoutputting 0 or 1 corresponding to whether the message sent was đ„0or đ„1. Note that we could have just as well modified Eve to output đ„0instead of 0 and đ„1 instead of 1. The key point is that a priori Eve onlyhad a 50/50 chance of guessing whether Alice sent đ„0 or đ„1 but afterseeing the ciphertext this chance increases to better than 99/1.) Thecondition P = NP can be relaxed to NP â BPP and even the weakercondition NP â P/poly with essentially the same proof.Proof Idea:
The proof follows along the lines of Theorem 21.5 but this timepaying attention to the computational aspects. If P = NP then forevery plaintext đ„ and ciphertext đŠ, we can efficiently tell whether thereexists đ â 0, 1đ such that đžđ(đ„) = đŠ. So, to prove this result we needto show that if the plaintexts are long enough, there would exist a pairđ„0, đ„1 such that the probability that a random encryption of đ„1 also isa valid encryption of đ„0 will be very small. The details of how to showthis are below.
596 introduction to theoretical computer science
âProof of Theorem 21.10. We focus on showing only the âfurthermoreâpart since it is the more interesting and the other part follows by es-sentially the same proof.
Suppose that (đž, đ·) is such an encryption, let đ be large enough,and let đ„0 = 0đż(đ). For every đ„ â 0, 1đż(đ) we define đđ„ to be the setof all valid encryption of đ„. That is đđ„ = đŠ | âđâ0,1đđŠ = đžđ(đ„). Asin the proof of Theorem 21.5, since there are 2đ keys đ, |đđ„| †2đ forevery đ„ â 0, 1đż(đ).
We denote by đ0 the set đđ„0 . We define our algorithm EVE to out-put 0 on input đŠ â 0, 1â if đŠ â đ0 and to output 1 otherwise. Thiscan be implemented in polynomial time if P = NP, since the key đcan serve the role of an efficiently verifiable solution. (Can you seewhy?) Clearly Pr[EVE(đžđ(đ„0)) = 0] = 1 and so in the case that EVEgets an encryption of đ„0 then she guesses correctly with probability1. The remainder of the proof is devoted to showing that there ex-ists đ„1 â 0, 1đż(đ) such that Pr[EVE(đžđ(đ„1)) = 0] †0.01, whichwill conclude the proof by showing that EVE guesses wrongly withprobability at most 12 0 + 12 0.01 < 0.01.
Consider now the following probabilistic experiment (which wedefine solely for the sake of analysis). We consider the sample spaceof choosing đ„ uniformly in 0, 1đż(đ) and define the random variableđđ(đ„) to equal 1 if and only if đžđ(đ„) â đ0. For every đ, the map đ„ âŠđžđ(đ„) is one-to-one, which means that the probability that đđ = 1is equal to the probability that đ„ â đžâ1đ (đ0) which is |đ0|2đż(đ) . So by thelinearity of expectation đŒ[âđâ0,1đ đđ] †2đ|đ0|2đż(đ) †22đ2đż(đ) .
We will now use the following extremely simple but useful factknown as the averaging principle (see also Lemma 18.10): for everyrandom variable đ, if đŒ[đ] = đ, then with positive probability đ †đ.(Indeed, if đ > đ with probability one, then the expected value of đwill have to be larger than đ, just like you canât have a class in whichall students got A or A- and yet the overall average is B+.) In our caseit means that with positive probability âđâ0,1đ đđ †22đ2đż(đ) . In otherwords, there exists some đ„1 â 0, 1đż(đ) such that âđâ0,1đ đđ(đ„1) â€22đ2đż(đ) . Yet this means that if we choose a random đ ⌠0, 1đ, thenthe probability that đžđ(đ„1) â đ0 is at most 12đ â 22đ2đż(đ) = 2đâđż(đ).So, in particular if we have an algorithm EVE that outputs 0 if đ„ âđ0 and outputs 1 otherwise, then Pr[EVE(đžđ(đ„0)) = 0] = 1 andPr[EVE(đžđ(đ„1)) = 0] †2đâđż(đ) which will be smaller than 2â10 < 0.01if đż(đ) â„ đ + 10.
cryptography 597
In retrospect Theorem 21.10 is perhaps not surprising. After all, asweâve mentioned before it is known that the Optimal PRG conjecture(which is the basis for the derandomized one-time pad encryption) isfalse if P = NP (and in fact even if NP â BPP or even NP â P/poly).21.8 PUBLIC KEY CRYPTOGRAPHY
People have been dreaming about heavier-than-air flight since at leastthe days of Leonardo Da Vinci (not to mention Icarus from the greekmythology). Jules Verne wrote with rather insightful details aboutgoing to the moon in 1865. But, as far as I know, in all the thousandsof years people have been using secret writing, until about 50 yearsago no one has considered the possibility of communicating securelywithout first exchanging a shared secret key.
Yet in the late 1960âs and early 1970âs, several people started toquestion this âcommon wisdomâ. Perhaps the most surprising ofthese visionaries was an undergraduate student at Berkeley namedRalph Merkle. In the fall of 1974 Merkle wrote in a project proposalfor his computer security course that while âit might seem intuitivelyobvious that if two people have never had the opportunity to prear-range an encryption method, then they will be unable to communicatesecurely over an insecure channel⊠I believe it is falseâ. The projectproposal was rejected by his professor as ânot good enoughâ. Merklelater submitted a paper to the communication of the ACM where heapologized for the lack of references since he was unable to find anymention of the problem in the scientific literature, and the only sourcewhere he saw the problem even raised was in a science fiction story.The paper was rejected with the comment that âExperience shows thatit is extremely dangerous to transmit key information in the clear.âMerkle showed that one can design a protocol where Alice and Bobcan use đ invocations of a hash function to exchange a key, but anadversary (in the random oracle model, though he of course didnâtuse this name) would need roughly đ 2 invocations to break it. Heconjectured that it may be possible to obtain such protocols wherebreaking is exponentially harder than using them, but could not think ofany concrete way to doing so.
We only found out much later that in the late 1960âs, a few yearsbefore Merkle, James Ellis of the British Intelligence agency GCHQwas having similar thoughts. His curiosity was spurred by an oldWorld-War II manuscript from Bell labs that suggested the followingway that two people could communicate securely over a phone line.Alice would inject noise to the line, Bob would relay his messages,and then Alice would subtract the noise to get the signal. The idea isthat an adversary over the line sees only the sum of Aliceâs and Bobâssignals, and doesnât know what came from what. This got James Ellis
598 introduction to theoretical computer science
thinking whether it would be possible to achieve something like thatdigitally. As Ellis later recollected, in 1970 he realized that in princi-ple this should be possible, since he could think of an hypotheticalblack box đ” that on input a âhandleâ đŒ and plaintext đ„ would give aâciphertextâ đŠ and that there would be a secret key đœ correspondingto đŒ, such that feeding đœ and đŠ to the box would recover đ„. However,Ellis had no idea how to actually instantiate this box. He and otherskept giving this question as a puzzle to bright new recruits until oneof them, Clifford Cocks, came up in 1973 with a candidate solutionloosely based on the factoring problem; in 1974 another GCHQ re-cruit, Malcolm Williamson, came up with a solution using modularexponentiation.
But among all those thinking of public key cryptography, probablythe people who saw the furthest were two researchers at Stanford,Whit Diffie and Martin Hellman. They realized that with the adventof electronic communication, cryptography would find new applica-tions beyond the military domain of spies and submarines, and theyunderstood that in this new world of many users and point to pointcommunication, cryptography will need to scale up. Diffie and Hell-man envisioned an object which we now call âtrapdoor permutationâthough they called âone way trapdoor functionâ or sometimes simplyâpublic key encryptionâ. Though they didnât have full formal defini-tions, their idea was that this is an injective function that is easy (e.g.,polynomial-time) to compute but hard (e.g., exponential-time) to in-vert. However, there is a certain trapdoor, knowledge of which wouldallow polynomial time inversion. Diffie and Hellman argued that us-ing such a trapdoor function, it would be possible for Alice and Bobto communicate securely without ever having exchanged a secret key. Butthey didnât stop there. They realized that protecting the integrity ofcommunication is no less important than protecting its secrecy. Thusthey imagined that Alice could ârun encryption in reverseâ in order tocertify or sign messages.
At the point, Diffie and Hellman were in a position not unlikephysicists who predicted that a certain particle should exist but with-out any experimental verification. Luckily they met Ralph Merkle,and his ideas about a probabilistic key exchange protocol, together witha suggestion from their Stanford colleague John Gill, inspired themto come up with what today is known as the Diffie Hellman Key Ex-change (which unbeknownst to them was found two years earlier atGCHQ by Malcolm Williamson). They published their paper âNewDirections in Cryptographyâ in 1976, and it is considered to havebrought about the birth of modern cryptography.
The Diffie-Hellman Key Exchange is still widely used today forsecure communication. However, it still felt short of providing Diffie
cryptography 599
Figure 21.13: Top left: Ralph Merkle, Martin Hellmanand Whit Diffie, who together came up in 1976with the concept of public key encryption and a keyexchange protocol. Bottom left: Adi Shamir, Ron Rivest,and Leonard Adleman who, following Diffie andHellmanâs paper, discovered the RSA function thatcan be used for public key encryption and digitalsignatures. Interestingly, one can see the equationP = NP on the blackboard behind them. Right: JohnGill, who was the first person to suggest to Diffie andHellman that they use modular exponentiation as aneasy-to-compute but hard-to-invert function.
Figure 21.14: In a public key encryption, Alice generatesa private/public keypair (đ, đ), publishes đ and keepsđ secret. To encrypt a message for Alice, one onlyneeds to know đ. To decrypt it we need to know đ.
and Hellmanâs elusive trapdoor function. This was done the next yearby Rivest, Shamir and Adleman who came up with the RSA trapdoorfunction, which through the framework of Diffie and Hellman yieldednot just encryption but also signatures. (A close variant of the RSAfunction was discovered earlier by Clifford Cocks at GCHQ, thoughas far as I can tell Cocks, Ellis and Williamson did not realize theapplication to digital signatures.) From this point on began a flurry ofadvances in cryptography which hasnât died down till this day.
21.8.1 Defining public key encryptionA public key encryption consists of a triple of algorithms:
âą The key generation algorithm, which we denote by đŸđđŠđșđđ or KG forshort, is a randomized algorithm that outputs a pair of strings (đ, đ)where đ is known as the public (or encryption) key, and đ is knownas the private (or decryption) key. The key generation algorithm getsas input 1đ (i.e., a string of ones of length đ). We refer to đ as thesecurity parameter of the scheme. The bigger we make đ, the moresecure the encryption will be, but also the less efficient it will be.
âą The encryption algorithm, which we denote by đž, takes the encryp-tion key đ and a plaintext đ„, and outputs the ciphertext đŠ = đžđ(đ„).
âą The decryption algorithm, which we denote by đ·, takes the decryp-tion key đ and a ciphertext đŠ, and outputs the plaintext đ„ = đ·đ(đŠ).We now make this a formal definition:
Definition 21.11 â Public Key Encryption. A computationally secret publickey encryption with plaintext length đż ⶠâ â â is a triple of ran-domized polynomial-time algorithms (KG, đž, đ·) that satisfy thefollowing conditions:
âą For every đ, if (đ, đ) is output by KG(1đ) with positive proba-bility, and đ„ â 0, 1đż(đ), then đ·đ(đžđ(đ„)) = đ„ with probabilityone.
âą For every polynomial đ, and sufficiently large đ, if đ is a NAND-CIRC program of at most đ(đ) lines then for every đ„, đ„âČ â0, 1đż(đ), |đŒ[đ (đ, đžđ(đ„))] â đŒ[đ (đ, đžđ(đ„âČ))]| < 1/đ(đ), wherethis probability is taken over the coins of KG and đž.
Definition 21.11 allows đž and đ· to be randomized algorithms. Infact, it turns out that it is necessary for đž to be randomized to obtaincomputational secrecy. It also turns out that, unlike the private keycase, we can transform a public-key encryption that works for mes-sages that are only one bit long into a public-key encryption scheme
600 introduction to theoretical computer science
that can encrypt arbitrarily long messages, and in particular messagesthat are longer than the key. In particular this means that we cannot ob-tain a perfectly secret public-key encryption scheme even for one-bitlong messages (since it would imply a perfectly secret public-key, andhence in particular private-key, encryption with messages longer thanthe key).
We will not give full constructions for public key encryptionschemes in this chapter, but will mention some of the ideas thatunderlie the most widely used schemes today. These generally belongto one of two families:
âą Group theoretic constructions based on problems such as integer factor-ing and the discrete logarithm over finite fields or elliptic curves.
âą Lattice/coding based constructions based on problems such as theclosest vector in a lattice or bounded distance decoding.
Group-theory based encryptions such as the RSA cryptosystem, theDiffie-Hellman protocol, and Elliptic-Curve Cryptography, are cur-rently more widely implemented. But the lattice/coding schemes arerecently on the rise, particularly because the known group theoreticencryption schemes can be broken by quantum computers, which wediscuss in Chapter 23.
21.8.2 Diffie-Hellman key exchangeAs just one example of how public key encryption schemes are con-structed, let us now describe the Diffie-Hellman key exchange. Wedescribe the Diffie-Hellman protocol in a somewhat of an informallevel, without presenting a full security analysis.
The computational problem underlying the Diffie Hellman protocolis the discrete logarithm problem. Letâs suppose that đ is some integer.We can compute the map đ„ ⊠đđ„ and also its inverse đŠ ⊠logđ đŠ. (Forexample, we can compute a logarithm is by binary search: start withsome interval [đ„đđđ, đ„đđđ„] that is guaranteed to contain logđ đŠ. We canthen test whether the intervalâs midpoint đ„đđđ satisfies đđ„đđđ > đŠ, andbased on that halve the size of the interval.)
However, suppose now that we use modular arithmetic and workmodulo some prime number đ. If đ has đ binary digits and đ is in [đ]then we can compute the map đ„ ⊠đđ„ mod đ in time polynomial in đ.(This is not trivial, and is a great exercise for you to work this out; as ahint, start by showing that one can compute the map đ ⊠đ2đ mod đusing đ modular multiplications modulo đ, if youâre stumped, youcan look up this Wikipedia entry.) On the other hand, because of theâwraparoundâ property of modular arithmetic, we cannot run binarysearch to find the inverse of this map (known as the discrete logarithm).
cryptography 601
In fact, there is no known polynomial-time algorithm for computingthis discrete logarithm map (đ, đ„, đ) ⊠logđ đ„ mod đ, where we definelogđ đ„ mod đ as the number đ â [đ] such that đđ = đ„ mod đ.
The Diffie-Hellman protocol for Bob to send a message to Alice is asfollows:
âą Alice: Chooses đ to be a random đ bit long prime (which can bedone by choosing random numbers and running a primality testingalgorithm on them), and đ and đ at random in [đ]. She sends to Bobthe triple (đ, đ, đđ mod đ).
âą Bob: Given the triple (đ, đ, â), Bob sends a message đ„ â 0, 1đżto Alice by choosing đ at random in [đ], and sending to Alice thepair (đđ mod đ, đđđ(âđ mod đ) â đ„) where đđđ ⶠ[đ] â 0, 1âis some ârepresentation functionâ that maps [đ] to 0, 1đż. (Thefunction đđđ does not need to be one-to-one and you can think ofđđđ(đ§) as simply outputting đż of the bits of đ§ in the natural binaryrepresentation, it does need to satisfy certain technical conditionswhich we omit in this description.)
âą Alice: Given đâČ, đ§, Alice recovers đ„ by outputting đđđ(đâČđ mod đ) âđ§.The correctness of the protocol follows from the simple fact that(đđ)đ = (đđ)đ for every đ, đ, đ and this still holds if we work modulođ. Its security relies on the computational assumption that computing
this map is hard, even in a certain âaverage caseâ sense (this computa-tional assumption is known as the Decisional Diffie Hellman assump-tion). The Diffie-Hellman key exchange protocol can be thought of asa public key encryption where the Aliceâs first message is the publickey, and Bobâs message is the encryption.
One can think of the Diffie-Hellman protocol as being based ona âtrapdoor pseudorandom generatorâ whereas the triple đđ, đđ, đđđlooks ârandomâ to someone that doesnât know đ, but someone thatdoes know đ can see that raising the second element to the đ-th poweryields the third element. The Diffie-Hellman protocol can be describedabstractly in the context of any finite Abelian group for which we canefficiently compute the group operation. It has been implementedon other groups than numbers modulo đ, and in particular EllipticCurve Cryptography (ECC) is obtained by basing the Diffie Hell-man on elliptic curve groups which gives some practical advantages.Another common group theoretic basis for key-exchange/public keyencryption protocol is the RSA function. A big disadvantage of Diffie-Hellman (both the modular arithmetic and elliptic curve variants)and RSA is that both schemes can be broken in polynomial time by a
602 introduction to theoretical computer science
quantum computer. We will discuss quantum computing later in thiscourse.
21.9 OTHER SECURITY NOTIONS
There is a great deal to cryptography beyond just encryption schemes,and beyond the notion of a passive adversary. A central objectiveis integrity or authentication: protecting communications from beingmodified by an adversary. Integrity is often more fundamental thansecrecy: whether it is a software update or viewing the news, youmight often not care about the communication being secret as much asthat it indeed came from its claimed source. Digital signature schemesare the analog of public key encryption for authentication, and arewidely used (in particular as the basis for public key certificates) toprovide a foundation of trust in the digital world.
Similarly, even for encryption, we often need to ensure securityagainst active attacks, and so notions such as non-malleability andadaptive chosen ciphertext security have been proposed. An encryp-tion scheme is only as secure as the secret key, and mechanisms tomake sure the key is generated properly, and is protected against re-fresh or even compromise (i.e., forward secrecy) have been studied aswell. Hopefully this chapter provides you with some appreciation forcryptography as an intellectual field, but does not imbue you with afalse self confidence in implementing it.
Cryptographic hash functions is another widely used tool with a va-riety of uses, including extracting randomness from high entropysources, achieving hard-to-forge short âdigestsâ of files, protectingpasswords, and much more.
21.10 MAGIC
Beyond encryption and signature schemes, cryptographers have man-aged to obtain objects that truly seem paradoxical and âmagicalâ. Webriefly discuss some of these objects. We do not give any details, buthopefully this will spark your curiosity to find out more.
21.10.1 Zero knowledge proofsOn October 31, 1903, the mathematician Frank Nelson Cole,gave an hourlong lecture to a meeting of the American Mathe-matical Society where he did not speak a single word. Rather,he calculated on the board the value 267 â 1 which is equal to147, 573, 952, 589, 676, 412, 927, and then showed that this number isequal to 193, 707, 721 Ă 761, 838, 257, 287. Coleâs proof showed that267 â 1 is not a prime, but it also revealed additional information,namely its actual factors. This is often the case with proofs: they teachus more than just the validity of the statements.
cryptography 603
In Zero Knowledge Proofs we try to achieve the opposite effect. Wewant a proof for a statement đ where we can rigorously show that theproofs reveals absolutely no additional information about đ beyond thefact that it is true. This turns out to be an extremely useful object fora variety of tasks including authentication, secure protocols, voting,anonymity in cryptocurrencies, and more. Constructing these ob-jects relies on the theory of NP completeness. Thus this theory thatoriginally was designed to give a negative result (show that some prob-lems are hard) ended up yielding positive applications, enabling us toachieve tasks that were not possible otherwise.
21.10.2 Fully homomorphic encryptionSuppose that we are given a bit-by-bit encryption of a stringđžđ(đ„0), ⊠, đžđ(đ„đâ1). By design, these ciphertexts are supposed tobe âcompletely unscrutableâ and we should not be able to extractany information about đ„đâs from it. However, already in 1978, Rivest,Adleman and Dertouzos observed that this does not imply that wecould not manipulate these encryptions. For example, it turns out thesecurity of an encryption scheme does not immediately rule out theability to take a pair of encryptions đžđ(đ) and đžđ(đ) and computefrom them đžđ(đNANDđ) without knowing the secret key đ. But do thereexist encryption schemes that allow such manipulations? And if so, isthis a bug or a feature?
Rivest et al already showed that such encryption schemes couldbe immensely useful, and their utility has only grown in the age ofcloud computing. After all, if we can compute NAND then we canuse this to run any algorithm đ on the encrypted data, and mapđžđ(đ„0), ⊠, đžđ(đ„đâ1) to đžđ(đ (đ„0, ⊠, đ„đâ1)). For example, a clientcould store their secret data đ„ in encrypted form on the cloud, andhave the cloud provider perform all sorts of computation on thesedata without ever revealing to the provider the private key, and sowithout the provider ever learning any information about the secretdata.
The question of existence of such a scheme took much longer timeto resolve. Only in 2009 Craig Gentry gave the first construction ofan encryption scheme that allows to compute a universal basis ofgates on the data (known as a Fully Homomorphic Encryption scheme incrypto parlance). Gentryâs scheme left much to be desired in terms ofefficiency, and improving upon it has been the focus of an intensiveresearch program that has already seen significant improvements.
21.10.3 Multiparty secure computationCryptography is about enabling mutually distrusting parties toachieve a common goal. Perhaps the most general primitive achiev-
604 introduction to theoretical computer science
ing this objective is secure multiparty computation. The idea in securemultiparty computation is that đ parties interact together to computesome function đč(đ„0, ⊠, đ„đâ1) where đ„đ is the private input of the đ-thparty. The crucial point is that there is no commonly trusted party orauthority and that nothing is revealed about the secret data beyond thefunctionâs output. One example is an electronic voting protocol whereonly the total vote count is revealed, with the privacy of the individualvoters protected, but without having to trust any authority to eithercount the votes correctly or to keep information confidential. Anotherexample is implementing a second price (aka Vickrey) auction wheređ â 1 parties submit bids to an item owned by the đ-th party, and theitem goes to the highest bidder but at the price of the second highest bid.Using secure multiparty computation we can implement second priceauction in a way that will ensure the secrecy of the numerical valuesof all bids (including even the top one) except the second highest one,and the secrecy of the identity of all bidders (including even the sec-ond highest bidder) except the top one. We emphasize that such aprotocol requires no trust even in the auctioneer itself, that will alsonot learn any additional information. Secure multiparty computationcan be used even for computing randomized processes, with one exam-ple being playing Poker over the net without having to trust any serverfor correct shuffling of cards or not revealing the information.
Chapter Recap
âą We can formally define the notion of security of anencryption scheme.
âą Perfect secrecy ensures that an adversary does notlearn anything about the plaintext from the cipher-text, regardless of their computational powers.
âą The one-time pad is a perfectly secret encryptionwith the length of the key equaling the length ofthe message. No perfectly secret encryption canhave key shorter than the message.
âą Computational secrecy can be as good as perfectsecrecy since it ensures that the advantage thatcomputationally bounded adversaries gain fromobserving the ciphertext is exponentially small. Ifthe optimal PRG conjecture is true then there existsa computationally secret encryption scheme withmessages that can be (almost) exponentially biggerthan the key.
âą There are many cryptographic tools that go well be-yond private key encryption. These include publickey encryption, digital signatures and hash functions,as well as more âmagicalâ tools such as multipartysecure computation, fully homomorphic encryption, zeroknowledge proofs, and many others.
cryptography 605
21.11 EXERCISES
21.12 BIBLIOGRAPHICAL NOTES
Much of this text is taken from my lecture notes on cryptography.Shannonâs manuscript was written in 1945 but was classified, and a
partial version was only published in 1949. Still it has revolutionizedcryptography, and is the forerunner to much of what followed.
The Venona projectâs history is described in this document. Asidefrom Grabeel and Zubko, credit to the discovery that the Soviets werereusing keys is shared by Lt. Richard Hallock, Carrie Berry, FrankLewis, and Lt. Karl Elmquist, and there are others that have madeimportant contribution to this project. See pages 27 and 28 in thedocument.
In a 1955 letter to the NSA that only recently came forward, JohnNash proposed an âunbreakableâ encryption scheme. He wrote âIhope my handwriting, etc. do not give the impression I am just a crank orcircle-squarerâŠ. The significance of this conjecture [that certain encryptionschemes are exponentially secure against key recovery attacks] .. is that it isquite feasible to design ciphers that are effectively unbreakable.â. John Nashmade seminal contributions in mathematics and game theory, and wasawarded both the Abel Prize in mathematics and the Nobel MemorialPrize in Economic Sciences. However, he has struggled with mentalillness throughout his life. His biography, A Beautiful Mind was madeinto a popular movie. It is natural to compare Nashâs 1955 letter to theNSA to Gödelâs letter to von Neumann we mentioned before. Fromthe theoretical computer science point of view, the crucial differenceis that while Nash informally talks about exponential vs polynomialcomputation time, he does not mention the word âTuring Machineâor other models of computation, and it is not clear if he is aware or notthat his conjecture can be made mathematically precise (assuming aformalization of âsufficiently complex types of encipheringâ).
The definition of computational secrecy we use is the notion ofcomputational indistinguishability (known to be equivalent to semanticsecurity) that was given by Goldwasser and Micali in 1982.
Although they used a different terminology, Diffie and Hellmanalready made clear in their paper that their protocol can be used asa public key encryption, with the first message being put in a âpub-lic fileâ. In 1985, ElGamal showed how to obtain a signature schemebased on the Diffie Hellman ideas, and since he described the Diffie-Hellman encryption scheme in the same paper, the public key encryp-tion scheme originally proposed by Diffie and Hellman is sometimesalso known as ElGamal encryption.
My survey contains a discussion on the different types of public keyassumptions. While the standard elliptic curve cryptographic schemes
606 introduction to theoretical computer science
are as susceptible to quantum computers as Diffie-Hellman and RSA,their main advantage is that the best known classical algorithms forcomputing discrete logarithms over elliptic curve groups take time 2đđfor some đ > 0 where đ is the number of bits to describe a group ele-ment. In contrast, for the multiplicative group modulo a prime đ thebest algorithm take time 2đ(đ1/3đđđđŠđđđ(đ)) which means that (assum-ing the known algorithms are optimal) we need to set the prime to bebigger (and so have larger key sizes with corresponding overhead incommunication and computation) to get the same level of security.
Zero-knowledge proofs were constructed by Goldwasser, Micali,and Rackoff in 1982, and their wide applicability was shown (usingthe theory of NP completeness) by Goldreich, Micali, and Wigdersonin 1986.
Two party and multiparty secure computation protocols were con-structed (respectively) by Yao in 1982 and Goldreich, Micali, andWigderson in 1987. The latter work gave a general transformationfrom security against passive adversaries to security against activeadversaries using zero knowledge proofs.
22Proofs and algorithms
âLetâs not try to define knowledge, but try to define zero-knowledge.â, ShafiGoldwasser.
Proofs have captured human imagination for thousands of years,ever since the publication of Euclidâs Elements, a book second only tothe bible in the number of editions printed.
Plan:
âą Proofs and algorithms
âą Interactive proofs
âą Zero knowledge proofs
âą Propositions as types, Coq and other proof assistants.
22.1 EXERCISES
22.2 BIBLIOGRAPHICAL NOTES
Compiled on 8.26.2020 18:10
23Quantum computing
âWe always have had (secret, secret, close the doors!) ⊠a great deal of diffi-culty in understanding the world view that quantum mechanics represents âŠIt has not yet become obvious to me that thereâs no real problem. ⊠Can I learnanything from asking this question about computersâabout this may or maynot be mystery as to what the world view of quantum mechanics is?â , RichardFeynman, 1981
âThe only difference between a probabilistic classical world and the equationsof the quantum world is that somehow or other it appears as if the probabilitieswould have to go negativeâ, Richard Feynman, 1981
There were two schools of natural philosophy in ancient Greece.Aristotle believed that objects have an essence that explains their behav-ior, and a theory of the natural world has to refer to the reasons (or âfi-nal causeâ to use Aristotleâs language) as to why they exhibit certainphenomena. Democritus believed in a purely mechanistic explanationof the world. In his view, the universe was ultimately composed ofelementary particles (or Atoms) and our observed phenomena arisefrom the interactions between these particles according to some localrules. Modern science (arguably starting with Newton) has embracedDemocritusâ point of view, of a mechanistic or âclockworkâ universeof particles and forces acting upon them.
While the classification of particles and forces evolved with time,to a large extent the âbig pictureâ has not changed from Newton tillEinstein. In particular it was held as an axiom that if we knew fullythe current state of the universe (i.e., the particles and their propertiessuch as location and velocity) then we could predict its future state atany point in time. In computational language, in all these theories thestate of a system with đ particles could be stored in an array of đ(đ)numbers, and predicting the evolution of the system can be done byrunning some efficient (e.g., đđđđŠ(đ) time) deterministic computationon this array.
Compiled on 8.26.2020 18:10
Learning Objectives:âą See main aspects in which quantum
mechanics differs from local deterministictheories.
âą Model of quantum circuits, or equivalentlyQNAND-CIRC programs
âą The complexity class BQP and what we knowabout its relation to other classes
âą Ideas behind Shorâs Algorithm and theQuantum Fourier Transform
610 introduction to theoretical computer science
Figure 23.1: In the âdouble baseball experimentâ weshoot baseballs from a gun at a soft wall through ahard barrier that has one or two slits open in it. Thereis only âconstructive interferenceâ in the sense thatthe dent in each position in the wall when both slitsare open is the sum of the dents when each slit isopen on its own.
Figure 23.2: The setup of the double slit experimentin the case of photon or electron guns. We see alsodestructive interference in the sense that there aresome positions on the wall that get fewer hits whenboth slits are open than they get when only one of theslits is open. See also this video.
23.1 THE DOUBLE SLIT EXPERIMENT
Alas, in the beginning of the 20th century, several experimental re-sults were calling into question this âclockworkâ or âbilliard ballâtheory of the world. One such experiment is the famous double slit ex-periment. Here is one way to describe it. Suppose that we buy one ofthose baseball pitching machines, and aim it at a soft plastic wall, butput a metal barrier with a single slit between the machine and the plasticwall (see Fig. 23.1). If we shoot baseballs at the plastic wall, then someof the baseballs would bounce off the metal barrier, while some wouldmake it through the slit and dent the wall. If we now carve out an ad-ditional slit in the metal barrier then more balls would get through,and so the plastic wall would be even more dented.
So far this is pure common sense, and it is indeed (to my knowl-edge) an accurate description of what happens when we shoot base-balls at a plastic wall. However, this is not the same when we shootphotons. Amazingly, if we shoot with a âphoton gunâ (i.e., a laser) ata wall equipped with photon detectors through some barrier, then(as shown in Fig. 23.2) in some positions of the wall we will see fewerhits when the two slits are open than when only one of them is!. Inparticular there are positions in the wall that are hit when the first slitis open, hit when the second gun is open, but are not hit at all when bothslits are open!.
It seems as if each photon coming out of the gun is aware of theglobal setup of the experiment, and behaves differently if two slits areopen than if only one is. If we try to âcatch the photon in the actâ andplace a detector right next to each slit so we can see exactly the patheach photon takes then something even more bizarre happens. Themere fact that we measure the path changes the photonâs behavior, andnow this âdestructive interferenceâ pattern is gone and the numberof times a position is hit when two slits are open is the sum of thenumber of times it is hit when each slit is open.
PYou should read the paragraphs above more thanonce and make sure you appreciate how truly mindboggling these results are.
23.2 QUANTUM AMPLITUDES
The double slit and other experiments ultimately forced scientists toaccept a very counterintuitive picture of the world. It is not merelyabout nature being randomized, but rather it is about the probabilitiesin some sense âgoing negativeâ and cancelling each other!
quantum computing 611
To see what we mean by this, let us go back to the baseball exper-iment. Suppose that the probability a ball passes through the left slitis đđż and the probability that it passes through the right slit is đđ .Then, if we shoot đ balls out of each gun, we expect the wall will behit (đđż + đđ )đ times. In contrast, in the quantum world of photonsinstead of baseballs, it can sometimes be the case that in both the firstand second case the wall is hit with positive probabilities đđż and đđ respectively but somehow when both slits are open the wall (or a par-ticular position in it) is not hit at all. Itâs almost as if the probabilitiescan âcancel each other outâ.
To understand the way we model this in quantum mechanics, it ishelpful to think of a âlazy evaluationâ approach to probability. Wecan think of a probabilistic experiment such as shooting a baseballthrough two slits in two different ways:âą When a ball is shot, ânatureâ tosses a coin and decides if it will go
through the left slit (which happens with probability đđż), right slit(which happens with probability đđ ), or bounce back. If it passesthrough one of the slits then it will hit the wall. Later we can look atthe wall and find out whether or not this event happened, but thefact that the event happened or not is determined independently ofwhether or not we look at the wall.
âą The other viewpoint is that when a ball is shot, ânatureâ computesthe probabilities đđż and đđ as before, but does not yet âtoss thecoinâ and determines what happened. Only when we actuallylook at the wall, nature tosses a coin and with probability đđż + đđ ensures we see a dent. That is, nature uses âlazy evaluationâ, andonly determines the result of a probabilistic experiment when wedecide to measure it.While the first scenario seems much more natural, the end result
in both is the same (the wall is hit with probability đđż + đđ ) and sothe question of whether we should model nature as following the firstscenario or second one seems like asking about the proverbial tree thatfalls in the forest with no one hearing about it.
However, when we want to describe the double slit experimentwith photons rather than baseballs, it is the second scenario that lendsitself better to a quantum generalization. Quantum mechanics as-sociates a number đŒ known as an amplitude with each probabilisticexperiment. This number đŒ can be negative, and in fact even complex.We never observe the amplitudes directly, since whenever we mea-sure an event with amplitude đŒ, nature tosses a coin and determinesthat the event happens with probability |đŒ|2. However, the sign (orin the complex case, phase) of the amplitudes can affect whether twodifferent events have constructive or destructive interference.
612 introduction to theoretical computer science
Specifically, consider an event that can either occur or not (e.g. âde-tector number 17 was hit by a photonâ). In classical probability, wemodel this by a probability distribution over the two outcomes: a pairof non-negative numbers đ and đ such that đ + đ = 1, where đ corre-sponds to the probability that the event occurs and đ corresponds tothe probability that the event does not occur. In quantum mechanics,we model this also by pair of numbers, which we call amplitudes. Thisis a pair of (potentially negative or even complex) numbers đŒ and đœsuch that |đŒ|2 + |đœ|2 = 1. The probability that the event occurs is |đŒ|2and the probability that it does not occur is |đœ|2. In isolation, thesenegative or complex numbers donât matter much, since we anywaysquare them to obtain probabilities. But the interaction of positive andnegative amplitudes can result in surprising cancellations where some-how combining two scenarios where an event happens with positiveprobability results in a scenario where it never does.
PIf you donât find the above description confusing andunintuitive, you probably didnât get it. Please makesure to re-read the above paragraphs until you arethoroughly confused.
Quantum mechanics is a mathematical theory that allows us tocalculate and predict the results of the double-slit and many other ex-periments. If you think of quantum mechanics as an explanation as towhat âreallyâ goes on in the world, it can be rather confusing. How-ever, if you simply âshut up and calculateâ then it works amazinglywell at predicting experimental results. In particular, in the doubleslit experiment, for any position in the wall, we can compute num-bers đŒ and đœ such that photons from the first and second slit hit thatposition with probabilities |đŒ|2 and |đœ|2 respectively. When we openboth slits, the probability that the position will be hit is proportionalto |đŒ + đœ|2, and so in particular, if đŒ = âđœ then it will be the case that,despite being hit when either slit one or slit two are open, the positionis not hit at all when they both are. If you are confused by quantummechanics, you are not alone: for decades people have been trying tocome up with explanations for âthe underlying realityâ behind quan-tum mechanics, including Bohmian Mechanics, Many Worlds andothers. However, none of these interpretations have gained universalacceptance and all of those (by design) yield the same experimentalpredictions. Thus at this point many scientists prefer to just ignore thequestion of what is the âtrue realityâ and go back to simply âshuttingup and calculatingâ.
quantum computing 613
RRemark 23.1 â Complex vs real, other simplifications. If(like the author) you are a bit intimidated by complexnumbers, donât worry: you can think of all ampli-tudes as real (though potentially negative) numberswithout loss of understanding. All the âmagicâ ofquantum computing already arises in this case, andso we will often restrict attention to real amplitudes inthis chapter.We will also only discuss so-called pure quantumstates, and not the more general notion of mixed states.Pure states turn out to be sufficient for understandingthe algorithmic aspects of quantum computing.More generally, this chapter is not meant to be a com-plete description of quantum mechanics, quantuminformation theory, or quantum computing, but ratherillustrate the main points where these differ fromclassical computing.
23.2.1 Linear algebra quick reviewLinear algebra underlies much of quantum mechanics, and so youwould do well to review some of the basic notions such as vectors,matrices, and linear subspaces. The operations in quantum mechanicscan be represented as linear functions over the complex numbers, butwe stick to the real numbers in this chapter. This does not cause muchloss in understanding but does allow us to simplify our notation andeliminate the use of the complex conjugate.
The main notions we use are:
âą A function đč ⶠâđ â âđ is linear if đč(đŒđą + đœđŁ) = đŒđč(đą) + đœđč(đŁ) forevery đŒ, đœ â â and đą, đŁ â âđ .
âą The inner product of two vectors đą, đŁ â âđ can be defined as âšđą, đŁâ© =âđâ[đ] đąđđŁđ. (There can be different inner products but we stickto this one.) The norm of a vector đą â âđ is defined as âđąâ =ââšđą, đąâ© = ââđâ[đ] đą2đ . We say that đą is a unit vector if âđąâ = 1.
âą Two vectors đą, đŁ â âđ are orthogonal if âšđą, đŁâ© = 0. An orthonormalbasis for âđ is a set of đ vectors đŁ0, đŁ1, ⊠, đŁđâ1 such that âđŁđâ = 1for every đ â [đ] and âšđŁđ, đŁđâ© = 0 for every đ â đ. A canoncialexample is the standard basis đ0, ⊠, đđâ1, where đđ is the vector thathas zeroes in all cooordinates except the đ-th coordinate in whichits value is 1. A quirk of the quantum mechanics literature is thatđđ is often denoted by |đâ©. We often consider the case đ = 2đ, inwhich case we identify [đ] with 0, 1đ and for every đ„ â 0, 1đ,we denote the standard basis element corresponding to the đ„-thcoordinate by |đ„â©.
614 introduction to theoretical computer science
1 If you are extremely paranoid about Alice and Bobcommunicating with one another, you can coordinatewith your assistant to perform the experiment exactlyat the same time, and make sure that the roomsare sufficiently far apart (e.g., are on two differentcontinents, or maybe even one is on the moon andanother is on earth) so that Alice and Bob couldnâtcommunicate to each other in time the results of theirrespective coins even if they do so at the speed oflight.
âą If đą is a vector in âđ and đŁ0, ⊠, đŁđâ1 is an orthonormal basis forâđ , then there are coefficients đŒ0, ⊠, đŒđâ1 such that đą = đŒ0đŁ0 +⯠+ đŒđâ1đŁđâ1. Consequently, the value đč(đą) is determined by thevalues đč(đŁ0), âŠ, đč(đŁđâ1). Moreover, âđąâ = ââđâ[đ] đŒ2đ .
âą We can represent a linear function đč ⶠâđ â âđ as an đ Ă đmatrix đ(đč) where the coordinate in the đ-th row and đ-th columnof đ(đč) (that is đ(đč)đ,đ) is equal to âšđđ, đč (đđ)â© or equivalently theđ-th coordinate of đč(đđ).
âą A linear function đč ⶠâđ â âđ such that âđč(đą)â = âđąâ for everyđą is called unitary. It can be shown that a function đč is unitary ifand only if đ(đč)đ(đč)†= đŒ where †is the transpose operator(in the complex case the conjugate transpose) and đŒ is the đ Ă đidentity matrix that has 1âs on the diagonal and zeroes everywhereelse. (For every two matrices đŽ, đ”, we use đŽđ” to denote the matrixproduct of đŽ and đ”.) Another equivalent characterization of thiscondition is that đ(đč)†= đ(đč)â1 and yet another is that both therows and columns of đ(đč) form an orthonormal basis.
23.3 BELLâS INEQUALITY
There is something weird about quantum mechanics. In 1935 Einstein,Podolsky and Rosen (EPR) tried to pinpoint this issue by highlightinga previously unrealized corollary of this theory. They showed thatthe idea that nature does not determine the results of an experimentuntil it is measured results in so called âspooky action at a distanceâ.Namely, making a measurement of one object may instantaneouslyeffect the state (i.e., the vector of amplitudes) of another object in theother end of the universe.
Since the vector of amplitudes is just a mathematical abstraction,the EPR paper was considered to be merely a thought experiment forphilosophers to be concerned about, without bearing on experiments.This changed when in 1965 John Bell showed an actual experimentto test the predictions of EPR and hence pit intuitive common senseagainst quantum mechanics. Quantum mechanics won: it turns outthat it is in fact possible to use measurements to create correlationsbetween the states of objects far removed from one another that cannotbe explained by any prior theory. Nonetheless, since the results ofthese experiments are so obviously wrong to anyone that has ever satin an armchair, that there are still a number of Bell denialists arguingthat this canât be true and quantum mechanics is wrong.
So, what is this Bellâs Inequality? Suppose that Alice and Bob tryto convince you they have telepathic ability, and they aim to prove itvia the following experiment. Alice and Bob will be in separate closed
quantum computing 615
rooms.1 You will interrogate Alice and your associate will interrogateBob. You choose a random bit đ„ â 0, 1 and your associate choosesa random đŠ â 0, 1. We let đ be Aliceâs response and đ be Bobâsresponse. We say that Alice and Bob win this experiment if đ â đ =đ„ ⧠đŠ. In other words, Alice and Bob need to output two bits thatdisagree if đ„ = đŠ = 1 and agree otherwise.
Now if Alice and Bob are not telepathic, then they need to agree inadvance on some strategy. Itâs not hard for Alice and Bob to succeedwith probability 3/4: just always output the same bit. Moreover, bydoing some case analysis, we can show that no matter what strategythey use, Alice and Bob cannot succeed with higher probability thanthat:
Theorem 23.2 â Bellâs Inequality. For every two functions đ, đ ⶠ0, 1 â0, 1, Prđ„,đŠâ0,1[đ(đ„) â đ(đŠ) = đ„ ⧠đŠ] †3/4.Proof. Since the probability is taken over all four choices of đ„, đŠ â0, 1, the only way the theorem can be violated if if there exist twofunctions đ, đ that satisfy đ(đ„) â đ(đŠ) = đ„ ⧠đŠ (23.1)
for all the four choices of đ„, đŠ â 0, 12. Letâs plug in all these fourchoices and see what we get (below we use the equalities đ§ â 0 = đ§,đ§ ⧠0 = 0 and đ§ ⧠1 = đ§):đ(0) â đ(0) = 0 (plugging in đ„ = 0, đŠ = 0)đ(0) â đ(1) = 0 (plugging in đ„ = 0, đŠ = 1)đ(1) â đ(0) = 0 (plugging in đ„ = 1, đŠ = 0)đ(1) â đ(1) = 1 (plugging in đ„ = 1, đŠ = 1) (23.2)
If we XOR together the first and second equalities we get đ(0) âđ(1) = 0 while if we XOR together the third and fourth equalities weget đ(0) â đ(1) = 1, thus obtaining a contradiction.
RRemark 23.3 â Randomized strategies. Theorem 23.2above assumes that Alice and Bob use deterministicstrategies đ and đ respectively. More generally, Aliceand Bob could use a randomized strategy, or equiva-lently, each could choose đ and đ from some distri-butions â± and đą respectively. However the averagingprinciple (Lemma 18.10) implies that if all possibledeterministic strategies succeed with probability atmost 3/4, then the same is true for all randomizedstrategies.
616 introduction to theoretical computer science
2 More accurately, one either has to give up on aâbilliard ball typeâ theory of the universe or believein telepathy (believe it or not, some scientists went forthe latter option).
An amazing experimentally verified fact is that quantum mechanicsallows for âtelepathyâ.2 Specifically, it has been shown that using theweirdness of quantum mechanics, there is in fact a strategy for Aliceand Bob to succeed in this game with probability larger than 3/4 (infact, they can succeed with probability about 0.85, see Lemma 23.5).
23.4 QUANTUM WEIRDNESS
Some of the counterintuitive properties that arise from quantum me-chanics include:
âą Interference - As weâve seen, quantum amplitudes can âcancel eachother outâ.
âą Measurement - The idea that amplitudes are negative as long asâno one is lookingâ and âcollapseâ (by squaring them) to positiveprobabilities when they are measured is deeply disturbing. Indeed,as shown by EPR and Bell, this leads to various strange outcomessuch as âspooky actions at a distanceâ, where we can create corre-lations between the results of measurements in places far removed.Unfortunately (or fortunately?) these strange outcomes have beenconfirmed experimentally.
âą Entanglement - The notion that two parts of the system could beconnected in this weird way where measuring one will affect theother is known as quantum entanglement.
As counter-intuitive as these concepts are, they have been experi-mentally confirmed, so we just have to live with them.
The discussion in this chapter of quantum mechanics in general andquantum computing in particular is quite brief and superficial, theâbibliographical notesâ section (Section 23.13) contains references andlinks to many other resources that cover this material in more depth.
23.5 QUANTUM COMPUTING AND COMPUTATION - AN EXECUTIVESUMMARY
One of the strange aspects of the quantum-mechanical picture of theworld is that unlike in the billiard ball example, there is no obviousalgorithm to simulate the evolution of đ particles over đĄ time periodsin đđđđŠ(đ, đĄ) steps. In fact, the natural way to simulate đ quantum par-ticles will require a number of steps that is exponential in đ. This is ahuge headache for scientists that actually need to do these calculationsin practice.
In the 1981, physicist Richard Feynman proposed to âturn thislemon to lemonadeâ by making the following almost tautologicalobservation:
quantum computing 617
3 As its title suggests, Feynmanâs lecture was actuallyfocused on the other side of simulating physics witha computer. However, he mentioned that as a âsideremarkâ one could wonder if itâs possible to simulatephysics with a new kind of computer - a âquantumcomputerâ which would ânot [be] a Turing machine,but a machine of a different kindâ. As far as I know,Feynman did not suggest that such a computer couldbe useful for computations completely outside thedomain of quantum simulation. Indeed, he wasmore interested in the question of whether quantummechanics could be simulated by a classical computer.
4 This â95 percentâ is a figure of speech, but not com-pletely so. At the time of this writing, cryptocurrencymining electricity consumption is estimated to useup at least 70Twh or 0.3 percent of the worldâs pro-duction, which is about 2 to 5 percent of the totalenergy usage for the computing industry. All thecurrent cryptocurrencies will be broken by quantumcomputers. Also, for many web servers the TLS pro-tocol (which is based on the current non-lattice basedsystems would be completely broken by quantumcomputing) is responsible for about 1 percent of theCPU usage.
If a physical system cannot be simulated by a computer in đ steps, thesystem can be considered as performing a computation that would takemore than đ steps.
So, he asked whether one could design a quantum system suchthat its outcome đŠ based on the initial condition đ„ would be somefunction đŠ = đ(đ„) such that (a) we donât know how to efficientlycompute in any other way, and (b) is actually useful for something.3In 1985, David Deutsch formally suggested the notion of a quantumTuring machine, and the model has been since refined in works ofDeutsch, Josza, Bernstein and Vazirani. Such a system is now knownas a quantum computer.
For a while these hypothetical quantum computers seemed usefulfor one of two things. First, to provide a general-purpose mecha-nism to simulate a variety of the real quantum systems that peoplecare about, such as various interactions inside molecules in quan-tum chemistry. Second, as a challenge to the Physical Extended ChurchTuring Thesis which says that every physically realizable computa-tion device can be modeled (up to polynomial overhead) by Turingmachines (or equivalently, NAND-TM / NAND-RAM programs).
Quantum chemistry is important (and in particular understand-ing it can be a bottleneck for designing new materials, drugs, andmore), but it is still a rather niche area within the broader context ofcomputing (and even scientific computing) applications. Hence for awhile most researchers (to the extent they were aware of it), thoughtof quantum computers as a theoretical curiosity that has little bear-ing to practice, given that this theoretical âextra powerâ of quantumcomputer seemed to offer little advantage in the majority of the prob-lems people want to solve in areas such as combinatorial optimization,machine learning, data structures, etc..
To some extent this is still true today. As far as we know, quantumcomputers, if built, will not provide exponential speed ups for 95%of the applications of computing.4 In particular, as far as we know,quantum computers will not help us solve NP complete problems inpolynomial or even sub-exponential time, though Groverâs algorithm (Remark 23.4) does yield a quadratic advantage in many cases.
However, there is one cryptography-sized exception: In 1994 PeterShor showed that quantum computers can solve the integer factoringand discrete logarithm in polynomial time. This result has capturedthe imagination of a great many people, and completely energized re-search into quantum computing. This is both because the hardness ofthese particular problems provides the foundations for securing sucha huge part of our communications (and these days, our economy),as well as it was a powerful demonstration that quantum computers
618 introduction to theoretical computer science
5 Of course, given that weâre still hearing of attacksexploiting âexport gradeâ cryptography that wassupposed to disappear in 1990âs, I imagine thatweâll still have products running 1024 bit RSA wheneveryone has a quantum laptop.
could turn out to be useful for problems that a-priori seemed to havenothing to do with quantum physics.
As weâll discuss later, at the moment there are several intensiveefforts to construct large scale quantum computers. It seems safeto say that, as far as we know, in the next five years or so there willnot be a quantum computer large enough to factor, say, a 1024 bitnumber. On the other hand, it does seem quite likely that in the verynear future there will be quantum computers which achieve some taskexponentially faster than the best-known way to achieve the sametask with a classical computer. When and if a quantum computer isbuilt that is strong enough to break reasonable parameters of DiffieHellman, RSA and elliptic curve cryptography is anybodyâs guess. Itcould also be a âself destroying prophecyâ whereby the existence ofa small-scale quantum computer would cause everyone to shift awayto lattice-based crypto which in turn will diminish the motivationto invest the huge resources needed to build a large scale quantumcomputer.5
RRemark 23.4 â Quantum computing and NP. Despitepopular accounts of quantum computers as havingvariables that can take âzero and one at the sametimeâ and therefore can âexplore an exponential num-ber of possibilities simultaneouslyâ, their true poweris much more subtle and nuanced. In particular, as faras we know, quantum computers do not enable us tosolve NP complete problems such as 3SAT in polyno-mial or even sub-exponential time. However, Groverâssearch algorithm does give a more modest advan-tage (namely, quadratic) for quantum computersover classical ones for problems in NP. In particular,due to Groverâs search algorithm, we know that theđ-SAT problem for đ variables can be solved in timeđ(2đ/2đđđđŠ(đ)) on a quantum computer for every đ.In contrast, the best known algorithms for đ-SAT on aclassical computer take roughly 2(1â 1đ )đ steps.
Big Idea 28 Quantum computers are not a panacea and are un-likely to solve NP complete problems, but they can provide exponen-tial speedups to certain structured problems.
23.6 QUANTUM SYSTEMS
Before we talk about quantum computing, let us recall how we phys-ically realize âvanillaâ or classical computing. We model a logical bit
quantum computing 619
that can equal 0 or a 1 by some physical system that can be in one oftwo states. For example, it might be a wire with high or low voltage,charged or uncharged capacitor, or even (as we saw) a pipe with orwithout a flow of water, or the presence or absence of a soldier crab. Aclassical system of đ bits is composed of đ such âbasic systemsâ, eachof which can be in either a âzeroâ or âoneâ state. We can model thestate of such a system by a string đ â 0, 1đ. If we perform an op-eration such as writing to the 17-th bit the NAND of the 3rd and 5thbits, this corresponds to applying a local function to đ such as settingđ 17 = 1 â đ 3 â đ 5.
In the probabilistic setting, we would model the state of the systemby a distribution. For an individual bit, we could model it by a pair ofnon-negative numbers đŒ, đœ such that đŒ + đœ = 1, where đŒ is the prob-ability that the bit is zero and đœ is the probability that the bit is one.For example, applying the negation (i.e., NOT) operation to this bitcorresponds to mapping the pair (đŒ, đœ) to (đœ, đŒ) since the probabilitythat NOT(đ) is equal to 1 is the same as the probability that đ is equalto 0. This means that we can think of the NOT function as the linearmap đ ⶠâ2 â â2 such that đ (đŒđœ) = (đœđŒ) or equivalently as the
matrix (0 11 0).
If we think of the đ-bit system as a whole, then since the đ bits cantake one of 2đ possible values, we model the state of the system as avector đ of 2đ probabilities. For every đ â 0, 1đ, we denote by đđ the 2đ dimensional vector that has 1 in the coordinate correspond-ing to đ (identifying it with a number in [2đ]), and so can write đ asâđ â0,1đ đđ đđ where đđ is the probability that the system is in the stateđ .
Applying the operation above of setting the 17-th bit to the NANDof the 3rd and 5th bits, corresponds to transforming the vector đ tothe vector đčđ where đč ⶠâ2đ â â2đ is the linear map that maps đđ tođđ 0âŻđ 16(1âđ 3â đ 5)đ 18âŻđ đâ1 . (Since đđ đ â0,1đ is a basis for đ 2đ , it suffices todefine the map đč on vectors of this form.)
PPlease make sure you understand why performing theoperation will take a system in state đ to a system inthe state đčđ. Understanding the evolution of proba-bilistic systems is a prerequisite to understanding theevolution of quantum systems.If your linear algebra is a bit rusty, now would be agood time to review it, and in particular make sureyou are comfortable with the notions of matrices, vec-tors, (orthogonal and orthonormal) bases, and norms.
620 introduction to theoretical computer science
23.6.1 Quantum amplitudesIn the quantum setting, the state of an individual bit (or âqubitâ,to use quantum parlance) is modeled by a pair of numbers (đŒ, đœ)such that |đŒ|2 + |đœ|2 = 1. While in general these numbers can becomplex, for the rest of this chapter, we will often assume they arereal (though potentially negative), and hence often drop the absolutevalue operator. (This turns out not to make much of a difference inexplanatory power.) As before, we think of đŒ2 as the probability thatthe bit equals 0 and đœ2 as the probability that the bit equals 1. As wedid before, we can model the NOT operation by the map đ ⶠâ2 â â2where đ(đŒ, đœ) = (đœ, đŒ).
Following quantum tradition, instead of using đ0 and đ1 as we didabove, from now on we will denote the vector (1, 0) by |0â© and thevector (0, 1) by |1â© (and moreover, think of these as column vectors).This is known as the Dirac âketâ notation. This means that NOT isthe unique linear map đ ⶠâ2 â â2 that satisfies đ|0â© = |1â© andđ|1â© = |0â©. In other words, in the quantum case, as in the probabilisticcase, NOT corresponds to the matrix
đ = (0 11 0) . (23.3)
In classical computation, we typically think that there are onlytwo operations that we can do on a single bit: keep it the same ornegate it. In the quantum setting, a single bit operation correspondsto any linear map OP ⶠâ2 â â2 that is norm preserving in the sense
that for every đŒ, đœ, if we apply OP to the vector (đŒđœ) then we obtain
a vector (đŒâČđœâČ) such that đŒâČ2 + đœâČ2 = đŒ2 + đœ2. Such a linear map
OP corresponds to a unitary two by two matrix. (As we mentioned,quantum mechanics actually models states as vectors with complexcoordinates; however, this does not make any qualitative difference toour discussion.) Keeping the bit the same corresponds to the matrixđŒ = (1 00 1) and (as weâve seen) the NOT operations corresponds to
the matrix đ = (0 11 0). But there are other operations we can use
as well. One such useful operation is the Hadamard operation, whichcorresponds to the matrix
đ» = 1â2 (+1 +1+1 â1) . (23.4)
quantum computing 621
In fact it turns out that Hadamard is all that we need to add to aclassical universal basis to achieve the full power of quantum comput-ing.
23.6.2 Quantum systems: an executive summaryIf you ignore the physics and philosophy, for the purposes of under-standing the model of quantum computers, all you need to knowabout quantum systems is the following. The state of a quantum systemof đ qubits is modeled by an 2đ dimensional vector đ of unit norm(i.e., squares of all coordinates sums up to 1), which we write asđ = âđ„â0,1đ đđ„|đ„â© where |đ„â© is the column vector that has 0 in allcoordinates except the one corresponding to đ„ (identifying 0, 1đwith the numbers 0, ⊠, 2đ â 1). We use the convention that if đ, đare strings of lengths đ and â respectively then we can write the 2đ+âdimensional vector with 1 in the đđ-th coordinate and zero elsewherenot just as |đđâ© but also as |đâ©|đâ©. In particular, for every đ„ â 0, 1đ, wecan write the vector |đ„â© also as |đ„0â©|đ„1⩠⯠|đ„đâ1â©. This notation satisfiescertain nice distributive laws such as |đâ©(|đâ© + |đâČâ©)|đâ© = |đđđâ© + |đđâČđâ©.
A quantum operation on such a system is modeled by a 2đ Ă 2đunitary matrix đ (one that satisfies UU†= đŒ where đ†is the transposeoperation, or conjugate transpose for complex matrices). If the systemis in state đ and we apply to it the operation đ , then the new state ofthe system is đđ.
When we measure an đ-qubit system in a state đ = âđ„â0,1đ đđ„|đ„â©,then we observe the value đ„ â 0, 1đ with probability |đđ„|2. In thiscase, the system collapses to the state |đ„â©.23.7 ANALYSIS OF BELLâS INEQUALITY (OPTIONAL)
Now that we have the notation in place, we can show a strategy forAlice and Bob to display âquantum telepathyâ in Bellâs Game. Re-call that in the classical case, Alice and Bob can succeed in the âBellGameâ with probability at most 3/4 = 0.75. We now show that quan-tum mechanics allows them to succeed with probability at least 0.8.(The strategy we show is not the best one. Alice and Bob can in factsucceed with probability cos2(đ/8) ⌠0.854.)Lemma 23.5 There is a 2-qubit quantum state đ â â4 so that if Alicehas access to the first qubit of đ, can manipulate and measure it andoutput đ â 0, 1 and Bob has access to the second qubit of đ and canmanipulate and measure it and output đ â 0, 1 then Pr[đ â đ =đ„ ⧠đŠ] â„ 0.8.
622 introduction to theoretical computer science
Proof. Alice and Bob will start by preparing a 2-qubit quantum systemin the state đ = 1â2 |00â© + 1â2 |11â© (23.5)
(this state is known as an EPR pair). Alice takes the first qubitof the system to her room, and Bob takes the second qubit to hisroom. Now, when Alice receives đ„ if đ„ = 0 she does nothing andif đ„ = 1 she applies the unitary map đ âđ/8 to her qubit wheređ đ = (đđđ đ â sin đ
sin đ cos đ ) is the unitary operation corresponding to
rotation in the plane with angle đ. When Bob receives đŠ, if đŠ = 0 hedoes nothing and if đŠ = 1 he applies the unitary map đ đ/8 to his qubit.Then each one of them measures their qubit and sends this as theirresponse.
Recall that to win the game Bob and Alice want their outputs tobe more likely to differ if đ„ = đŠ = 1 and to be more likely to agreeotherwise. We will split the analysis in one case for each of the fourpossible values of đ„ and đŠ.
Case 1: đ„ = 0 and đŠ = 0. If đ„ = đŠ = 0 then the state does notchange. Because the state đ is proportional to |00â© + |11â©, the measure-ments of Bob and Alice will always agree (if Alice measures 0 then thestate collapses to |00â© and so Bob measures 0 as well, and similarly for1). Hence in the case đ„ = đŠ = 1, Alice and Bob always win.
Case 2: đ„ = 0 and đŠ = 1. If đ„ = 0 and đŠ = 1 then after Alicemeasures her bit, if she gets 0 then the system collapses to the state|00â©, in which case after Bob performs his rotation, his qubit is inthe state cos(đ/8)|0â© + sin(đ/8)|1â©. Thus, when Bob measures hisqubit, he will get 0 (and hence agree with Alice) with probabilitycos2(đ/8) â„ 0.85. Similarly, if Alice gets 1 then the system collapsesto |11â©, in which case after rotation Bobâs qubit will be in the stateâ sin(đ/8)|0â© + cos(đ/8)|1â© and so once again he will agree with Alicewith probability cos2(đ/8).
The analysis for Case 3, where đ„ = 1 and đŠ = 0, is completelyanalogous to Case 2. Hence Alice and Bob will agree with probabilitycos2(đ/8) in this case as well. (To show this we use the observationthat the result of this experiment is the same regardless of the orderin which Alice and Bob apply their rotations and measurements; thisrequires a proof but is not very hard to show.)
Case 4: đ„ = 1 and đŠ = 1. For the case that đ„ = 1 and đŠ = 1,after both Alice and Bob perform their rotations, the state will beproportional to đ âđ/8|0â©đ đ/8|0â© + đ âđ/8|1â©đ đ/8|1â© . (23.6)
quantum computing 623
Intuitively, since we rotate one state by 45 degrees and the otherstate by -45 degrees, they will become orthogonal to each other, andthe measurements will behave like independent coin tosses that agreewith probability 1/2. However, for the sake of completeness, we nowshow the full calculation.
Opening up the coefficients and using cos(âđ„) = cos(đ„) andsin(âđ„) = â sin(đ„), we can see that (23.6) is proportional to
cos2(đ/8)|00â© + cos(đ/8) sin(đ/8)|01â©â sin(đ/8) cos(đ/8)|10â© + sin2(đ/8)|11â©â sin2(đ/8)|00â© + sin(đ/8) cos(đ/8)|01â©â cos(đ/8) sin(đ/8)|10â© + cos2(đ/8)|11â© . (23.7)
Using the trigonometric identities 2 sin(đŒ) cos(đŒ) = sin(2đŒ) andcos2(đŒ) â sin2(đŒ) = cos(2đŒ), we see that the probability of getting anyone of |00â©, |10â©, |01â©, |11â© is proportional to cos(đ/4) = sin(đ/4) = 1â2 .Hence all four options for (đ, đ) are equally likely, which mean that inthis case đ = đ with probability 0.5.
Taking all the four cases together, the overall probability of winningthe game is 14 â 1 + 12 â 0.85 + 14 â 0.5 = 0.8.
RRemark 23.6 â Quantum vs probabilistic strategies. Itis instructive to understand what is it about quan-tum mechanics that enabled this gain in BellâsInequality. For this, consider the following anal-ogous probabilistic strategy for Alice and Bob.They agree that each one of them output 0 if heor she get 0 as input and outputs 1 with prob-ability đ if they get 1 as input. In this case onecan see that their success probability would be14 â 1 + 12 (1 â đ) + 14 [2đ(1 â đ)] = 0.75 â 0.5đ2 †0.75.The quantum strategy we described above can bethought of as a variant of the probabilistic strategy forparameter đ set to sin2(đ/8) = 0.15. But in the caseđ„ = đŠ = 1, instead of disagreeing only with probability2đ(1 â đ) = 1/4, the existence of the so called âneg-ative probabilitiesâ â in the quantum world allowedus to rotate the state in opposing directions to achievedestructive interference and hence a higher probabilityof disagreement, namely sin2(đ/4) = 0.5.
624 introduction to theoretical computer science
6 Readers familiar with quantum computing shouldnote that đđđŽđđ· is a close variant of the so calledToffoli gate and so QNAND-CIRC programs corre-spond to quantum circuits with the Hadamard andToffoli gates.
23.8 QUANTUM COMPUTATION
Recall that in the classical setting, we modeled computation as ob-tained by a sequence of basic operations. We had two types of computa-tional models:
âą Non uniform models of computation such as Boolean circuits andNAND-CIRC programs, where a finite function đ ⶠ0, 1đ â 0, 1is computable in size đ if it can be expressed as a combination ofđ basic operations (gates in a circuit or lines in a NAND-CIRCprogram)
âą Uniform models of computation such as Turing machines and NAND-TM programs, where an infinite function đč ⶠ0, 1â â 0, 1 iscomputable in time đ (đ) if there is a single algorithm that on inputđ„ â 0, 1đ evaluates đč(đ„) using at most đ (đ) basic steps.
When considering efficient computation, we defined the class P toconsist of all infinite functions đč ⶠ0, 1â â 0, 1 that can be com-puted by a Turing machine or NAND-TM program in time đ(đ) forsome polynomial đ(â ). We defined the class P/poly to consists of allinfinite functions đč ⶠ0, 1â â 0, 1 such that for every đ, the re-striction đčđ of đč to 0, 1đ can be computed by a Boolean circuit orNAND-CIRC program of size at most đ(đ) for some polynomial đ(â ).
We will do the same for quantum computation, focusing mostly onthe non uniform setting of quantum circuits, since that is simpler, andalready illustrates the important differences with classical computing.
23.8.1 Quantum circuitsA quantum circuit is analogous to a Boolean circuit, and can be de-scribed as a directed acyclic graph. One crucial difference that theout degree of every vertex in a quantum circuit is at most one. This isbecause we cannot âreuseâ quantum states without measuring them(which collapses their âprobabilitiesâ). Therefore, we cannot use thesame qubit as input for two different gates. (This is known as the NoCloning Theorem.) Another more technical difference is that to ex-press our operations as unitary matrices, we will need to make sure allour gates are reversible. This is not hard to ensure. For example, in thequantum context, instead of thinking of NAND as a (non reversible)map from 0, 12 to 0, 1, we will think of it as the reversible mapon three qubits that maps đ, đ, đ to đ, đ, đ â NAND(đ, đ) (i.e., flip thelast bit if NAND of the first two bits is 1). Equivalently, the NANDoperation corresponds to the 8 Ă 8 unitary matrix đđđŽđđ· such that(identifying 0, 13 with [8]) for every đ, đ, đ â 0, 1, if |đđđâ© is thebasis element with 1 in the đđđ-th coordinate and zero elsewhere, thenđđđŽđđ·|đđđâ© = |đđ(đ â NAND(đ, đ))â©.6 If we order the rows and
quantum computing 625
columns as 000, 001, 010, ⊠, 111, then đđđŽđđ· can be written as thefollowing matrix:
đđđŽđđ· =âââââââââââââââââââ
0 1 0 0 0 0 0 01 0 0 0 0 0 0 00 0 0 1 0 0 0 00 0 1 0 0 0 0 00 0 0 0 0 1 0 00 0 0 0 1 0 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 1
âââââââââââââââââââ (23.8)
If we have an đ qubit system, then for đ, đ, đ â [đ], we will denoteby đ đ,đ,đđđŽđđ· as the 2đ Ă 2đ unitary matrix that corresponds to applyingđđđŽđđ· to the đ-th, đ-th, and đ-th bits, leaving the others intact. That is,for every đŁ = âđ„â0,1đ đŁđ„|đ„â©, đ đ,đ,đđđŽđđ·đŁ = âđ„â0,1đ đŁđ„|đ„0 ⯠đ„đâ1(đ„đ âNAND(đ„đ, đ„đ))đ„đ+1 ⯠đ„đâ1â©.
As mentioned above, we will also use the Hadamard or HAD oper-ation, A quantum circuit is obtained by applying a sequence of đđđŽđđ·and HAD gates, where a HAD gates corresponding to applying thematrix đ» = 1â2 (+1 +1+1 â1) . (23.9)
Another way to write define đ» is that for đ â 0, 1, đ»|đâ© = 1â2 |0â© +1â2 (â1)đ|1â©. We define HADđ to be the 2đ Ă 2đ unitary matrix thatapplies HAD to the đ-th qubit and leaves the others intact. Using theket notation, we can write this as
HADđ âđ„â0,1đ đŁđ„|đ„â© = 1â2 âđ„â0,1đ |đ„0 ⯠đ„đâ1â© (|0â© + (â1)đ„đ |1â©) |đ„đ+1 ⯠đ„đâ1â© .(23.10)
A quantum circuit is obtained by composing these basic operationson some đ qubits. If đ â„ đ, we use a circuit to compute a functionđ ⶠ0, 1đ â 0, 1:âą On input đ„, we initialize the system to hold đ„0, ⊠, đ„đâ1 in the first đ
qubits, and initialize all remaining đ â đ qubits to zero.
âą We execute each elementary operation one by one: at every step weapply to the current state either an operation of the form đ đ,đ,đđđŽđđ· oran operation of the form HADđ for đ, đ, đ â [đ].
âą At the end of the computation, we measure the system, and outputthe result of the last qubit (i.e. the qubit in location đ â 1). (Forsimplicity we restrict attention to functions with a single bit ofoutput, though the definition of quantum circuits naturally extendsto circuits with multiple outputs.)
626 introduction to theoretical computer science
âą We say that the circuit computes the function đ if the probability thatthis output equals đ(đ„) is at least 2/3. Note that this probabilityis obtained by summing up the squares of the amplitudes of allcoordinates in the final state of the system corresponding to vectors|đŠâ© where đŠđâ1 = đ(đ„).Formally we define quantum circuits as follows:
Definition 23.7 â Quantum circuit. Let đ â„ đ â„ đ. A quantum circuit ofđ inputs, đ â đ auxiliary bits, and đ gates over the đđđŽđđ·,HADbasis is a sequence of đ unitary 2đ Ă 2đ matrices đ0, ⊠, đđ â1 suchthat each matrix đâ is either of the form NANDđ,đ,đ for đ, đ, đ â [đ]or HADđ for đ â [đ].
A quantum circuit computes a function đ ⶠ0, 1đ â 0, 1 if thefollowing is true for every đ„ â 0, 1đ:
Let đŁ be the vectorđŁ = đđ â1đđ â2 ⯠đ1đ0|đ„0đâđâ© (23.11)
and write đŁ as âđŠâ0,1đ đŁđŠ|đŠâ©. ThenâđŠâ0,1đ s.t. đŠđâ1=đ(đ„) |đŁđŠ|2 â„ 23 . (23.12)
PPlease stop here and see that this definition makessense to you.
Big Idea 29 Just as we did with classical computation, we can de-fine mathematical models for quantum computation, and representquantum algorithms as binary strings.
Once we have the notion of quantum circuits, we can define thequantum analog of P/poly (i.e., the class of functions computable bypolynomial size quantum circuits) as follows:
Definition 23.8 â BQP/poly. Let đč ⶠ0, 1â â 0, 1. We say thatđč â BQP/poly if there exists some polynomial đ ⶠâ â â such thatfor every đ â â, if đčđ is the restriction of đč to inputs of length đ,then there is a quantum circuit of size at most đ(đ) that computesđčđ.
quantum computing 627
RRemark 23.9 â The obviously exponential fallacy. Apriori it might seem âobviousâ that quantum com-puting is exponentially powerful, since to perform aquantum computation on đ bits we need to maintainthe 2đ dimensional state vector and apply 2đ Ă 2đ ma-trices to it. Indeed popular descriptions of quantumcomputing (too) often say something along the linesthat the difference between quantum and classicalcomputer is that a classical bit can either be zero orone while a qubit can be in both states at once, andso in many qubits a quantum computer can performexponentially many computations at once.Depending on how you interpret it, this descriptionis either false or would apply equally well to proba-bilistic computation, even though weâve already seenthat every randomized algorithm can be simulated bya similar-sized circuit, and in fact we conjecture thatBPP = P.Moreover, this âobviousâ approach for simulating aquantum computation will take not just exponentialtime but exponential space as well, while can be shownthat using a simple recursive formula one can calcu-late the final quantum state using polynomial space (inphysics this is known as âFeynman path integralsâ).So, the exponentially long vector description by itselfdoes not imply that quantum computers are exponen-tially powerful. Indeed, we cannot prove that they are(i.e., we have not been able to rule out the possibilitythat every QNAND-CIRC program could be simu-lated by a NAND-CIRC program/ Boolean circuit withpolynomial overhead), but we do have some problems(integer factoring most prominently) for which theydo provide exponential speedup over the currentlybest known classical (deterministic or probabilistic)algorithms.
23.8.2 QNAND-CIRC programs (optional)Just like in the classical case, there is an equivalence between circuitsand straight-line programs, and so we can define the programminglanguage QNAND-CIRC that is the quantum analog of our NAND-CIRC programming language. To do so, we only add a single op-eration: HAD(foo) which applies the single-bit operation đ» to thevariable foo. We also use the following interpretation to make NANDreversible: foo = NAND(bar,blah)means that we modify foo to bethe XOR of its original value and the NAND of bar and blah. (Inother words, apply the 8 by 8 unitary transformation đđđŽđđ· definedabove to the three qubits corresponding to foo, bar and blah.) If foois initialized to zero then this makes no difference.
628 introduction to theoretical computer science
If đ is a QNAND-CIRC program with đ input variables, âworkspace variables, and đ output variables, then running it on theinput đ„ â 0, 1đ corresponds to setting up a system with đ + đ + âqubits and performing the following process:1. We initialize the input variables X[0] ⊠X[đ â 1] to đ„0, ⊠, đ„đâ1 and
all other variables to 0.2. We execute the program line by line, applying the corresponding
physical operation đ» or đđđŽđđ· to the qubits that are referred to bythe line.
3. We measure the output variables Y[0], âŠ, Y[đ â 1] and outputthe result (if there is more than one output then we measure morevariables).
23.8.3 Uniform computationJust as in the classical case, we can define uniform computationalmodels for quantum computing as well. We will let BQP be thequantum analog to P and BPP: the class of all Boolean functionsđč ⶠ0, 1â â 0, 1 that can be computed by quantum algorithmsin polynomial time. There are several equivalent ways to define BQP.For example, there is a computational model of Quantum TuringMachines that can be used to define BQP just as standard Turingmachines are used to define P. Another alternative is to define theQNAND-TM programming language to be QNAND-CIRC augmentedwith loops and arrays just like NAND-TM is obtained from NAND-CIRC. Once again, we can define BQP using QNAND-TM programsanalogously to the way P can be defined using NAND-TM programs.However, we use the following equivalent definition (which is alsothe one most popular in the literature):
Definition 23.10 â The class BQP. Let đč ⶠ0, 1â â 0, 1. We say thatđč â BQP if there exists a polynomial time NAND-TM program đsuch that for every đ, đ(1đ) is the description of a quantum circuitđ¶đ that computes the restriction of đč to 0, 1đ.
PDefinition 23.10 is the quantum analog of the alter-native characterization of P that appears in SolvedExercise 13.4. One way to verify that youâve under-stood Definition 23.10 it to see that you can prove(1) P â BQP and in fact the stronger statementBPP â BQP, (2) BQP â EXP, and (3) For everyNP-complete function đč , if đč â BQP then NP â BQP.Exercise 23.1 asks you to work these out.
quantum computing 629
Figure 23.3: Superconducting quantum computerprototype at Google. Image credit: Google / MITTechnology Review.
The relation between NP and BQP is not known (see also Re-mark 23.4). It is widely believed that NP â BQP, but there is noconsensus whether or not BQP â NP. It is quite possible that thesetwo classes are incomparable, in the sense that NP â BQP (and in par-ticular no NP-complete function belongs to BQP) but also BQP â NP(and there are some interesting candidates for such problems).
It can be shown that QNANDEVAL (evaluating a quantum circuiton an input) is computable by a polynomial size QNAND-CIRC pro-gram, and moreover this program can even be generated uniformlyand hence QNANDEVAL is in BQP. This allows us to âportâ manyof the results of classical computational complexity into the quantumrealm, including the notions of a universal quantum Turing machine,as well as all of the uncomputability results. There is even a quantumanalog of the Cook-Levin Theorem.
RRemark 23.11 â Restricting attention to circuits. Becausethe non uniform model is a little cleaner to work with,in the rest of this chapter we mostly restrict attentionto this model, though all the algorithms we discusscan be implemented using uniform algorithms as well.
23.9 PHYSICALLY REALIZING QUANTUM COMPUTATION
To realize quantum computation one needs to create a system with đindependent binary states (i.e., âqubitsâ), and be able to manipulatesmall subsets of two or three of these qubits to change their state.While by the way we defined operations above it might seem that oneneeds to be able to perform arbitrary unitary operations on these twoor three qubits, it turns out that there are several choices for universalsets - a small constant number of gates that generate all others. Thebiggest challenge is how to keep the system from being measured andcollapsing to a single classical combination of states. This is sometimesknown as the coherence time of the system. The threshold theorem saysthat there is some absolute constant level of errors đ so that if errorsare created at every gate at rate smaller than đ then we can recoverfrom those and perform arbitrary long computations. (Of course thereare different ways to model the errors and so there are actually severalthreshold theorems corresponding to various noise models).
There have been several proposals to build quantum computers:
âą Superconducting quantum computers use superconducting electriccircuits to do quantum computation. This is the direction wherethere has been most recent progress towards âbeatingâ classicalcomputers.
630 introduction to theoretical computer science
Figure 23.4: Top: A periodic function. Bottom: Ana-periodic function.
âą Trapped ion quantum computers Use the states of an ion to sim-ulate a qubit. People have made some recent advances on thesecomputers too. While itâs not at all clear thatâs the right measur-ing yard, the current best implementation of Shorâs algorithm (forfactoring 15) is done using an ion-trap computer.
âą Topological quantum computers use a different technology. Topo-logical qubits are more stable by design and hence error correctionis less of an issue, but constructing them is extremely challenging.
These approaches are not mutually exclusive and it could be thatultimately quantum computers are built by combining all of themtogether. In the near future, it seems that we will not be able to achievefull fledged large scale universal quantum computers, but rather morerestricted machines, sometimes called âNoisy Intermediate-ScaleQuantum Computersâ or âNISQâ. See this article by John Preskil forsome of the progress and applications of such more restricted devices.
23.10 SHORâS ALGORITHM: HEARING THE SHAPE OF PRIME FAC-TORS
Bellâs Inequality is a powerful demonstration that there is some-thing very strange going on with quantum mechanics. But couldthis âstrangenessâ be of any use to solve computational problems notdirectly related to quantum systems? A priori, one could guess theanswer is no. In 1994 Peter Shor showed that one would be wrong:
Theorem 23.12 â Shorâs Algorithm. There is a polynomial-time quan-tum algorithm that on input an integer đ (represented in basetwo), outputs the prime factorization of đ .
Another way to state Theorem 23.12 is that if we defineFACTORING ⶠ0, 1â â 0, 1 to be the function that on input apair of numbers (đ, đ) outputs 1 if and only if đ has a factor đ suchthat 2 †đ †đ, then FACTORING is in BQP. This is an exponentialimprovement over the best known classical algorithms, which takeroughly 2(đ1/3) time, where the notation hides factors that arepolylogarithmic in đ. While we will not prove Theorem 23.12 in thischapter, we will sketch some of the ideas behind the proof.
23.10.1 Period findingAt the heart of Shorâs Theorem is an efficient quantum algorithm forfinding periods of a given function. For example, a function đ ⶠâ â âis periodic if there is some â > 0 such that đ(đ„ + â) = đ(đ„) for every đ„(e.g., see Fig. 23.4).
quantum computing 631
Figure 23.5: Left: The air-pressure when playing aâC Majorâ chord as a function of time. Right: Thecoefficients of the Fourier transform of the same func-tion, we can see that it is the sum of three frequenciescorresponding to the C, E and G notes (261.63, 329.63and 392 Hertz respectively). Credit: Bjarke MĂžnstedâsQuora answer.
Figure 23.6: If đ is a periodic function then when werepresent it in the Fourier transform, we expect thecoefficients corresponding to wavelengths that donot evenly divide the period to be very small, as theywould tend to âcancel outâ.
Musical notes yield one type of periodic function. When you pullon a string on a musical instrument, it vibrates in a repeating pattern.Hence, if we plot the speed of the string (and so also the speed ofthe air around it) as a function of time, it will correspond to someperiodic function. The length of the period is known as the wave lengthof the note. The frequency is the number of times the function repeatsitself within a unit of time. For example, the âMiddle Câ note hasa frequency of 261.63 Hertz, which means its period is 1/(261.63)seconds.
If we play a chord by playing several notes at once, we get a morecomplex periodic function obtained by combining the functions ofthe individual notes (see Fig. 23.5). The human ear contains manysmall hairs, each of which is sensitive to a narrow band of frequencies.Hence when we hear the sound corresponding to a chord, the hairs inour ears actually separate it out to the components corresponding toeach frequency.
It turns out that (essentially) every periodic function đ ⶠâ â âcan be decomposed into a sum of simple wave functions (namelyfunctions of the form đ„ ⊠sin(đđ„) or đ„ ⊠cos(đđ„)). This is known asthe Fourier Transform (see Fig. 23.6). The Fourier transform makes iteasy to compute the period of a given function: it will simply be theleast common multiple of the periods of the constituent waves.
23.10.2 Shorâs Algorithm: A birdâs eye viewOn input an integer đ , Shorâs algorithm outputs the prime factoriza-tion of đ in time that is polynomial in logđ . The main steps in thealgorithm are the following:
Step 1: Reduce to period finding. The first step in the algorithm is topick a random đŽ â 0, 1 ⊠, đ â 1 and define the function đčđŽ â¶0, 1đ â 0, 1đ as đčđŽ(đ„) = đŽđ„( mod đ) where we identify thestring đ„ â 0, 1đ with an integer using the binary representation,and similarly represent the integer đŽđ„( mod đ) as a string. (We willchoose đ to be some polynomial in logđ and so in particular 0, 1đis a large enough set to represent all the numbers in 0, 1, ⊠, đ â 1).
Some not-too-hard (though somewhat technical) calculations showthat: (1) The function đčđŽ is periodic (i.e., there is some integer đđŽ suchthat đčđŽ(đ„ + đđŽ) = đčđŽ(đ„) for âalmostâ every đ„) and more importantly(2) If we can recover the period đđŽ of đčđŽ for several randomly cho-sen đŽâs, then we can recover the factorization of đ . (Weâll ignore theâalmostâ qualifier in the discussion below; it causes some annoying,yet ultimately manageable, technical issues in the full-fledged algo-rithm.) Hence, factoring đ reduces to finding out the period of thefunction đčđŽ. Exercise 23.2 asks you to work out this for the related
632 introduction to theoretical computer science
task of computing the discrete logarithm (which underlies the securityof the Diffie-Hellman key exchange and elliptic curve cryptography).Step 2: Period finding via the Quantum Fourier Transform. Using a simpletrick known as ârepeated squaringâ, it is possible to compute themap đ„ ⊠đčđŽ(đ„) in time polynomial in đ, which means we can alsocompute this map using a polynomial number of NAND gates,and soin particular we can generate in polynomial quantum time a quantumstate đ that is (up to normalization) equal toâđ„â0,1đ |đ„â©|đčđŽ(đ„)â© . (23.13)
In particular, if we were to measure the state đ, we would get a ran-dom pair of the form (đ„, đŠ) where đŠ = đčđŽ(đ„). So far, this is not at allimpressive. After all, we did not need the power of quantum comput-ing to generate such pairs: we could simply generate a random đ„ andthen compute đčđŽ(đ„).
Another way to describe the state đ is that the coefficient of |đ„â©|đŠâ© inđ is proportional to đđŽ,đŠ(đ„) where đđŽ,đŠ ⶠ0, 1đ â â is the functionsuch that đđŽ,đŠ(đ„) = â§âšâ©1 đŠ = đŽđ„( mod đ)0 otherwise
. (23.14)
The magic of Shorâs algorithm comes from a procedure knownas the Quantum Fourier Transform. It allows to change the state đ intothe state đ where the coefficient of |đ„â©|đŠâ© is now proportional to theđ„-th Fourier coefficient of đđŽ,đŠ. In other words, if we measure the stateđ, we will obtain a pair (đ„, đŠ) such that the probability of choosing đ„is proportional to the square of the weight of the frequency đ„ in therepresentation of the function đđŽ,đŠ. Since for every đŠ, the functionđđŽ,đŠ has the period đđŽ, it can be shown that the frequency đ„ will be(almost) a multiple of đđŽ. If we make several such samples đŠ0, ⊠, đŠđand obtain the frequencies đ„1, ⊠, đ„đ, then the true period đđŽ dividesall of them, and it can be shown that it is going to be in fact the greatestcommon divisor (g.c.d.) of all these frequencies: a value which can becomputed in polynomial time.
As mentioned above, we can recover the factorization of đ fromthe periods of đčđŽ0 , ⊠, đčđŽđĄ for some randomly chosen đŽ0, ⊠, đŽđĄ in0, ⊠, đ â 1 and đĄ which is polynomial in logđ .
The resulting algorithm can be described in a high (and somewhatinaccurate) level as follows:
Shorâs Algorithm: (sketch)Input: Number đ â â.Output: Prime factorization of đ .Operations:
quantum computing 633
1. Repeat the following đ = đđđđŠ(logđ) number of times:a. Choose đŽ â 0, ⊠, đ â 1 at random, and let đđŽ ⶠâ€đ â â€đ be
the map đ„ ⊠đŽđ„ mod đ .b. For đĄ = đđđđŠ(logđ), repeat đĄ times the following step: Quantum
Fourier Transform to create a quantum state |đâ© over đđđđŠ(log(đ))qubits, such that if we measure |đâ© we obtain a pair of strings(đ, đŠ) with probability proportional to the square of the coefficientcorresponding to the wave function đ„ ⊠cos(đ„đđ/đ) or đ„ âŠsin(đ„đđ/đ) in the Fourier transform of the function đđŽ,đŠ ⶠâ€đ â0, 1 defined as đđŽ,đŠ(đ„) = 1 iff đđŽ(đ„) = đŠ.
c. If đ1, ⊠, đđĄ are the coefficients we obtained in the previous step,then the least common multiple of đ/đ1, ⊠, đ/đđĄ is likely to bethe period of the function đđŽ.
2. If we let đŽ0, ⊠, đŽđâ1 and đ0, ⊠, đđâ1 be the numbers we chose inthe previous step and the corresponding periods of the functionsđđŽ0 , ⊠, đđŽđâ1 then we can use classical results in number theory toobtain from these a non-trivial prime factor đ of đ (if such exists).We can now run the algorithm again with the (smaller) input đ/đto obtain all other factors.
Reducing factoring to order finding is cumbersome, but can bedone in polynomial time using a classical computer. The key quantumingredient in Shorâs algorithm is the quantum fourier transform.
RRemark 23.13 â Quantum Fourier Transform. Despiteits name, the Quantum Fourier Transform does notactually give a way to compute the Fourier Trans-form of a function đ ⶠ0, 1đ â â. This would beimpossible to do in time polynomial in đ, as simplywriting down the Fourier Transform would require 2đcoefficients. Rather the Quantum Fourier Transformgives a quantum state where the amplitude correspond-ing to an element (think: frequency) â is equal tothe corresponding Fourier coefficient. This allows tosample from a distribution where â is drawn withprobability proportional to the square of its Fouriercoefficient. This is not the same as computing theFourier transform, but is good enough for recoveringthe period.
23.11 QUANTUM FOURIER TRANSFORM (ADVANCED, OPTIONAL)
The above description of Shorâs algorithm skipped over the implemen-tation of the main quantum ingredient: the Quantum Fourier Transformalgorithm. In this section we discuss the ideas behind this algorithm.We will be rather brief and imprecise. Section 23.13 contain referencesto sources of more information about this topic.
634 introduction to theoretical computer science
To understand the Quantum Fourier Transform, we need to betterunderstand the Fourier Transform itself. In particular, we will needto understand how it applies not just to functions whose input is areal number but to functions whose domain can be any arbitrarycommutative group. Therefore we now take a short detour to (verybasic) group theory, and define the notion of periodic functions overgroups.
RRemark 23.14 â Group theory. While we define the con-cepts we use, some background in group or numbertheory will be very helpful for fully understandingthis section. In particular we will use the notion offinite commutative (a.k.a. Abelian) groups. These aredefined as follows.
âą A finite group đŸ is a pair (đș, â) where đș is a finiteset of elements and â is a binary operation mappinga pair đ, â of elements in đș to the element đ â â in đș.We often identify đŸ with the set đș of its elements,and so use notation such as đ â đŸ to indicate that đis an element of đŸ and |đŸ| to denote the number ofelements in đŸ.
âą The operation â satisfies the sort of properties thata product operation does, namely, it is associative(i.e., (đ â â) â đ = đ â (â â đ)) and there is someelement 1 such that đ â 1 = đ for all đ, and forevery đ â đŸ there exists an element đâ1 such thatđ â đâ1 = 1.
âą A group is called commutative (also known asAbelian) if đ â â = â â đ for all đ, â â đŸ.
The Fourier transform is a deep and vast topic, on which we willbarely touch upon here. Over the real numbers, the Fourier trans-form of a function đ is obtained by expressing đ in the form â đ(đŒ)đđŒwhere the đđŒâs are âwave functionsâ (e.g. sines and cosines). How-ever, it turns out that the same notion exists for every Abelian group đŸ.Specifically, for every such group đŸ, if đ is a function mapping đŸ to â,then we can write đ as đ = âđâđŸ đ(đ)đđ , (23.15)
where the đđâs are functions mapping đŸ to â that are analogs ofthe âwave functionsâ for the group đŸ and for every đ â đŸ, đ(đ) is acomplex number known as the Fourier coefficient of đ corresponding tođ. Specifically, the equation (23.15) means that if we think of đ as a|đŸ| dimensional vector over the complex numbers, then we can writethis vector as a sum (with certain coefficients) of the vectors đđđâđŸ.
quantum computing 635
The representation (23.15) is known as the Fourier expansion or Fouriertransform of đ , the numbers ( đ(đ))đâđŸ are known as the Fourier coef-ficients of đ and the functions (đđ)đâđŸ are known as the Fourier char-acters. The central property of the Fourier characters is that they arehomomorphisms of the group into the complex numbers, in the sensethat for every đ„, đ„âČ â đŸ, đđ(đ„ â đ„âČ) = đđ(đ„)đđ(đ„âČ), where â is the groupoperation. One corollary of this property is that if đđ(â) = 1 then đđis â periodic in the sense that đđ(đ„ â â) = đđ(đ„) for every đ„. It turnsout that if đ is periodic with minimal period â, then the only Fouriercharacters that have non zero coefficients in the expression (23.15)are those that are â periodic as well. This can be used to recover theperiod of đ from its Fourier expansion.
23.11.1 Quantum Fourier Transform over the Boolean Cube: SimonâsAlgorithm
We now describe the simplest setting of the Quantum Fourier Trans-form: the group 0, 1đ with the XOR operation, which weâll denoteby (0, 1đ, â). (Since XOR is equal to addition modulo two, thisgroup is also often denoted as (â€2)đ.) It can be shown that the Fouriertransform over (0, 1đ, â) corresponds to expressing đ ⶠ0, 1đ â âas đ = âđŠâ0,1 đ(đŠ)đđŠ (23.16)
where đđŠ ⶠ0, 1đ â â is defined as đđŠ(đ„) = (â1)âđ đŠđđ„đ andđ(đŠ) = 1â2đ âđ„â0,1đ đ(đ„)(â1)âđ đŠđđ„đ .The Quantum Fourier Transform over (0, 1đ, â) is actually quite
simple:
Theorem 23.15 â QFT Over the Boolean Cube. Let đ = âđ„â0,1đ đ(đ„)|đ„â©be a quantum state where đ ⶠ0, 1đ â â is some function satisfy-ing âđ„â0,1đ |đ(đ„)|2 = 1. Then we can use đ gates to transform đ tothe state âđŠâ0,1đ đ(đŠ)|đŠâ© (23.17)
where đ = âđŠ đ(đŠ)đđŠ and đđŠ ⶠ0, 1đ â â is the functionđđŠ(đ„) = â1â đ„đđŠđ .Proof Idea:
The idea behind the proof is that the Hadamard operation corre-sponds to the Fourier transform over the group 0, 1đ (with the XORoperations). To show this, we just need to do the calculations.â
636 introduction to theoretical computer science
Proof of Theorem 23.15. We can express the Hadamard operation HADas follows:
HAD|đâ© = 1â2 (|0â© + (â1)đ|1â©) . (23.18)We are given the state đ = âđ„â0,1đ đ(đ„)|đ„â© . (23.19)
Now suppose that we apply the HAD operation to each of the đqubits. We can see that we get the state2âđ/2 âđ„â0,1đ đ(đ„) đâ1âđ=0(|0â© + (â1)đ„đ |1â©) . (23.20)
We can now use the distributive law and open up a term of theform đ(đ„)(|0â© + (â1)đ„0 |1â©) ⯠(|0â© + (â1)đ„đâ1 |1â©) (23.21)
to the following sum over 2đ terms:đ(đ„) âđŠâ0,1đ(â1)â đŠđđ„đ |đŠâ© . (23.22)
(If you find the above confusing, try to work out explicitly thiscalculation for đ = 3; namely show that (|0â© + (â1)đ„0 |1â©)(|0â© +(â1)đ„1 |1â©)(|0â©+(â1)đ„2 |1â©) is the same as the sum over 23 terms |000â©+(â1)đ„2 |001â© + ⯠+ (â1)đ„0+đ„1+đ„2 |111â©.)
By changing the order of summations, we see that the final state isâđŠâ0,1đ 2âđ/2( âđ„â0,1đ đ(đ„)(â1)â đ„đđŠđ)|đŠâ© (23.23)
which exactly corresponds to đ.
23.11.2 From Fourier to Period finding: Simonâs Algorithm (advanced,optional)
Using Theorem 23.15 it is not hard to get an algorithm that can recovera string ââ â 0, 1đ given a circuit that computes a function đč â¶0, 1đ â 0, 1â that is ââ periodic in the sense that đč(đ„) = đč(đ„âČ) fordistinct đ„, đ„âČ if and only if đ„âČ = đ„ â ââ. The key observation is that ifwe compute the state âđ„â0,1đ |đ„â©|đč (đ„)â©, and perform the QuantumFourier transform on the first đ qubits, then we would get a state suchthat the only basis elements with nonzero coefficients would be of theform |đŠâ© where â đŠđââđ = 0( mod 2) (23.24)
quantum computing 637
So, by measuring the state, we can obtain a sample of a randomđŠ satisfying (23.24). But since (23.24) is a linear equation modulo 2about the unknown đ variables ââ0, ⊠, ââđâ1, if we repeat this proce-dure to get đ such equations, we will have at least as many equationsas variables and (it can be shown that) this will suffice to recover ââ.
This result is known as Simonâs Algorithm, and it preceded andinspired Shorâs algorithm.
23.11.3 From Simon to Shor (advanced, optional)Theorem 23.15 seemed to really use the special bit-wise structure ofthe group 0, 1đ, and so one could wonder if it can be extended toother groups. However, it turns out that we can in fact achieve such ageneralization.
The key step in Shorâs algorithm is to implement the Fourier trans-form for the group â€đż which is the set of numbers 0, ⊠, đż â 1 withthe operation being addition modulo đż. In this case it turns out thatthe Fourier characters are the functions đđŠ(đ„) = đđŠđ„ where đ = đ2đđ/đż(đ here denotes the complex number
ââ1). The đŠ-th Fourier coeffi-cient of a function đ ⶠâ€đż â â isđ(đŠ) = 1âđż âđ„ââ€đż đ(đ„)đđ„đŠ . (23.25)
The key to implementing the Quantum Fourier Transform for suchgroups is to use the same recursive equations that enable the classicalFast Fourier Transform (FFT) algorithm. Specifically, consider the casethat đż = 2â. We can separate the sum over đ„ in (23.25) to the termscorresponding to even đ„âs (of the form đ„ = 2đ§) and odd đ„âs (of theform đ„ = 2đ§ + 1) to obtainđ(đŠ) = 1âđż âđ§âđđż/2 đ(2đ§)(đ2)đŠđ§ + đđŠâđż âđ§ââ€đż/2 đ(2đ§ + 1)(đ2)đŠđ§ (23.26)
which reduces computing the Fourier transform of đ over the groupâ€2â to computing the Fourier transform of the functions đđđŁđđ andđđđđ (corresponding to the applying đ to only the even and odd đ„âsrespectively) which have 2ââ1 inputs that we can identify with thegroup â€2ââ1 = â€đż/2.
Specifically, the Fourier characters of the group â€đż/2 are the func-tions đđŠ(đ„) = đ2đđ/(đż/2)đŠđ„ = (đ2)đŠđ„ for every đ„, đŠ â â€đż/2. Moreover,since đđż = 1, (đ2)đŠ = (đ2)đŠ mod đż/2 for every đŠ â â. Thus (23.26)translates intođ(đŠ) = đđđŁđđ(đŠ mod đż/2) + đđŠ đđđđ(đŠ mod đż/2) . (23.27)
This observation is usually used to obtain a fast (e.g. đ(đż logđż))time to compute the Fourier transform in a classical setting, but it can
638 introduction to theoretical computer science
be used to obtain a quantum circuit of đđđđŠ(logđż) gates to transform astate of the form âđ„ââ€đż đ(đ„)|đ„â© to a state of the form âđŠââ€đż đ(đŠ)|đŠâ©.
The case that đż is not an exact power of two causes some complica-tions in both the classical case of the Fast Fourier Transform and thequantum setting of Shorâs algorithm. However, it is possible to handlethese. The idea is that we can embed đđż in the group â€đŽâ đż for anyinteger đŽ, and we can find an integer đŽ such that đŽ â đż will be closeenough to a power of 2 (i.e., a number of the form 2đ for some đ), sothat if we do the Fourier transform over the group â€2đ then we willnot introduce too many errors.
Figure 23.7: Conjectured status of BQP with respect toother complexity classes. We know that P â BPP âBQP and BQP â PSPACE â EXP. It is not known ifany of these inclusions are strict though it is believedthat they are. The relation between BQP and NP isunknown but they are believed to be incomparable,and that NP-complete problems can not be solved inpolynomial time by quantum computers. However, itis possible that BQP contains NP and even PSPACE,and it is also possible that quantum computersoffer no super-polynomial speedups and that P =BQP. The class âNISQâ above is not a well definedcomplexity class, but rather captures the currentstatus of quantum devices, which seem to be able tosolve a set of computational tasks that is incomparablewith the set of tasks solvable by classical computers.The diagram is also inaccurate in the sense that at themoment the âquantum supremacyâ tasks on whichsuch devices seem to offer exponential speedupsdo not correspond to Boolean functions/ decisionproblems. Chapter Recap
âą The state of an đ-qubit quantum system can bemodeled as a 2đ dimensional vector
âą An operation on the state corresponds to applyinga unitary matrix to this vector.
âą Quantum circuits are obtained by composing basicoperations such as HAD and đđđŽđđ·.
âą We can use quantum circuits to define the classesBQP/poly and BQP which are the quantum analogsof P/poly and BPP respectively.
âą There are some problems for which the best knownquantum algorithm is exponentially faster thanthe best known, but quantum computing is not apanacea. In particular, as far as we know, quantumcomputers could still require exponential time tosolve NP-complete problems such as SAT.
quantum computing 639
7 You can use đđđŽđđ· to simulate NAND gates.
8 Use the alternative characterization of P as in SolvedExercise 13.4.9 You can use the HAD gate to simulate a coin toss.
10 In exponential time simulating quantum computa-tion boils down to matrix multiplication.11 If a reduction can be implemented in P it can beimplemented in BQP as well.
12 We are given â = đđ„ and need to recover đ„. Todo so we can compute the order of various elementsof the form âđđđ. The order of such an element isa number đ satisfying đ(đ„đ + đ) = 0 (mod |đŸ|).With a few random examples we will get a non trivialequation on đ„ (where đ is not zero modulo |đŸ|) andthen we can use our knowledge of đ, đ, đ to recover đ„.
23.12 EXERCISES
Exercise 23.1 â Quantum and classical complexity class relations. Prove thefollowing relations between quantum complexity classes and classicalones:
1. P/poly â BQP/poly. See footnote for hint.7
2. P â BQP. See footnote for hint.8
3. BPP â BQP. See footnote for hint.9
4. BQP â EXP. See footnote for hint.10
5. If SAT â BQP then NP â BQP. See footnote for hint.11
Exercise 23.2 â Discrete logarithm from order finding. Show a probabilisticpolynomial time classical algorithm that given an Abelian finite groupđŸ (in the form of an algorithm that computes the group operation),a generator đ for the group, and an element â â đŸ, as well as access toa black box that on input đ â đŸ outputs the order of đ (the smallest đsuch that đđ = 1), computes the discrete logarithm of â with respect tođ. That is the algorithm should output a number đ„ such that đđ„ = â.See footnote for hint.12
23.13 BIBLIOGRAPHICAL NOTES
An excellent gentle introduction to quantum computation is given inMerminâs book [Mer07]. In particular the first 100 pages (Chapter1 to 4) of [Mer07] cover all the material of this chapter in a muchmore comprehensive way. This material is also covered in the first5 chapters of De-Wolfâs online lecture notes. For a more condensedexposition, the chapter on quantum computation in my book withArora (see draft here) is one relatively short source that containsfull descriptions of Groverâs, Simonâs and Shorâs algorithms. Thisblog post of Aaronson contains a high level explanation of Shorâsalgorithm which ends with links to several more detailed expositions.Chapters 9 and 10 in Aaronsonâs book [Aar13] give an informal buthighly informative introduction to the topics of this chapter and muchmore. Chapter 10 in Avi Wigdersonâs book also provides a high leveloverview of quantum computing. Other recommended resourcesinclude Andrew Childsâ lecture notes on quantum algorithms, aswell as the lecture notes of Umesh Vazirani, John Preskill, and JohnWatrous.
There are many excellent videos available online covering some ofthese materials. The videos of Umesh Vaziraniâs EdX course are an
640 introduction to theoretical computer science
accessible and recommended introduction to quantum computing.Regarding quantum mechanics in general, this video illustrates thedouble slit experiment, this Scientific American video is a nice ex-position of Bellâs Theorem. This talk and panel moderated by BrianGreene discusses some of the philosophical and technical issuesaround quantum mechanics and its so called âmeasurement prob-lemâ. The Feynman lecture on the Fourier Transform and quantummechanics in general are very much worth reading. The Fourier trans-form is covered in these videos of Dr. Chris Geoscience, Clare Zhangand Vi Hart. See also Kelsey Houston-Edwardsâs video on Shorâs Al-gorithm.
The form of Bellâs game we discuss in Section 23.3 was given byClauser, Horne, Shimony, and Holt.
The Fast Fourier Transform, used as a component in Shorâs algo-rithm, is one of the most widely used algorithms invented. The storiesof its discovery by Gauss in trying to calculate asteroid orbits andrediscovery by Tukey during the cold war are fascinating as well.
The image in Fig. 23.2 is taken from Wikipedia.Thanks to Scott Aaronson for many helpful comments about this
chapter.
VIAPPENDICES
Bibliography
[Aar05] Scott Aaronson. âNP-complete problems and physicalrealityâ. In: ACM Sigact News 36.1 (2005). Available onhttps://arxiv.org/abs/quant-ph/0502072, pp. 30â52.
[Aar13] Scott Aaronson. Quantum computing since Democritus.Cambridge University Press, 2013.
[Aar16] Scott Aaronson. âP =? NPâ. In: Open problems in mathe-matics. Available on https://www.scottaaronson.com/papers/pnp.pdf. Springer, 2016, pp. 1â122.
[Aar20] Scott Aaronson. âThe Busy Beaver Frontierâ. In: SIGACTNews (2020). Available on https://www.scottaaronson.com/papers/bb.pdf.
[AKS04] Manindra Agrawal, Neeraj Kayal, and Nitin Saxena.âPRIMES is in Pâ. In: Annals of mathematics (2004),pp. 781â793.
[Asp18] James Aspens. Notes on Discrete Mathematics. Onlinetextbook for CS 202. Available on http://www.cs.yale.edu/homes/aspnes/classes/202/notes.pdf. 2018.
[Bar84] H. P. Barendregt. The lambda calculus : its syntax andsemantics. Amsterdam New York New York, N.Y: North-Holland Sole distributors for the U.S.A. and Canada,Elsevier Science Pub. Co, 1984. isbn: 0444875085.
[Bjo14] Andreas Bjorklund. âDeterminant sums for undirectedhamiltonicityâ. In: SIAM Journal on Computing 43.1(2014), pp. 280â299.
[BlĂ€13] Markus BlĂ€ser. Fast Matrix Multiplication. Graduate Sur-veys 5. Theory of Computing Library, 2013, pp. 1â60.doi : 10 . 4086/ toc .gs . 2013 .005. url: http : //www.theoryofcomputing.org/library.html.
[Boo47] George Boole. The mathematical analysis of logic. Philo-sophical Library, 1847.
Compiled on 8.26.2020 18:10
644 introduction to theoretical computer science
[BU11] David Buchfuhrer and Christopher Umans. âThe com-plexity of Boolean formula minimizationâ. In: Journal ofComputer and System Sciences 77.1 (2011), pp. 142â153.
[Bur78] Arthur W Burks. âBooke review: Charles S. Peirce, Thenew elements of mathematicsâ. In: Bulletin of the Ameri-can Mathematical Society 84.5 (1978), pp. 913â918.
[Cho56] Noam Chomsky. âThree models for the description oflanguageâ. In: IRE Transactions on information theory 2.3(1956), pp. 113â124.
[Chu41] Alonzo Church. The calculi of lambda-conversion. Prince-ton London: Princeton University Press H. Milford,Oxford University Press, 1941. isbn: 978-0-691-08394-0.
[CM00] Bruce Collier and James MacLachlan. Charles Babbage:And the engines of perfection. Oxford University Press,2000.
[Coh08] Paul Cohen. Set theory and the continuum hypothesis. Mine-ola, N.Y: Dover Publications, 2008. isbn: 0486469212.
[Coh81] Danny Cohen. âOn holy wars and a plea for peaceâ. In:Computer 14.10 (1981), pp. 48â54.
[Coo66] SA Cook. âOn the minimum computation time for mul-tiplicationâ. In: Doctoral dissertation, Harvard UniversityDepartment of Mathematics, Cambridge, Mass. 1 (1966).
[Coo87] James W Cooley. âThe re-discovery of the fast Fouriertransform algorithmâ. In: Microchimica Acta 93.1-6(1987), pp. 33â45.
[Cor+09] Thomas H. Cormen et al. Introduction to Algorithms, 3rdEdition. MIT Press, 2009. isbn: 978-0-262-03384-8. url:http://mitpress.mit.edu/books/introduction-algorithms.
[CR73] Stephen A Cook and Robert A Reckhow. âTime boundedrandom access machinesâ. In: Journal of Computer andSystem Sciences 7.4 (1973), pp. 354â375.
[CRT06] Emmanuel J Candes, Justin K Romberg, and Terence Tao.âStable signal recovery from incomplete and inaccuratemeasurementsâ. In: Communications on Pure and AppliedMathematics: A Journal Issued by the Courant Institute ofMathematical Sciences 59.8 (2006), pp. 1207â1223.
[CT06] Thomas M. Cover and Joy A. Thomas. âElements ofinformation theory 2nd editionâ. In: Willey-Interscience:NJ (2006).
[Dau90] Joseph Warren Dauben. Georg Cantor: His mathematics andphilosophy of the infinite. Princeton University Press, 1990.
BIBLIOGRAPHY 645
[De 47] Augustus De Morgan. Formal logic: or, the calculus ofinference, necessary and probable. Taylor and Walton, 1847.
[Don06] David L Donoho. âCompressed sensingâ. In: IEEE Trans-actions on information theory 52.4 (2006), pp. 1289â1306.
[DPV08] Sanjoy Dasgupta, Christos H. Papadimitriou, and UmeshV. Vazirani. Algorithms. McGraw-Hill, 2008. isbn: 978-0-07-352340-8.
[Ell10] Jordan Ellenberg. âFill in the blanks: Using math to turnlo-res datasets into hi-res samplesâ. In: Wired Magazine18.3 (2010), pp. 501â509.
[Ern09] Elizabeth Ann Ernst. âOptimal combinational multi-level logic synthesisâ. Available on https://deepblue.lib .umich .edu/handle/2027 .42/62373. PhD thesis.University of Michigan, 2009.
[Fle18] Margaret M. Fleck. Building Blocks for Theoretical Com-puter Science. Online book, available at http://mfleck.cs.illinois.edu/building-blocks/. 2018.
[FĂŒr07] Martin FĂŒrer. âFaster integer multiplicationâ. In: Pro-ceedings of the 39th Annual ACM Symposium on Theory ofComputing, San Diego, California, USA, June 11-13, 2007.2007, pp. 57â66.
[FW93] Michael L Fredman and Dan E Willard. âSurpassingthe information theoretic bound with fusion treesâ.In: Journal of computer and system sciences 47.3 (1993),pp. 424â436.
[GKS17] Daniel GĂŒnther, Ăgnes Kiss, and Thomas Schneider.âMore efficient universal circuit constructionsâ. In: Inter-national Conference on the Theory and Application of Cryp-tology and Information Security. Springer. 2017, pp. 443â470.
[Gom+08] Carla P Gomes et al. âSatisfiability solversâ. In: Founda-tions of Artificial Intelligence 3 (2008), pp. 89â134.
[Gra05] Judith Grabiner. The origins of Cauchyâs rigorous cal-culus. Mineola, N.Y: Dover Publications, 2005. isbn:9780486438153.
[Gra83] Judith V Grabiner. âWho gave you the epsilon? Cauchyand the origins of rigorous calculusâ. In: The AmericanMathematical Monthly 90.3 (1983), pp. 185â194.
[Hag98] Torben Hagerup. âSorting and searching on the wordRAMâ. In: Annual Symposium on Theoretical Aspects ofComputer Science. Springer. 1998, pp. 366â398.
646 introduction to theoretical computer science
[Hal60] Paul R Halmos. Naive set theory. Republished in 2017 byCourier Dover Publications. 1960.
[Har41] GH Hardy. âA Mathematicianâs Apologyâ. In: (1941).[HJB85] Michael T Heideman, Don H Johnson, and C Sidney
Burrus. âGauss and the history of the fast Fourier trans-formâ. In: Archive for history of exact sciences 34.3 (1985),pp. 265â277.
[HMU14] John Hopcroft, Rajeev Motwani, and Jeffrey Ullman.Introduction to automata theory, languages, and compu-tation. Harlow, Essex: Pearson Education, 2014. isbn:1292039051.
[Hof99] Douglas Hofstadter. Gödel, Escher, Bach : an eternal goldenbraid. New York: Basic Books, 1999. isbn: 0465026567.
[Hol01] Jim Holt. âThe Ada perplex: how Byronâs daughter cameto be celebrated as a cybervisionaryâ. In: The New Yorker5 (2001), pp. 88â93.
[Hol18] Jim Holt. When Einstein walked with Gödel : excursions tothe edge of thought. New York: Farrar, Straus and Giroux,2018. isbn: 0374146705.
[HU69] John E Hopcroft and Jeffrey D Ullman. âFormal lan-guages and their relation to automataâ. In: (1969).
[HU79] John E. Hopcroft and Jeffrey D. Ullman. Introduction toAutomata Theory, Languages and Computation. Addison-Wesley, 1979. isbn: 0-201-02988-X.
[HV19] David Harvey and Joris Van Der Hoeven. âInteger multi-plication in time O(n log n)â. working paper or preprint.Mar. 2019. url: https://hal.archives-ouvertes.fr/hal-02070778.
[Joh12] David S Johnson. âA brief history of NP-completeness,1954â2012â. In: Documenta Mathematica (2012), pp. 359â376.
[Juk12] Stasys Jukna. Boolean function complexity: advances andfrontiers. Vol. 27. Springer Science & Business Media,2012.
[Kar+97] David R. Karger et al. âConsistent Hashing and RandomTrees: Distributed Caching Protocols for Relieving HotSpots on the World Wide Webâ. In: Proceedings of theTwenty-Ninth Annual ACM Symposium on the Theory ofComputing, El Paso, Texas, USA, May 4-6, 1997. 1997,pp. 654â663. doi : 10.1145/258533.258660. url: https://doi.org/10.1145/258533.258660.
BIBLIOGRAPHY 647
[Kar95] ANATOLII ALEXEEVICH Karatsuba. âThe complexityof computationsâ. In: Proceedings of the Steklov Institute ofMathematics-Interperiodica Translation 211 (1995), pp. 169â183.
[Kle91] Israel Kleiner. âRigor and Proof in Mathematics: AHistorical Perspectiveâ. In: Mathematics Magazine 64.5(1991), pp. 291â314. issn: 0025570X, 19300980. url:http://www.jstor.org/stable/2690647.
[Kle99] Jon M Kleinberg. âAuthoritative sources in a hyper-linked environmentâ. In: Journal of the ACM (JACM) 46.5(1999), pp. 604â632.
[Koz97] Dexter Kozen. Automata and computability. New York:Springer, 1997. isbn: 978-3-642-85706-5.
[KT06] Jon M. Kleinberg and Ăva Tardos. Algorithm design.Addison-Wesley, 2006. isbn: 978-0-321-37291-8.
[Kun18] Jeremy Kun. A programmerâs introduction to mathematics.Middletown, DE: CreateSpace Independent PublishingPlatform, 2018. isbn: 1727125452.
[Liv05] Mario Livio. The equation that couldnât be solved : howmathematical genius discovered the language of symmetry.New York: Simon & Schuster, 2005. isbn: 0743258207.
[LLM18] Eric Lehman, Thomson F. Leighton, and Albert R. Meyer.Mathematics for Computer Science. 2018.
[LMS16] Helger Lipmaa, Payman Mohassel, and Seyed SaeedSadeghian. âValiantâs Universal Circuit: Improvements,Implementation, and Applications.â In: IACR CryptologyePrint Archive 2016 (2016), p. 17.
[Lup58] O Lupanov. âA circuit synthesis methodâ. In: Izv. Vuzov,Radiofizika 1.1 (1958), pp. 120â130.
[Lup84] Oleg B. Lupanov. Asymptotic complexity bounds for controlcircuits. In Russian. MSU, 1984.
[Lus+08] Michael Lustig et al. âCompressed sensing MRIâ. In:IEEE signal processing magazine 25.2 (2008), pp. 72â82.
[LĂŒt02] Jesper LĂŒtzen. âBetween Rigor and Applications: De-velopments in the Concept of Function in MathematicalAnalysisâ. In: The Cambridge History of Science. Ed. byMary JoEditor Nye. Vol. 5. The Cambridge History ofScience. Cambridge University Press, 2002, pp. 468â487.doi : 10.1017/CHOL9780521571999.026.
648 introduction to theoretical computer science
[LZ19] Harry Lewis and Rachel Zax. Essential Discrete Mathemat-ics for Computer Science. Princeton University Press, 2019.isbn: 9780691190617.
[Maa85] Wolfgang Maass. âCombinatorial lower bound argu-ments for deterministic and nondeterministic Turingmachinesâ. In: Transactions of the American MathematicalSociety 292.2 (1985), pp. 675â693.
[Mer07] N David Mermin. Quantum computer science: an introduc-tion. Cambridge University Press, 2007.
[MM11] Cristopher Moore and Stephan Mertens. The nature ofcomputation. Oxford University Press, 2011.
[MP43] Warren S McCulloch and Walter Pitts. âA logical calcu-lus of the ideas immanent in nervous activityâ. In: Thebulletin of mathematical biophysics 5.4 (1943), pp. 115â133.
[MR95] Rajeev Motwani and Prabhakar Raghavan. Randomizedalgorithms. Cambridge university press, 1995.
[MU17] Michael Mitzenmacher and Eli Upfal. Probability andcomputing: randomization and probabilistic techniques inalgorithms and data analysis. Cambridge university press,2017.
[Neu45] John von Neumann. âFirst Draft of a Report on the ED-VACâ. In: (1945). Reprinted in the IEEE Annals of theHistory of Computing journal, 1993.
[NS05] Noam Nisan and Shimon Schocken. The elements of com-puting systems: building a modern computer from first princi-ples. MIT press, 2005.
[Pag+99] Lawrence Page et al. The PageRank citation ranking: Bring-ing order to the web. Tech. rep. Stanford InfoLab, 1999.
[Pie02] Benjamin Pierce. Types and programming languages. Cam-bridge, Mass: MIT Press, 2002. isbn: 0262162091.
[Ric53] Henry Gordon Rice. âClasses of recursively enumerablesets and their decision problemsâ. In: Transactions of theAmerican Mathematical Society 74.2 (1953), pp. 358â366.
[Rog96] Yurii Rogozhin. âSmall universal Turing machinesâ. In:Theoretical Computer Science 168.2 (1996), pp. 215â240.
[Ros19] Kenneth Rosen. Discrete mathematics and its applications.New York, NY: McGraw-Hill, 2019. isbn: 125967651x.
[RS59] Michael O Rabin and Dana Scott. âFinite automata andtheir decision problemsâ. In: IBM journal of research anddevelopment 3.2 (1959), pp. 114â125.
BIBLIOGRAPHY 649
[Sav98] John E Savage. Models of computation. Vol. 136. Availableelectronically at http://cs.brown.edu/people/jsavage/book/. Addison-Wesley Reading, MA, 1998.
[Sch05] Alexander Schrijver. âOn the history of combinatorialoptimization (till 1960)â. In: Handbooks in operationsresearch and management science 12 (2005), pp. 1â68.
[Sha38] Claude E Shannon. âA symbolic analysis of relay andswitching circuitsâ. In: Electrical Engineering 57.12 (1938),pp. 713â723.
[Sha79] Adi Shamir. âFactoring numbers in O (logn) arith-metic stepsâ. In: Information Processing Letters 8.1 (1979),pp. 28â31.
[She03] Saharon Shelah. âLogical dreamsâ. In: Bulletin of theAmerican Mathematical Society 40.2 (2003), pp. 203â228.
[She13] Henry Maurice Sheffer. âA set of five independent pos-tulates for Boolean algebras, with application to logicalconstantsâ. In: Transactions of the American mathematicalsociety 14.4 (1913), pp. 481â488.
[She16] Margot Shetterly. Hidden figures : the American dream andthe untold story of the Black women mathematicians whohelped win the space race. New York, NY: William Morrow,2016. isbn: 9780062363602.
[Sin97] Simon Singh. Fermatâs enigma : the quest to solve the worldâsgreatest mathematical problem. New York: Walker, 1997.isbn: 0385493622.
[Sip97] Michael Sipser. Introduction to the theory of computation.PWS Publishing Company, 1997. isbn: 978-0-534-94728-6.
[Sob17] Dava Sobel. The Glass Universe : How the Ladies of theHarvard Observatory Took the Measure of the Stars. NewYork: Penguin Books, 2017. isbn: 9780143111344.
[Sol14] Daniel Solow. How to read and do proofs : an introductionto mathematical thought processes. Hoboken, New Jersey:John Wiley & Sons, Inc, 2014. isbn: 9781118164020.
[SS71] Arnold Schönhage and Volker Strassen. âSchnelle mul-tiplikation grosser zahlenâ. In: Computing 7.3-4 (1971),pp. 281â292.
[Ste87] Dorothy Stein. Ada : a life and a legacy. Cambridge, Mass:MIT Press, 1987. isbn: 0262691167.
[Str69] Volker Strassen. âGaussian elimination is not optimalâ.In: Numerische mathematik 13.4 (1969), pp. 354â356.
650 introduction to theoretical computer science
[Swa02] Doron Swade. The difference engine : Charles Babbage andthe quest to build the first computer. New York: PenguinBooks, 2002. isbn: 0142001449.
[Tho84] Ken Thompson. âReflections on trusting trustâ. In: Com-munications of the ACM 27.8 (1984), pp. 761â763.
[Too63] Andrei L Toom. âThe complexity of a scheme of func-tional elements realizing the multiplication of integersâ.In: Soviet Mathematics Doklady. Vol. 3. 4. 1963, pp. 714â716.
[Tur37] Alan M Turing. âOn computable numbers, with an ap-plication to the Entscheidungsproblemâ. In: Proceedingsof the London mathematical society 2.1 (1937), pp. 230â265.
[Vad+12] Salil P Vadhan et al. âPseudorandomnessâ. In: Founda-tions and TrendsÂź in Theoretical Computer Science 7.1â3(2012), pp. 1â336.
[Val76] Leslie G Valiant. âUniversal circuits (preliminary re-port)â. In: Proceedings of the eighth annual ACM sympo-sium on Theory of computing. ACM. 1976, pp. 196â203.
[Wan57] Hao Wang. âA variant to Turingâs theory of computingmachinesâ. In: Journal of the ACM (JACM) 4.1 (1957),pp. 63â92.
[Weg87] Ingo Wegener. The complexity of Boolean functions. Vol. 1.BG Teubner Stuttgart, 1987.
[Wer74] PJ Werbos. âBeyond regression: New tools for predictionand analysis in the behavioral sciences. Ph. D. thesis,Harvard University, Cambridge, MA, 1974.â In: (1974).
[Wig19] Avi Wigderson. Mathematics and Computation. Draftavailable on https://www.math.ias.edu/avi/book.Princeton University Press, 2019.
[Wil09] Ryan Williams. âFinding paths of length k in đâ(2đ)timeâ. In: Information Processing Letters 109.6 (2009),pp. 315â318.
[WN09] Damien Woods and Turlough Neary. âThe complex-ity of small universal Turing machines: A surveyâ. In:Theoretical Computer Science 410.4-5 (2009), pp. 443â450.
[WR12] Alfred North Whitehead and Bertrand Russell. Principiamathematica. Vol. 2. University Press, 1912.