arXiv:cs/0205031v1 [cs.CC] 18 May 2002 · complexity and Kolmogorov complexity and are covered in...

57
arXiv:cs/0205031v1 [cs.CC] 18 May 2002 Lecture Notes on Evasiveness of Graph Properties Lectures by aszl´ o Lov´ asz Computer Science Department, Princeton University, Princeton, NJ 08544, USA, and Computer Science Department, E˜ otv˜osLor´andUniversity, Budapest, Hungary H-1088. Notes by Neal Young Computer Science Department, Princeton University, Princeton, NJ 08544, USA. Abstract These notes cover the first eight lectures of the class Many Models of Complexity taught by L´ aszl´ o Lov´asz at Princeton University in the Fall of 1990. The first eight lectures were on evasiveness of graph prop- erties and related topics; subsequent lectures were on communication complexity and Kolmogorov complexity and are covered in other sets of notes. The fundamental question considered in these notes is, given a func- tion, how many bits of the input an algorithm must check in the worst case before it knows the value of the function. The algorithms consid- ered are deterministic, randomized, and non-deterministic. The func- tions considered are primarily graph properties — predicates on edge sets of graphs invariant under relabeling of the edges. 1

Transcript of arXiv:cs/0205031v1 [cs.CC] 18 May 2002 · complexity and Kolmogorov complexity and are covered in...

arX

iv:c

s/02

0503

1v1

[cs

.CC

] 1

8 M

ay 2

002 Lecture Notes on

Evasiveness of Graph Properties

Lectures by

Laszlo Lovasz

Computer Science Department, Princeton University,

Princeton, NJ 08544, USA,

and Computer Science Department, Eotvos Lorand University,

Budapest, Hungary H-1088.

Notes by

Neal Young

Computer Science Department, Princeton University,

Princeton, NJ 08544, USA.

Abstract

These notes cover the first eight lectures of the class Many Modelsof Complexity taught by Laszlo Lovasz at Princeton University in theFall of 1990. The first eight lectures were on evasiveness of graph prop-erties and related topics; subsequent lectures were on communicationcomplexity and Kolmogorov complexity and are covered in other setsof notes.

The fundamental question considered in these notes is, given a func-tion, how many bits of the input an algorithm must check in the worstcase before it knows the value of the function. The algorithms consid-ered are deterministic, randomized, and non-deterministic. The func-tions considered are primarily graph properties — predicates on edgesets of graphs invariant under relabeling of the edges.

1

2

Contents

1 Decision Trees and Evasive Properties 5

1.1 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 An Evasive Function . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 A Non-Evasive Function . . . . . . . . . . . . . . . . . . . . . 6

1.4 Non-Deterministic Complexity . . . . . . . . . . . . . . . . . 7

1.5 D(f) ≤ D0(f)D1(f) . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 The Aanderaa-Karp-Rosenberg Conjecture . . . . . . . . . . . 8

2 Evasiveness, continued 11

2.1 Connectivity is Evasive . . . . . . . . . . . . . . . . . . . . . . 11

2.2 “Tree” Functions are Evasive . . . . . . . . . . . . . . . . . . 11

2.3 The AKR Conjecture is True for Prime n . . . . . . . . . . . 11

3 Non-Evasive Monotone Properties Give Contractable Com-plexes 15

3.1 Simplicial Complexes . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Contractability . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Monotone Functions . . . . . . . . . . . . . . . . . . . . . . . 18

4 Fixed Points of Simplicial Maps Show Evasiveness 21

4.1 Fixed Points of Simplicial Mappings . . . . . . . . . . . . . . 21

4.2 Fixed Point Theorems . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Application to Graph Properties . . . . . . . . . . . . . . . . 23

5 Non-Deterministic and Randomized Decision Trees 27

5.1 Near Evasiveness of Monotone Graph Properties . . . . . . . 27

5.2 Non-Deterministic Decision Trees . . . . . . . . . . . . . . . . 28

5.3 Randomized Decision Trees . . . . . . . . . . . . . . . . . . . 29

5.3.1 Details of Calculating ak and bk . . . . . . . . . . . . 32

6 Lower Bounds on Randomized Decision Trees 35

6.1 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.2 Von Neumann’s Min-Max Theorem . . . . . . . . . . . . . . . 36

6.3 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.3.1 Graph Packing . . . . . . . . . . . . . . . . . . . . . . 40

6.3.2 Yao’s dmax/d Lemma . . . . . . . . . . . . . . . . . . . 41

3

7 Randomized Decision Tree Complexity, continued 437.1 More Graph Packing . . . . . . . . . . . . . . . . . . . . . . . 437.2 Application of Packing Lemma . . . . . . . . . . . . . . . . . 457.3 An Improved Packing Lemma . . . . . . . . . . . . . . . . . . 46

8 Randomized Complexity of Tree Functions — Lower Bounds 518.1 Generalized Costs . . . . . . . . . . . . . . . . . . . . . . . . . 518.2 The Saks-Wigderson Lower Bound . . . . . . . . . . . . . . . 53

4

5

x2?0 1

1

0

01

0 1

.

.

.

.

.

.

.

.

.

x1?x3?

x1?

x7?

Figure 1: A Simple Decision Tree

1 Decision Trees and Evasive Properties

The goal of this course is to examine various ways of measuring the complex-ity of computations. In this lecture, we discuss the decision tree complexityof functions. We begin a characterization of which functions require that forany deterministic algorithm for computing the function, there is some inputfor which the algorithm checks all the bits of the input.

1.1 Decision Trees

A decision tree is a tree representing the logical structure of certain algo- decision tree

rithms on various inputs. The nodes of the tree represent branch points ofthe computation — places where more than one outcome are possible basedon some predicate of the input — and the leaves represent possible out-comes. Given a particular input, one starts at the root of the tree, performsthe test at that node, and descends accordingly into one of the subtrees ofthe root. Continuing in this way, one reaches a leaf node which representsthe outcome of the computation.

Given a function f : 0, 1n → 0, 1, a simple decision tree for the simple decision tree

function is a binary tree whose internal nodes have labels from 1, 2, . . . , nand whose leaves have labels from 0, 1. If a node has label i, then the testperformed at that node is to examine the ith bit of the input. If the result is0, one descends into the left subtree, whereas if the result is 1, one descendsinto the right subtree. The label of the leaf so reached is the value of the

6

function on the input.

While it is clear that any such function f has a simple decision tree, wewill be interested in simple decision trees for f which have minimal depthD(f). D(f) is called the decision tree complexity of f .decision tree complexity

of f It is clear that D(f) is at most the number of variables of f . A simple ex-ample which achieves this upper bound is the parity function f(x1, . . . , xn) =x1 + x2 + · · ·+ xn mod 2. For this function every leaf of any simple decisiontree for f has depth n, because if the value of some xi has not been examinedby the time a leaf is reached for an input x, the tree gives the same answerwhen xi is flipped, so the function computed is not parity.

1.2 An Evasive Function

A function f with D(f) equal to the number of variables is said to be evasive.evasive

A less trivial example of an evasive function is

f(xij : i, j ∈ 1, . . . , n) =∧

i

j

xij,

that is f is 1 iff every row of the matrix with entries xij has at least one 1.

To show f is evasive, we use an adversary argument. We simulate theadversary argument

computation of some decision tree, except instead of checking the bits ofthe input directly, we ask the adversary. The adversary, when asked for thevalue of xij, responds 0 as long as some other variable in the row remainsundetermined, and 1 otherwise. In this way the adversary maintains that thevalue of the function is undetermined until all variables have been checked.Note that in the case we show only that some leaf is of depth n2.

1.3 A Non-Evasive Function

Next we give a non-trivial example of a non-evasive function. Given players1, 2, . . . , n, let xij : 1 ≤ i < j ≤ n be 1 if player i will beat player j if theyplay each other and 0 if j will beat i. (No draws allowed. Note that thisis not necessarily a transitive relation.) The function is 1 iff there is someplayer who will beat everyone.

The object is to determine f without playing all possible matches. Todo this, first play a “knockout tournament” — have 1 and 2 play, have thewinner play 3, have the winner play 4, etc. until every player but some playeri has lost to somebody. Now play i against everyone he hasn’t played. Ifi wins all his matches, f is 1, otherwise f is 0. The number of matches

1—Decision Trees and Evasive Properties 7

played in the first stage is n − 1, and at most n − 2 are played in thesecond, so D(f) ≤ 2n − 3. (How can one redesign the first stage to showD(f) ≤ 2n − ⌊log2 n⌋?)

1.4 Non-Deterministic Complexity

The basic idea behind these two examples is that for most functions (whatare the two exceptions?) there are proper subsets of the variables whosevalues can determine the value of the function irrespective of the values ofthe other variables. The goal in minimizing decision tree depth is to discoverthe partial assignments as quickly as possible, while the goal in showing largedecision tree complexity is to show this is not possible. Define

D1(f) = maxx:f(x)=1

mink : ∃i1, . . . , ik, ǫ1, . . . , ǫk : f |xi1=ǫ1,...,xik

=ǫk≡ 1,

D0(f) = maxx:f(x)=0

mink : ∃i1, . . . , ik, ǫ1, . . . , ǫk : f |xi1=ǫ1,...,xik

=ǫk≡ 0.

That is, Di(f) is the least k so that from every assignment we can pick kvariables such that assigning only these k values already forces the functionto be i. Alternatively, Di(f) corresponds to the non-deterministic decisiontree complexity of verifying f(x) = i, and maxD0(f),D1(f) is the non-deterministic decision tree complexity of computing f . (A non-deterministiccomputation may be considered as an ordinary computation augmented bythe power to to make lucky guesses.)

1.5 D(f) ≤ D0(f)D1(f)

For boolean x let xǫ denote x if ǫ = 0 and x if ǫ = 1. The representation

f(x1, . . . , xn) =∨

l

i∈Sl

xǫil

i

of f in terms of the disjunction of a number of elementary conjunctions ofliterals1 is called a disjunctive normal form (DNF) of f . literal

disjunctive normal formIf we can represent f in DNF so that every elementary conjunction hasat most k terms, then D1(f) ≤ k, because if any partial assignment ofvariable forces f to be 1, it must force some elementary conjunction to be 1.Conversely, there exists a DNF for f in which every elementary conjunctionhas at most D1(f) terms: for ǫ : f(ǫ) = 1 let Sǫ be the indices of the

1A literal is a boolean variable or its negation.

8

minimum set of (at most D1(f)) variables whose assignment xi = ǫi forcesf to 1. Then

f(x) =∨

ǫ:f(ǫ)=1

i∈Sǫ

x1−ǫi

i .

One can similarly correlate D0 and the conjunctive normal form CNF ofconjunctive normal form

f .

Next, we show the surprising relation D(f) ≤ D1(f)D0(f). Write f si-multaneously in DNF and CNF so that the sizes of the elementary conjunc-tions (disjunctions for CNF) do not exceed D1(f) (D0(f)). To determinethe value of f on an input x, we use the following strategy. We choose thefirst variable xi in the first elementary conjunction of the DNF, and queryits value ǫi. We then substitute the value ǫi for the variable xi in the DNFand the CNF and simplify, obtaining a DNF and CNF for f ′ = f |xi=ǫi

.Since each elementary conjunction in the new DNF has size at most D1(f),D1(f

′) ≤ D1(f). Similarly, D0(f′) ≤ D0(f).

The crucial observation is that each elementary disjunction in the CNFhas a variable (in fact a literal) in common with each elementary conjunctionin the DNF. (Otherwise the variables in the elementary disjunction and theelementary conjunction can be simultaneously set to force the function to 0and 1.) Thus by continuing the above process, by the time we have queriedall of the at most D1(f) variables in the first elementary conjunction, wehave reduced the size of every elementary disjunction by at least 1. Itfollows that we can query at most the variables in the first D0(f) elementaryconjunctions before we have determined the value of the function. ThusD(f) ≤ D0(f)D1(f).

Recalling the earlier remark about non-determinism, D1(f), and D0(f),one might say that the above shows that in this model NP ∩ co − NP = P.

1.6 The Aanderaa-Karp-Rosenberg Conjecture

We can represent functions on graphs by encoding the adjacency matrix inthe input to the function. For an undirected graph with n nodes, we letxG

ij : 1 ≤ i < j ≤ n represent the presence or absence of the edge (i, j) bytaking the value 1 or 0 respectively.

In this way we can represent arbitrary functions on graphs. Generally,however, we will restrict our attention to graph properties — boolean func-graph properties

tions whose values are independent of the labeling of the nodes of the graph.Technically, f : xij : 1 ≤ i < j ≤ n → 0, 1 is a graph property if for any

1—Decision Trees and Evasive Properties 9

Π ∈ Sn,2 and for any x, Sn

f(. . . , xij, . . .) = f(. . . , xΠ(i)Π(j), . . .).

The Aanderaa-Karp-Rosenberg (AKR) Conjecture is that any mono-tone3 , non-trivial graph property is evasive. It is known to be true for n a monotone

prime power, and counter-examples are known if the monotonicity require-ment is dropped.

A generalization of this conjecture follows. F is weakly symmetric if there weakly symmetric

exists a transitive4 group G ⊆ Sn such that for all g ∈ G, f(. . . , xi, . . .) = transitive

f(. . . , xg(i), . . .). The generalized conjecture is that any monotone, non-trivial, weakly symmetric boolean function is evasive.

For example, suppose f ≡ “graph G has no isolated node”. First, observethat for general f , if #x ∈ 0, 1n : f(x) = 1 is odd, then f is evasive. Tosee this, observe that for any xi the above property is maintained for eitherf |xi=0 or f |xi=1, so that the adversary can answer queries so as to maintainthe property as f is restricted. As long as the number of unqueried variablesis at least 1, the size of the range of the restricted function is even, so theproperty ensures that the function is not constant.

For the above choice of f , an inclusion/exclusion argument shows

#G : G has no isolated vertex ≡n∑

k=0

(−1)k−1

(n

k

)2(

n−k

2 )(mod 2)

≡ (−1)n−1n + (−1)n(mod 2).

Thus provided n is even, an odd number of graphs have no isolated nodes,and f is evasive.

Note for later that we can generalize the above condition. In particular,an inductive argument in the same spirit shows:

Lemma 1.1

2n−D(f) |#x : f(x) = 1.

2Sn denotes the symmetric group on n elements, also known as the set of permutationsof size n

3A graph property is monotone if adding edges to the graph preserves the property.4G is transitive if ∀i, j∃g ∈ G : g(i) = j.

10

11

2 Evasiveness, continued

2.1 Connectivity is Evasive

If f ≡ “G is connected”, then f is evasive. To see this have the adversary an-swer “no” unless that answer would imply that the graph was disconnected,in which case she answers “yes”. In this way the adversary maintains thata spanning tree exists among the “yes” and unqueried edges. If some edge(i, j) has not been queried, can the answer be known? If it is known, it mustbe “yes”, and the “yes” edges must contain a spanning tree, so a path of“yes” edges connects i to j. Of the edges on this path, suppose the last edgequeried is (u, v). At this point, we have a contradiction, because the the ad-versary could have answered “no” to the query of (u, v) while maintainingthe possible connectedness of the graph through the other “yes” edges andedge (i, j).

Consideration shows this argument generalizes to any monotone f withthe property that for any x such that f(x) = 1, and any xi = 1, we can setxi = 0, possibly setting some other xj = 1, without changing the value ofthe function.

2.2 “Tree” Functions are Evasive

A general class of simple but evasive functions are tree functions — those tree functions

which have formulas using ∨ and ∧ in which every variable occurs exactlyonce. The adversary has the following strategy. When asked for the value ofxi, if xi occurs in a conjunction (· · · ∧xi ∧ · · ·) in the formula, the adversaryclaims xi = 1. Otherwise xi occurs in a disjunction and the adversaryresponds that xi = 0. The adversary plugs the answered value into theformula, simplifies it, and continues. In this way, the adversary maintainsthat one variable is removed from the formula with each question, so theresult can not be known unless every variable has been queried. (Clearlythe same proof applies if the formula also contains negations.)

2.3 The AKR Conjecture is True for Prime n

Previously we proved that if the number of x with f(x) = 1 is odd, then f isevasive, and noted that this can be generalized to show that 2n−D(f) dividesthis number. Here is an alternate extension: let |x| denote the number of

12

1’s in x. Define

µ(f) =∑

f(x)=1

(−1)|x|.

Then we can use the property µ(f) = µ(f |xi=0) − µ(f |xi=1) to show thatif µ(f) 6= 0 then f is evasive. In particular, the adversary maintains that µapplied to the restricted function (i.e. f restricted by the partial assignmentgiven by the adversary’s responses so far) is non-zero, so that the restrictedfunction is non-trivial unless all variables have been queried. (The readermay want to check the base case of this argument.)

More generally, define pf (t) =∑

x f(x)t|x|. Then for a constant functionc of k variables, pc(t) = (1 + t)k, and so an inductive argument similar tothe above shows

(t + 1)n−D(f) | pf (t).

Next we use the µ criterion to prove the generalization of the AKRconjecture for prime n. A counter-example exists with n = 14 when n is notrequired to be prime.

Theorem 2.1 If f : 0, 1n → 0, 1 is weakly symmetric, f(0) 6= f(1),and n is prime, then f is evasive.

Proof: We will show µ(f) =∑

x f(x)(−1)|x| 6= 0. The first part of theproof is to use the weak symmetry of f and the primality of n to show thatthere is a permutation consisting of a single cycle leaving f invariant. Thesecond is to use this fact and the primality of n to group the inputs yieldingf(x) = 1 except 0 or 1 into equivalence classes of size n, thus showing thatµ(f) ≡ 1(mod n), so that µ(f) 6= 0.

Since f is weakly symmetric, there exists a transitive subgroup Γ of Sn

leaving f invariant. Consider the partition of Γ = U1∪· · ·∪Un where g ∈ Ui

iff g(1) = i. The transitivity of Γ ensures that each Ui is of the same size,so n divides |Γ|. Since n is prime and n | |Γ|, Cauchy’s theorem implies thatΓ contains an element γ of order n. Since n is prime, such a permutationnecessarily consists of a single cycle.

Now (assuming WLOG that f(0) = 0) we partition the inputs x intoclasses such that two elements are in the same class iff one is obtainablefrom the other by rotation (i.e. application of γ). Since γ leaves f invariant,and n is prime, it follows that unless every xi is the same, each of the npossible rotations of x are distinct. Thus the values of x such that f(x) = 1can be partitioned into classes of size n, except for x = 1. It follows that

2—Evasiveness, continued 13

the number of such inputs modulo n is 1, so that the number of distinctnon-zero terms in the expression for µ is 1 modulo n, and µ is not zero.

[Here is a sketch of how to generalize the theorem for n = pa a primepower. It is no longer necessarily true that Γ has a cyclic element, but nowΓ has a transitive (sylow) subgroup Γ′ of order pb, with pb but not pb+1

dividing |Γ|.Again we group the terms of µ(f) so that two x’s are in the same group

if mapped by Γ′ to each other. We look at the orbits of Γ′ acting on 0, 1n.The number of elements in an orbit divides Γ′ = pb and is not equal to 1unless x = 0 or x = 1, and all vectors in the same orbit give the same valueof f .

Using this grouping we show µ(f) ≡ (−1)n mod p, so µ(f) 6= 0.]Before we observed that (t + 1)n−D(f) | pf (t) =

∑x f(x)t|x|. We define

pf (t1, . . . , tn) =∑

x f(x)tx11 · · · txn

n , and generalize this observation in thenext lemma.

Lemma 2.2 pf ∈⟨(ti1 + 1) · · · (tin−D(f)

+ 1) : 1 ≤ i1 < · · · < in−D(f) ≤ n⟩

5 ideal

Proof: First, if f ≡ 0, pf = 0, and if f ≡ 1, pf =∑

x tx11 · · · txn

n =(t1 + 1) · · · (tn + 1). If f is not constant, fix a minimum depth decision treefor f and use pf = p f |x1=0

+ t1p f |x1=1to expand pf into a sum of terms,

each term corresponding to a “yes” leaf of the tree. Each such term is of the

form (Πi∈S1ti) ×(Πi∈S(ti + 1)

), where S1 is the set of indices of variables

queried and found to be 1, and S is the set of variables queried.

Here is another way to look at this result. Let zi = ti + 1, and

Qf (z1, . . . , zn) = pf (z1 − 1, . . . , zn − 1)

=∑

x

f(x)(z1 − 1)x1 · · · (zn − 1)xn

=∑

x

f(x)∑

y≤x

zy11 · · · zyn

n × (−1)|x−y|

5〈· · ·〉 represents the ideal generated by · · · (the smallest set of polynomials closed undersubtraction and under multiplication by any polynomial). An equivalent formulation ofthis lemma is

pf =∑

1≤i1<···<in−D(f)≤n

Pi1,...,in−D(f)× (ti1 + 1) · · · (tin−D(f)

+ 1),

where the P··· are integer coefficient polynomials of the ti.

14

x2?1

1

01

1 ...

.

.

.

.

.

.

x1?x3?

x1?

x7?

x2

x3

x7

0x3(1+x4)(1+x5)(1+x6)

Figure 2: A term of Pf

=∑

y

x≥y

(−1)|x−y|f(x)

zy1

1 · · · zynn

(Here the inequality y ≤ x means ∀i, yi ≤ xi.)By the previous lemma, the terms in Qf have degree at least n − D(f)

in the zi. Thus if |y| < n − D(f), then∑

x≥y(−1)|x−y|f(x) = 0. The lefthand side of this equality is known as the mobius transform Mf of f . Thusmobius transform

we have:

Corollary 2.3 For |y| ≤ n − D(f), (Mf)(y) = 0.

15

3 Non-Evasive Monotone Properties Give Contractable

Complexes

In the previous lecture, we showed that every non-trivial weakly symmetricfunction on pk variables for prime p is evasive.

In this lecture we continue our study of evasiveness, introducing sometopological concepts related to simplicial complexes. We show that thesimplicial complex associated with a non-evasive monotone function is con-tractable. This is the first part of a technique due to Kahn, Saks, andSturtevant; our goal is to prove that all non-trivial, monotone, bipartitegraph properties6 and non-trivial, monotone graph properties of graphs witha prime power number of nodes are evasive.

3.1 Simplicial Complexes

A simplicial complex is a finite collection K of sets such that simplicial complex

1. ∀X ∈ K, Y ⊆ X ⇒ Y ∈ K, and

2. K 6= ∅.

V (K), the vertices of K, consists of the elements of the sets in K. V (K)

Corresponding to K one can construct a geometric realization K ⊆IRV(K). First, one defines the mapping · : V (K) → IRV(K) so that no ver-tex is mapped into the affine hull7 of any other subset of the vertices (for affineS, convS

instance, one maps the vertices to the unit vectors). Then one extends · toany set X of vertices by X = convv : v ∈ X, and to any collection C(such as K) of sets by C = ∪X∈CX.

Note that for any X ∈ K, X is a simplex — the convex hull of a set simplex

of vectors none of which lies in the affine hull of any subset of the others.(Such a set of vectors is said to be affinely independent.) affinely independent

A collection K = S1, . . . , Sm of simplices in IRN is said to form ageometric simplicial complex if: geometric simplicial

complex

1. ∀Si ∈ K, T a face8 of Si ⇒ T ∈ K, and face

6 A bipartite graph property f(xij : i ∈ V, j ∈ W ) is a boolean function invariant underpermutations of the edges induced by permutations of V and W . See “graph property”.

7 affineS =∑

v∈Sαvv :

∑αv = 1

;

convS =∑

v∈Sαvv :

∑αv = 1, αv ≥ 0

.

8A face of a simplex S = convV is a set S′ = convV ′ : V ′ ⊆ V .

16

2. ∀Si, Sj ∈ K, Si ∩ Sj 6= ∅ ⇒ Si ∩ Sj is a face of both Si and Sj .

A polyhedron is then defined as the union of the sets in any geomet-polyhedron

ric simplicial complex. Note that such an entity is not necessarily convex,for instance the surface of an octahedron is a polyhedron formed by thegeometric simplicial complex consisting of its faces, edges, and vertices.

3.2 Contractability

Intuitively, a set T ⊆ IRN is contractable if it can be continuously shrunk to acontractable

single point, while never breaking through its original boundary. Technically,T is contractable if there exists a continuous mapping Φ : T × [0, 1] → Twith ∀x ∈ T,Φ(x, 0) = x,Φ(x, 1) = p0 for some p0 ∈ T . One can show thatthe choice of p0 is immaterial.

If T consists of 2 distinct points in IR1, no such mapping can exist becauseat some time the mapping would have to switch from mapping a point toitself to mapping the point to the other point.

If the underlying simplicial complex is a graph, if the graph is discon-nected one can similarly show that the set is not contractable. Similarly,if the graph has a cycle, at any time the cycle will be in the image of themapping, so a cyclic graph is not contractable.

Conversely, if the graph is a tree, then one can contract the graph byrepeatedly contracting the edges leading to leaves.

(Note that we consider contractability of a simplicial complex synony-mous with the contractability of its geometric realizations.)

Generalizing the contraction of a tree described above, we will obtaina useful sufficient condition for contractability. (Surprisingly, for a generalsimplicial complex K, it is undecidable whether K is contractable.) Forv ∈ V (K), define

K\v = X ∈ K : v 6∈ XK/v = X ∈ K : v 6∈ X,X ∪ v ∈ K.

The first is called K minus v; the second is called link of v in K.K minus vlink of v in K Considering the boundary of the 3 dimensional simplex, ∂∆3, which is

not contractable, one sees that ∂∆3\v ≡ ∆2 is contractable but ∂∆3/v, athree node cycle, is not.

Lemma 3.1 If for some v, K/v and K\v are contractable, then K is con-tractable.

3—Non-Evasive Monotone Properties Give Contractable Complexes 17

v

Figure 3: ∂∆3,∂∆3\v, and ∂∆3/v.

v p

p’

λ=0

λ=1 λ=t

K/v

K\v

Cone(v,K/v)

Figure 4: Contraction of C onto K/v.

18

Proof: Let C denote X ∈ K : v ∈ X, so that K = C ∪ K\v.The first step in contracting K is to use the contractability of K/v to

construct a mapping Ψ which contracts9 C onto K/v, leaving K\v fixed.

Once this is accomplished, all of the points have been contracted into K\v,

so applying the contraction of K\v completes the contraction of K.

Suppose K/v is contracted by Φ to p0. Denote a point p in C by (p′, λ),

where p′ ∈ K/v and λ ∈ [0, 1] such that p = λp′ + (1 − λ)v. (Note that thisdenotation is continuous and invertible except at v.)

Let Cλ = (p′, λ) : p′ ∈ K/v, so C1 = K/v and C0 = v.The contraction can be envisioned as flattening C. At time t, each Cλ

for λ < t will have been flattened into Ct, until all of C is flattened into K/v.Once Cλ is mapped onto Ct, as t grows, instead of letting the image of Cλ

grow with Ct (which would lead to a discontinuity at v), we contract it usingΦ to counteract the growth.

Ψ(p, t) =

(Φ(p′, 1 − λ/t), t) if t ≥ λ, andp if t ≤ λ.

If t = 0 or λ = 1 then Ψ is the identity. If t = 1 all points are mappedinto K/v. We leave it to the reader to verify the continuity of Ψ, remarkingonly that as p → v, λ → 0, so Φ(p′, λ) → p0, independent of p′.

3.3 Monotone Functions

A monotone boolean function f 6≡ 1 gives a simplicial complex

Kf =S ⊆ 1, . . . , n : f(xS) = 0

in a natural way, and vice versa.10 Also,

Kf |xi=0= S ⊆ 1, . . . , i − 1, i + 1, . . . , n : S ∈ Kf = Kf\i,

Kf |xi=1= S ⊆ 1, . . . , i − 1, i + 1, . . . , n : S ∪ i ∈ Kf = Kf/i.

By now we may begin to suspect a relation between non-evasiveness andcontractability. We prove such a relation in the next lemma.

9 We generalize the notion of contraction to a point in the natural way to allow con-traction to arbitrary contractable subsets.

10(xS)

i=

1 i ∈ S,0 i 6∈ S.

3—Non-Evasive Monotone Properties Give Contractable Complexes 19

Lemma 3.2 (Kahn-Saks-Sturtevant) If f 6≡ 1 is non-evasive, then Kf

is contractable.

Proof: Assume f : 0, 1n → 0, 1 is non-evasive, with f 6≡ 1.If n > 1, then there exists an i such that f |xi=0 and f |xi=1 are non-

evasive. Provided f |xi=1 6≡ 1, we can assume by induction that Kf |xi=0and

Kf |xi=1are contractable. By the preceding lemma and remarks, it follows

that Kf is contractable.If n > 1 and f |xi=1 ≡ 1, then Kf = Kf |xi=0

, and f |xi=0 is non-evasive,

so again by induction Kf is contractible.Otherwise n = 1, so f ≡ 0 and Kf = ∅, 1, which is contractible.

We have now established a link between evasiveness of monotone func-tions and the contractability of the associated simplicial complex. In thenext lecture, we will use the symmetry properties of monotone graph prop-erties and some more topology to show in some cases that the associatedcomplexes are not contractable, and thus that the original functions areevasive.

20

21

4 Fixed Points of Simplicial Maps Show Evasive-

ness

In the previous lecture we showed that the simplicial complex associatedwith a monotone non-evasive function is contractable. In this lecture wepresent the following argument.

Standard fixed point theorems in topology tell us that a continuous func-tion mapping a contractable polyhedron into itself has a fixed point. On theother hand, the invariance of a monotone function f under a permutation πof the inputs implies that the geometric realization of the permutation mapsKf into itself, and thus has a fixed point if f is non-evasive. For f a mono-tone bipartite graph property or a monotone graph property on graphs witha prime power number of nodes, we characterize the possible fixed point setsof such mappings to show that if f is non-trivial, no fixed point can exist,so that f is evasive.

4.1 Fixed Points of Simplicial Mappings

Suppose K and K′ are (abstract) simplicial complexes. Then ϕ : V (K) →V (K′) is a simplicial map provided ∀X ∈ K ⇒ ϕ(X) ∈ K′, that is, provided simplicial map

ϕ preserves the property of being in the complex when applied to sets. Sucha map yields a continuous linear map ϕ : K → K′ by mapping the verticesof K in correspondence with ϕ, and mapping convex combinations of thevertices to the corresponding convex combinations of their images:

ϕ

(∑

v∈Kαv v

)=∑

v∈Kαvϕ(v).

Note that for x ∈ K, the representation x =∑

v∈K αv v :∑

v αv = 1 isunique. The set ∆x = v : αv 6= 0, a simplex of K, is called the supportsimplex of x, and is, of course, also unique. support simplex

So given a simplicial map ϕ : K → K, which for our purposes we willassume is one-to-one, what are the fixed points? Suppose x is a fixed pointwith support simplex ∆. Then ϕ(∆) contains x, and hence contains ∆.Since ϕ(∆) and ∆ are the same size, it follows that ϕ(∆) = ∆, that is ϕpermutes the vertices in ∆. This in turn implies that the center of gravity11 center of gravity

of ∆ is also fixed by ϕ.

11The center of gravity of a face H is∑

v∈Hv/|H |.

22

Figure 5: Fixed Points of Simplicial Maps of ∆3

What are the other fixed points in ∆? If the orbits12 of the permutationorbit

induced by ϕ on ∆ are H1, . . . ,Hk, then Hi is a face of ∆, with the centerof gravity a fixed point. Also, any convex combinations of these centers ofgravity is a fixed point.

Conversely, if x =∑

v∈∆ αv v is a fixed point, then x = ϕ(x) =∑

v∈∆ αvϕ(v)is also a representation of x, and because the representation of x in thisway is unique, there exists a permutation π of the vertices in ∆ such thatαπ(v)π(v) = αvϕ(v), that is, ϕ(v) = π(v) and αv = απ(v). It follows thatαv = αϕ(v), so that for each orbit Hi of ϕ on ∆, we can choose βi so thatu ∈ Hi ⇒ αu = βi. Thus we have

x =∑

v∈∆

αv v =∑

i

v∈Hi

βiv =∑

i

βi|Hi|wi.

In other words, x is a convex combination of the centers of mass of the facescorresponding to the orbits.

To view this from a more combinatorial perspective, suppose the orbitsof ϕ on the vertices of K are H1, . . . ,HN , and assume that the first t ofthese are those which are also simplices of K. Let wi : 1 ≤ i ≤ t denote thecenter of gravity of Hi. Then each wi is a fixed point, and any proper convexcombination x of a subset wi1 , . . . , wir ⊆ w1, . . . , wt is also a fixed point,provided only that the x is in fact in K, that is, provided Hi1 ∪· · ·∪Hir ∈ K.

In sum, if fix(ϕ) denotes the fixed points of ϕ, then fix(ϕ) = H, where

H = i1, . . . , ir : Hi1 ∪ · · · ∪ Hir ∈ K ,

and the vertices V (H) of the simplicial complex H are 1, . . . , t, with itaken to be the center of gravity of the face Hi.

12The orbits, or cycles, of a permutation are the minimal sets of elements such that thepermutation takes no element out of its set.

4—Fixed Points of Simplicial Maps Show Evasiveness 23

4.2 Fixed Point Theorems

Next we give some theorems which give sufficient conditions for the existenceof fixed points, and which characterize some useful properties of the fixedpoint sets. The first is Brouwer’s fixed point theorem.

Theorem 4.1 (Brouwer) Any continuous map of a simplex to itself hasa fixed point.

An alternate formulation of this theorem follows.

Theorem 4.2 There does not exist a continuous map from a simplex to itsboundary leaving the boundary fixed.

If there were a continuous function f : S → S with no fixed point, wecould construct a function g : S → ∂S leaving ∂S fixed as follows. Givenx ∈ S, obtain g(x) by projecting a ray from the point f(x) through thepoint x to the boundary ∂S.

Note that for any polyhedron S, if ∂S is not contractable, then there cannot exist a continuous map from S to its boundary leaving the boundaryfixed.

[Also recall our earlier claim that a set is contractable iff the cone13 cone

formed by the set with an affinely independent point v can be mappedcontinuously to the set.]

Theorem 4.3 (Lefshetz) If K is contractable, then any continuous mapfrom K to itself has a fixed point.

4.3 Application to Graph Properties

We are finally in a position to apply these techniques to show evasiveness.Recall the previous result that a function f : 0, 1n → 0, 1 invariantunder some cyclic permutation of its inputs and with f(0) 6= f(1), is evasive,provided the number of inputs is prime. To start, we show that any non-trivial monotone function invariant under a cyclic permutation of the inputsis evasive.

Lemma 4.4 Suppose f : 0, 1n → 0, 1 is a monotone, non-trivial func-tion invariant under a cyclic permutation of its inputs. Then f is evasive.

13The cone of a set with a vertex consists of the convex combinations of v with pointsin the set.

24

Proof: Assume without loss of generality that f is invariant under thepermutation ϕ(i) = i+1 mod n, and assume f is non-evasive. Consider Kf .

As shown in the previous lecture, Kf is contractable. Since f is invariantunder ϕ, ϕ is a simplicial map of Kf , so that ϕ has a fixed point (by Lefshetz’theorem).

As discussed in the beginning of this lecture, the fixed points correspondto orbits of ϕ contained in Kf . Since the only orbit of ϕ is 1, 2, . . . , n, thisset must be in Kf . Thus f(1, 1, . . . , 1) = 0, a contradiction.

The success of this technique hinges on our being able to characterizethe orbits, and hence the fixed point set, of a permutation under which thefunction is invariant. The proof of the next theorem is essentially the sameas the previous, except that the characterization of the orbits is trickier.Before we give the theorem, we give the Hopf index formula, an extensionof Lefshetz’ fixed point theorem which we need for the proof.

Theorem 4.5 (Hopf Index Formula) For ϕ a simplicial one-to-one map-ping of K a contractable simplicial complex, the Euler characteristic14 of theEuler characteristic

fixed point set H of ϕ is -1.

Theorem 4.6 (Yao) Non-trivial monotone bipartite graph properties areevasive.

Proof: Let f(xij : i ∈ U, j ∈ W ) be a non-evasive monotone bipartitegraph property. Let ϕ be a permutation of the edges corresponding to acyclic permutation of the vertices of W while leaving U fixed.

The fixed point set of ϕ on Kf is characterized by:

fix(ϕ) = H, ui1 , . . . , uir ∈ H ⇔ f(xui1,...,uir×W ) = 0

with the vertices of H being the centers of gravity of the faces correspondingto the orbits of ϕ.

The orbits of ϕ correspond to the nodes of U : each orbit contains all ofthe edges touching a single node of U , so we identify each vertex of H witha node of U . Since Kf is contractable, there are fixed points, so some edge

14The Euler characteristic χ(K) of a polyhedron K is defined by χ(K) =∑x∈K,x 6=∅(−1)|x|. (Recall µ(f).) The Euler characteristic is invariant under topolog-

ical deformation, and is thus useful for classifying topological types. For instance, acontractable set has Euler characteristic -1.

4—Fixed Points of Simplicial Maps Show Evasiveness 25

set u × W has f(xu×W ) = 0.15 By the symmetry of f , therefore, anychoice of u yields f(xu×W ) = 0. The sets in H correspond to edge setsui1 , . . . , uir × W , and again by symmetry either all or none of these setsfor any given r are in H. By monotonicity, then, H is characterized by

ui1 , . . . , uir ∈ H ⇔ r ≤ r0

for some r0. The Euler characteristic of H is thus

−(|U |1

)+

(|U |2

)− · · · + (−1)r0

(|U |r0

)

= −(|U | − 1

0

)−(|U | − 1

1

)+

(|U | − 1

1

)+

(|U | − 1

2

)− · · ·

= −(|U | − 1

0

)+ (−1)r0

(|U | − 1

r0

).

By the Hopf formula, this equals -1, which implies that r0 = |U |, i.e. U ∈ H,so U × W ∈ Kf , and f(xU×W ) = 0. Thus f ≡ 0.

One might expect the proof to be simpler for a general graph property,which is invariant under a larger class of permutations. Unfortunately, sincea general graph has more edges, the orbits of any given permutation aregenerally more complicated. However, when the number of nodes is a primepower, we can still characterize the orbits, and thus show evasiveness. Beforewe show this, we give a more general fixed point theorem.

Theorem 4.7 Let Γ be a group of mappings of a contractable geometriccomplex K onto itself. Let Γ1 be a normal subgroup16 of Γ with |Γ1| = pk, normal subgroup

for a prime p, and with Γ/Γ1 cyclic. Then there exists an x ∈ K such that∀ϕ ∈ Γ, x = ϕ(x).

Note that we are no longer talking about the fixed points of a singlesimplicial mapping, but rather the fixed points of a group of simplicial map-pings. If the orbits17 of Γ are H1, . . . ,HN , with the first t of these in K, andwi : 1 ≤ i ≤ t is the center of gravity of Hi, then fix(Γ) = H, where:

i1, . . . , ir ∈ H ⇔ Hi1 ∪ · · · ∪ Hir ∈ K,15Considered as acting on the complete simplex ∆|U|×|W |, ϕ necessarily has fixed points.

The question is whether any of these fixed points are in Kf .16A subgroup Γ1 of Γ is normal if ∀x ∈ Γ, xΓ1x

−1 = Γ1.17The orbits of a collection of permutations are the minimal sets invariant under every

permutation.

26

and V (H) = 1, . . . , t, with i = wi.To see this, note first that for any ϕ ∈ Γ, an orbit of Γ is expressible as a

disjoint union of orbits of ϕ, so any wi is expressible as a convex combinationof centers of mass of the faces corresponding to orbits of ϕ, so wi is fixed byϕ. Thus any convex combination of the wi is fixed by each ϕ.

Conversely, if x =∑

v∈∆ αv v is a fixed point of Γ, for any u,w in an orbit

Hi of Γ there is a ϕ such that ϕ(u) = w and x = ϕ(x) =∑

v∈∆ αvϕ(v), sothe uniqueness of the representation of x implies αu = αw. Consequentlywe can choose β1, . . . , βt such that ∀i, u ∈ Hi ⇒ αu = βi, and

x =∑

v∈∆

αv v =∑

Hi⊆∆

v∈Hi

βiv =∑

Hi⊆∆

βi|Hi|wi.

Thus the previous techniques continue to apply when we have a groupof simplicial mappings. The previous theorem is exactly what we need forthe proof of the next theorem:

Theorem 4.8 Suppose f is a non-trivial monotone graph property on graphswith a prime power pk number of nodes. Then f is evasive.

Proof: Think of the nodes of our graph G as identified with GF(pk).18GF(pk)

Consider the linear mappings x 7→ ax + b : GF(pk) → GF(pk) as a groupΓ. Let Γ1 be the mappings x 7→ x + b, so |Γ1| = pk. The normality of Γ1

follows from (((ax + b) + b′) − b)/a = x + b/a. The factor group Γ/Γ1 isisomorphic to the group of mappings x 7→ ax : a 6= 0, i.e. the multiplicativegroup19 of GF(pk) − 0, which is known to be cyclic. Thus the precedingmultiplicative group

theorem applies, and every action of Γ has a fixed point on Kf . Since Γ istransitive on the edges, the only orbit of Γ consists of all of the edges. Thusif f is non-evasive, so that Kf is contractable and Γ has a fixed point, thenKf must have as an element the set of all edges, and f ≡ 0.

18GF(pk) is the Galois field of order pk. For k = 1 this is the field of integers 0, 1, . . . , p−1under arithmetic modulo p. For larger k, it is the field of polynomials over GF(p) moduloan irreducible polynomial of order k.

19The multiplicative group of a field is the group formed on the elements other than 0under multiplication.

27

5 Non-Deterministic and Randomized Decision Trees

In this lecture we use previous results to show a Θ(n2) lower bound on thedecision tree complexity of a general non-trivial monotone graph property,and we begin discussion of decision trees for probabilistic algorithms.

5.1 Near Evasiveness of Monotone Graph Properties

The key lemma in our study so far of the decision tree complexity of mono-tone Boolean functions has been :

If f : 0, 1n → 0, 1 is a non-evasive, monotone function, with f(0) =0, then Kf is contractible.

We also noted that :

If Kf is contractible then χ(Kf ) = −1 (i.e.∑

S:f(S)=0(−1)|S| = 0).

We would like to extend this theory to general (non-monotone) functionsas well. However, Kf , as defined, is not a simplicial complex in the case ofa non-monotone function f . Although the AKR conjecture is known to befalse for non-monotone functions, if f is weakly symmetric, f(0) 6= f(1) = 1,and the number of variables of f is prime, we have shown that f is evasive.(Theorem 2.1).

We have seen that monotone graph properties on graphs with a primepower number of nodes (theorem 4.8), or on bipartite graphs (theorem 4.6),are evasive. Next we show that for any non-trivial monotone graph prop-erty, by restricting the property to some Ω(n) size subgraph, we can obtainone of these two kinds of properties. Since D(f) is at least D(f |R) forany restriction R, this will imply that any non-trivial monotone property fD(f) = Ω(n).

Theorem 5.1 Let f be a monotone, non-trivial graph property, then D(f) ≥cn2 for some positive constant c.

Proof: Let G be a graph on n nodes. Choose a prime p such that n/2 <p < 2n/3 (it follows from number theoretic arguments that such a primeexists). Let S be a subset of the nodes of G such that |S| = p. Let KS

denote the complete graph on S, with all other (n−p) nodes being isolated.Since f is monotone, and non-trivial, f(0) = 0. Now, there are several cases:

Case 1, f(KS) = 1. 20 Let R be the restriction xi,j = 0 : (i, j) 6∈ S. Then f(G)

20f(G) for a graph G = (E, V ) is shorthand for f(XE).

28

f |R is a monotone, non-trivial graph property on a graph with a primenumber of nodes, and

D(f) ≥ D(f |R) =

(p

2

)≥ (n2 − n)/8.

Case 2, f(KS) = 0. In this case f(KV \S) = 0, since KV \S is a completegraph that is smaller than KS (n/2 ≤ p), and f is monotone.

Let H = KV \S ∪S × (V \S) (Note the abuse of notation here, as weare really interested in unordered pairs).

Case 2.1, f(H) = 1. Let R = xi,j = 0 : i, j ∈ S∪xi,j = 1 : (i, j) ∈S\V . Then f |R(0) = f(V \S) = 0, and f |R(1) = f(H) = 1,and f |R is a monotone bipartite graph property on a graph withp(n − p) edges. Thus D(f) ≥ D(f |R) = p(n − p) ≥ 2n2/9.

Case 2.2, f(H) = 0. Let R = xi,j = 1 : (i, j) ∈ H. Then f |R(0) =f(H) = 0, f |R1 = f(1), and f |R is a monotone graph propertyon the subgraph induced by the vertices in S, so as in case 1D(f) ≥ (n2 − n)/8.

5.2 Non-Deterministic Decision Trees

Recall that we defined D0(f), and D1(f), and showed that D(f) ≤ D0(f)D1(f).

Theorem 5.2 (Babai or Nisan?) Suppose f is weakly symmetric (invari-ant under a transitive group Γ), then D0(f)D1(f) ≥ n.

Proof: Recall D0(f) = mink : f = E1∧E2 . . .∧EN , Ei = xǫi1i1

∨ . . .∨xǫik

ik.

Similarly, D1(f) = mink : f = F1∨F2 . . .∨FM , Fi = xǫi1i1

∧. . .∧xǫik

ik. Recall;

there must be a variable, xi, that occurs in both E1 and F1 (otherwise, wecan force the function value to be 0, and 1, at the same time). Let γ ∈ Γ.Let Eγ

i be Ei after the action of γ. Since f is invariant under Γ, we canrewritef = Eγ

1 . . . EγN . Therefore Eγ

1 must have a variable in common withF1.

The crucial observation at this point is that for a transitive group Γ ofmappings on a set S, the quantity q = #γ ∈ Γ : γ(x) = y is independent of

5—Non-Deterministic and Randomized Decision Trees 29

x and y. This is because for any x, y, and y′ we can map γ ∈ Γ : γ(x) = y1-1 into γ ∈ Γ : γ(x) = y′ by composing any fixed map γ′ : γ(y) = y′

with the maps in the first set. This shows independence of y and a similarargument shows independence of x.

In fact, we can determine q by the equation (for fixed x0)

qn =∑

y

#γ ∈ Γ : γ(x0) = y = |Γ|,

so q = |Γ|/n.

Returning to the original argument, that Eγ1 has a variable in common

with F1 for every γ means that every γ maps something from E1 to some-thing in F1, i.e. there are |Γ| γ mapping some x from E1 to some y fromF1. Since any given pair x ∈ E1 and y ∈ F1 (again abusing notation) hasat most q = |Γ|/n γ’s mapping x to y, it follows that there are at least|Γ|/q = n pairs (x, y) with x ∈ E1 and y ∈ F1, i.e. |E1||F1| ≥ n.

Recalling that |E1| ≤ D0(f) and |F1| ≤ D1(f) finishes the argument.

5.3 Randomized Decision Trees

The general question in complexity of randomized algorithms is “Does theability to flip a coin add computational power?” Over the past twenty yearswe have learned that the answer is a definitive yes. Generally, randomizationmay give an algorithm the ability to avoid a few bad computation paths,and thus better its worst case behavior.

From the decision tree model, there are a number of ways to model ran-domized algorithms. One is that at each node, rather than definitely query-ing some variable, we choose which variable to query randomly according toa probability distribution dependent on the node and the previous results ofrandom choices. For a given input the number of input bits queried is thena random variable, and the decision tree complexity is the maximum overall inputs of the expected value of the number of input bits queried.

An alternate model is that the algorithm makes all random choices inadvance, and from that point on is deterministic. In this model an algorithmfor deciding a property is specified as a probability distribution over allpossible decision trees for the property. For a given tree T , if δ(x, T ) denotesthe number of input bits queried for a given input x ∈ 0, 1n, and pT

denotes the probability the algorithm choosing T , then the decision tree

30

v

^ x1 x2 x3 x4 x5 x6 x7 x8

^ ^ ^

^ v

Figure 6: A Function with Randomized Complexity o(n).

complexity of the algorithm is

maxx

T

pT δ(x, T ).

This will be made more concrete by the following example. The exampleis due to Saks, Snir, and Wigderson, and gives a tree formula which hasrandomized decision tree complexity o(n). (We have seen previously thatall tree formulae have (deterministic) decision tree complexity n.)

The function fk : 0, 1n → 0, 1, n = 2k, is a tree formula. (I.e. it isdefined by a formula in which each variable occurs exactly once.) We maydefine fk inductively by

f0(x1) = x1

fk+1(x1, ..., xn) =

fk(x1, ..., xn/2) ∧ fk(xn/2+1, ..., xn) k odd,fk(x1, ..., xn/2) ∨ fk(xn/2+1, ..., xn) k even.

That is, we take a balanced binary tree and construct a formula by labelingthe leaves with the variables and labeling internal nodes of the tree alter-nately with “and” and “or” gates as we go up the tree.

We saw previously that tree formulae are evasive. The evasiveness of falso follows by the weak symmetry of f and the fact that the number ofvariables is a prime power. Our task is to construct a randomized algorithmfor f with expected decision tree complexity o(n).

First, for convenience, we get rid of the asymmetry at different levelsby replacing the and-gates and or-gates by nand-gates (negated and-gates).Specifically, we use (f1 ∨ f2) ∧ (f3 ∨ f4) = (f1∧f2)∧(f3∧f4) to replace allgates except possibly the gate at the root with nand-gates. If the gate at the

5—Non-Deterministic and Randomized Decision Trees 31

root does not become a nand-gate, it is an and-gate, and we simply negateit. This complements f , but doesn’t change the complexity of the function.

Now that we have nand-gates at all the nodes, consider the evaluationof the function. If for some nand-gate, we know one input is 0, we knowthe gate outputs 1, independent of the other input. Thus our randomizedstrategy will be to start at the root, choose one of the two inputs uniformlyat random to evaluate, and recursively evaluate it. If it returns 0 we return1, otherwise we evaluate the other input recursively, returning 1 if the otherinput returns 0, and 1 otherwise.

Let ak denote the expected number of variables checked to compute f iff(x) = 0, and let bk denote the expected number if f(x) = 1.

If f(x) = 0, then both inputs must be evaluated and are 1. If f(x) = 1,then either both inputs are 0, in which case we definitely only evaluate oneinput, or one input is 0, in which case we have at least a one in two chanceto evaluate only one input. This yields

bk = 2ak−1,

ak ≤ maxbk−1,1

2bk−1 +

1

2(ak−1 + bk−1)

=1

2bk−1 +

1

2(ak−1 + bk−1).

We can write this as(

a0

b0

)=

(11

),

(ak

bk

)=

(12 12 0

)(ak−1

bk−1

)

=

(12 12 0

)k (11

).

To estimate ak and bk from such a recurrence relation, we can examinethe eigenvalues of the matrix. The action of the matrix on an eigenvector(by definition) is just to stretch the vector by the corresponding eigenvalue.Thus on repeated application of the matrix, the norm of the eigenvectorwith largest eigenvalue will grow most rapidly. For a vector which is notan eigenvector, we can consider it as a convex combination of eigenvectorsand similarly show that with repeated application of the matrix, its normgrows no faster than that of the eigenvector with largest eigenvalue. These

32

considerations show that if there is a single largest eigenvector λ, then after kapplications of the matrix the resulting vector has norm λk +o(λk) times thenorm of the original vector. Thus ak and bk are approximately λk. For the

above matrix, the eigenvalues are 14 (1 ±

√33), so that ak, bk ≈

((1±

√33)

4

)k.

For our randomized algorithm, k = log2 n, so we have

ak, bk ≈(

(1 +√

33)

4

)log2 n

= nlog2

((1+

√33)

4

)≈ n0.754 = o(n),

and we are done. (See the end of this lecture for details of the calculation.)One might suspect that one could improve this algorithm by sampling

the inputs randomly and using the result to bias the choice of which inputto evaluate first towards the input with more 1’s in the subtree. It turnsout this doesn’t help; in fact the above algorithm is essentially optimal.(Although we don’t give the proof.)

In the next lecture we will begin to study the probabilistic version of theAKR-conjecture, which is that for a graph of n nodes the expected decisiontree complexity of a randomized algorithm for a non-trivial graph propertyis Θ(n2). A lower bound of n follows from DR(f) ≥ D0(f)D1(f) ≥ n. Yaoimproved this lower bound to n log n and introduced some techniques whichwe will study. Valerie King improved the bound to Θ(n5/4) in her thesis,and subsequently Hajnal improved the bound to Θ(n4/3).

One question which arises is the complexity of probabilistic algorithmswhich are allowed a small probability of error. (Called a Las Vegas algo-rithm.) One can gain a little bit, for instance consider the majority function,Las Vegas algorithm

which is 1 provided a majority of its inputs are 1. By random sampling halfthe inputs, say, one can compute the majority function with a small proba-bility of error. On the other hand, there is a lower bound on the complexityof√

D(f), which leaves a large gap.

5.3.1 Details of Calculating ak and bk

Following are the calculations of ak and bk in more detail; they may be

skipped by anyone familiar with linear algebra. Let M =

(12 12 0

). The

eigenvalues of M are the values λ 6= 0 such that there exists a correspondingeigenvalues

eigenvector x with Mx = λx. Rewriting this as (M − λI)x = 0, we seeeigenvector

that the eigenvalues are those values for which M − λI is singular, i.e. hasdeterminant |M − λI| = 0.

5—Non-Deterministic and Randomized Decision Trees 33

∣∣∣∣∣12 − λ 1

2 −λ

∣∣∣∣∣ = 0,

λ2 − 1

2λ − 2 = 0,

λ =1

4(1 ±

√33).

Once we have the eigenvalues λ1 and λ2, let x1 and x2 be correspondingeigenvectors, and define the matrices

D =

(λ1 00 λ2

),

X =(

x1 x2

).

It is easy to verify that MX = XD, so M = XDX−1, M2 = XD2X−1,..., and Mk = XDkX−1. Since

Dk =

(λk

1 00 λk

2

),

we have (ak

bk

)= Mk

(11

)= XDkX−1

(11

).

After one determines the eigenvectors x1 and x2, it is an easy matter tocomplete this and give a closed form for ak and bk.

(Notes by Sigal Ar and Neal Young.)

34

35

6 Lower Bounds on Randomized Decision Trees

We begin this lecture with a proof of a basic result, Farkas’ lemma. Thelemma gives a necessary and sufficient condition for the existence of a so-lution to a set of linear inequalities. We then discuss (and prove) von Neu-mann’s min-max theorem. The theorem gives some insight into the advan-tages of randomization. We then present some techniques developed by Yaoapplying the min-max theorem to give lower bounds on randomized decisiontree complexity.

6.1 Farkas’ Lemma

For a system of linear equalities, everyone knows necessary and sufficientconditions for solvability. Farkas’ lemma is the analogue for systems of linearinequalities. Consider the problem “Does a system of linear inequalities

m∑

j=1

aijxj ≤ bi (i = 1, ..., n) (1)

have a solution?” Roughly, we expect this problem to be in NP, since wecan exhibit an easy proof (an assignment of x) if it is. Farkas’ lemma impliesthat it is also in co-NP, that is, if it is not solvable there is also an easy proofthat it isn’t.

If the system is not solvable, we will prove it by exhibiting λi satisfying∑i λiaij = 0 (j = 1, ...,m),

∑i λibi < 0, and λi ≥ 0 (i = 1, ..., n). This

provides a proof that no x satisfies (1), because if such an x existed wewould have

0 =∑

j

xj

i

λiaij =∑

i

λi

j

aijxj ≤∑

i

λibi < 0.

Lemma 6.1 (Farkas) For any aij and bi, (j = 1, ...,m, i = 1, ..., n)∑

j

aijxj ≤ bi (i = 1, ..., n) (2)

has a solution x if and only if∑

i

λiaij = 0 (j = 1, ...,m)

i

λibi < 0 (3)

λi ≥ 0 (i = 1, ..., n)

36

has no solution λ.

Proof: Consider the vectors (ai,1, ..., ai,m, bi), (i = 1, ..., n). Consider thecone of these vectors. (3) says exactly that y∗ = (0, ..., 0,−1) is in thiscone. Suppose that (3) has no solution, so y∗ is not in the cone. Thenthere exists a separating hyperplane H = y :

∑j yjhj = 0 such that y∗

is on one side of H and the cone is on the other, i.e.∑

j y∗j hj < 0 andai1h1 + · · · + ainhn + bihn+1 ≥ 0 (i = 1, ...,m). The first condition saysthat hn+1 ≥ 0, and the second says that

∑j aijhj/hn+1 ≥ −bi (i = 1, ..., n).

Thus letting xj = − hj

hn+1, we have a solution to 2.

6.2 Von Neumann’s Min-Max Theorem

Recall our second characterization of a randomized algorithm via decisiontrees. For a function f : 0, 1n → 0, 1, on an input x, an algorithmA chooses a deterministic algorithm for f with decision tree T accordingto some probability distribution p (independent of x!), and then runs thedeterministic algorithm on x.

On a given input x, having chosen a particular tree T , A (deterministi-cally) takes some complexity δ(T, x) to compute f(x). The expected com-plexity for A on x is given by

∑T pT δ(T, x). We are interested in the worst

case expected complexity for A (j.e. the adversary chooses x to maximizethe complexity): maxx

∑T pT δ(T, x). Finally, if A is optimal, it minimizes

this worst-case complexity, and thus takes complexity

DR(f) = minp

maxx

T

pT δ(T, x).

We can view this process as a game, in which we choose p (determiningA), and then the adversary, knowing our choice, chooses x. To play thegame, we run A on x. Our goal is to minimize the expected complexity; theadversary’s goal is to maximize it.

For this situation (called a zero-sum game, since we lose exactly whatzero-sum game

our opponent gains), there is a general theorem, von Neumann’s min-maxtheorem. We have a game, defined by a matrix M , for two players — therow player and the column player. The game is played as follows. Thecolumn player chooses a column c and the row player chooses a row r. Thecolumn player then pays Mrc to the row player.

6—Lower Bounds on Randomized Decision Trees 37

0

0

1

1 Row Player

Column Player

r=1

r=2

c=1 c=2

Figure 7: A Simple Zero-Sum Game

The column player wants to minimize Mrc, and the row player wishes tomaximize it.

To make this concrete, assume we are playing a game (see figure 6.2)where each player chooses either 1 or 2. If we choose the same as theadversary, we pay her 1. Otherwise we pay nothing. What should ourstrategy be? Suppose our strategy is to choose 1. After playing the gameseveral times, with the adversary beating us every time, we begin to suspectthat the adversary knows our strategy and has used that knowledge to beatus. What can we do, given that the adversary may know our strategy anduse that information to try to beat us?

Von Neumman’s key observation is that we can use randomization tonegate the adversary’s advantage in knowing our strategy. For our simplegame, if our strategy is to choose 1 or 2 randomly, each with probability1/2, then no matter what strategy the adversary picks, we have an expectedloss of at most 1/2.

More generally, in any zero-sum game we have a randomized (also calleda mixed) strategy which negates the adversary’s advantage in knowing ourstrategy. To make precise the notion of “negating the advantage,” we con-sider turning the tables, so that she chooses her strategy first, and then wechoose ours. Then provided we choose the randomized strategy that negatesher advantage in the first situation, and she chooses the randomized strategywhich negates our advantage in the second situation, we will expect to dojust as well in the first situation as the second. Formally,

Theorem 6.2 (Von Neumann’s min-max theorem) For any zero-sum

38

game M ,min

pmax

qqTMp = max

qmin

pqTMp.

(Note that p and q range over all probability distributions of columns androws, respectively.)

We start with some observations. In the min-max theorem, in the innermax and min, randomization is not important. That is, (if ej denotes thejth unit vector in a vector space implicit in the context)

∀p,maxq

pTMp = maxj

eTj Mp,

and similarly for the inner term on the right hand side. This is because, fora fixed p, qTMp is a linear function of q, and thus is maximized at one ofthe vertices ej of the probability space.Proof: Obviously,

∀q0, p0,maxq

qTMp0 ≥ qT0 Mp0 ≥ min

pqT0 Mp.

Thusminp0

maxq

qTMp0 ≥ maxq0

minp

qT0 Mp.

The other direction is not so easy. We suppose that ∃t :

∀p : (p ≥ 0,∑

j

pj = 1) maxj

eTj Mp ≥ t

and we want to show that

∃q : q ≥ 0,∑

j

qj = 1,∀i, qTMei ≥ t.

This will follow almost immediately from Farkas’ lemma.Suppose that 6 ∃q : q ≥ 0,

∑j qj = 1,∀i, qTMei ≥ t. Farkas’ lemma

implies that there exists λ = (λ1, .., λn), µ = (µ1, ..., µm), and α such that:

λTM + µ + α

1...1

=

0...0

, (4)

i

λit + α > 0, (5)

λi, µj ≥ 0.

6—Lower Bounds on Randomized Decision Trees 39

Note that α is not constrained to be non-negative and that in (5) the expres-sion is constrained to be positive, rather than negative. These are essentiallytrivial variations from the standard form of Farkas’ lemma. The first is be-cause the inequality corresponding to α is in fact an equality, and the secondis because the unsatisfiable constraints are of the form · · · ≥ t, rather than· · · ≤ t.

Letting p = λ/∑

i λi, (4) and (5) imply

pTM ≤ − α∑i λi

1...1

≤ t

1...1

,

which contradicts our assumption.

6.3 Lower Bounds

The discussion before the proof, in our original context, means that once wechoose p, determining A, the adversary can choose a specific input x, ratherthan a distribution of inputs, to give the worst case expected behavior for A.Alternatively, if the adversary chooses an input distribution first, then wecan do our best by subsequently choosing the best deterministic algorithm,rather than a randomized algorithm, for this input distribution. Thus themin-max theorem implies that if the adversary can choose a distributionfor which she can force any deterministic algorithm to have an expectedcomplexity of at least t, then for every randomized algorithm there is aninput such that the expected complexity of the algorithm on that input isat least t.

To show a lower bound on the decision tree complexity of any randomizedalgorithm, then, it suffices to show the same bound on the complexity ofall deterministic algorithms for some fixed input distribution. This is thetechnique we will use.

As a starting point, recall that DR(f) denotes the minimum expecteddecision tree complexity of a randomized algorithm for f . An almost trivialobservation is

DR(f) ≥ maxD0(f),D1(f). (6)

This follows because when f(x) = i, any algorithm must look at at leastDi(f) bits of x before it can be sure f(x) = i. Last lecture we showed thatfor (non-trivial) weakly symmetric f , D0(f)D1(f) ≥ n. Thus DR(f) ≥ √

n.

40

Next we give a lemma which uses the min-max theorem to give a betterlower bound for certain types of functions.

Lemma 6.3 (Yao) Let f : 0, 1n → 0, 1. Suppose we can partition1, ..., n into S1, ..., Sr so that |Si| ≥ t and

f(x) =

0 if ∀i |j : xj = 0 ∩ Si| ≥ t,1 if ∃i |j : xj = 0 ∩ Si| = 0.

Then DR(f) ≥ Ω(

nt

).

(Note that the condition on f leaves some values of f unspecified. Also notethat this bound doesn’t follow immediately from (6).)

Proof: We give an input distribution on x and give a lower bound on theexpected complexity of any deterministic algorithm on this input distribu-tion.

To generate x, choose t indices j uniformly at random from each Si andset xj = 0. Set the remaining xj′ = 1. Then f(x) = 0, and for an algorithmto verify this it must query an xj with the value 0 from each Si. How manyqueries must the algorithm expect to make before finding such an xj in agiven Si? Since the t xj with value 0 were chosen randomly, we can view thealgorithm as sampling randomly (without replacement) from a set of size kuntil it finds one of the t xj chosen to be 0. The expected time for this isΩ(k/t). By linearity of expectation, the expected time to find a 0 in each Si

is thus at least nk Ω

(kt

)= Ω(n/k).

We will apply this lemma to lower bound the randomized complexityof monotone bipartite graph properties on 2n nodes (n in each part). Thecomplexity of such properties is not quite solved. The trivial bound DR(f) ≥√

#inputs gives a lower bound of n. This was first improved to n log n byYao, who introduced the general techniques for the problem. Valerie Kingimproved the bound to n5/4, and subsequently Hajnal improved the boundto n4/3.

6.3.1 Graph Packing

Our problem is closely related to the problem of graph packing. Two graphsgraph packing

G1 and G2 on n nodes can be packed if (possibly after relabeling the vertices)the edge sets of the graphs don’t overlap. If G1 and G2 pack, then G1 ⊆ G2.

6—Lower Bounds on Randomized Decision Trees 41

Suppose G1 is a minimal G in a graph property Pf = G : f(G) = 1,and G2 is a minimal G such that G ∈ P. Observe that G1 and G2 don’t pack.Furthermore, if any two G1 and G2 don’t pack, we have |E(G1)| |E(G2)| ≥n2 by essentially the same argument that showed D(f) ≥ D0(f)D1(f). Moreinterestingly, one can show

dmax(G1)dmax(G2) ≥ n/2.

An instance of this is that if a graph G has dmin(G) > n/2, then there existsa hamiltonian cycle C. That is, if dmax(G) < n/2, then G and any cycle Cpack.

6.3.2 Yao’s dmax/d Lemma

Next we will in some sense specialize lemma 6.3 to the case of a monotonebipartite graph property. We will choose a minimal graph G for the property,and show how to construct a containing graph G′ such that we can partitiona large subset of the edges of G′ into disjoint sets such that if we removeedges from the subsets, the property holds if we leave any set untouched,but fails if we remove at least a small number from each set. Lemma 6.3will apply to show essentially that any randomized algorithm has to checkmany edges in each set.

Lemma 6.4 (Yao) Let f be a (non-trivial) monotone bipartite graph prop-erty on bipartite graphs with 2n nodes, the two parts U and W each havingn nodes, and let P = G : f(G) = 1.

If G is a graph in P with minimum dUmax(G) (i.e. minimum maximum

degree over vertices in U) and dU(G) is the average degree of vertices of G

in U , then

DR(f) ≥ Ω(1)dUmax(G)

dU(G)

.

Proof: Of the G ∈ P with minimum dmax, choose one with the fewestnumber of maximum degree vertices, so that if any graph has fewer verticesof degree dmax (and no higher degree vertices) it is not in P.

Assume dmax ≥ 4d, otherwise the bound is trivial. Assume also that thevertices are labeled so that vertex 0 is a maximal degree vertex in U , andvertices 1, ..., n/2 are in U and have degree at most h = 2d.

We form the containing graph G′ by adding edges from vertex i to theneighbors of vertex 0 and vertex i+1. G′ has two essential properties. First,

42

if we delete any 2h + 1 edges out of each vertex 0, 1, ..., n/2, we destroy theproperty P, because we reduce the number of vertices in U with degree dmax

by 1.Second, if for some i (possibly 0), for each vertex j = 0, 1, ..., i − 1, i +

1, ..., n/2, we delete the edges from j into Γ0 − Γi − Γi+1, we preserve P.This is because we can permute the vertices of U so that the permuted graphcontains the original graph G. We do this as follows: shift vertex 0 to vertex1, vertex 1 to vertex 2, ..., and vertex i to vertex 0.

To apply lemma 6.3, we define Si = i × (Γ0 − Γi − Γi+1) (for i =1, ..., n/2), S0 = 0 × (Γ0 − Γ1), and S = ∪iSi.

Then if we obtain f ′ by restricting the domain of f to graphs which agreewith G′ on edges not in S, f ′ is a function of |S| ≥ n (dmax − 2h) variables.If we start with G′ and within each partition Si delete 2h + 1 edges, f ′

becomes 0, but if we start with G′ and delete edges within S leaving at leastone partition complete the function stays 1. Thus lemma 6.3 implies that

DR(f) ≥ DR(f ′) ≥ Ω

(n (dmax − 2h)

2h + 1

)≥ Ω

(n

dmax

d

).

43

7 Randomized Decision Tree Complexity, contin-

ued

In this lecture we continue giving lower bounds on randomized decision treecomplexity, combining the various bounds we have developed to show alower bound of Ω(n5/4) on the randomized decision tree complexity of anynon-trivial, monotone, bipartite graph property.

The general method is to choose both a minimal graph with the propertyand a graph whose complement is minimal in the complementary property,and then to use the fact that the two graphs don’t pack to get constraintson the maximum and average degrees of the two graphs, and finally to applyYao’s lemma from last lecture to bound the randomized complexity via theconstraints on the degrees.

7.1 More Graph Packing

Recall that graphs G1 and G2 pack if after some relabeling of the nodes ofG1 the two graphs have no common edges. That is, ∃G′

1, G′1∼= G1, G

′1 ⊆ G2.

Recall that dUmax(G), for a bipartite graph G = (U,W,E), denotes the

maximum degree of a vertex in U in G, and d(G) denotes |E|/n, the averagedegree of a vertex in G. For this lecture, we will restrict our attention tobipartite graphs G1 = (U,W,E1) and G2 = (U,W,E2) with equal size parts,i.e. |U | = |W | = n, so that the average degree in each part is the same.

We will show the following lemma.

Lemma 7.1 (Bollobas-Eldridge) If

dUmax(G1)d

Wmax(G2) + dU

max(G2)dWmax(G1) ≤ n

then G1 and G2 pack.

There is also a non-bipartite version of this lemma:

Lemma 7.2 For two graphs G1 and G2, if

dmax(G1)dmax(G2) ≤ n/2

then G1 and G2 pack.

The proof of the second lemma, which we omit, is similar to that ofthe first lemma, which we give. While there is no gap in the n/2 term; theconjecture is that (dmax(G1)+1)(dmax(G2)+1) < n also gaurantees packing.

44

u

w’

u’

w

U:

W:

u

w’

u’

w

U:

W:

G1 edge:

G2 edge:

Figure 8: Swapping the Labels of u and u′: Two Cases.

Proof: (Lemma 7.1.) Suppose G1 and G2 don’t pack, and we haverelabeled the vertices of G1 so as to minimize the number of overlappingedges. There is some overlapping edge (u,w). Consider swapping the labelsof u and a vertex u′ ∈ U in G1. With the current labeling, there is at leastone overlapping edge out of u and u′, so after the swap this must also bethe case.

This entails that either an edge (u,w′) ∈ E1 will be mapped on to anedge (u′, w′) ∈ E2 by the swap, or that an edge (u′, w′) ∈ E1 will be mappedonto an edge (u,w′) ∈ E2 by the swap. This means that every u′ 6= u isreachable from u either by following an edge in G1 and then an edge in G2

or by following an edge in G2 and then an edge in G1.There are at most dU

max(G1)dWmax(G2)− 1 paths of the first kind, (one of

the candidates leads back to u), and similarly at most dUmax(G2)d

Wmax(G1)−1

of the second kind. Thus

dUmax(G1)d

Wmax(G2) + dU

max(G2)dWmax(G1) ≥ n + 1.

Graph packing captures many graph theoretic notions, for instance if G1

is a Kd (a graph with a d-clique and n − d isolated nodes), and G2 consistsof d n/d cliques, then G1 and G2 pack, and this is equivalent to saying thevertices of G1 are d-colorable21 with each color coloring n/d nodes. In factd-colorable

it can be proved22 that if G1 is of maximal degree d − 1, then G1 and G2

pack.On the other hand, if G1 instead consisted of a d+1-clique and n−d−1

isolated nodes, then G1 and G2 would not pack, since G1 would not be d-colorable. Since the product of the maximum degrees for this pair of graphs

21The vertices of a graph are d-colorable if one can color them with d colors so no edgetouches two vertices of the same color. The edges of a graph are d-colorable if one cancolor them with d-colors so no vertex touches two edges of the same color.

22Hajnal and Steveredi?

7—Randomized Decision Tree Complexity, continued 45

is d(n/d − 1) = n − d, which for d = n/2 is n/2, this gives a tight lowerbound for lemma 7.1 and the conjecture.

7.2 Application of Packing Lemma

So we have this packing lemma, which is fairly straightforward; how can weuse it?

Given a monotone bipartite graph property Pf on bipartite graphs G =(U,W,E) with |U | = |W | = n, choose G1 to have the lexicographicallysmallest degree sequence23 of vertices in U , so that G1 is minimal, no graph degree sequence

in P has lesser dUmax, and of those with equal dU

max, none has fewer verticesof this degree. Similarly, choose G2 lexicographically smallest with G2 6∈ P.

We know G1 and G2 don’t pack, otherwise G2 ⊆ G1 ∈ P.

We have several bounds on DR(f):

DR(f) ≥ d(G1)n, (7)

DR(f) ≥ cdUmax(G1)

d(G1)n. (8)

Bound (7) says that DR is at least the number of edges in G1, which istrivial since G1 is minimal. Bound (8) is Yao’s lemma, shown in the previouslecture. It also holds if G2 replaces G1 and/or W replaces U , a fact we shalluse.

Bound (7) implies that d(G1) ≤ DR/n, and thus (8) implies

dUmax(G1) ≤

DR

cnd(G1) ≤

1

c

(DR

n

)2

,

and similarly for W and G2 possible replacing U and G1, respectively. SinceG1 and G2 don’t pack, lemma 7.1 implies that

2

c2

(DR

n

)4

≥ dUmax(G1)d

Wmax(G2) + dU

max(G2)dWmax(G1) > n,

which in turn implies that DR ≥(

c2n5

2

)1/4= Ω(n5/4).

Thus any non-trivial bipartite graph property has DR = Ω(n5/4).

23The degree sequence of a graph is the list of degrees of the vertices of the graph, fromlargest to smallest.

46

7.3 An Improved Packing Lemma

To improve this result, we need an improved packing lemma:

Lemma 7.3 Let G1, G2 be bipartite graphs. If

dUmax(G1)d(G2) <

n

100, (9)

dUmax(G2)d(G1) <

n

100, (10)

anddWmax(G1), d

Wmax(G2) <

n

1000 log n, (11)

then G1 and G2 pack.

(The condition (11) is a technical condition, needed for the proof but nottruly a restriction. In particular, if (11) is violated, we will see that Yao’slemma gives an immediate lower bound of Ω(n3/2/

√log n) on DR.)

Proof: This proof is somewhat more complicated, we sketch the proof. Inparticular, we omit some final computations.

We have G1 = (U1,W1, E1) and G2 = (U2,W2, E2). We will assume theabove conditions, and show that if we fix a random relabeling f of W1, withnon-zero probability there is a relabeling g of U1 so that G1 relabeled by fand g shares no edges with G2.

In spirit the idea is initially similar to the previous packing lemma. Therewe showed that from the standpoint of a given vertex u, if after ruling outneighbors of neighbors of u there was a vertex left, we could swap the labelsof u and the vertex and possible reduce the number of edge overlaps. Thus weshowed roughly that the product of the maximum degree in U and maximumdegree in W was at least n.

Here, since the vertices of W1 have been randomly mapped onto thevertices of W2, the neighbors of u in G1 are mapped onto an essentiallyrandom set of size at most dmax(G1) in W2. Since the set in W2 is essentiallyrandom, we will be able to show (using the technical condition) that the sizeof its neighbor set is

c|W2|d(G2) = cdmax(G1)d(G2) = cn/100

with probability at most 12n . Thus with probability at least 1/2, for each u

there will be less than n/2 u′ ruled out as possible images under g. We willalso show that the existence of g is equivalent to the existence of a perfect

7—Randomized Decision Tree Complexity, continued 47

matching connecting each u with a possible image u′, and thus the existenceof n/2 possible images of each u is sufficient to guarantee the existence of g.

So fix a relabeling f of W1 uniformly at random. When will there be a grelabeling U1 so that no edges are shared? The constraint is that if an edge(u, v) is in E1, then the edge (g(u), f(v)) is not in E2. Thus for each u ∈ U1

we must find a u′ ∈ U1 such that NG2(u′) ∩ f(NG1(u)) is empty. (NG(v)

denotes the neighboring vertices of v in G.) The only additional constraintis that each u have a unique such u′.

In other words, if we define a bipartite graph H = (U1, U1, F ), where

F = (u, u′) : NG2(u′) ∩ f(NG1(u)) = ∅,

then g exists iff F has a perfect matching.

The Frobenius-Konig-Hall theorem states that a bipartite graph G =(U,W,E) has a perfect matching iff for every vertex set X ⊆ U we have|N(X)| ≥ |X|. We don’t use this in full generality, rather we use a conse-quence. Namely, if dmin(G) ≥ n/2, then G has a perfect matching. Thisfollows from the FKH theorem as follows: any set X violating is of size atgreater than n/2. But then any vertex in W has some edge into X, sinceU − X is not big enough to contain all the edges out of any vertex in W .Thus all vertices in W are neighbors of X.

(Just for fun, note that we can also use the previous lemma to showthis. Namely if dmin(G) ≥ n/2, then G (with dmax(G) ≤ n/2) and a perfectmatching (with dmax(G) = 1) pack.)

Now F is a random graph, but not in the usual sense. We will arguethat with non-zero probability the minimum degree of F is at least n/2, sothat it has a perfect matching, and g exists.

To bound the minimum degree of H from below, we ask how many edgesfrom a vertex u can be excluded. An edge (u, u′) is excluded if there is a(u,w) ∈ E1 with (u′, f(w)) ∈ E2. The idea is that the the number of such(u,w) is bounded by dU

max(G1), while for a given w the number of such(u′, w) is around d(G2), so that for a given u, the number of excluded u′

(which must be at least n) is at most around the product of these two. (Byreversing the roles of G1 and G2, we can bound the number of edges (u, u′)into a given u′ which are excluded, thus ensuring dH(u′) is also at least n/2for vertices in the second part of H.)

By the definition of F ,

#u′ : (u, u′) 6∈ H ≤ |NG2 (f (NG1(u)))|

48

≤∑

u′∈f(NG1(u))

dG2(u′). (12)

Thus the probability that dH(u) < n/2 (for u in the first part of thebipartite graph) is bounded by the probability that (12) is greater thann/2. (The case for the second part is similar, and we omit it.) The pointis that f (NG1(u)) is essentially a random subset of W2 of size at mostdUmax(G1), so the sum of the degrees of vertices in W2 should be bounded

by O(d(G2)d

Umax

)with high probability. In the full proof one shows that

for a given u the probability of DH(u) ≥ n/2 is at most 12n , so that the

probability of all u having degree less than n/2, and thus of a matching,and the consequent g, existing, is at least 1/2.

Here we show exactly what computations we are leaving out: If we define

ωi =dG2(wi)

nd(G2)(wi ∈ W2)

then∑

i ωi = 1, ωi ≥ 0, and we want to bound the probability that

i∈S

ωi >1

2δ,

where δ = d(G2) and S is a set chosen uniformly at random from sets ofsome size at most dU

max(G1), which is bounded by n100δ by hypothesis.

The average value of the ωi is 1/n, so the expected value of the sumis 1

100δ . Thus unless the ωi are highly concentrated, which the technicalcondition dW

max(G1) < n1000 log n prevents, the condition will hold. We omit

the details of the computation.

With this improved packing lemma, and the conditions (as before)

DR

n≥ d(G1), (13)

DR

n≥ c

dUmax(G1)

d(G1)(14)

(and the corresponding conditions with W and G2 possibly replacing U andG1, respectively), we can show an improved bound. (Recall that the firstcondition is essentially the trivial lower bound on DR, while the second isYao’s lemma.)

7—Randomized Decision Tree Complexity, continued 49

Specifically, if the technical condition dWmax(G1) < n

1000 log n (or any of theequivalent technical conditions) of the improved packing lemma are violated,

then by conditions (13) and (14)D2

R

n2 > Ω(

nlog n

), so DR = Ω(n3/2/

√log n).

Otherwise (as G1 and G2 don’t pack) one of the other conditions is vio-lated. We assume without loss of generality that it is (9): dU

max(G1)d(G2) <n

100 . Together with the above two conditions, this gives

(DR

n

)3

≥ cd(G2)dUmax(G1) ≥

cn

100.

Thus DR = Ω(n4/3).It seems possible that this bound could be pushed a bit higher, perhaps

to n3/2. Currently this is the best lower bound known, and the best upperbounds known are Ω(n2).

50

51

8 Randomized Complexity of Tree Functions —

Lower Bounds

For any non-trivial monotone graph property Pf on graphs with n nodes wehave seen that D(f) = Ω(n2), DR(f) = Ω(n3/2).

In this lecture we discuss tree functions — functions with formulas inwhich each variable occurs exactly once. We already know that for any treefunction f , D(f) = n, and we previously saw a tree function f0 (representedby a complete binary tree with alternating and and or-gates) with DR(f0) =O(n0.75...).

We show a lower bound on DR(f) for tree functions, which we use todeduce that

• DR(f0) = Ω(n0.75...), and

• DR(f) = Ω(n0.51) for any tree fn. f .

8.1 Generalized Costs

The most natural thing to consider for proving a lower bound on the com-plexity of a tree function f(x, y) = g(x)h(y) (with x ∈ 0, 1n−i, y ∈ 0, 1i,and ∈ ∧,∨) is a top-down induction. Unfortunately we don’t know howto get this to work.

Instead, Saks and Wigderson have looked at a bottom-up induction, inwhich a gate with two immediate inputs is replaced by a single input. Firstf is expressed as f(x, y,w) = f ′(x y,w) (with x ∈ 0, 1, y ∈ 0, 1,w ∈ 0, 1n−2, and ∈ ∧,∨), and then a lower bound on f is given by acorresponding lower bound on f ′(v,w).

For this technique to work, we need to keep track of the fact that dis-covering v, which represents x y, is somehow more expensive than justquerying a bit. To do this, we generalize our notion of cost. We associatetwo costs c0(xi) and c1(xi) with each variable xi which represents a bit ofthe input to f . The cost c0 represents the cost to discover that xi is 0, whilethe cost c1 represents the cost to discover that xi is 1. With such a costfunction c, we define

DR(f, c) = minp

maxx

T

pT δ(T, x, c),

where δ(T, x, c) =∑

i∈S,xi=0 c0(xi)+∑

i∈S,xi=1 c1(xi), with S the set of vari-ables queried by T on input x.

52

So we are given a function f with a set of costs c. To show a lower boundon DR(f, c) we choose the function f ′ so f(x, y,w) = f ′(x y,w), and wechoose a set of costs c′ for the inputs of f ′ such that we can show DR(f, c) ≥DR(f ′, c′), thus inductively generating a lower bound for DR(f, c). We willassume that = ∧; the case = ∨ is symmetric.

So, leaving the choice of c′ unspecified as yet, we have f , c, f ′, and c′. Wewant to show DR(f, c) ≥ DR(f ′, c′). The min-max theorem says DR(f, c) isalso equal to

maxq

minT

x

qxδ(T, x, c),

that is, we can also obtain DR(f, c) by choosing the worst input distribution,and then the best deterministic algorithm for that distribution. Thus toshow DR(f ′, c′) is at most DR(f, c), we will assume a worst case distributionq∗ of inputs to f ′, and we show that there exists a T ′ for f ′ such that∑

x q∗xδ(T′, x, c′) ≤ DR(f, c).

So we have the worst case distribution q∗ for f ′, and we want to showthe existence of a T ′ that does well on q∗. We will map q∗ to a distributionq on the inputs to f , so that there exists an algorithm T ∗ for f such that∑

x qxδ(T∗, x, c) ≤ DR(f, c). We know such a T ∗ exists because q is at

worst the worst-case distribution for f , in which case there still exists analgorithm with expected cost exactly DR(f, c). We will then construct T ′

based on T ∗, so that their expected costs on their respective distributionscan be correlated.

The basic idea will be that T ′(v,w) will mimic T ∗(x, y,w) for some choiceof x and y such that x ∧ y = v. When T ∗ checks a variable in w, T ′ will dothe same. When T ∗ checks x or y, T ′ may or may not check v.

What should the costs c′ be? For variables other than v, c′ will agreewith c. We will wait to determine c′0(v) and c′1(v), choosing them as largeas our proof techniques will allow.

What about the distribution q? We define q(1, 1, w) = q∗(1, w), q(0, 1, w) =pxq∗(0, w), and q(1, 0, w) = pyq

∗(0, w), where py and px = 1− py will be de-termined later.

What about T ′? Define T ′y on input (v,w) to mimic T ∗ on (1, v, w) and

T ′x on input (v,w) to mimic T ∗ on (v, 1, w). Define T ′ on input (v,w) to run

T ′y with probability py and T ′

x with probability px. That T ′ is a randomizedstrategy is no problem; since q∗ is fixed one of the two deterministic strategiesT ′

y or T ′x will be at least as good.

We need to find the constraints on c′1(v) and c′0(v) which will allow us toshow that the expected cost of T ∗ is at least that of T ′. For this it suffices

8—Randomized Complexity of Tree Functions — Lower Bounds 53

to show that the cost of T ∗(v,w) for any v and w is at most py times thecost of T ∗(1, v, w) plus px times the cost of T ∗(v, 1, w).

8.2 The Saks-Wigderson Lower Bound

Now that we have the form of our argument, the rest is essentially a matterof checking cases. We should note that although most of the choices we havemade above are straightforward, there is one choice which in fact anticipatesin a clever way what we will need in the remaining part of the proof. Inparticular, we have chosen T ′ to run T ′

x or T ′y with probability px or py,

respectively, and, not coincidentally, we have chosen the distribution q tomap (0, w) to (1, 0, w) or (0, 1, w) with probability px or py, respectively, aswell. This choice bears some consideration.

Returning to our argument, we want to find the conditions under whichthe cost of T ′(v,w) is at most py times the cost of T ∗(1, v, w) plus px timesthe cost of T ∗(v, 1, w). We consider the various cases for T ∗, v, x, and y.

If v = 1, this reduces to the cost of T ′(1, w) being at most the cost ofT ∗(x = 1, y = 1, w). If T ∗ queries neither x nor y, then this is clear, sinceT ′

x, T ′y, and T ∗ query exactly the same variables. Otherwise, if T ∗ queries

only x, then T ∗ pays c1(x) while T ′ pays an expected cost of pxc′1(v) (thecosts to query variables in w are again the same); if T ∗ queries only y, thenT ∗ pays c1(y) while T ′ pays an expected cost of pyc

′1(v); if T ∗ queries both

x and y then T ∗ pays c1(x) + c1(y) while T ′ pays c′1(v). Thus T ∗ will payat least what T ′ pays provided

• pxc′1(v) ≤ c1(x), and

• pyc′1(v) ≤ c1(y).

The case v = 0 is a bit more complicated. In this case, we want the costof T ′(0, w) to be at most py times the cost of T ∗(1, 0, w) plus px times thecost of T ∗(0, 1, w). If neither queries x or y for this w then this is clear.Otherwise, both query x first or both query y first. We consider the casewhen x is queried first, the other case being symmetric.

There are then two cases, depending on whether or not T ∗(1, 0, w) queriesy as well as x. (T ∗(0, 1, w) will not query y, since T ∗ is optimal and knowsthe value of x ∧ y after querying x.) First assume that only x is queried byT ∗(1, 0, w). Then with probability py the cost to T ′ is the cost to T ∗(1, 0, w)minus c1(x), and with probability px the cost to T ′ is the cost to T ∗(0, 1, w)minus c0(x) plus c′0(v). Thus we are fine provided

54

• −pyc1(x) − pxc0(x) + pxc′0(v) ≤ 0.

Next assume that x and y are queried by T ∗(1, 0, w). Then with prob-ability py the cost to T ′ is the cost to T ∗(1, 0, w) minus c1(x) minus c0(y)plus c′0(v), and with probability px the cost to T ′ is the cost to T ∗(0, 1, w)minus c0(x) plus c′0(v). Then we are fine provided

• py(−c1(x) − c0(y) + c′0(v)) + px(−c0(x) + c′0(v)) ≤ 0.

Collecting all of these inequalities, and the symmetric inequalities for yqueried first, we have that DR(f, c) ≥ DR(f ′, c′) provided that c′0(v) andc′1(v) satisfy the constraints:

c′1(v) ≤ c1(x)/px,

c′1(v) ≤ c1(y)/py,

c′0(v) ≤ pyc1(x)/px + c0(x),

c′0(v) ≤ pxc1(y)/py + c0(y),

c′0(v) ≤ py(c1(x) + c0(y)) + pxc0(x),

c′0(v) ≤ px(c1(y) + c0(x)) + pyc0(y).

Choosing c′1(v) = c1(x)+c1(y) forces px = c1(x)c1(x)+c1(y) and py = c1(y)

c1(x)+c1(y)and yields

Theorem 8.1 (Saks-Wigderson) Let f be a tree function with binary ∧and ∨ gates. Then DR(f) ≥ maxl0(f), l1(f), where

l0(xi) = l1(xi) = 1,

l1(g ∧ h) = l1(g) + l1(h),

l0(g ∧ h) = minl0(g) + l1(h), l1(g) + l0(h),l1(g)l0(g) + l0(h)l1(h) + l1(g)l1(h)

l1(g) + l1(h),

l0(g ∨ h) = l0(g) + l0(h),

l1(g ∨ h) = minl1(g) + l0(h), l0(g) + l1(h),l0(g)l1(g) + l1(h)l0(h) + l0(g)l0(h)

l0(g) + l0(h).

Applying this to the function f0 (the alternating and/or function men-tioned at the beginning of the lecture) shows that the upper bound for thatfunction is in fact tight.

Rafi Heiman and Avi Wigderson generalize this theorem to tree func-tions with arbitrary fan-in gates to show a lower bound of Ω(n0.51) for the

8—Randomized Complexity of Tree Functions — Lower Bounds 55

randomized decision tree complexity for an arbitrary tree function. SeeRandomized vs. Deterministic Decision Trees — Complexity for Read OnceBoolean Functions by Rafi Heiman and Avi Wigderson.

(Lecture by Rafi Heiman.)

Index

D(f), 6D0(f),D1(f), 7Sn, 9

V (K), 15K\v,K/v, 16µ(f), 12K, 15

d-colorable, 44f(G), 27pf (t), 12pf (t1, . . . , tn), 13

affineS, convS, 15K minus v, 16GF(pk), 26

Aanderaa, 9adversary argument, 6affinely independent, 15

AKR Conjecture, 9, 11AKR conjecture, probabilistic, 32AKR conjecture, proof for bipar-

tite graphs, 24AKR conjecture, proof for prime

n, 12

center of gravity, 21cone, 23conjunctive normal form, 8contractable, 16

decision tree, 5decision tree complexity of f , 6degree sequence, 45

disjunctive normal form, 7

eigenvalues, 32eigenvector, 32

Euler characteristic, 24evasive, 6

face, 15

geometric simplicial complex, 15graph packing, 40

graph properties, 8

Hajnal, 32, 40, 44Hamiltonian cycle, 41

Heiman, 54

ideal, 13

Karp, 9King, 32, 40

Las Vegas algorithm, 32link of v in K, 16literal, 7

mobius transform, 14monotone, 9multiplicative group, 26

normal subgroup, 25

orbit, 22

polyhedron, 15

Rosenberg, 9

simple decision tree, 5simplex, 15simplicial complex, 15simplicial map, 21Steveredi, 44

56

8—Randomized Complexity of Tree Functions — Lower Bounds 57

support simplex, 21

Theorem, Babai or Nisan?, 28Theorem, Cauchy, 12Theorem, Farkas, 35Theorem, Frobenius-Konig-Hall, 47Theorem, Kahn-Saks-Sturtevant, 19Theorem, Lefshetz, 23Theorem, Saks-Wigderson, 54Theorem, Von Neumann, 37Theorem, Yao, 24, 41Theorem,Bollobas-Eldridge, 43Theorem,Brouwer, 23Theorem,Hopf, 24transitive, 9tree functions, 11

weakly symmetric, 9, 12Wigderson, 54

Yao, 32, 40

zero-sum game, 36