THE UNIVERSITY OF CHICAGO HIGHER-ORDER FOURIER ANALYSIS ... · The traditional Fourier analysis...

THE UNIVERSITY OF CHICAGO

HIGHER-ORDER FOURIER ANALYSIS OVER FINITE FIELDS AND APPLICATIONS

A DISSERTATION SUBMITTED TO

THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES

IN CANDIDACY FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

DEPARTMENT OF COMPUTER SCIENCE

BY

POOYA HATAMI

CHICAGO, ILLINOIS

APRIL 13, 2015

ABSTRACT

Higher-order Fourier analysis is a powerful tool in the study of problems in additive and

extremal combinatorics, for instance the study of arithmetic progressions in primes, where

the traditional Fourier analysis comes short. In recent years, higher-order Fourier analysis

has found multiple applications in computer science in fields such as property testing and

coding theory. In this thesis, we develop new tools within this theory with several new ap-

plications such as a characterization theorem in algebraic property testing. One of our main

contributions is a strong near-equidistribution result for regular collections of polynomials.

The densities of small linear structures in subsets of Abelian groups can be expressed

as certain analytic averages involving linear forms. Higher-order Fourier analysis examines

such averages by approximating the indicator function of a subset by a function of bounded

number of polynomials. Then, to approximate the average, it suffices to know the joint

distribution of the polynomials applied to the linear forms. We prove a near-equidistribution

theorem that describes these distributions for the group Fnp when p is a fixed prime. This

fundamental fact was previously known only under various extra assumptions about the

linear forms or the field size. We use this near-equidistribution theorem to settle a conjecture

of Gowers and Wolf on the true complexity of systems of linear forms.

Our next application is towards a characterization of testable algebraic properties. We

prove that every locally characterized affine-invariant property of functions f : Fnp → R

with n ∈ N, is testable. In fact, we prove that any such property P is proximity-obliviously

testable. More generally, we show that any affine-invariant property that is closed under

subspace restrictions and has “bounded complexity” is testable. We also prove that any

property that can be described as the property of decomposing into a known structure of

low-degree polynomials is locally characterized and is, hence, testable.

We discuss several notions of regularity which allow us to deduce algorithmic versions of

various regularity lemmas for polynomials by Green and Tao and by Kaufman and Lovett.

iii

We show that our algorithmic regularity lemmas for polynomials imply algorithmic versions

of several results relying on regularity, such as decoding Reed-Muller codes beyond the list

decoding radius (for certain structured errors), and prescribed polynomial decompositions.

Finally, motivated by the definition of Gowers norms, we investigate norms defined by

different systems of linear forms. We give necessary conditions on the structure of systems

of linear forms that define norms. We prove that such norms can be one of only two types,

and assuming that |Fp| is sufficiently large, they essentially are equivalent to either a Gowers

norm or Lp norms.

iv

To my family

ACKNOWLEDGMENTS

First and foremost, I would like to express many thanks to my advisor Alexander Razborov.

It has been a great honor for me to be a student of Sasha. He is a great mentor, and he

has always been there for me whenever I have needed advice, and has been of great help

whenever I needed his supports. I would like to thank my second adviser and friend Madhur

Tulsiani, for his friendship, support and his great ideas and interesting problems, and for

always being available for sharing his insights and intuition about a problem or a paper. I am

very thankful to Shachar Lovett who has been a great friend, colaborator, a mentor, and for

his never-ending collection of interesting open problems and ideas. It was a great experience

when he hosted my visits to UCSD for two summers, where I learned a lot from him while

also enjoying the beautiful San Diego summers. I am also thankful to Arnab Bhattacharyya

for many interesting and fruitful discussions and collaborations.

Many thanks to Laci Babai and Janos Simons for many invaluable discussions on theory,

and sharing their stories on the theory community and history of the mathematical problems.

I would like to thank Denis Pankratov, for being a great office-mate and friend during

the past few years at University of Chicago. I want to thank Raghav Kulkarni, Sushant

Sachdeva, Abhishek Bhowmick, Ankan Saha, Shubhendu Trivedi, Yuan Li, Joshua Grochow,

Kaave Hosseini and Joshua Brody for many interesting conversations. A special thanks to

my friends Rasool Zandvakil, Gregor Jarosch, Laura Pilossoph, Rodrigo Dufeo and James

Edwards for making my time in Chicago most enjoyable. I want to thank Sarah, for her

amazing patience and support during the process of me writing this thesis.

I would like to thank my brother, Hamed, for without him I would not be where I am. I

am very grateful to him for introducing me to theoretical computer science years ago when

we were both much younger, and for keeping the habit of sharing with me the new exciting

theory that he explores. I dedicate this dissertation to my loving family, my mother Sedigheh,

my father Ibrahim, and my brothers Hamed and Amir.

vi

TABLE OF CONTENTS

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 HIGHER-ORDER FOURIER ANALYSIS . . . . . . . . . . . . . . . . . . . . . . 102.1 Polynomials over Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Non-classical Polynomials . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 The Derivative Polynomial . . . . . . . . . . . . . . . . . . . . . . . . 132.1.3 Gowers Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Rank and Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.1 Polynomial Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.2 Analytic Measures of Uniformity . . . . . . . . . . . . . . . . . . . . 192.2.3 Equidistribution and Near-Orthogonality of Regular Factors . . . . . 212.2.4 Strong Near-Orthogonality and Equidistribution . . . . . . . . . . . . 222.2.5 Regularization of Factors . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3 Decomposition Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3.1 Strong Decomposition Theorems . . . . . . . . . . . . . . . . . . . . 312.3.2 Higher-order Fourier Expansion . . . . . . . . . . . . . . . . . . . . . 35

2.4 Systems of Linear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.4.1 Cauchy-Schwarz Complexity . . . . . . . . . . . . . . . . . . . . . . . 362.4.2 The True Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 HOMOGENEOUS NON-CLASSICAL POLYNOMIALS . . . . . . . . . . . . . . 393.1 Properties of the Derivative Polynomial . . . . . . . . . . . . . . . . . . . . . 413.2 A Homogeneous Basis, the Univariate Case . . . . . . . . . . . . . . . . . . . 423.3 A Homogeneous Basis, the Multivariate Case . . . . . . . . . . . . . . . . . . 44

4 DISTRIBUTION OF POLYNOMIAL FACTORS OVER LINEAR FORMS . . . . 464.1 Near-equidistribution over Linear Forms . . . . . . . . . . . . . . . . . . . . 484.2 Strong near-orthogonality: Proof of Theorem 2.2.23 . . . . . . . . . . . . . . 50

5 TRUE COMPLEXITY, ON A THEOREM OF GOWERS AND WOLF . . . . . . 575.1 The Gowers-Wolf Conjecture: Proof of Theorem 5.0.7 . . . . . . . . . . . . . 58

6 TESTABLE AFFINE-INVARIANT PROPERTIES . . . . . . . . . . . . . . . . . 656.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.1.1 Algebraic Property Testing, Affine Invariance . . . . . . . . . . . . . 666.1.2 Localy Characterized Properties . . . . . . . . . . . . . . . . . . . . . 66

6.2 Locality for Affine-Invariant Properties . . . . . . . . . . . . . . . . . . . . . 686.3 Subspace Hereditary Properties . . . . . . . . . . . . . . . . . . . . . . . . . 69

vii

6.4 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4.1 Affine constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4.2 Complexity of Induced Affine Constraints . . . . . . . . . . . . . . . 726.4.3 Statement of the main result . . . . . . . . . . . . . . . . . . . . . . . 726.4.4 Comparison with Previous Work . . . . . . . . . . . . . . . . . . . . 73

6.5 Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.6 Strong Near-Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.7 Degree-structural Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.7.1 Every Degree Structural Property is Testable . . . . . . . . . . . . . 786.8 Big Picture Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.9 Proof of Testability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7 ALGORITHMIC ASPECTS OF REGULARITY . . . . . . . . . . . . . . . . . . 957.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.1.1 Inverse Theorem and Reed-Muller Decoding in High Characteristic . 967.1.2 Efficient Worst-Case to Average-Case Reductions for Polynomials . . 987.1.3 Algorithmic Inverse Theorem for Polynomials: Fields of Small Char-

acteristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.1.4 Polynomial decompositions . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2 Algorithmic Bogdanov-Viola Lemma; Approximate Regularization . . . . . . 1017.2.1 Unbiased Almost-Refinement . . . . . . . . . . . . . . . . . . . . . . 104

7.3 Algorithmic Uniform Refinement in high Characteristic . . . . . . . . . . . . 1077.4 Algorithmic Regularity in Low Characteristic . . . . . . . . . . . . . . . . . 110

7.4.1 Algorithmic approximate strongly unbiased refinement . . . . . . . . 1137.4.2 Algorithmic exact strongly unbiased refinement . . . . . . . . . . . . 118

7.5 Applications in High Characteristic . . . . . . . . . . . . . . . . . . . . . . . 1197.5.1 Computing Polynomials with Large Gowers Norm . . . . . . . . . . . 1197.5.2 Algorithmic Inverse Theorem in High Characteristic . . . . . . . . . . 1217.5.3 Decoding Reed-Muller Codes with Structured Noise . . . . . . . . . . 124

7.6 Applications n Low Characterstc . . . . . . . . . . . . . . . . . . . . . . . . 1277.6.1 Computing a biased Polynomial . . . . . . . . . . . . . . . . . . . . . 1277.6.2 Worst Case to Average Case Reduction . . . . . . . . . . . . . . . . . 128

7.7 Algorithmic Inverse Theorem for Polynomials . . . . . . . . . . . . . . . . . 1307.7.1 Derivative Polynomials as Symmetric Multilinear Polynomials . . . . 1317.7.2 Computing Polynomials with Large Gowers Norm . . . . . . . . . . . 137

7.8 Application: Polynomial Decompositions . . . . . . . . . . . . . . . . . . . . 146

8 NORMS DEFINED BY LINEAR FORMS . . . . . . . . . . . . . . . . . . . . . . 1488.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1508.2 Norms defined by linear forms . . . . . . . . . . . . . . . . . . . . . . . . . . 1518.3 Holder type inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528.4 A characterization theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588.5 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1638.6 Type (I) norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

viii

8.6.1 Theorem 8.5.4, the Direct Part . . . . . . . . . . . . . . . . . . . . . 1678.6.2 Theorem 8.5.4, the Inverse Part . . . . . . . . . . . . . . . . . . . . . 168

9 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

ix

CHAPTER 1

INTRODUCTION

Traditional Fourier analysis has been shown to be a powerful tool in the study of problems

in additive and extremal combinatorics. However, in the recent years there has been several

cases, such as the study of arithmetic progressions, where traditional Fourier analysis comes

short. An extension of the traditional Fourier analysis, called higher-order Fourier analysis,

was initiated by Gowers’ seminal work [35], where he introduced a sequence of norms for

functions on Abelian groups and used them to prove quantitative bounds for Szemeredi’s

theorem on arithmetic progressions [73]. Higher-order Fourier analysis has been shown to

be very useful in analysing the density of linear patterns in mathematical objects, and has

been extensively studied in several works [44, 45, 11, 78, 77, 68, 36, 38, 37, 39, 72, 71]. We

refer the interested reader to the monograph by Tao [76].

In this dissertation we are interested in the case where the underlying group is Fn, where

F = Fp for a fixed prime p. The traditional Fourier analysis over Fn allows one to express

a given function as a linear combination of the characters of Fn. Each character of Fn

is a linear phase corresponding to an α ∈ Fn, defined as χp(x) := ep (∑ni=1 αixi), where

ep(m) := e2πm/p. In higher-order Fourier analysis, the linear polynomials are replaced

by higher-degree polynomials, and one would like to express a function f : Fn → C as

a linear combination of such functions. Such a higher-order expansion can be achieved in

an approximate sense via the so-called “decomposition theorems” which are proved by an

application of the “inverse theorems” for uniformity norms, such as Gowers norms [36, 75, 43].

This allows one to express a given function f in the form f = f1+∑ki=1 gi where f1 is a linear

combination of higher-degree phases and gi’s are sufficiently small in an appropriate norm

allowing us to ignore them as negligible error [36]. One can then use such a decomposition

of f in order to reduce the problem of estimating the density of a linear pattern within f

to the problem of understanding the distribution of a collection of polynomials over a given

1

linear pattern.

One of the main useful properties of Fourier analysis is that the set of characters form an

orthonormal basis. However, we lose this orthogonality when we include higher degree poly-

nomials; for example a nonzero linear combination of two distinct quadratic polynomials will

always have a nonzero bias. Green and Tao [43], and Kaufman and Lovett [54] resolve this

issue by introducing a notion of regularity for polynomials, where a regular polynomial is one

that is “distributionally well-behaved”. These set of works provide an approximate orthogo-

nality which is sufficient for understading quantities of the form Ex∈Fn [f1(x) · · · fm(x)]. This

is not completely satisfactory towards answering questions about the density of arbitrary lin-

ear patterns, namely EX∈(Fn)k [f1(L1(X)) · · · fm(Lm(X))], where Li’s are arbitrary linear

forms over k variables. A main contribution of this dissertation which resolves this issue, is

a new near-orthogonality for regular polynomials over arbitrary collections of linear forms.

This improves on previous more restrictive results such as [15] for affine systems of linear

forms, and [51] for fields of sufficiently large characteristic. This strong near-orthogonality

result allows us to settle a conjecture of Gowers and Wolf regarding the complexity of a

system of linear forms.

Gowers uniformity norms are commonly used to control averages of linear patterns such

as arithmetic progressions within subsets of a group. Since higher-order Gowers norms

represent stronger notions of uniformity, a question arises naturally: Given a linear pattern,

i.e. a collection of linear forms, what is the smallest k for which the Gowers Uk norm

controls the density of such patterns? This question was asked and studied by Gowers and

Wolf [37] who conjectured a simple characterization for this value, and in [38] verified it for

the case of functions over Fn in the case when |F| is sufficiently large. We use our strong

near-orthogonality result to settle the Gowers-Wolf conjecture in full generality over Fn.

The decomposition theorems discussed above can be thought of as arithmetic “regularity

lemmas”, where we think of regularity as a notion of uniformity that allows one to decompose

2

a given object into a collection of simpler objects which appear random according to certain

statistics. Szemeredi’s celebrated regularity lemma [74, 61], which has been a fundamental

tool in combinatorics [73, 59], can be thought of as a decomposition theorem for bivariate

functions, where a given function, corresponding to the edges of a given graph is decomposed

into a structured part, a small error, and a uniform or pseudorandom part. In the past

decade such decomposition theorems have been developed in much more general settings

and under various stronger or weaker notions and parameters of uniformity [36, 75, 40,

27]. Decomposition theorems can be used to reduce questions regarding large and complex

mathematical objects to that of bounded complexity objects while only suffering a small error

in parameters. This general phenomenon has been used in proving several major results in

mathematics [42, 36, 83], and the reduction to a lower complexity object under a controllable

error has also been shown to be useful in computer science [3, 16, 19]. The development

and application of decomposition theorems requires answers to several interesting questions

regarding the notion of uniformity that is to be used, the kinds of error that can be tolerated,

the relations between error, uniformity and structure parameters that are required, and

whether such a decomposition theorem exists or can be obtained efficiently. Several such

questions are discussed in this dissertation.

The sequence of Gowers uniformity norms have been shown to have numerous applications

such as the study of linear structures such as arithmetic progressions in subsets of a group,

or as a test for correlation with low-degree phases. The d+ 1’th Gowers norm of a function

f : Fn → C is defined by picking a random d dimensional parallelepiped and considering the

average product of the function over all its points, more formally

‖f‖Ud+1 :=

EX,Y1,...,Yd∈Fn

∏S⊆[d]

C|S|f(X +∑i∈S

Yi)

1/2d

,

where C is complex conjugation. Motivated by the definition of Gowers norms one may

3

be interested in introducing similar norms defined by different systems of linear forms. We

study such norms and give nice necessary conditions on the structure of systems of linear

forms which define a norm. In particular, we prove that such norms can be one of two types.

We then show that norms of the first type, for which the complexity of the system of the

linear forms is smaller than |F|, are essentially equivalent to a Gowers norm. We further

show that norms of the second type are equivalent to Lp norms. Our results are inspired by

similar necessary conditions in the context of graph normings by Hamed Hatami [47, 48].

Motivated by turning the numerous applications of decomposition theorems and regu-

larity lemmas into algorithms, it is natural to ask whether it is possible to achieve them

algorithmically. In the case of Szemeredi’s regularity lemma, the original proof was non-

algorithmic and the question of finding an algorithm for computing the regularity partition

was first considered by Alon et.al. [1]. The Szemeredi regularity lemma is often used to

prove existence of certain substructures, such as triangles [67], in large graphs, and with

an algorithmic version of the lemma one can efficiently “find’ these structures in a given

graph. Since then, there have been numerous improvements and extensions to the algorith-

mic version of the Szemeredi regularity lemma, of which the works [5, 28, 58, 25] constitute

a partial list (see [25] for a detailed discussion). The arithmetic regularity lemmas of Green

and Tao [43] and Kaufman and Lovett [54] show that one can modify a given collection

of polynomials into a new “pseudorandom” collection of polynomials. These lemmas have

various applications in computer science, such as (special cases of) Reed-Muller testing and

worst-case to average-case reductions for polynomials. However, the proofs of [43, 54], which

involve a transformation from the first collection to the pseudorandom collection, are not

algorithmic. In this dissertation we consider analytic notions of regularity for polynomials

and show that they allow for efficient algorithms.

One main application of our algorithmic regularity lemma is in decoding of Reed-Muller

codes. In the case when the characteristic of F is large, the Gowers norm gives an approximate

4

test for checking if for a given polynomial P , there exists a Q of degree at most k within

Hamming distance 1 − 1|F| − ε [43]. This is remarkable because the list-decoding radius of

Reed-Muller codes of order k codes is less than 1 − k|F| for k < |F| [33], and the test works

even beyond that. Moreover, Tao and Ziegler [11] later showed that this test works not only

when P is a polynomial of degree d < |F| but also when P is an arbitrary function. Given

this, it is natural to ask for the decoding analogue:

Given P of degree d over Fn, if there exists Q of degree k such that dist(P,Q) 6

1 − 1|F| − ε, can one find a Q′ (in time polynomial in n) of degree k such that

dist(P,Q′) 6 1− 1|F| − η for some η depending on ε?

We use our algorithmic regularity lemma and solve the above decoding question for Reed-

Muller codes beyond their list-decoding radius: This special case can be interpreted as the

scenario when the received word P is obtained from some codeword Q of degree k by adding

a “noise” P − Q of degree d. Thus, when the noise is “structured”, we can decode in the

discussed sense even beyond the list-decoding radius. This shows that although there can be

exponentially many such Q due to the fact that we are in the regime beyond the list-decoding

radius (see [55, 19]), the question is still tractable if we ask only for one such Q and if we

allow a loss from ε to η. Such a decoding question was solved for Reed-Muller codes of order

2 over F2, for any given function f (instead of a polynomial P of bounded degree) by [79, 10].

Lastly a key contribution of this dissertation is an application of higher-order Fourier

analysis to give a characterization of testable algebraic properties. The field of property

testing, as initiated by [20, 8], is the study of algorithms that query their input a very small

number of times and with high probability decide correctly whether the input satisfies a

given property or is “far” from satisfying that property. A property is called testable if

the number of queries can be made independent of the size of the object without affecting

the correctness probability. Perhaps surprisingly, it has been found that a large number of

natural properties satisfy this strong requirement; see e.g. the surveys [24, 65, 64, 70] for

5

a general overview. An example of such phenomenon is the famous result of Blum, Luby

and Rubinfeld [20] who showed that linearity of a function f : Fn → F is testable by a test

which makes only 3 random queries. Some of the earliest results in the field of property

testing are about the properties of functions on Fn, where p is a fixed prime. However, other

settings such as graph and hypergraph properties have witnessed more progress in the past,

where several characterizations for testable properties have been proved [3, 7, 6, 26, 22].

This is mainly due to the fact that the concepts of uniformity and related approximations

are somewhat simpler in those settings. In this thesis we use the recently developed tools

in higher-order Fourier analysis in order to extend some of the results from the graph and

hypergraphs setting to the setting of functions on Fn.

Specifically, we give a complete characterization of the so called “proximity oblivious”

testable properties that are invariant under affine transformations. We consider properties

of functions f : Fn → 1, . . . , R. Our main result shows that any such property that

is invariant with respect to affine transformations on Fn and that is locally characterized

is testable. Furthermore, we show that a large class of natural algebraic properties whose

query complexity had not been previously studied are locally characterized affine-invariant

properties and are, hence, testable. Our results are in the one-sided regime, where a tester

is required to accept any function that belongs to the studied property.

Recently, there has been several other developments in the study of affine-invariant prop-

erties; see [12] for a summary. Hatami and Lovett [52] proved that higher-order Fourier

analysis combined with ideas from [26] can be used to show that the distance to every

testable affine invariant property is constant-query estimable. This extends the ideas of [26]

to the algebraic property testing setting. In the two-sided error regime, Yoshida [81] uses

the decomposition theorems in order to give a full characterization of two-sided testable

affine invariant properties, analogous to the work of Alon et.al. [3] in the context of graphs.

In a more recent work, Yoshida [82] uses non-standard analysis in order to define limits of

6

functions sequences, which allows for a simpler characterization of two-sided testable affine-

invariant properties. An approach to defining limit objects for Boolean functions over Fn

using finitary higher-order Fourier analysis can be seen in [49].

7

Organization.

In Chapter 2 we introduce higher-order Fourier analysis in detail and develop some of the

essential tools that we need throughout this thesis. Notions of regularity and their associ-

ated norms and decomposition theorems as well as distributional properties of collections

of polynomials over linear forms are discussed. In Chapter 3 we prove that there exists a

basis for “non-classical” polynomials consisting only of “homogeneous non-classical poly-

nomials”. Later we use this theorem in Chapter 4 to prove our main theorem which is a

near-equidistribution theorem for collections of polynomials over arbitrary collections of lin-

ear forms. We use our near-equidistribution theorem to prove a conjecture of Gowers and

Wolf regarding a complexity measure of systems of linear forms in Chapter 5. Chapter 6

is concerned with the field of algebraic property testing, where we prove a characterization

theorem for affine invariant testable properties. Finally, in Chapter 8 we investigate norms

that are defined by averaging over a system of linear forms, where we give necessary condi-

tions on the structure of such systems of linear forms and use it to relate to Lp or Gowers

norms.

8

Notation

Fix a prime field F = Fp for a prime p > 2. Define | · | to be the standard map from F

to 0, 1, . . . , p − 1 ⊂ Z. Let D denote the complex unit disk z ∈ C : |z| 6 1, and let

T denote the circle group R/Z, which is an Abelian group with group operation denoted

+. Let e : T → C denote the character e (x) = e2πix, by abuse of notation, we may use

e : F→ C to denote e(x) = e2πix/|F| as well.

For integers a, b, we let [a] denote the set 1, 2, . . . , a and [a, b] denote the set a, a +

1, . . . , b. For real numbers, α, σ, ε, we use the shorthand σ = α ± ε to denote α − ε 6 σ 6

α + ε. The zero element in Fn is denoted by 0. We will denote by lower case letters, e.g.

x, y, elements of Fn. We use capital letters, e.g. X = (x1, . . . , xk) ∈ (Fn)k, to denote tuples

of variables.

9

CHAPTER 2

HIGHER-ORDER FOURIER ANALYSIS

In this chapter we introduce higher-order Fourier analysis in detail, and prove several tech-

nical tools that will be required throught this dissertation. We discuss several properties of

polynomials over finite fields, and extensions of polynomials that are required when the field

size is small. Notions of regularity for collections of polynomials and norms that capture

these notions of regularity. Algebraic decomposition theorems associated with uniformity

norms are presented, which allow to decompose a given function to a structured part given

by a collection of polynomials and a pseudorandom part. We then show how one can use such

a decomposition to write a higher-order expansion of the given function. We then discuss the

distribution polynomials over collections of linear forms. This is useful because the density

of a linear pattern in an arbitrary function can be related to the distribution of a regular

collection of polynomials over a collection of linear forms via a decomposition theorem.

2.1 Polynomials over Finite Fields

Let d > 0 be an integer. A polynomial P : Rn → R of degree < d can be defined in one of

two equivalent ways:

• (Global Definition) P is a polynomial of degree < d if and only if it can be written in

the form

P (x1, ..., xn) =∑

06i1,...,in;i1+···+in<d

ci1,...,inxi11 · · ·x

inn ,

with coefficients ci1,...,in ∈ R.

• (Local Definition) P is a polynomial of degree < d if it is d times differentiable and its

d-th derivative vanishes.

10

It is not difficult to check that the above two definitions are equivalent, using the Taylor series

expansion to go from the local to the global definition. In finite characteristic, i.e. when

P : Fn → G, the local definition of a polynomial used the notion of additive derivatives.

Definition 2.1.1 (Polynomials over finite fields (local definition)). For an integer d > 0, a

function P : Fn → G is said to be a polynomial of degree 6 d if for all y1, . . . , yd+1, x ∈ Fn,

it holds that

(Dy1 · · ·Dyd+1P )(x) = 0,

where DyP (x) = P (x + y)− P (x) is the additive derivative of P at x and direction y. The

degree of P is the smallest d for which the above holds.

It follows simply from the definition that for any direction y ∈ Fn, deg(DyP ) < deg(P ).

In the “classical” case of polynomials P : Fn → F, it is a well-known fact that the global and

local definitions coincide. However, the situation is different in more general groups. One

way to suspect this is the fact that we are not able to divide by d! in some cases, in order to

be able to make use of Taylor expansions. For example when the range of P is R/Z, it turns

out that the global definition must be refined to the “non-classical polynomials”, which may

have monomials that are different from the classical case. This phenomenon was noted by

Tao and Ziegler [78] in the study of Gowers norms.

2.1.1 Non-classical Polynomials

Definition 2.1.2 (Non-classical Polynomials (local definition)). For an integer d > 0, a

function P : Fn → T is said to be a non-classical polynomial of degree 6 d (or simply a

polynomial of degree 6 d) if for all y1, . . . , yd+1, x ∈ Fn, it holds that

(Dy1 · · ·Dyd+1P )(x) = 0. (2.1)

11

The degree of P is the smallest d for which the above holds. A function P : Fn → T is said

to be a classical polynomial of degree 6 d if it is a non-classical polynomial of degree 6 d

whose image is contained in 1pZ/Z.

The following lemma of Tao and Ziegler [78] shows that a classical polynomial P of degree

d must always be of the form x 7→ |Q(x)|p , where Q : Fn → F is a polynomial (in the usual

sense) of degree d, and | · | is the standard map from F to 0, 1, . . . , p− 1. This lemma also

characterizes the structure of non-classical polynomials.

Lemma 2.1.3 (Lemma 1.7 in [78]). A function P : Fn → T is a polynomial of degree 6 d

if and only if P can be represented as

P (x1, . . . , xn) = α +∑

06d1,...,dn<p;k>0:0<∑i di6d−k(p−1)

cd1,...,dn,k|x1|d1 · · · |xn|dn

pk+1mod 1,

for a unique choice of cd1,...,dn,k ∈ 0, 1, . . . , p− 1 and α ∈ T. The element α is called the

shift of P , and the largest integer k such that there exist d1, . . . , dn for which cd1,...,dn,k 6= 0

is called the depth of P . A depth-k polynomial P takes values in a coset of the subgroup

Uk+1def= 1

pk+1Z/Z. Classical polynomials correspond to polynomials with 0 shift and 0 depth.

Note that Lemma 2.1.3 immediately implies the following important observation1:

Remark 2.1.4. If Q : Fn → T is a polynomial of degree d and depth k, then pQ is a

polynomial of degree max(d − p + 1, 0) and depth k − 1. In other words, if Q is classical,

then pQ vanishes, and otherwise, its degree decreases by p − 1 and its depth by 1. Also, if

λ ∈ [1, p− 1] is an integer, then deg(λQ) = d and depth(λQ) = k.

For convenience of exposition, henceforth we will assume that the shifts of all polynomials

are zero. This can be done without affecting any of the results presented in this thesis. Hence,

all polynomials of depth k take values in Uk+1.

1. Recall that T is an additive group. If n ∈ Z and x ∈ T, then nx is shorthand for x + · · ·+ x if n > 0and −x− · · · − x otherwise, where there are |n| terms in both expressions.

12

2.1.2 The Derivative Polynomial

Given a non-classical polynomial P of exact degree d, it is often useful to consider the

properties of its d-th derivative. Motivated by this, we give the following definition.

Definition 2.1.5 (Derivative Polynomial). Let P : Fn → T be a degree-d polynomial, possi-

bly non-classical. Define the derivative polynomial ∂P : (Fn)d → T by the following formula

∂P (h1, . . . , hd)def= Dh1

· · ·DhdP (0),

where h1, . . . , hd ∈ Fn.2 Moreover for k < d define

∂kP (x, h1, . . . , hk)def= Dh1

· · ·DhkP (x).

The following lemma shows some useful properties of the derivative polynomial.

Lemma 2.1.6. Let P : Fn → T be a degree-d (non-classical) polynomial. Then the polyno-

mial ∂P (h1, . . . , hd) is

(i) multilinear: ∂P is additive in each hi.

(ii) invariant under permutations of h1, . . . , hd.

(iii) a classical nonzero polynomial of degree d.

(iv) homogeneous: all its monomials are of degree d.

Notice that by multilinear we mean additive in each direction hi, which is not the usual

use of the term “multilinear”.

Proof. The proof follows by the properties of the additive derivative Dh. Multilinearity of ∂P

follows from linearity of the additive derivative, namely for every function Q and directions

2. Notice since P is a degree d polynomial, Dh1. . . Dhd

P (x) does not depend on x and thus we have theidentity ∂P (h1, . . . , hd) = Dh1

. . . DhdP (x) for any choice of x ∈ Fn.

13

h1, h2 we have the identity Dh1+h2Q(x) = Dh1

Q(x) +Dh2Q(x+ h1). The invariance under

permutations of h1, . . . , hd is a result of commutativity of the additive derivatives. Since

P is a degree-d (non-classical) polynomial, ∂P is nonzero by definition. Notice that since

D0Q ≡ 0 for any function Q, we have ∂P (h1, . . . , hd) = 0 if any of hi is equal to zero. Hence

every monomial of ∂P must depend on all hi’s. The properties (iii) and (iv) now follow from

this and the fact that deg(∂P ) 6 d and thus each monomial has exactly one variable from

each hi.

2.1.3 Gowers Norms

Gowers norms, which were introduced by Gowers [35], play an important role in additive

combinatorics, more specifically in the study of polynomials of bounded degree. For example,

it is known through an inverse theorem for Gowers norms that a Gowers norm of a function

can be used to test whether it is close to a polynomial of a given degree bound. Let us first

define the multiplicative derivative of a complex valued function.

Definition 2.1.7 (Multiplicative Derivative). Given a function f : Fn → C and an element

h ∈ Fn, define the multiplicative derivative in direction h of f to be the function ∆hf : Fn →

C satisfying ∆hf(x) = f(x+ h)f(x) for all x ∈ Fn.

The Gowers norm of order d for a function f : Fn → C is the expected multiplicative

derivative of f in d random directions at a random point.

Definition 2.1.8 (Gowers norm). Given a function f : Fn → C and an integer d > 1, the

Gowers norm of order d for f is given by

‖f‖Ud =

∣∣∣∣ Ey1,...,yd,x∈Fn

[(∆y1∆y2 · · ·∆ydf)(x)

]∣∣∣∣1/2d .Note that as ‖f‖U1 = |E [f ] | the Gowers norm of order 1 is only a semi-norm. However

for d > 1, it is not difficult to show that ‖ · ‖Ud is indeed a norm, using the following lemma

14

which was first proved in [35] as a key property of Gowers norms.

Lemma 2.1.9 (Gowers Cauchy-Schwarz). Consider a family of functions fS : Fn → C,

where S ⊆ [d]. Then

∣∣∣∣∣∣ Ex,y1,...,yd∈Fn

∏S⊆[d]

Cd−|S|fS(x+∑i∈S

yi)

∣∣∣∣∣∣ 6∏S⊆[d]

‖fS‖Ud , (2.2)

It is a direct consequence of Definition 2.1.2 that a function f : Fn → C with ‖f‖∞ 6 1

satisfies ‖f‖Ud+1 = 1 if and only if f = e (P ) for a (non-classical) polynomial P : Fn → T

of degree 6 d. The inverse theorem for Gowers norms is a robust version of this statement.

We denote by Poly(Fn → T) and Poly6d(Fn → T), respectively, the set of all non-classical

polynomials, and the ones of degree at most d.

Theorem 2.1.10 (Theorem 1.11 of [78]). Suppose δ > 0, and d > 1 is an integer. There

exists an ε = ε2.1.10(δ, d,F) such that the following holds. For every function f : Fn → C

with ‖f‖∞ 6 1 and ‖f‖Ud+1 > δ, there exists a polynomial P ∈ Poly6d(Fn → T) of degree

6 d that is ε-correlated with f , meaning

∣∣∣∣ Ex∈Fn

f(x)e (−P (x))

∣∣∣∣ > ε.

2.2 Rank and Regularity

The rank of a polynomial is a notion of its complexity according to lower degree polynomials.

Definition 2.2.1 (Rank of a polynomial). Given a polynomial P : Fn → T and an integer

d > 1, the d-rank of P , denoted rankd(P ), is defined to be the smallest integer r such that

there exist polynomials Q1, . . . , Qr : Fn → T of degree 6 d − 1 and a function Γ : Tr → T

satisfying P (x) = Γ(Q1(x), . . . , Qr(x)). If d = 1, then 1-rank is defined to be ∞ if P is

non-constant and 0 otherwise.

15

The rank of a polynomial P : Fn → T is its deg(P )-rank. We say that P is r-regular if

rank(P ) > r.

Note that for an integer λ ∈ [1, p−1], rank(P ) = rank(λP ). For future use, we record here

a simple lemma stating that restrictions of high rank polynomials to hyperplanes generally

preserve degree and high rank.

Lemma 2.2.2. Suppose P : Fn → T is a polynomial of degree d and rank > r, where

r > p + 1. Let A be a hyperplane in Fn, and denote by P ′ the restriction of P to A. Then,

P ′ is a polynomial of degree d and rank > r − p, unless d = 1 and P is constant on A.

Proof. For the case d = 1, we can check directly that either P ′ is constant or else, P ′ is a

non-constant degree-1 polynomial and so has rank infinity.

So, assume d > 1. By making an affine transformation, we can assume without loss of

generality that A is the hyperplane x1 = 0. Let π : Fn → Fn−1 be the projection to A.

Let P ′′ = P − P ′ π. Clearly, P ′′ is zero on A. For x ∈ F \ 0, let hx = (x, 0, . . . , 0) ∈ Fn.

Note that DhxP′′ is of degree 6 d − 1 and that (DhxP

′′)(y) = P ′′(y + hx) for all y ∈ A.

Hence, for every x ∈ F \ 0, P ′′ on hx +A agrees with a polynomial Qx of degree 6 d− 1.

So, for a function Γ : Tp+1 → T, we can write P = Γ(ι(x1), P ′, Q1, Q2, . . . , Qp−1), where

ι(x1), Q1, . . . , Qp−1 are of degree 6 d− 1.

Now, if P ′ itself is of degree d− 1, then P is of rank 6 p+ 1 < r, a contradiction. If P ′

is of rank < r − p, then again P is of rank < r − p+ p = r, a contradiction.

A high-rank polynomial of degree d is, intuitively, a “generic” degree-d polynomial. There

are no unexpected ways to decompose it into lower degree polynomials.

2.2.1 Polynomial Factors

Next, we will formalize the notion of a generic collection of polynomials. Intuitively, it should

mean that there are no unexpected algebraic dependencies among the polynomials. First,

16

we need to set up some notation.

Definition 2.2.3 (Factors). If X is a finite set then by a factor B we mean simply a partition

of X into finitely many pieces called atoms.

A function f : X → C is called B-measurable if it is constant on atoms of B. For any

function f : X → C, we may define the conditional expectation

E[f |B](x) = Ey∈B(x)

[f(y)],

where B(x) is the unique atom in B that contains x. Note that E[f |B] is B-measurable.

A finite collection of functions φ1, . . . , φC from X to some other space Y naturally define a

factor B = Bφ1,...,φC whose atoms are sets of the form x : (φ1(x), . . . , φC(x)) = (y1, . . . , yC)

for some (y1, . . . , yC) ∈ Y C . By an abuse of notation we also use B to denote the map

x 7→ (φ1(x), . . . , φC(x)), thus also identifying the atom containing x with (φ1(x), . . . , φC(x)).

Definition 2.2.4 (Polynomial factors). If P1, . . . , PC : Fn → T is a sequence of polynomials,

then the factor BP1,...,PC is called a polynomial factor.

The complexity of B, denoted |B| := C, is the number of defining polynomials. The degree

of B is the maximum degree among its defining polynomials P1, . . . , PC . If P1, . . . , PC are

of depths k1, . . . , kC , respectively, then the number of atoms of B is at most∏Ci=1 p

ki+1.

Definition 2.2.5 (Conditional Expectation over Factors). Let B be a polynomial factor

defined by P1, . . . , PC . For f : Fn → C, the conditional expectation of f with respect to B,

denoted E[f |B] : Fn → C, is

E[f |B](x) = Ey∈Fn

[f(y)

∣∣P1(y) = P1(x), . . . , PC(y) = PC(x)].

Namely E[f |B] is a constant on every atom of B.

17

Following is a simple observation stating that E[f |B] has same correlation with any other

function as f does.

Remark 2.2.6. Let f : Fn → C. Let B be a polynomial factor defined by polynomials

P1, ..., PC . Let g : Fn → C be any B-measurable function. Then

〈f, g〉 = 〈E(f |B), g〉.

Finally, we define the rank of a factor similarly to that of a single polynomial.

Definition 2.2.7 (Rank of a Factor). A polynomial factor B defined by a sequence of poly-

nomials P1, . . . , PC : Fn → T with respective depths k1, . . . , kC is said to have rank r if r

is the least integer for which there exists (λ1, . . . , λC) ∈ ZC , with (λ1 mod pk1+1, . . . , λC

mod pkC+1) 6= 0C , such that rankd(∑Ci=1 λiPi) 6 r, where d = maxi deg(λiPi).

Given a polynomial factor B and a function r : Z>0 → Z>0, we say that B is r-regular

if B is of rank larger than r(|B|).

Notice that by the definition of rank, for a degree-d polynomial P of depth k we have

rank(P) = min

rankd(P ), rankd−(p−1)(pP ), . . . , rankd−k(p−1)(pkP )

,

where P, is a polynomial factor consisting only of one polynomial P .

Regular factors indeed do behave like a generic collection of polynomials, as we shall

establish in a precise sense in Section 2.2.3. Thus, given any factor B that is not regular, it

will often be useful to regularize B, that is, find a refinement B′ of B that is regular up to

our desires. We distinguish between two kinds of refinements:

Next, we define the notion of conditional expectation with respect to a given factor.

Definition 2.2.8 (Expectation over polynomial factor). Given a factor B and a function

f : Fn → 0, 1, the expectation of f over an atom y ∈ T|B| is the average Ex:B(x)=y[f(x)],

18

which we denote by E[f |y]. The conditional expectation of f over B, is the real-valued

function over Fn given by E[f |B](x) = E[f |B(x)]. In particular, it is constant on every

atom of the polynomial factor.

2.2.2 Analytic Measures of Uniformity

Regularity defined by the notion of rank is an algebraic/combinatorial notion of pseudo-

randomness, however there are several cases where an analytic notion would be much more

useful. In many cases, what is needed from the notion of regularity is for every nonzero

linear combination of polynomials in the polynomial factor to be unbiased, where the bias

of a function is defined as follows.

Definition 2.2.9 (Bias). The bias of a function f : Fn → T is defined to be

bias(f) := Ex∈Fn

[eF(f(x))] .

Green and Tao [43], and Kaufman and Lovett [54] proved the following relation between

bias and rank of a polynomial.

Theorem 2.2.10 (d < |F| [43], arbitrary F [54]). For any ε > 0 and integer d > 0, there

exists r = r2.2.10(d, ε) such that the following is true. If P : Fn → T is a degree-d polynomial

with rank greater than r, then |Ex[e (P (x))]| < ε.

Kaufman and Lovett originally proved Theorem 2.2.10 for classical polynomials. But

their proof also works for non-classical ones without modification.

Note that r2.2.10(δ, d), does not depend on the dimension n of Fn. Motivated by Theo-

rem 2.2.10 we define unbiasedness for polynomial factors.

Definition 2.2.11 (γ-unbiased factors). Let F be a factor of degree d, and let γ : N→ R+

be a decreasing function. We say that F is γ-unbiased if for every collection of coefficients

19

ci,j ∈ F16i6d,16j6Mi, we have

bias(∑i,j

ci,jPi,j) 6 γ(dim(F)).

Using Gowers norms, one can define the following analytical notion of uniformity for

polynomials which is stronger than unbiasedness.

Definition 2.2.12 (Uniformity). Let ε > 0 be a real. A degree-d polynomial P : Fn → T is

said to be ε-uniform if

‖e (P )‖Ud < ε.

Tao and Ziegler used Theorem 2.2.10 to show that high rank polynomials have small

Gowers norm.

Theorem 2.2.13 (Theorem 1.20 of [78]). For any ε > 0 and integer d > 0, there exists an

integer r(d, ε) such that the following is true. For any (non-classical) polynomial P : Fn → T

of degree 6 d, if ‖e (P )‖Ud > ε, then rankd(P ) 6 r.

This immediately implies that a regular polynomial is also uniform.

Corollary 2.2.14. Let ε, d, and r(d, ε) be as in Theorem 2.2.13. Every r-regular polynomial

P of degree d is also ε-uniform.

The next claim, which is a standard application of Fourier analysis, shows that the

converse of this is true at least qualitatively.

Claim 2.2.15. For every r, d ∈ N, there is ε(r, d) such that every ε-uniform polynomial of

degree d has rank(P ) > r.

The following is an analytical notion of uniformity for a factor along the same lines as

Definition 2.2.12.

20

Definition 2.2.16 (Uniform Factor). Let γ : N → R+ be a decreasing function. A poly-

nomial factor B defined by a sequence of polynomials P1, . . . , PC : Fn → T with respective

depths k1, . . . , kC is said to be γ-uniform if for every collection (λ1, . . . , λC) ∈ ZC , with (λ1

mod pk1+1, . . . , λC mod pkC+1) 6= 0C

∥∥∥∥∥e(∑

i

λiPi

)∥∥∥∥∥Ud

< γ(dim(F)),

where d = maxi deg(λiPi).

Remark 2.2.17 (Equivalence between rank and uniformity). Similar to Corollary 2.2.14

it also follows from Theorem 2.2.13 that an r-regular degree-d factor B is also ε-uniform

when r = r2.2.13(d, ε) is as in Theorem 2.2.13. The converse of this also holds, since by

Claim 2.2.15, and there is an approximate equivalence between regularity and uniformity.

2.2.3 Equidistribution and Near-Orthogonality of Regular Factors

In this section, we make precise the intuition that a high-rank collection of polynomials

often behaves like a collection of independent random variables. The key technical tool is

the connection between the combinatorial notion of rank and the analytic notion of bias

(Theorem 2.2.10).

Using a standard observation that relates the bias of a function to its distribution on its

range, Theorem 2.2.10 implies the following.

Lemma 2.2.18 (Size of atoms). Given ε > 0, let B be a polynomial factor of degree d > 0,

complexity C, and rank r2.2.10(d, ε), defined by a tuple of polynomials P1, . . . , PC : Fn → T

having respective depths k1, . . . , kC . Suppose b = (b1, . . . , bC) ∈ Uk1+1 × · · · ×UkC+1. Then

Prx

[B(x) = b] =1

‖B‖± ε.

21

In particular, for ε < 1‖B‖ , B(x) attains every possible value in its range and thus has ‖B‖

atoms.

Proof.

Prx

[B(x) = b] = Ex

∏i

1

pki+1

pki+1−1∑λi=0

e (λi(Pi(x)− bi))

=∏i

p−(ki+1) ·∑

(λ1,...,λC)

∈∏i[0,p

ki+1−1]

Ex

[e

(∑i

λi(Pi(x)− bi)

)]

=∏i

p−(ki+1) ·

(1± ε

∏i

pki+1

)=

1

‖B‖± ε.

The first equality uses the fact that Pi(x)−bi is in Uki+1 and that for any nonzero x ∈ Uki+1,∑pk+1−1λ=0 e (λx) = 0. The third equality uses Theorem 2.2.10 and the fact that unless every

λi = 0, the polynomial∑i λi(Pi(x)− bi) has rank at least r2.2.10(d, ε).

An almost identical proof implies a similar statement for unbiased factors instead of

regular factors.

Lemma 2.2.19 (Equidistribution for unbiased factors). Suppose that γ : N → R+ is

a decreasing function. Let F be a γ-unbiased factor of degree d > 0, defined by a tu-

ple of polynomials P1, . . . , PC : Fn → T having respective depths k1, . . . , kC . Suppose

b = (b1, . . . , bC) ∈ Uk1+1 × · · · × UkC+1. Then

Prx∈Fn

[F(x) = b] =1

‖F‖± γ(dim(F)).

2.2.4 Strong Near-Orthogonality and Equidistribution

One of the important and useful properties of the classical Fourier characters is that they

form an orthonormal basis. Theorem 2.2.10, Lemma 2.2.18 and Lemma 2.2.19 provide an

22

approximate version of this phenomenon which is useful for several applications. However,

this is not completely satisfactory, as we will see later that in order to study averages of the

form

EX∈(Fn)k[f(L1(X)) · · · f(Lm(X))

],

for a bounded function f : Fn → R, one needs to understand the distribution of the more

sophisticated random variable

AP,L(X) :=

P1(L1(X)) P2(L1(X)) . . . PC(L1(X))

P1(L2(X)) P2(L2(X)) . . . PC(L2(X))

......

P1(Lm(X)) P2(Lm(X)) . . . PC(Lm(X))

,

where X is the uniform random variable taking values in (Fn)k, P = P1, ..., PC is a family

of non-classical polynomials and L = L1, ..., Lm ⊂ Fk is an arbitrary system of linear

forms.

Since (non-classical) polynomials of a given degree satisfy various linear identities (e.g.

every degree one polynomial P satisfies P (x + y + z) = P (x + y) + P (x + z) − P (x)), it

is no longer possible to choose the polynomials in a way that this random matrix is almost

uniformly distributed over all m×C matrices A ∈∏mi=1

∏Cj=1 Ukj+1. Therefore, in this case

one would like to obtain an almost uniform distribution on the points of the configuration

space that are consistent with these linear identities. Note that [43] and [54] only address

the case where the system of linear forms consists only of a single linear form; in other words

they show that the entries in each row of this matrix are nearly independent. The stronger

near-equidistribution would in particular imply that the columns of this matrix are nearly

independent, with each column uniform modulo the required linear dependencies.

Hatami and Lovett [51] established this strong near-equidistribution in the case where

the characteristic of the field F is greater than the degree of the (classical) polynomial factor.

23

Lemma 2.2.20 ([51]). Let F be a prime field and d < |F|. Let L1, . . . , Lm be a system of

linear forms. Let F = P1, . . . , Pk be a collection of homogeneous classical polynomials of

degree at most d, such that rank(F) > τp,d(ε). For every set of coefficients Λ = λi,j ∈ F :

i ∈ [k], j ∈ [m], and

PΛ(x) :=k∑i=1

m∑j=1

λi,jPi(Lj(x)),

one of the following two cases holds:

PΛ ≡ 0 or E [eF(PΛ)] < ε.

The proof technique used in [51] breaks down once |F| is allowed to be small. Bhat-

tacharyya, et al. [15] later extended the result of [51] to the general characteristic case, but

under the extra assumption that the system of linear forms is affine, i.e. there is a variable

that appears with coefficient 1 in all the linear forms.

Theorem 2.2.21 (Near orthogonality [15]). Given ε > 0, suppose B = (P1, . . . , PC) is a

polynomial factor of degree d > 0 and rank > r2.2.13(d, ε), A = (L1, . . . , Lm) is an affine

constraint on ` variables, and Λ is a tuple of integers (λi,j)i∈[C],j∈[m]. Define

PA,B,Λ(x1, . . . , x`) =∑

i∈[C],j∈[m]

λi,jPi(Lj(x1, . . . , x`)).

Then, one of the two statements below is true.

• For every i ∈ [C], it holds that∑j∈[m] λi,jQi(Lj(·)) ≡ 0 for all polynomials Qi : Fn →

T with the same degree and depth as Pi. Clearly, PA,B,Λ ≡ 0 in this case.

• PA,B,Λ is non-constant. Moreover, |Ex1,...,x` [e(PA,B,Λ(x1, . . . , x`)

)]| < ε.

One can use the derivative technique used in [15] in a straightforward manner to prove

the following near-orthogonality result for when the linear forms are not necessarily affine

24

but they are allowed only to take 0 and 1 entries (See [18] for a proof).

Lemma 2.2.22 (Near orthogonality). For every decreasing function ε : N→ R+ and param-

eter d > 1, there is a decreasing function γ : N→ R+ such that the following holds. Suppose

that F = Pi,j16i6d,16j6Miis a γ-uniform factor of degree d. Let x ∈ Fn be a fixed vector.

For every integer k and set of not all zero coefficients Λ = λi,j,ω16i6d,16j6Mi,ω∈0,1k

define

PΛ(y1..., yk)def=

∑i∈[d],j∈[Mi]

∑ω∈0,1k

λi,j,ω · Pi,j(x+ ω · y),

where y = (y1, ..., yk) ∈ (Fn)k. Then either we have that PΛ is a constant, i.e. it does not

depend on y, or

bias(PΛ) < ε(dim(F))

Finally, in the present dissertation we fully characterize the joint distribution of the

matrix AP,L without any extra assumptions on the linear forms.

Theorem 2.2.23 (Near Orthogonality over Linear Forms). Let L1, . . . , Lm be linear forms

on ` variables and let B = (P1, . . . , PC) be an ε-uniform polynomial factor for some ε ∈ (0, 1]

defined only by homogeneous polynomials. For every tuple Λ of integers (λi,j)i∈[C],j∈[m],

define PΛ : (Fn)` → T as

PΛ(X) =∑

i∈[C],j∈[m]

λi,jPi(Lj(X)).

Then one of the following two statements holds:

• PΛ ≡ 0.

• PΛ is non-constant and∣∣∣EX∈(Fn)` [e (PΛ)]

∣∣∣ < ε.

Furthermore PΛ ≡ 0 if and only if for every i ∈ [C], we have (λi,j)j∈[m] ∈ Φdi,ki(L)⊥ where

di, ki are the degree and depth of Pi, respectively.

25

Homogeneous non-classical polynomials are introduced in Chapter 3, where we also prove

several useful facts about them. Theorem 2.2.23 is discussed in Chapter 4 and proved in

Section 4.2.

2.2.5 Regularization of Factors

Due to the generic properties of regular factors, it is often useful to refine a given polynomial

factor to a regular one [78]. We will first formally define what we mean by refining a

polynomial factor.

Definition 2.2.24 (Refinement). A factor B′ is called a refinement of B, and denoted B′

B, if the induced partition by B′ is a combinatorial refinement of the partition induced by B.

In other words, if for every x, y ∈ Fn, B′(x) = B′(y) implies B(x) = B(y).

One needs to be careful about distinguishing between two types of refinements.

Definition 2.2.25 (Semantic and syntactic refinements). B′ is called a syntactic refinement

of B, and denoted B′ syn B, if the sequence of polynomials defining B′ extends that of

B. It is called a semantic refinement, and denoted B′ sem B if the induced partition is a

combinatorial refinement of the partition induced by B. In other words, if for every x, y ∈ Fn,

B′(x) = B′(y) implies B(x) = B(y).

Remark 2.2.26. Clearly, being a syntactic refinement is stronger than being a semantic

refinement. But observe that if B′ is a semantic refinement of B, then there exists a syntactic

refinement B′′ of B that induces the same partition of Fn, and for which |B′′| 6 |B′| + |B|,

because we can define B′′ by just adding the defining polynomials of B to those of B′.

The following, by now well-known, lemma states that given a classical polynomial factor

one can refine it to a regular one.

26

Lemma 2.2.27 (Regularity Lemma for Polynomials). Let d > 1, F : N → N be a growth

function, and let F be a classical polynomial factor of degree d. Then there exists a semantic

F -regular refinement F ′ sem F of classical polynomials of same degree d, satisfying

dim(F ′) = OF,d,dim(F)(1).

Proof. We shall induct on the dimension vectors (M1, ...,Md). Notice that the dimension

vectors of a factor of degree d takes values in Nd, and we will use the reverse lexicographical

ordering on this space for our induction. The proof for d = 1 is obvious, because non-

zero linear functions have infinite rank0, unless there is a linear dependency between the

polynomials in F , which we can simply discard by removing the polynomials that can be

written as a linear combination of the others.

If F is already F -regular, then we are done. Otherwise, there is a set of coefficients

ci,j16i6d,16j6Mi, not all zero, such that

rankk−1

d∑i=1

Mi∑j=1

ci,jPi,j

< F (dim(F)).

Without loss of generality assume that ck,Mk6= 0. The above inequality means that Pk,Mk

can be written as a function of the rest of the polynomials in the factor and a set of at

most F (dim(F)) polynomials of degree at most k−1. We will replace Pk,Mkwith these new

polynomials. The new factor will have the following dimension vector

(M1, . . . ,Mk−1 + F (dim(F)),Mk − 1,Mk+1, . . . ,Md).

Now by the induction hypothesis the above can be regularized.

The same proof idea can be used to extend this lemma to non-classical polynomial factors.

27

Lemma 2.2.28 (Lemma 9.6 of [78]). Let r : Z>0 → Z>0 be a non-decreasing function and

d > 0 be an integer. Then, there are functions C(r,d) : Z>0 → Z>0 and I(d) : Z>0 →

Z>0 such that the following is true. Suppose B is an extended polynomial factor defined by

polynomials P1, . . . , PC : Fn → T of degree 6 d. Then, there is a subspace V 6 Fn and

an r-regular extended factor B consisting of polynomials Q1, . . . , QC : V → T such that

2 6 deg(Qi) 6 d for each i, B semantically refines the factor defined by P1|V , . . . , PC |V ,

C 6 C(r,d)(C), and dim(V ) > n− I(d)(C).

The nature of the proof in regularity lemmas of the above type results in Ackermann-like

functions even for “reasonable” growth functions.

The following lemma is the workhorse that allows us to construct regular refinements.

Lemma 2.2.29 (Polynomial Regularity Lemma). Let r : Z>0 → Z>0 be a non-decreasing

function and d > 0 be an integer. Then, there is a function C(r,d)2.2.29 : Z>0 → Z>0 such that

the following is true. Suppose B is a factor defined by polynomials P1, . . . , PC : Fn → T of

degree at most d. Then, there is an r-regular factor B′ consisting of polynomials Q1, . . . , QC ′ :

Fn → T of degree 6 d such that B′ sem B and C ′ 6 C(r,d)2.2.29(C).

Moreover, if B is itself a refinement of some B that has rank > (r(C ′) +C ′) and consists

of polynomials, then additionally B′ will be a syntactic refinement of B.

Proof. We can prove this lemma starting from Lemma 9.6 of [78]. To explain, let us define

the notion of an extended factor. We say a polynomial factor B is extended if for any

polynomial Q ∈ B that is not classical, pQ ∈ B also. Note that an extended factor defined

by polynomials P1, . . . , PC is of high rank if for all tuples (λ1, . . . , λC) ∈ [0, p− 1]C , unless

all the λi’s are zero,∑i λiPi is of high (maxi deg(λiPi))-rank.

Let B1 be the extended factor defined bypkPi | 0 6 k 6 depth(Pi), i ∈ [C]

. Apply

Lemma 2.2.28 to B1 in order to obtain a bounded index subspace V1 and an extended

R1-regular factor B1 defined by polynomials Q1, . . . , QC : V1 → T, where R1 is a growth

function (growing even faster than r) we specify later on in the proof and C 6 C(R1,d)(|B1|).28

For a ∈ Fn/V1 and P ∈ B1, define P a : V1 → Fn to be P a(x) = DaP (x) = P (a+ x)−P (x).

Each P a is of degree 6 d−1. Also, since V1 is the intersection of I 6 I(d)(C) hyperplanes, we

can decide which coset in Fn/V1 an element x ∈ Fn belongs to as a function of I 6 I(d)(C)

(classical) linear functions π1, . . . , πI . Let B2 be the extended factor obtained by adding to

B1 all the polynomials P a | P ∈ B1, a ∈ Fn/V1 and π1, . . . , πI . Consider x ∈ Fn and let

x = a + y where y ∈ V1, and a ∈ Fn/V1. Since P (x) = P (y)− P−a(x), each polynomial in

B1 is a function of the polynomials in B2 over all of Fn, and so B2 is a semantic refinement

of B1 (and a syntactic refinement of B1). Note that |B2| 6 C + dCI(d)(C) + I(d)(C) <

C + 2dCI(d)(C).

Now, suppose we repeat the steps in the previous paragraph with B2 taking the place

of B1 and a different function R2 taking the place of R1. We specify R2 later, but we will

choose it so that it grows faster than r. The new application of Lemma 2.2.28 to B2 produces

an extended factor B2 that is R2-regular and a bounded index subspace V2 such that the

polynomials in B2 restricted to V2 are measurable with respect to B2. We argue that B2

differs from B2 only by polynomials of degree 6 d−1. Suppose B2 is not R2-regular to start

off with. The function R1 is chosen so that B1’s rank, R1(|B1|) > R2(|B2|) + |B2|. This

means that if a linear combination of polynomials in B2,∑S∈B2

λSS, has rank 6 R2(|B2|)

and d′ = maxS:λS 6=0 deg(S), then there must be an S 6∈ B1 with λS 6= 0 and degree d′,

since otherwise the rank condition of B1 would be violated. Since all the polynomials in B2

which are not in B1 have degree 6 d− 1, we conclude that d′ 6 d− 1. Inspecting the proof

of Lemma 2.2.28 in [78] shows that this means B2 consists of the polynomials of B1 along

with other polynomials of degree 6 d − 1. In the same way as in the previous paragraph,

we obtain an extended factor B3 syn B2, so that B3 is a semantic refinement of B2 over

all of Fn. Note that since all the polynomials of B1 are already in B2, we only need to addP a | P ∈ B2 \ B1, a ∈ Fn/V2

, together with some linear functions. All these polynomials

have degree at most 6 d− 2.

29

We keep repeating this process to obtain a sequence of extended factors B1,B2,B3, . . .

and B1, B2, B3, . . . . Each Bi+1 semantically refines Bi and syntactically refines Bi. The

process stops at step i if Bi becomes Ri-regular, where the sequence of growth functions

Ri satisfies Ri(m) > Ri+1(m + 2dmI(d)(m)) + m + 2dmI(d)(m) and Rd(m) = r(m). The

functions Ri are chosen so that Ri(|Bi|) > Ri+1(|Bi+1|)+ |Bi+1|, and therefore, by the above

argument, Bi+1 differs from Bi by polynomials of degree 6 d − i. So, we must stop after

obtaining Bd in the sequence. Also, since each Ri grows faster than r, note that Ri-regularity

for any i ∈ [d] implies r-regularity. So, it must be that some Bi for i 6 d already becomes

r-regular.

Given an extended factor B′′ of rank > r, we can get a (standard) factor B′ of rank > r

by letting B′ be defined by the smallest subset of polynomials S such thatpiP | P ∈ S, i ∈ Z>0

⊇ B′′. The last statement of the lemma follows from the same

considerations as used above to argue that Bi syntactically refines Bi.

2.3 Decomposition Theorems

“Decomposition theorems” [36, 75, 43] are important applications of inverse theorems. These

theorems allow one to express a given function f with certain properties as a sum f1 +∑ki=1 gi, where f1 is “structured” and each gi has certain desired quasirandomness proper-

ties. The rough idea is that the structure of f1 is strong enough for us to be able to analyze

it reasonably explicitly, while the quasirandomness of gis is strong enough for many proper-

ties of f1 to be unaffected if we “perturb” it to f = f1 +∑ki=1 gi. We refer the interested

reader to [36] and [40] for a detailed discussion. The following decomposition theorem is a

consequence of an inverse theorem for Gowers norms ([78, Theorem 1.11]).

Theorem 2.3.1 (Strong Decomposition Theorem for Multiple Functions). Let m, d > 1 be

integers, δ > 0 a parameter, and let r : N → N be an arbitrary growth function. Given any

30

functions f1, . . . , fm : Fn → D, there exists a decomposition

fi = gi + hi,

such that for every 1 6 i 6 m,

1. gi = E[fi|B], where B is an r-regular polynomial factor of degree at most d and com-

plexity C 6 Cmax(p,m, d, δ, r(·)),

2. ‖hi‖Ud+1 6 δ.

Remark 2.3.2. In the case of high-characteristic, i.e. d < |F|, B will consist only of

classical polynomials, and it is easily seen that we can further assume that the polynomials

in factor B are homogeneous, which means that all their multinomials are of same degree.

However, the case of non-classical polynomials is tricky and unclear, as for example it is not

a priori clear what a good candidate definition for homogeneous non-classical polynomial is

and whether such homogeneous non-classical polynomials span all polynomials. Chapter 3

addresses this issue, and proves that homogeneous non-classical polynomials span the space

of all non-classical polynomials.

2.3.1 Strong Decomposition Theorems

Often, in order to obtain stronger statements about the structure and the qasirandomness in

a decomposition theorem, one allows also a small L2-error: that is, one writes f as f1+f2+f3

with f1 structured, f2 quasirandom, and f3 small in L2.

The strong decomposition theorem below shows that any Boolean function can be de-

composed into the sum of a conditional expectation over a high rank factor, a function with

small Gowers norm, and a function with small L2-norm.

Theorem 2.3.3 (Strong Decomposition Theorem; Theorem 4.4 of [16]). Suppose δ > 0

and d > 1 are integers. Let η : N → R+ be an arbitrary non-increasing function and

31

r : N → N be an arbitrary non-decreasing function. Then there exist N = N2.3.3(δ, η, r, d)

and C = C2.3.3(δ, η, r, d) such that the following holds.

Given f : Fn → 0, 1 where n > N , there exist three functions f1, f2, f3 : Fn → R and

a polynomial factor B of degree at most d and complexity at most C such that the following

conditions hold:

(i) f = f1 + f2 + f3.

(ii) f1 = E[f |B].

(iii) ‖f2‖Ud+1 6 1/η(|B|).

(iv) ‖f3‖2 6 δ.

(v) f1 and f1 + f3 have range [0, 1]; f2 and f3 have range [−1, 1].

(vi) B is r-regular.

It turns out though that this strong decomposition theorem is not quite sufficient for

some applications. For example, in the study of algebraic property testing we require an

even stronger decomposition theorem. The issue is that the bound on f3 above is a constant

δ. Ideally, we would want δ to decrease as a function of the complexity of the polynomial

factor, but such a decomposition theorem is simply not possible. However, analogous to what

it is shown in [2] in the context graphs, here one can find two polynomial factors B′ syn B

such that, the structured part f1 equals to E[f |B′], but now the L2-norm of f3 can be made

arbitrarily small in terms of the complexity of the coarser factor B. Furthermore for most

atoms c of B, the function f : Fn → 0, 1 have roughly the same density on c and most of

its subatoms in B′. To make this precise, we make the following definition.

Definition 2.3.4 (Polynomial factor represents another factor). Given a function f : Fn →

0, 1, a polynomial factor B′ that refines another factor B and a real ζ ∈ (0, 1), we say

32

B′ ζ-represents B with respect to f if for at most ζ fraction of atoms c of B, more than ζ

fraction of the atoms c′ lying inside c satisfy |E[f |c]− E[f |c′]| > ζ.

We can now state the following “Super Decomposition Theorem” from [16].

Theorem 2.3.5 (Super Decomposition Theorem; Theorem 4.9 of [16]). Suppose ζ > 0 is

a real and d, C0 > 1 are integers. Let η : N → R+ and δ : N → R+ be arbitrary non-

increasing functions, and r : N → N be an arbitrary non-decreasing function. Then there

exist N = N2.3.5(δ, η, r, d, ζ) and C = C2.3.5(δ, η, r, d, ζ) such that the following holds.

Given f : Fn → 0, 1 where n > N , there exist functions f1, f2, f3 : Fn → R, and

polynomial factors B′ syn B of degree at most d and of complexity at most C, such that the

following conditions hold:

(i) f = f1 + f2 + f3.

(ii) f1 = E[f |B′].

(iii) ‖f2‖Ud+1 6 η(|B′|).

(iv) ‖f3‖2 6 δ(|B|).

(v) f1 and f1 + f3 have range [0, 1]; f2 and f3 have range [−1, 1].

(vi) B and B′ are both r-regular.

(vii) B′ ζ-represents B with respect to f .

Although the above Super Decomposition Theorem may be useful by itself for other

applications, we will need a particular variant. The factor B′ is a syntactic refinement of B,

and thus is defined by adding new polynomials Q1, . . . , Q|B′|−|B| to the polynomials defining

B. Then for each atom c in the coarser factor B we will select one atom c′ of B′ such that

the following hold:

33

• There is a fixed s ∈ T|B′|−|B| such that for every atom c in B its corresponding atom

c′ is obtained by requiring (Q1, . . . , Q|B′|−|B|) to be equal to s.

• The L2-norm of f3 conditioned inside every such atom (i.e., Ex∈c′ [|f3(x)|2]) is small.

• Most subatoms c′ will “well-represent” (in the sense of Definition 2.3.4) their corre-

sponding atoms c from B.

Before stating this formally, let us also take this opportunity to remark that it is possible

to adapt the proofs of the above decomposition theorems to decompose several functions

f (1), . . . , f (R) : Fn → 0, 1 simultaneously. Alternatively, this could be thought of as

decomposing a single vector-valued function f : Fn → 0, 1R. Now we finally state the

decomposition theorem that we will use in the proof of our main result.

Theorem 2.3.6 (Subatom Selection; Theorem 4.12 of [16]). Suppose ζ > 0 is a real and

d,R > 1 are integers. Let η, δ : N → R+ be arbitrary non-increasing functions, and let

r : N→ N be an arbitrary non-decreasing function. Then, there exist C = C2.3.6(δ, η, r, ζ, R)

such that the following holds.

Given f (1), . . . , f (R) : Fn → 0, 1, there exist functions f(i)1 , f

(i)2 , f

(i)3 : Fn → R for all

i ∈ [R], a polynomial factor B of degree d with atoms denoted by elements of T|B|, a syntactic

refinement B′ syn B of degree d with complexity at most C and atoms denoted by elements

of T|B| × T|B′|−|B|, and an element s ∈ T|B′|−|B| such that the following is true:

(i) f (i) = f(i)1 + f

(i)2 + f

(i)3 for every i ∈ [R].

(ii) f(i)1 = E[f (i)|B′] for every i ∈ [R].

(iii) ‖f (i)2 ‖Ud+1 < η(|B′|) for every i ∈ [R].

(iv) For every i ∈ [R], f(i)1 and f

(i)1 + f

(i)3 have range [0, 1], and f

(i)2 and f

(i)3 have range

[−1, 1].

34

(v) B and B′ are both r-regular.

(vi) For every atom c ∈ T|B| of B, the subatom c′ = (c, s) ∈ T|B′| satisfy

E[|f (i)

3 |2∣∣∣ (c, s)

]< δ(|B|)2

for every i ∈ [R].

(vii) If c is an atom of B chosen uniformly at random, then

Prc

[maxi∈[R]

(∣∣∣E [f (i)∣∣∣ c]− E

[f (i)∣∣∣ (c, s)]∣∣∣) > ζ

]< ζ.

2.3.2 Higher-order Fourier Expansion

In the previous sections we saw that decomposition theorems can be used to express a

given function f that satisfies certain reasonable properties as f = f1 +∑ki=1 gi, where

f1 = Γ(P1, ..., PC) for a set of (non-classical) polynomials, for some function Γ : TC → D,

and gi’s are small in norms that allow us to regard them as negligible error terms. Let

ki denote the depth of the polynomial Pi so that by Lemma 2.1.3, each Pi takes values in

Uki+1 = 1pki+1Z/Z. Moreover, let Σ := Zpk1+1 × · · · × Z

pkC+1 . Using the Fourier expansion

of Γi we have

f1(x) =∑

Λ=(λ1,...,λC)∈Σ

Γi(Λ) · e

C∑j=1

λjPj(x)

, (2.3)

where Γi(Λ) is the Fourier coefficient of Γi corresponding to Λ. Equation (2.3) implies an

approximate higher-order Fourier expansion of f , a generalization of the classical expansion.

f(x) '∑

Λ=(λ1,...,λC)∈Σ

Γi(Λ) · e

C∑j=1

λjPj(x)

. (2.4)

35

2.4 Systems of Linear forms

Definition 2.4.1. A linear form on k variables is a vector L = (`1, . . . , `k) ∈ Fk and it

maps X = (x1, . . . , xk) ∈ (Fn)k to L(X) =∑ki=1 `ixi ∈ Fn.

For a linear form L = (`1, . . . , `k) ∈ Fk we define |L| def=∑ki=1 |`i|.

Gowers norms are known to control the density of linear patterns in subsets of an Abelian

group. Next, we present a notion of complexity associated with a collection of linear forms

that determines an upper-bound on the order of the required Gowers norm to do so.

2.4.1 Cauchy-Schwarz Complexity

Let A be a subset of Fn with the indicator function 1A : Fn → 0, 1. Then the analytical

average

EX∈(Fn)k

[1A(L1(X)) · · ·1A(Lm(X))] , (2.5)

represents the probability that L1(X), . . . , Lm(X) all fall in A, where X ∈ (Fn)k is chosen

uniformly at random. Then roughly speaking, we will say that A ⊆ Fn is pseudorandom

with regards to L if

EX∈(Fn)k

[m∏i=1

1A(Li(X))

]≈(|A|pn

)m;

That is, the probability that all L1(X), . . . , Lm(X) fall in A is close to what we would expect

if A was a random subset of Fn of cardinality |A|. Let α := |A|/pn be the density of A, and

define f := 1A − α. We have

EX

[m∏i=1

1A(Li(X))

]

= EX

[m∏i=1

(α + f(Li(X)))

]= αm +

∑S⊆[m],S 6=∅

αm−|S|EX

∏i∈S

f(Li(X))

.

36

Therefore, a sufficient condition for A to be pseudorandom with regards to L is that

EX[∏

i∈S f(Li(X))]

is negligible for all nonempty subsets S ⊆ [m]. Green and Tao [46]

showed that a sufficient condition for this to occur is that ‖f‖Us+1 is small enough, where s

is the Cauchy-Schwarz complexity of the system of linear forms.

Definition 2.4.2 (Cauchy-Schwarz complexity [46]). Let L = L1, . . . , Lm be a system of

linear forms. The Cauchy-Schwarz complexity of L is the minimal s such that the following

holds. For every 1 6 i 6 m, we can partition Ljj∈[m]\i into s+ 1 subsets, such that Li

does not belong to the linear span of any of the subsets.

Note that the Cauchy-Schwarz complexity of any system of m linear forms in which any

two linear forms are linearly independent (i.e. one is not a multiple of the other) is at most

m− 2, since we can always partition Ljj∈[m]\i into the m− 1 singleton subsets.

The reason for the term Cauchy-Schwarz complexity is the following lemma due to Green

and Tao [46] the proof of which is based on a clever iterative application of the Cauchy-

Schwarz inequality.

Lemma 2.4.3 ([46], See also [37, Theorem 2.3]). Let f1, . . . , fm : F → D. Let L =

L1, . . . , Lm be a system of m linear forms in ` variables of Cauchy-Schwarz complexity s.

Then ∣∣∣∣∣ EX∈(Fn)`

[m∏i=1

fi(Li(X))

]∣∣∣∣∣ 6 min16i6m

‖fi‖Us+1 .

2.4.2 The True Complexity

The Cauchy-Schwarz complexity of L gives an upper bound on s, such that if ‖f‖Us+1 is

small enough for some function f : Fn → D, then f is pseudorandom with regards to L.

Gowers and Wolf [37] defined the true complexity of a system of linear forms as the minimal

s for which this condition holds for all f : Fn → D.

37

Definition 2.4.4 (True complexity [37]). Let L = L1, . . . , Lm be a system of linear forms

over F. The true complexity of L is the smallest d ∈ N with the following property. For

every ε > 0, there exists δ > 0 such that if f : Fn → D is any function with ‖f‖Ud+1 6 δ,

then ∣∣∣∣∣ EX∈(Fn)k

[m∏i=1

f(Li(X))

]∣∣∣∣∣ 6 ε.

An obvious bound on the true complexity is the Cauchy-Schwarz complexity of the sys-

tem.

38

CHAPTER 3

HOMOGENEOUS NON-CLASSICAL POLYNOMIALS

Recall that a classical polynomial is called homogeneous if all of its monomials are of the

same degree. Trivially a homogeneous classical polynomial P (x) satisfies P (cx) = |c|dP (x)

for every c ∈ F. We will use this property to define the class of nonclassical homogeneous

polynomials.

Definition 3.0.5 (Homogeneity). A (nonclassical) polynomial P : Fn → T is called homo-

geneous if for every c ∈ F there exists a σc ∈ Z such that P (cx) = σcP (x) mod 1 for all

x ∈ Fn.

Remark 3.0.6. It is not difficult to see that P (cx) = σcP (x) mod 1 implies that σc =

|c|deg(P ) mod p, a property that we will use later. Indeed for d = deg(P ), we have 0 =

∂d(P (cx) − σcP (x)) = (|c|d − σc)∂dP (x) mod 1. This, since ∂dP (x) is a nonzero degree-d

classical polynomial, implies σc = |c|d mod p.

Notice that for a polynomial P to be homogeneous it suffices that there exists σ ∈ Z

for which P (ζx) = σP (x) mod 1, where ζ is a generator of F∗. Throughout this chapter,

we fix ζ ∈ F∗ a generator of F∗. If P has depth k, then we can assume that σ ∈ Zpk+1 , as

pk+1P ≡ 0. The following lemma shows that σ is uniquely determined for all homogeneous

polynomials of degree d and depth k. Henceforth, we will denote this unique value by σ(d, k).

Lemma 3.0.7. For every d and k, there is a unique σ = σ(d, k) ∈ Zpk+1, such that for

every homogeneous polynomial P of degree d and depth k, P (ζx) = |σ|P (x) mod 1, where

| · | is the natural map from Zpk+1 to 0, 1, . . . , pk+1 − 1 ⊂ Z.

Proof. Let P be a homogeneous polynomial of degree d and depth k, and let σ ∈ Zpk+1

be such that P (ζx) = |σ|P (x) mod 1. By Remark 3.0.6 we know that |σ| = |ζ|d mod p.

We also observe that P (x) = P (ζp−1x) = |σ|p−1P (x) mod 1 from which it follows that

σp−1 = 1. We claim that σ ∈ Zpk+1 is uniquely determined by the two properties

39

i. |σ| = |c|d mod p, and

ii. σp−1 = 1.

Suppose to the contrary that there are two nonzero values σ1, σ2 ∈ Zpk+1 that satisfy the

above two properties, and choose t ∈ Zpk+1 such that σ1 = tσ2. It follows from (i) that t = 1

mod p and from (ii) that tp−1 = 1. We will show that t = 1 is the only possible such value

in Zpk+1 .

Let a1, . . . , apk ∈ Zpk+1 be all the possible solutions to x = 1 mod p in Zpk+1 . Note that

ta1, . . . , tapk is just a permutation of the first sequence and thus

tpk∏

ai =∏

ai.

Consequently tpk

= 1, which combined with tp = t implies t = 1.

Lemma 2.1.3 allows us to express every (nonclassical) polynomial as a linear span of

monomials of the form|x1|d1 ···|xn|dn

pk+1 . Unfortunately, unlike in the classical case, these mono-

mials are not necessarily homogeneous, and for some applications it is important to express

a polynomial as a linear span of homogeneous polynomials. We show that this is possible as

homogeneous multivariate (nonclassical) polynomials linearly span the space of multivariate

(nonclassical) polynomials. We will present the proof of this theorem in Section 3.3.

Theorem 3.0.8. There is a basis for Poly(Fn → T) consisting only of homogeneous multi-

variate polynomials.

Theorem 3.0.8 allows us to make the extra assumption in the strong decomposition

theorem (Theorem 2.3.1) that the resulting polynomial factor B consists only of homogeneous

polynomials.

Corollary 3.0.9 (Homogeneous Strong Decomposition). Let d > 1 be an integer, δ > 0

a parameter, and let r : N → N be an arbitrary growth function. Given any functions

40

f1, . . . , fm : Fn → D, there exists a decomposition

fi = gi + hi,

such that or every 1 6 i 6 m,

1. gi = E[fi|B], where B is an r-regular polynomial factor of degree at most d and com-

plexity C 6 Cmax(p, d, δ, r(·)), moreover B only consists of homogeneous polynomials.

2. ‖hi‖Ud+1 6 δ.

To simplify the notation, through the rest of this section we will omit writing “mod 1”

in the description of the defined nonclassical polynomials.

3.1 Properties of the Derivative Polynomial

We start by proving the following simple observation.

Claim 3.1.1. Let P : F → T be a univariate polynomial of degree d. Then for every

c ∈ F\0,

deg(P (cx)− |c|dP (x)

)< d.

Proof. By Lemma 2.1.3 it suffices to prove the claim for a monomial q(x) :=|x|spk+1 with

k(p− 1) + s = d. Note that q(cx)− |c|dq(x) takes values in 1pkZ/Z as |c|s − |c|d is divisible

by p. Hence

deg(q(cx)− |c|dq(x)

)6 (p− 1)(k − 1) < d. (3.1)

It is not difficult to show that the above claim holds for any multivariate polynomial

P : Fn → T. We provide a proof for the multivariate case since it is a useful observation,

although the univariate case suffices for our purposes in this section.

41

Claim 3.1.2. Let P : Fn → T be a univariate polynomial of degree d. Then for every

c ∈ F\0,

deg(P (cx)− |c|dP (x)

)< d.

Proof. Notice that the claim is trivial for classical polynomials, since in this case, if R denotes

the homogeneous degree-d part of P , then R(cx) − |c|dR(x) = 0. We prove the statement

for nonclassical polynomials. Let Q(x) := P (cx), and note deg(Q) = d. We will inspect

the derivative polynomial of Q. Recall from Definition 2.1.5 and (Lemma 2.1.6 that the

derivative polynomial of Q,

∂Q(y1, . . . , yd) = Dy1 · · ·DydQ(0),

is a degree-d classical homogeneous multi-linear polynomial which is invariant under permu-

tations of (y1, . . . , yd). In particular

|c|−d∂Q(y1, . . . , yd) = ∂Q(c−1y1, . . . , c−1yd) = Dc−1y1

Dc−1y2· · ·Dc−1yd

Q(x)

=∑S⊆[d]

(−1)|S|Q

c−1∑i∈S

yi

=∑S⊆[d]

(−1)|S|P

∑i∈S

yi

= ∂P (y1, . . . , yd),

This implies that ∂d(Q− |c|dP ) ≡ 0 and thus deg(Q− |c|dP ) < d.

3.2 A Homogeneous Basis, the Univariate Case

First we prove Theorem 3.0.8 for univariate polynomials.

Lemma 3.2.1. There is a basis of homogeneous univariate polynomials for Poly(F→ T).

Proof. We will prove by induction on d that there is a basis h1, . . . , hd of homogeneous

univariate polynomials for Poly6d(F → T) for every d. Let ζ be a fixed generator of F∗.42

For any degree d > 0, we will build a degree-d homogeneous polynomial hd(x) such that

hd(ζx) = σdhd(x) for some integer σd. The base case of d 6 p−1 is trivial as Poly6p−1(F→

T) consists of only classical polynomials, and those are spanned by h0(x) := 1p , h1(x) :=

|x|p , . . . , hp−1(x) :=

|x|p−1

p . Now suppose that d = s + (p − 1)(k − 1) with 0 < s 6 p − 1,

and k > 1. It suffices to show that the degree-d monomial|x|spk

can be expressed as a linear

combination of homogeneous polynomials. Consider the function

f(x) :=|ζx|s

pk− |ζ|

s|x|s

pk.

Claim 3.1.1 implies that deg(f) < d. Using the induction hypothesis, we can express f(x)

as a linear combination of|x|sp`

for ` = 0, . . . , k−1, and he for e < d with e 6= s mod (p−1):

f(x) =k−1∑`=1

a`|x|s

p`+

∑e<d,

e 6=d mod (p−1)

behe(x).

Set A := |ζ|s +∑k−1`=1 a`p

k−`, so that

|ζx|s

pk− A |x|

s

pk=

∑e<d,

e 6=d mod (p−1)

behe(x). (3.2)

By the induction hypothesis, for e < d, he(ζx) = σeh(x) where σe = |ζ|e mod p, and thus

as A = |ζ|s mod p, we have σe 6= A mod p when e 6= s mod (p− 1). Consequently,

∑e<d,

e 6=d mod (p−1)

behe(x) =∑e<d,

e 6=d mod (p−1)

beσe − A

(σe − A)he(x)

=∑e<d,

e 6=d mod (p−1)

beσe − A

(he(ζx)− Ahe(x)).

43

Combing this with (3.2) we conclude that

hd(x) :=|x|s

pk−

∑e<d,

e 6=d mod (p−1)

beσe − A

he(x),

satisfies

hd(ζx) = Ahd(x).

3.3 A Homogeneous Basis, the Multivariate Case

We are now ready to prove the main result of this section.

Theorem 3.0.8 (restated). There is a basis for Poly(Fn → T) consisting only of homoge-

neous multivariate polynomials.

Proof. We will show by induction on the degree d, that every degree d monomial can be

written as a linear combination of homogeneous polynomials. The base case of d < p

is trivial as such monomials are classical and thus homogeneous themselves. Consider a

(nonclassical) monomial M(x1, . . . , xn) =|x1|s1 ···|xn|sn

pkof degree d = s1 + · · · + sn + (p −

1)(k − 1). For every i ∈ [n] let gi(xi) := hsi+(p−1)(k−1)(xi) where hsi+(p−1)(k−1)(·) is the

homogeneous univariate polynomial from Lemma 3.2.1. Every gi takes values in 1pkZ/Z, and

thus corresponds to a polynomial Gi : F→ Zpk . Define F : Fn → Zpk as

F (x1, . . . , xn) := G1(x1) · · ·Gn(xn),

and f : Fn → T as

f(x1, . . . , xn) :=F (x1, . . . , xn)

pk.

It is simple to verify that deg(f) = s1 + . . . + sn + (p − 1)(k − 1) = d, it has only one

44

monomial of deg(f), which is|x1|s1 ···|xn|sn

pk= M(x1, . . . , xn), and it is homogeneous. Thus

M(x1, . . . , xn)− f(x1, . . . , xn) is of degree less than d and by the induction hypothesis can

be written as a linear combination of homogeneous polynomials.

45

CHAPTER 4

DISTRIBUTION OF POLYNOMIAL FACTORS OVER LINEAR

FORMS

A main contribution of this dissertation is a near-orthogonality result for (non-classical)

polynomial factors of high rank 1. Such a statement was proved in [43, 78] for systems of

linear forms corresponding to repeated derivatives or equivalently Gowers norms, and in [51]

for the case when the field is of high characteristic but with arbitrary system of linear forms

and in [15] for (non-classical) polynomials but only in the case of systems of affine linear

forms. In Theorem 2.2.23 we establish the near-orthogonality over any arbitrary system of

linear forms. Before stating this theorem we need to introduce the notion of consistency.

Definition 4.0.1 (Consistency). Let L = L1, . . . , Lm be a system of linear forms. A

vector (β1, . . . , βm) ∈ Tm is said to be (d, k)-consistent with L if there exists a homogeneous

polynomial P of degree d and depth k and a point X such that P (Li(X)) = βi for every

i ∈ [m]. Let Φd,k(L) denote the set of all such vectors.

It is immediate from the definition that Φd,k(L) ⊆ Umk+1 is a subgroup of Tm, or more

specifically, a subgroup of Umk+1. Let

Φd,k(L)⊥ :=

(λ1, . . . , λm) ∈ Zm : ∀(β1, . . . , βm) ∈ Φd,k(L),∑

λiβi = 0.

Equivalently Φd,k(L)⊥ is the set of all (λ1, . . . , λm) ∈ Zm such that∑mi=1 λiP (Li(X)) ≡ 0

for every homogeneous polynomial P of degree d and depth k.

Theorem 2.2.23 (restated). Let L1, . . . , Lm be linear forms on ` variables and let

B = (P1, . . . , PC) be an ε-uniform polynomial factor for some ε ∈ (0, 1] defined only by homo-

geneous polynomials. For every tuple Λ of integers (λi,j)i∈[C],j∈[m], define PΛ : (Fn)` → T

1. The work in this chapter is based on the work [50]

46

as

PΛ(X) =∑

i∈[C],j∈[m]

λi,jPi(Lj(X)).

Then one of the following two statements holds:

• PΛ ≡ 0.

• PΛ is non-constant and∣∣∣EX∈(Fn)` [e (PΛ)]

∣∣∣ < ε.

Furthermore PΛ ≡ 0 if and only if for every i ∈ [C], we have (λi,j)j∈[m] ∈ Φdi,ki(L)⊥

where di, ki are the degree and depth of Pi, respectively. We will present the proof of

Theorem 2.2.23 in Section 4.2.

Remark 4.0.2. By Corollary 2.2.14 the assumption of ε-uniformity in Theorem 2.2.23 is

satisfied for every factor B of rank at least r2.2.13(d, ε). However, we would like to point out

that in Theorem 2.2.23 by using the assumption of ε-uniformity instead of the assumption

of high rank, we are able to achieve the quantitative bound of ε on the bias of PΛ.

Remark 4.0.3. In Theorem 2.2.23 in the second case where PΛ is non-constant, it is possible

to deduce a more general statement that ‖e (PΛ)‖2tU t

< ε for every t 6 deg(PΛ). Indeed

assume that PΛ is non-constant, and consider the derivative

DY1. . . DYtPΛ(X) =

∑i∈[C],j∈[m]

λi,j∑S⊆[t]

(−1)|S|Pi(Lj(X +∑r∈S

Yr)), (4.1)

where Yi := (yi,1, . . . , yi,`) ∈ (Fn)`. Notice that for every choice of j ∈ [m] and S ⊆ [t],

Lj(X +∑r∈S Yr) is an application of a linear form on the vector

(x1, . . . , x`, y1,1, . . . , y1,`, . . . , yt,1, . . . , yt,`

)∈ (Fn)(t+1)`,

47

and since by Lemma 2.1.6 the polynomial ∂PΛ is nonzero, Theorem 2.2.23 implies

‖e (PΛ)‖2t

U t = Eh1,...,ht

[e (∂tPΛ(h1, . . . , ht))] 6 ε.

4.1 Near-equidistribution over Linear Forms

It is well-known that statements similar to that of Theorem 2.2.23 imply “near-

equidistributions” of the joint distribution of the polynomials applied to linear forms. Con-

sider a highly uniform polynomial factor of degree d > 0, defined by a tuple of homogeneous

polynomials P1, . . . , PC : Fn → T with respective degrees d1, . . . , dC and depths k1, . . . , kC ,

and let L = (L1, . . . , Lm) be a collection of linear forms on ` variables. As we mentioned

earlier, we are interested in the distribution of the random matrix

P1(L1(X)) P2(L1(X)) . . . PC(L1(X))

P1(L2(X)) P2(L2(X)) . . . PC(L2(X))

......

P1(Lm(X)) P2(Lm(X)) . . . PC(Lm(X))

, (4.2)

where X is the uniform random variable taking values in (Fn)`. Note that by the definition

of consistency, for every 1 6 i 6 C, the i-th column of this matrix must belong to Φdi,ki(L).

Theorem 4.1.1 below says that Equation (4.2) is “almost” uniformly distributed over the set

of all matrices satisfying this condition. The proof of Theorem 4.1.1 is standard and is iden-

tical to the proof of [15, Theorem 3.10] with the only difference that it uses Theorem 2.2.23

instead of the weaker near-orthogonality theorem of [15].

Theorem 4.1.1 (Near-equidistribution over Linear Forms). Given ε > 0, let B be an ε-

uniform polynomial factor of degree d > 0 and complexity C, that is defined by a tuple of

homogeneous polynomials P1, . . . , PC : Fn → T having respective degrees d1, . . . , dC and

48

depths k1, . . . , kC . Let L = (L1, . . . , Lm) be a collection of linear forms on ` variables.

Suppose (βi,j)i∈[C],j∈[m] ∈ TC×m is such that (βi,1, . . . , βi,m) ∈ Φdi,ki(L) for every

i ∈ [C]. Then

PrX∈(Fn)`

[Pi(Lj(X)) = βi,j ∀i ∈ [C], j ∈ [m]

]=

1

K± ε,

where K =∏Ci=1 |Φdi,ki(L)|.

Proof. We have

Pr[Pi(Lj(X)) = βi,j ∀i ∈ [C],∀j ∈ [m]]

= E

∏i,j

1

pki+1

pki+1−1∑λi,j=0

e(λi,j(Pi(Lj(X))− βi,j

))=

∏i∈[C]

p−(ki+1)

m ∑(λi,j)

e

−∑i,j

λi,jβi,j

E

e ∑i∈[C],j∈[m]

λi,jPi(Lj(X))

,where the outer sum is over (λi,j)i∈[C],j∈[m] with λi,j ∈ [0, pki+1−1]. Let Λi = Φdi,ki(L)⊥∩

[0, pk+1 − 1]m, and note that |Λi||Φdi,ki | = pm(ki+1). Since (βi,1, . . . , βi,m) ∈ Φdi,ki(L) for

every i ∈ [C], it follows that∑i,j λi,jβi,j = 0 if (λi,1, . . . , λi,m) ∈ Λi for all i ∈ [C]. If

the latter holds, then the expected value in the above expression is 0, and otherwise by

Theorem 2.2.23, it is bounded by ε. Hence the above expression can be approximated by

p−m∑Ci=1(ki+1) ·

C∏i=1

|Λi| ± εpm∑Ci=1(ki+1)

=1

K± ε.

49

4.2 Strong near-orthogonality: Proof of Theorem 2.2.23

We prove Theorem 2.2.23 in this section. Our proof uses similar derivative techniques as

used in [15], but in order to handle the general setting we will need a few technical claims

which we present first. Recall that |L| =∑`i=1 |λi| for a linear form L = (λ1, . . . , λ`).

Claim 4.2.1. Let d > 0 be an integer, and L = (λ1, . . . , λ`) ∈ F` be a linear form on

` variables. There exists linear forms Li = (λi,1, . . . , λi,`) ∈ F` for i = 1, . . . ,m, and

coefficients a1, . . . , am ∈ Z with m 6 |F|` such that

• P (L(X)) =∑mi=1 aiP (Li(X)) for every degree-d polynomial P : Fn → T;

• |Li| 6 d for every i ∈ [m];

• λi,j 6 λj for every i ∈ [m] and j ∈ [`].

Proof. The proof proceeds by simplifying P (L(X)) using identities that are valid for every

polynomial P : Fn → T of degree d.

In the case |L| 6 d there is nothing to prove. Assume otherwise that |L| > d. We will

use the fact that for every choice of y1, . . . , y|L| ∈ Fn,

∑S⊆[|L|]

(−1)|S|P

∑i∈S

yi

≡ 0. (4.3)

Let X = (x1, . . . , x`) ∈ (Fn)`. Setting |λi| of the vectors y1, . . . , y|L| to xi for every i ∈ [`],

Equation (4.3) implies

P (L(X)) =∑i

αiP (Mi(X)),

where Mi = (τi,1, . . . , τi,`), |Mi| 6 |L| − 1 and for every j ∈ [`], |τi,j | 6 |λj |. Repeatedly

applying the same process to every Mi with |Mi| > d we arrive at the desired expansion.

50

The next claim shows that we can further simplify the expression given in Claim 4.2.1.

Let Ld ⊆ F` denote the set of nonzero linear forms L with |L| 6 d and with the first

(left-most) nonzero coefficient equal to 1, e.g. (0, 1, 0, 2) ∈ L3 but (2, 1, 0, 0) 6∈ L3.

Claim 4.2.2. For any linear form L ∈ F` and integer d > 0, there is a collection of

coefficients aM,c ∈ ZM∈Ld,c∈F∗ such that for every degree-d polynomial P : Fn → T,

P (L(X)) =∑

M∈Ld,c∈F∗aM,cP (cM(X)). (4.4)

Proof. Similar to the proof of Claim 4.2.1 we simplify P (L(X)) using identities that are valid

for every polynomial P : Fn → T of degree d.

We use induction on the number of nonzero entries of L. The case when L has only one

nonzero entry is trivial. For the induction step, choose c ∈ F so that the leading nonzero

coefficient of L′ = c · L is equal to 1. Assume that L′ = (λ1, . . . , λ`). If |L′| 6 d we are

done. Assume otherwise that |L′| > d. Applying Claim 4.2.1 for the degree-d polynomial

R(x) := P (c−1x) and the linear form L′ we can write

P (L(X)) = P (c−1L′(X)) =∑i

βiP (c−1Mi(X)), (4.5)

where for every i, Mi = (λi,1, . . . , λi,`) satisfies |Mi| 6 d, and for every j ∈ [`], λi,j 6 λj .

Let I denote the set of indices i such that the leading nonzero entry of Mi is one. Then

P (L(x)) =∑i∈I

αiP (c−1Mi(X)) +∑j /∈I

αjP (c−1Mj(X)). (4.6)

Notice that since the leading coefficient of L′ is 1, for every j /∈ I, Mj has smaller support

than L and thus applying the induction hypothesis to the linear forms c−1Mj with j /∈ I

concludes the claim.

51

Claim 4.2.2 applies to all polynomials of degree d. If we also specify the depth then we

can obtain a stronger statement.

Claim 4.2.3. For every d, k, every system of linear forms L1, . . . , Lm, and constants

λi ∈ Zi∈[m], there exists aM ∈ ZM∈Ld such that the following is true for every (d, k)-

homogeneous polynomial P : Fn → T:

•∑mi=1 λiP (Li(X)) ≡

∑M∈Ld aMP (M(X));

• For every M with aM 6= 0, we have |M | 6 deg(aMP ).

Proof. The proof is similar to that of Claim 4.2.2, except that now we repeatedly apply

Claim 4.2.2 to every term of the form λP (L(X)) to express it as a linear combination of

P (cM(X)) for M ∈ Ldeg(λP ) and c ∈ F∗. Then we use homogeneity to replace P (cM(X))

with σcP (M(X)), where if c = ζi for the fixed generator ζ ∈ F∗ then σc = σ(d, k)i. By

repeating this procedure we arrive at the desired expansion.

We are now ready for the proof of our main theorem. For a linear form L = (λ1, . . . , λ`),

let lc(L) denote the index of its first nonzero entry, namely lc(L)def= mini:λi 6=0 i.

Proof of Theorem 2.2.23: Let d′ be the degree of the factor. For every i ∈ [C], by

Claim 4.2.3 we have

m∑j=1

λi,jPi(Lj(X)) =∑

M∈Ld′λ′i,MPi(M(X)) (4.7)

for some integers λ′i,M such that |M | 6 deg(λ′i,MPi) if λ′i,M 6= 0. The simplifications of

Claim 4.2.3 depend only on the degrees and depths of the polynomials. Hence if λ′i,M = 0

for all M ∈ Ld′ , then (λi,1, . . . , λi,m) ∈ Φ(di, ki)⊥. So to prove the theorem, it suffices to

show that PΛ has small bias if λ′i,MPi 6≡ 0 for some M ∈ Ld′ . Suppose this is true, and thus

52

there exists a nonempty set M⊆ Ld′ such that

PΛ(X) =∑

i∈[C],M∈Mλ′i,MPi(M(X)),

and for every M ∈ M, there is at least one index i ∈ [C] for which λ′i,MPi 6= 0. Choose

i∗ ∈ [C] and M∗ ∈M in the following manner.

• First, let M∗ ∈ M be such that lc(M∗) = minM∈M lc(M), and among these, |M∗| is

maximal.

• Then, let i∗ ∈ [C] be such that deg(λ′i∗,M∗Pi∗) is maximized.

Without loss of generality assume that i∗ = 1, lc(M∗) = 1, and let d := deg(λ′1,M∗P1).

We claim that if∑j∈[m] λ1,jP1(Lj(X)) is not the zero polynomial, then deg(PΛ) > d, and

moreover PΛ has small bias. We prove this by deriving PΛ in specific directions in a manner

that all the terms but λ′1,M∗P1(M∗(X)) vanish.

Given a vector α ∈ F`, an element y ∈ Fn, and a function P : (Fn)` → T, define the

derivative of P according to the pair (α, y) as

Dα,yP (x1, . . . , x`)def= P (x1 + α1y, · · · , x` + α`y)− P (x1, . . . , x`). (4.8)

Note that for every M ∈M,

Dα,y(Pi M)(x1, · · · , x`) = Pi(M(x1, . . . , x`) +M(α)y)− Pi(M(x1, . . . , x`))

= (D〈M,α〉·yPi)(M(x1, . . . , x`)).

Thus if α is chosen such that 〈M,α〉 = 0 then Dα,y(Pi M) ≡ 0.

Assume that M∗ = (w1, . . . , w`), where w1 = 1. Let t := |M∗|,

α1 := e1 = (1, 0, 0, . . . , 0) ∈ F`, and let α2, . . . , αt be the set of all vectors of the form

53

(−w, 0, . . . , 0, 1, 0, . . . , 0) where 1 is in the i-th coordinate for i ∈ [2, `] and 0 6 w 6 wi − 1.

In addition, pick αt+1 = · · · = αd = e1.

Claim 4.2.4.

Dα1,y1 · · ·Dαd,ydPΛ(X) =

(D〈M∗,α1〉y1

· · ·D〈M∗,αd〉yd∑i∈[C]:

deg(λ′i,M∗Pi)=d

λ′i,1Pi

)(M∗(X)).

(4.9)

Proof. Deriving according to (α1, y1), . . . , (αt, yt) gives

Dα1,y1 · · ·Dαt,ytPΛ(X) =

D〈M∗,α1〉y1· · ·D〈M∗,αt〉yt

C∑i=1

λ′i,M∗Pi

(M∗(X)). (4.10)

This is because for every M = (w′1, . . . , w′`) ∈ M \ M

∗, either w′1 = 0 in which case

〈M,α1〉 = 0, or otherwise w1 = 1 and |M | 6 t, thus there must be an index ξ ∈ [`]

such that w′ξ < wξ; By our choice of α2, . . . , αt there is e, 2 6 e 6 t, such that αe =

(−w′ξ, 0, . . . , 0, 1, 0, . . . , 0) where 1 is in the ξ-th coordinate and thus 〈Mj , αe〉 = 0. Now, the

claim follows after additionally deriving according to (αt+1, yt+1), . . . , (αd, yd).

Claim 4.2.4 implies that

Ey1,...,yd,x1,...,x`

[e(Dα1,y1 · · ·Dαd,ydPΛ)(x1, . . . , x`)

)]=

∥∥∥∥∥∥∥∥∥∥e

∑i∈[C]:

deg(λ′i,1Pi)=d

λ′i,M∗Pi

∥∥∥∥∥∥∥∥∥∥

2d

Ud

6 ε2d ,

where the last inequality holds by the ε-uniformity of the polynomial factor. Now the theorem

follows from the next claim from [15] which is a repeated application of the Cauchy-Schwarz

inequality. We include a proof for self-containment.

54

Claim 4.2.5 ([15, Claim 3.4]). For any α1, . . . , αd ∈ F`\0,

Ey1,...,yd,x1,...,x`

[e((Dα1,y1 · · ·Dαd,ydPΛ)(x1, . . . , x`)

)]>

(∣∣∣∣ Ex1,...,x`

e (PΛ(x1, . . . , x`))

∣∣∣∣)2d

.

Proof. It suffices to show that for any function P (x1, . . . , x`) and nonzero α ∈ F`,

∣∣∣∣ Ey,x1,...,x`

[e((Dα,yP )(x1, . . . , x`)

)]

∣∣∣∣ > ∣∣∣∣ Ex1,...,x`

[e (P (x1, . . . , x`))]

∣∣∣∣2 .Recall that (Dα,yP )(x1, . . . , x`) = P (x1 + α1y, . . . , x` + α`y)− P (x1, . . . , x`). Without loss

of generality, suppose α1 6= 0. We make a change of coordinates so that α can be assumed

to be (1, 0, . . . , 0). More precisely, define P ′ : (Fn)` → T as

P ′(x1, . . . , x`) = P

(x1,

x2 + α2x1

α1,x3 + α3x1

α1, . . . ,

x` + α`x1

α1

),

so that P (x1, . . . , x`) = P ′(x1, α1x2 − α2x1, α1x3 − α3x1, . . . , α1x` − α`x1), and thus

(Dα,yP )(x1, . . . , x`)

= P ′(x1 + α1y, α1x2 − α2x1, . . . , α1x` − α`x1)− P ′(x1, α1x2 − α2x1, . . . , α1x` − α`x1).

Therefore

∣∣∣∣ Ey,x1,...,x`

[e((Dα,yP )(x1, . . . , x`)

)]

∣∣∣∣=

∣∣∣∣ Ey,x1,...,x`

[e(P ′(x1 + α1y, x2, . . . , x`)− P ′(x1, x2, . . . , x`)

)]

∣∣∣∣= Ex2,...,x`

∣∣∣∣Ex1[e(P ′(x1, x2, . . . , x`)

)]

∣∣∣∣2 > ∣∣∣∣ Ex1,x2,...,x`

[e(P ′(x1, x2, . . . , x`)

)]

∣∣∣∣2=

∣∣∣∣ Ex1,x2,...,x`

[e (P (x1, x2, . . . , x`))]

∣∣∣∣2 .55

The above proof also implies the following proposition just by omitting the application

of Claim 4.2.3.

Proposition 4.2.6. Let L1, . . . , Lm be linear forms on ` variables and let B = (P1, . . . , PC)

be an ε-uniform polynomial factor of degree d > 0 for some ε ∈ (0, 1] which is defined by

only homogeneous polynomials. For every tuple Λ of integers (λi,j)i∈[C],j∈[m], define

PΛ(X) =∑

i∈[C],j∈[m]

λi,jPi(Lj(X)),

where PΛ : (Fn)` → T. Moreover assume that for every i ∈ [C], j ∈ [m], Lj ∈ Ldeg(λi,jPi).

Then, PΛ is of degree d = maxi,j deg(λi,jPi) and ‖e (PΛ)‖Ud < ε.

56

CHAPTER 5

TRUE COMPLEXITY, ON A THEOREM OF GOWERS AND

WOLF

The work in this chapter is based on a previous joint work [50]. As discussed in Section 2.4.2,

Cauchy-Schwarz complexity is an upper-bound on the true complexity of a system of linear

forms. However, there are cases where this is not tight. Gowers and Wolf conjectured that

the true complexity of a system of linear forms can be characterized by a simple linear

algebraic condition. Namely, that it is equal to the smallest d > 1 such that Ld+11 , . . . , Ld+1

m

are linearly independent where the d-th tensor power of a linear form L is defined as

Ld =

d∏j=1

λij : i1, . . . , id ∈ [k]

∈ Fkd.

Later in [38, Theorem 6.1] they verified their conjecture in the case where |F| is sufficiently

large; more precisely when |F| is at least the Cauchy-Schwarz complexity of the system of

linear form. In this chapter we verify the Gowers-Wolf conjecture in full generality by proving

the following stronger theorem.

Theorem 5.0.7. Let L = L1, . . . , Lm be a system of linear forms. Assume that Ld+11 is

not in the linear span of Ld+12 , . . . , Ld+1

m . For every ε > 0, there exists δ > 0 such that for

any collection of functions f1, . . . , fm : Fn → D with ‖f1‖Ud+1 6 δ, we have

∣∣∣∣∣ EX∈(Fn)k

[m∏i=1

fi(Li(X))

]∣∣∣∣∣ 6 ε.

Theorem 5.0.7 was conjectured in [38], and left open even in the case of large |F|. In [51],

a partial near-orthogonality result is proved and used to prove Theorem 5.0.7 in the case

where |F| is greater or equal to the Cauchy-Schwarz complexity of the system of linear form.

Theorem 5.0.7 establishes this in its full generality using our near-orthogonality result for

57

regular non-classical polynomials. The following corollary to Theorem 5.0.7 is very useful

when combined with the decomposition theorems such as Theorem 2.3.1.

Corollary 5.0.8. Let L = L1, . . . , Lm be a system of linear forms. Assume that

Ld+11 , . . . , Ld+1

m are linearly independent. For every ε > 0, there exists δ > 0 such that for

any functions f1, . . . , fm, g1, . . . , gm : Fn → D with ‖fi − gi‖Ud+1 6 δ, we have

∣∣∣∣∣EX[m∏i=1

fi(Li(X))

]− EX

[m∏i=1

gi(Li(X))

]∣∣∣∣∣ 6 ε,

Proof. Choosing δ = δ(ε′) as in Theorem 5.0.7 for ε′ := ε/m, we have

∣∣∣∣∣EX[m∏i=1

fi(Li(X))

]− EX

[m∏i=1

gi(Li(X))

]∣∣∣∣∣=

∣∣∣∣∣∣m∑i=1

EX

(fi − gi)(Li(X)) ·i−1∏j=1

gj(Lj(X)) ·m∏

j=i+1

fj(Lj(X))

∣∣∣∣∣∣6

m∑i=1

∣∣∣∣∣∣EX(fi − gi)(Li(X)) ·

i−1∏j=1

gj(Lj(X)) ·m∏

j=i+1

fj(Lj(X))

∣∣∣∣∣∣ 6 m · δ 6 ε,

where the second inequality follows from Theorem 5.0.7 since ‖fi − gi‖Ud+1 6 δ and Ld+1i

is not in the linear span of Ld+1j j∈[m]\i.

5.1 The Gowers-Wolf Conjecture: Proof of Theorem 5.0.7

Theorem 5.0.7 (restated). Let L = L1, . . . , Lm be a system of linear forms. Assume

that Ld+11 is not in the linear span of Ld+1

2 , . . . , Ld+1m . For every ε > 0, there exists δ > 0

such that for any collection of functions f1, . . . , fm : Fn → D with ‖f1‖Ud+1 6 δ, we have

∣∣∣∣∣ EX∈(Fn)k

[m∏i=1

fi(Li(X))

]∣∣∣∣∣ 6 ε. (5.1)

58

We prove Theorem 5.0.7 in this section. Note that since Ld+11 is not in the linear span

of Ld+12 , . . . , Ld+1

m we have that L1 is linearly independent from each Lj for j > 1. We

claim that we may assume without loss of generality that L2, . . . , Lm are pairwise linearly

independent as well, namely that L1, . . . , Lm has bounded Cauchy-Schwarz complexity.

Assume that there are j, ` ∈ [m]\1 and a nonzero c ∈ F such that L` = cLj . Then we

may define a new function f ′j(x) = fj(x)f`(cx) so that f ′j(Lj(X)) = fj(Lj(X))f`(L`(X))

and remove the linear form L` and functions fj , f` from the system. Now

EX∈(Fn)k

[m∏i=1

fi(Li(X))

]= EX∈(Fn)k

f ′j(Lj(X)) ·

∏i∈[m]\j,`

fi(Li(X))

,and thus it suffices to bound the right hand side of the above identity. We may repeatedly

apply the above procedure in order to achieve a new system of pairwise linearly independent

linear forms along with their corresponding functions while keeping L1 and f1 untouched.

Thus we may assume that L1, . . . , Lm is of finite Cauchy-Schwarz complexity s for

some s 6 m < ∞. The case when d > s follows from Lemma 2.4.3, thus we will consider

the case when d < s. We will use Corollary 3.0.9 to write fi = gi + hi with

1. gi = E[fi|B], where B is an r-regular polynomial factor of degree at most s and

complexity C 6 Cmax(p, s, η, δ, r(·)) defined by only homogeneous polynomials, where

r is a sufficiently fast growing growth function (to be determined later);

2. ‖hi‖Us+1 6 η.

We first show that by choosing a sufficiently small η we may replace fi’s in Equation (5.1)

with gi’s.

59

Claim 5.1.1. Choosing η 6 ε2m we have

∣∣∣∣∣ EX∈(Fn)k

[m∏i=1

fi(Li(X))

]− EX∈(Fn)k

[m∏i=1

gi(Li(X))

]∣∣∣∣∣ 6 ε

2.

Proof. We have

∣∣∣∣∣EX[m∏i=1

fi(Li(X))

]− EX

[m∏i=1

gi(Li(X))

]∣∣∣∣∣=

∣∣∣∣∣∣m∑i=1

EX

hi(Li(X)) ·i−1∏j=1

gj(Lj(X)) ·m∏

j=i+1

fj(Lj(X))

∣∣∣∣∣∣6

m∑i=1

∣∣∣∣∣∣EXhi(Li(X)) ·

i−1∏j=1

gj(Lj(X)) ·m∏

j=i+1

fj(Lj(X))

∣∣∣∣∣∣6

m∑i=1

‖hi‖Us+1 6 m · η 6 ε

2,

where the second inequality follows from Lemma 2.4.3 since the Cauchy-Schwarz complexity

of L is s.

Thus it is sufficient to bound∣∣∣EX∈(Fn)k

[∏mi=1 gi(Li(X))

]∣∣∣ by ε/2. For each i, gi =

E[fi|B] and thus

gi(x) = Γi(P1(x), . . . , PC(x)),

where P1, . . . , PC are the (nonclassical) homogeneous polynomials of degree 6 s defining B

and Γi : TC → D is a function. Let ki denote the depth of the polynomial Pi so that by

Lemma 2.1.3, each Pi takes values in Uki+1 = 1pki+1Z/Z. Moreover let Σ := Zpk1+1 × · · · ×

ZpkC+1 . Using the Fourier expansion of Γi we have

gi(x) =∑

Λ=(λ1,...,λC)∈Σ

Γi(Λ) · e

C∑j=1

λjPj(x)

, (5.2)

60

where Γi(Λ) is the Fourier coefficient of Γi corresponding to Λ. Let PΛ :=∑Cj=1 λjPj(x) for

the sake of brevity so that we may write

EX

[m∏i=1

gi(Li(X))

]=

∑Λ1,...,Λm∈Σ

(m∏i=1

Γi(Λi)

)· EX

[e

(m∑i=1

PΛi(Li(X))

)]. (5.3)

We will show that for a sufficiently fast growing choice of the regularity function r(·) we may

bound each term in Equation (5.3) by σ := ε2|Σ|m , thus concluding the proof by the triangle

inequality. We will first show that the terms for which deg(PΛ1) 6 d can be made small.

Claim 5.1.2. Let Λ ∈ Σ be such that deg(PΛ) 6 d. For a sufficiently fast growing choice of

r(·) and choice of δ 6 σ2 , ∣∣∣Γ1(Λ)

∣∣∣ < σ.

Proof. It follows from Equation (5.2) that

Γ1(Λ) = Ex

[g1(x)e (−PΛ(x))]−∑

Λ′∈Σ\ΛΓ1(Λ′) · E

x[e (PΛ′(x)− PΛ(x))] .

Note that ‖f1‖Ud+1 6 δ and thus

∣∣∣Ex

[g1(x)e (−PΛ(x))]∣∣∣ =

∣∣∣Ex

[f1(x)e (−PΛ(x))]∣∣∣

6 ‖f1e (−PΛ)‖Ud+1 = ‖f1‖Ud+1 6 δ 6 σ/2,

where we used the fact that g1 = E[f1|B] and the fact that Gowers norms are increasing in

d. Finally the terms of the form Γ1(Λ′) · Ex[e (PΛ′(x)− PΛ(x))] with Γ′ 6= Γ can be made

arbitrarily small by choosing a sufficiently fast growing r(·) due to Remark 2.2.17, since

PΛ′ −PΛ = PΛ′−Λ is a nonzero linear combination of the polynomials defining the r-regular

factor B.

The above claim allows us to bound the terms from Equation (5.3) corresponding to

61

tuples (Λ1, . . . ,Λm) ∈ Σm with deg(PΛ1) 6 d. This is because for such terms |Γ1(Λ1)| < σ

by the above claim, and |Γi(Λi)| 6 1 since fi’s take values in D. It remains to bound the

terms for which deg(PΛ1) > d. We will need the following claim.

Lemma 5.1.3. Assume that Ld+11 is not in the linear span of Ld+1

2 , . . . , Ld+1m , and let

(Λ1, . . . ,Λm) ∈ Σm be such that deg(PΛ1) > d+ 1. Then

m∑i=1

PΛi(Li(X)) 6≡ 0.

This combined with Theorem 2.2.23 implies that for a sufficiently fast growing choice of

r(·), EX[e(∑m

i=1 PΛi(Li(X)))]< σ which completes the proof of Theorem 5.0.7. Thus, we

are left with proving Lemma 5.1.3.

Proof of Lemma 5.1.3: Assume to the contrary that∑mi=1 PΛi(Li(X)) ≡ 0. Denoting

the coordinates of Λi by (λi,1, . . . , λi,C) ∈ ΣC we have

m∑i=1

PΛi(Li(X)) =∑

i∈[m],j∈[C]

λi,jPj(Li(X)) ≡ 0.

Since the polynomial factor defined by P1, . . . , PC is r-regular with a sufficiently fast growing

growth function r(·), Theorem 2.2.23 implies that for every j ∈ [C] we must have that

m∑i=1

λi,jPj(Li(X)) ≡ 0. (5.4)

Since deg(PΛ1) > d, there must exist j ∈ [C] such that λ1,j 6= 0 and deg(λ1,jPj) > d.

Let j∗ ∈ [C] be such that deg(λ1,j∗Pj∗) is maximized, and let d∗ := deg(Pj∗) (note that

deg(λ1,j∗Pj∗) 6 d∗). We will first prove that replacing Pj∗ with a classical homogeneous

polynomial Q of the same degree, Equation (5.4) for j = j∗ would still hold.

Claim 5.1.4. Let Q : Fn → T be a classical homogeneous polynomial with deg(Q) = d∗.

62

Thenm∑i=1

λi,j∗Q(Li(X)) ≡ 0.

Proof. Assume to the contrary that∑mi=1 λi,j∗Q(Li(X)) 6≡ 0. By Claim 4.2.2 for degree d∗

and linear forms L1, . . . , Lm, we can find a set of coefficients ai,t ∈ Zi∈[`],t∈[m′], ci,t ∈

F\0i∈[`],t∈[m′] and linear forms Mtt∈[m′] with |Mt| 6 d∗ such that the first nonzero

entry of every Mt is equal to 1, such that

m∑i=1

λi,j∗Q(Li(X)) =∑

i∈[`],t∈[m′]

ai,tλi,j∗Q(ci,tMt(X)) =m′∑t=1

αtQ(Mt(X)) 6≡ 0, (5.5)

where αt =∑`i=1 ai,tλi,j∗|ci,t|d

∗. Here the second equality follows from Q being classical

and homogeneous. Furthermore,

m∑i=1

λi,j∗Pj∗(Li(X)) =∑

i∈[`],t∈[m′]

ai,tλi,j∗Pj∗(ci,tMt(X)) =m′∑t=1

βtPj∗(Mt(X)), (5.6)

where βt =∑`i=1 ai,tλi,j∗σi,t, σi,ti∈[`],t∈[m′] are integers whose existence follows from the

homogeneity of Pj∗ . Moreover, by Remark 3.0.6 we know that σi,t ≡ |ci,t|d∗

mod p, and

hence

αt ≡ βt mod p, ∀t ∈ [m′]. (5.7)

Now notice that Equation (5.5) implies that there exists some t ∈ [m′] for which αtQ 6= 0,

which, since Q is classical, is equivalent to αt 6≡ 0 mod p. Hence also βt 6≡ 0 mod p. Let

T = t ∈ [m′] : βt 6≡ 0 mod p, which we just verified is nonempty. Then for t ∈ T ,

deg(βtPj∗) = deg(Pj∗) = d∗; and for t ∈ [m] \ T , deg(βtPj∗) 6 d∗ − (p− 1) < d∗.

We can now decompose

m∑i=1

λi,j∗Pj∗(Li(X)) =∑t∈T

βtPj∗(Mt(X)) +∑

t∈[m]\TβtPj∗(Mt(X)). (5.8)

63

By Proposition 4.2.6, the first sum in Equation (5.8) is a nonzero polynomial of degree d∗,

and by our previous argument, the sum for t 6∈ T is a polynomial of degree less than d∗.

Hence, we get that∑mi=1 λi,j∗Pj∗(Li(X)) is a nonzero polynomial of degree d∗, which is a

contradiction to our assumption.

We have proved that for every choice of a degree-d∗ classical homogeneous polynomial Q,∑mi=1 λi,j∗Q(Li(X)) ≡ 0. Now choosing the polynomial Q(x(1), . . . , x(n)) = x(1) · · ·x(d∗)

and looking at the coefficients of the monomials of degree d∗, we have

m∑i=1

λi,j∗Ld∗i ≡ 0.

Recalling that λ1,j∗ 6= 0 and that d∗ > d, this means that Ld+11 can be written as a linear

combination of Ld+12 , . . . , Ld+1

m , a contradiction.

64

CHAPTER 6

TESTABLE AFFINE-INVARIANT PROPERTIES

The results in this chapter are based on previous work [15] 1. The field of property testing,

as initiated by [20, 8] and defined formally by [66, 29], is the study of algorithms that query

their input a very small number of times and with high probability decide correctly whether

their input satisfies a given property or is “far” from satisfying that property. A property is

called testable, or sometimes strongly testable or locally testable, if the number of queries can

be made independent of the size of the object without affecting the correctness probability.

Perhaps surprisingly, it has been found that a large number of natural properties satisfy this

strong requirement; see e.g. the surveys [24, 65, 64, 70] for a general overview.

Let [R] denote the set 1, . . . , R. Given a property P of functions in Fn → [R] | n ∈

Z>0, we say that f : Fn → [R] is ε-far from P if

ming∈P

Prx∈Fn

[f(x) 6= g(x)] > ε,

and we say that it is ε-close otherwise.

Definition 6.0.5 (Testability). A property P is said to be testable (with one-sided error)

if there are functions q : (0, 1) → Z>0, δ : (0, 1) → (0, 1), and an algorithm T that, given

as input a parameter ε > 0 and oracle access to a function f : Fn → [R], makes at most

q(ε) queries to the oracle for f , always accepts if f ∈ P and rejects with probability at least

δ(ε) if f is ε-far from P. If, furthermore, q is a constant function, then P is said to be

proximity-obliviously testable (PO testable).

The term proximity-oblivious testing is coined by Goldreich and Ron in [32]. As an

example of a testable (in fact, PO testable) property, let us recall the famous result by

1. This work is based on an earlier work: Every locally characterized affine-invariant property is testable.In Proceedings of the forty-fifth annual ACM symposium on Theory of computing (STOC 13), pp. 429–436.c© ACM, 2013. DOI=10.1145/2488608.2488662

65

Blum, Luby and Rubinfeld [20] which initiated this line of research. They showed that

linearity of a function f : Fn → F is testable by a test which makes 3 queries. This test

accepts if f is linear and rejects with probability Ω(ε) if f is ε-far from linear.

6.1 Background

6.1.1 Algebraic Property Testing, Affine Invariance

We say that a property P ⊆ Fn → [R] is linear-invariant if it is the case that for any

f ∈ P and for any linear transformation L : Fn → Fn, it holds that f L ∈ P . Similarly,

an affine-invariant property is closed under composition with affine transformations A :

Fn → Fn (an affine transformation A is of the form L + c where L is linear and c is a

constant). The property of a function f : Fn → F being affine is testable by a simple

reduction to [20], and is itself affine-invariant. Other well-studied examples of affine-invariant

(and hence, linear-invariant) properties include Reed-Muller codes (in other words, bounded

degree polynomials) [8, 9, 23, 66, 4] and Fourier sparsity [34]. In fact, affine invariance seems

to be a common feature of most interesting properties that one would classify as “algebraic”.

Kaufman and Sudan in [57] made explicit note of this phenomenon and initiated a general

study of the testability of affine-invariant properties (see also [30]). In particular, they asked

for necessary and sufficient conditions for the testability of affine-invariant properties.

6.1.2 Localy Characterized Properties

“Local characterization” is a necessary condition for PO testability. For a PO testable

property P , if a function f does not satisfy P , then by Definition 6.0.5, the tester rejects f

with positive probability. Since the test always accepts functions with the property, there

must be q points x1, . . . , xq ∈ Fn that form a witness for non-membership in P . These are

the queries that cause the tester to reject. Thus, denoting σ = (f(x1), . . . , f(xq)) ∈ [R]q, we

66

say that C = (x1, x2, . . . , xq;σ) forms a q-local constraint for P . This means that whenever

the constraint is violated by a function g, i.e., (g(x1), . . . , g(xq)) = σ, we know that g is not

in P . A property P is q-locally characterized if there exists a collection of q-local constraints

C1, . . . , Cm such that g ∈ P if and only if none of the constraints C1, . . . , Cm are violated.

It follows from the above discussion that if P is PO testable with q queries, then P is q-

locally characterized. We say P is locally characterized if it is q-locally characterized for

some constant q.

We now give some examples of locally characterized affine-invariant properties. Consider

the property of being affine. It is 4-locally characterized because a function f is affine if and

only if f(x) − f(x + y) − f(x + z) + f(x + y + z) = 0 for every x, y, z ∈ Fn. Note that

this characterization automatically suggests a 4-query test: pick random x, y, z ∈ Fn and

check whether the identity holds or not for that choice of x, y, z. More generally, consider

the property of being a polynomial of degree at most d, for some fixed integer d > 0. The

property is known to be PO testable due to independent work of [56, 53], and their test

is based upon a pdd+1p−1e-local characterization. Again, the test is simply to pick a random

constraint and check if it is violated.

Indeed, for any q-locally characterized property P defined by constraints C1, . . . , Cm, one

can design the following q-query test: choose a constraint Ci uniformly at random and reject

only if the input function violates Ci. Clearly, if the input function f is in P , the test always

accepts. The question is the probability with which a function ε-far from P is rejected. We

show that for affine-invariant properties, this test always rejects with probability bounded

away from zero for every constant ε > 0.

Theorem 6.1.1. Every q-locally characterized affine-invariant property is proximity oblivi-

ously testable with q queries.

67

6.2 Locality for Affine-Invariant Properties

In the context of affine-invariant properties, we can define the notion of local characterization

in a more algebraic way than we did in the introduction. Recall that a hyperplane is an affine

subspace of codimension 1.

Definition 6.2.1 (Locally characterized properties). An affine-invariant property P ⊂

Fn → [R] : n > 0 is said to be locally characterized if both of the following hold:

• For every function f : Fn → [R] in P and every hyperplane A 6 Fn, f |A ∈ P.

• There exists a constant K > 1 such that if f : Fn → [R] does not belong to P and

n > K, then there exists a hyperplane B 6 Fn such that f |B 6∈ P.

The constant K is said to be the locality of P.

The following observation shows that an affine-invariant property is locally characterized

if and only if it can be described using a bounded number of induced affine constraints from

the previous section, and hence, is locally characterized in the sense of the introduction.

Lemma 6.2.2. If P ⊆ Fn → [R] : n > 0 is a locally characterized affine-invariant

property with locality K, then P is equivalent to A-freeness, where A is a finite collection of

induced affine constraints, with each constraint of size pK on K + 1 variables. On the other

hand, if P is equivalent to A-freeness, where A is a collection of induced affine constraints

with each constraint on 6 K + 1 variables, then P has locality at most K.

Finally, we also make formal note of the observation in the introduction that if a property

is testable, then it must be locally characterized.

Remark 6.2.3. If K is a fixed integer and P ⊂ Fn → [R] is an affine-invariant property

that is testable with K queries, then P is a locally characterized property with locality K.

So, we can view our main result as a converse statement.

68

6.3 Subspace Hereditary Properties

Just as a necessary condition for PO testability is local characterization, one can formulate

a natural condition that is (almost) necessary for testability in general. In the context of

affine-invariant properties, the condition can be succinctly stated as follows:

Definition 6.3.1 (Subspace hereditary properties). An affine-invariant property P is said

to be (affine) subspace hereditary if for any f : Fn → [R] satisfying P, the restriction of f

to any affine subspace of Fn also satisfies P.

In [17], it is shown that every affine-invariant property testable by a “natural” tester is

very “close” to a subspace hereditary property2. Thus, if we gloss over some technicalities,

subspace hereditariness is a necessary condition for testability. In the opposite direction,

[17] conjectures the following:

Conjecture 6.3.2 ([17]). Every subspace hereditary property is testable.

Resolving Conjecture 6.3.2 would yield a combinatorial characterization of the (natu-

ral) one-sided testable affine-invariant properties, similar to the characterization for testable

dense graph properties [6]. Although we are not yet able to confirm or refute the full Con-

jecture 6.3.2, we can show testability if we make an additional assumption of “bounded

complexity”, defined formally in Section 6.4.2.

Theorem 6.3.3 (Informal). Every subspace hereditary property of “bounded complexity” is

testable.

We will formally define the notion of complexity later on in Section 6.4.2, but for now, it

suffices to know that it is an integer that we will associate with each property (independent of

2. We omit the technical definitions of “natural” and “close”, since they are unimportant here. Informally,the behavior of a “natural” tester is independent of the size of the domain and “close” means that the propertydeviates from an actual affine subspace hereditary property on functions over a finite domain. See [17] fordetails, or [6] for the analogous definitions in a graph-theoretic context.

69

n). Also, q-locally characterized properties are of complexity at most q. All natural affine-

invariant properties that we know of have bounded complexity (in fact, most are locally

characterized). So, the subspace hereditary properties not covered by Theorem 6.3.3 seem

to be mainly of theoretical interest.

6.4 Main Result

In this section, we describe our main result, Theorem 6.3.3, rigorously. Theorem 6.1.1 follows

as a corollary. We first need to set up some notions. Just as a locally characterized property

can be described by a list of constraints, subspace hereditary properties can also be described

similarly, but here, the size of the list can be infinite. For affine-invariant properties, we can

represent the constraints in a very special form, as “induced affine constraints”. We first

describe these, then define the notion of complexity, and finally state the theorem.

6.4.1 Affine constraints

A linear form L = (w1, w2, . . . , wk) is said to be affine if w1 = 1. In this section, linear forms

will always be assumed to be affine.

We specify a partial order among affine forms. We say (w1, . . . , wk) (w′1, . . . , w′k)

if |wi| 6 |w′i| for all i ∈ [k], where | · | is the obvious map from F to 0, 1, . . . , p− 1. An

affine constraint is a collection of affine forms, with the added technical restriction of being

downward-closed with respect to . For future references we state this as the following

definition.

Definition 6.4.1 (Affine constraints). An affine constraint of size m on k variables is a

tuple A = (L1, . . . , Lm) of m affine forms L1, . . . , Lm over F on k variables, where:

(i) L1(x1, . . . , xk) = x1;

(ii) If L belongs to A and L′ L, then L′ also belongs to A.

70

Any subspace hereditary property can be described using affine constraints and forbidden

patterns, in the following way.

Definition 6.4.2 (Properties defined by induced affine constraints).

• An induced affine constraint of size m on ` variables is a pair (A, σ) where A is an

affine constraint of size m on ` variables and σ ∈ [R]m.

• Given such an induced affine constraint (A, σ), a function f : Fn → [R] is said to be

(A, σ)-free if there exist no x1, . . . , x` ∈ Fn such that

(f(L1(x1, . . . , x`)), . . . , f(Lm(x1, . . . , x`))) = σ.

On the other hand, if such x1, . . . , x` exist, we say that f induces (A, σ) at x1, . . . , x`.

• Given a (possibly infinite) collection A = (A1, σ1), (A2, σ2), . . . , (Ai, σi), . . . of in-

duced affine constraints, a function f : Fn → [R] is said to be A-free if it is (Ai, σi)-free

for every i > 1.

As an example consider the property of having degree at most 1 as a polynomial, for

function F : Fn → F. It is easy to see that F satisfies this property if and only if F (x1) −

F (x1 + x2) − F (x1 + x3) + F (x1 + x2 + x3) = 0 for all x1, x2, x3 ∈ F. Consequently the

property can be defined by the set of induced affine constraints that forbid any values for

F (x1), F (x1 + x2), F (x1 + x3), F (x1 + x2 + x3) that do not satisfy the identity F (x1) −

F (x1 + x2)− F (x1 + x3) + F (x1 + x2 + x3) = 0.

The connection between affine subspace hereditariness and affine constraints is given by

the following simple observation.

Observation 6.4.3. An affine-invariant property P is subspace hereditary if and only if

it is equivalent to the property of A-freeness for some fixed collection A of induced affine

constraints.

71

Proof. Given an affine invariant property P , a simple (though inefficient) way to obtain the

set A is to let it be the following: For every n and a function f : Fn → [R] that is not in P ,

we include in A the constraint (Af , σf ), where Af is indexed by members of Fn and contains

Lz(X1, . . . , Xn+1) = X1 +∑ni=1 ziXi+1 : z = (z1, . . . , zn) ∈ Fn, and σf is just set to f .

Setting X1 = 0 and Xi+1 to the ith standard vector ei for every i ∈ [n] shows that f is

not (Af , σf )-free. Hence the property defined by A is contained in P . The containment in

the other direction follows from P being affine-invariant and hereditary.

The other direction of the observation is trivial.

6.4.2 Complexity of Induced Affine Constraints

Recall from Definition 2.4.2, that if L = L1, . . . , Lm ∈ Fk contains two linear forms that

are multiples of each other (that is Li = λLj for i 6= j and λ ∈ F), then the Cauchy-Schwarz

complexity of L is infinity. Otherwise, its Cauchy-Schwarz complexity is at most |L| − 2.

Note, also that sets of affine linear forms are always of finite complexity.

Definition 6.4.4. Given a collection A =

(A1, σ1), (A2, σ2), . . . , (Ai, σi)

of induced affine

constraints, we say that A is of complexity 6 d if for each i, the collection of affine forms

Ai is of Cauchy-Schwarz complexity 6 d according to Definition 2.4.2.

6.4.3 Statement of the main result

Theorem 6.4.5 (Main theorem). For any integer d > 0 and (possibly infinite) fixed col-

lection A = (A1, σ1), (A2, σ2), . . . , (Ai, σi), . . . of induced affine constraints, each of

complexity 6 d, there are functions qA : (0, 1) → Z+, δA : (0, 1) → (0, 1) and a tester T

which, for every ε > 0, makes qA(ε) queries, accepts A-free functions and rejects functions

ε-far from A-free with probability at least δA(ε). Moreover, qA is a constant if A is of finite

size.

72

We do not have any explicit bounds on δA because the analysis depends on previous

work based on ergodic theory. It would of course be interesting to have explicit bounds for

some of the properties described in Section 6.1.2.

Let us finally note that Theorem 6.4.5 is quite nontrivial even if A consists only of a

single induced affine constraint of complexity greater than 1. Indeed, previously it was not

known how to show testability in this case. A more detailed account of previous work is

given in Section 6.4.4.

6.4.4 Comparison with Previous Work

Our testing result is part of, and culminates a sequence of works investigating the relationship

between affine-invariance and testability. As described, Kaufman and Sudan [57] initiated the

program. Subsequently, Bhattacharyya, Chen, Sudan, and Xie [14] investigated monotone

linear-invariant properties of functions f : Fn2 → 0, 1, where a property P is monotone if

it satisfies the condition that for any function g ∈ P , modifying g by changing some outputs

from 1 to 0 does not make it violate P . Kral, Serra and Vena [60] and, independently, Shapira

[69] showed testability for any monotone linear-invariant property characterized by a finite

number of linear constraints (of arbitrary complexity). For general non-monotone properties,

Bhattacharyya, Grigorescu, and Shapira proved in [17] that affine-invariant properties of

functions in Fn2 → 0, 1 are testable if the complexity of the property is 1. Earlier

this year, Bhattacharrya, Fischer and Lovett in [16] generalized [17] to show that affine-

invariant properties of complexity < p are testable. In this paper, we only have to restrict

the complexity to be bounded, but the bound can be independent of p.

In terms of techniques, the general framework of the proof for testability here is very

much the same as in [17] or [16]. However, the main difference here is that we work with

collections of non-classical polynomials, rather than classical ones. Because the degrees of

non-classical polynomials can change when multiplied by constants, the notions of rank and

73

regularity are much more subtle. We need to establish a new version of a “polynomial

regularity lemma” which allows us to decompose a given polynomial collection into a high

rank collection of non-classical polynomials. Also, as discussed earlier, we establish a new

equidistribution theorem for non-classical polynomials. We expect that these results will be

of independent interest.

6.5 Locality

In the context of affine-invariant properties, we can define the notion of local characterization

in a more algebraic way than we did in the introduction. Recall that a hyperplane is an affine

subspace of codimension 1.

Definition 6.5.1 (Locally characterized properties). An affine-invariant property P ⊂

Fn → [R] : n > 0 is said to be locally characterized if both of the following hold:

• For every function f : Fn → [R] in P and every hyperplane A 6 Fn, f |A ∈ P.

• There exists a constant K > 1 such that if f : Fn → [R] does not belong to P and

n > K, then there exists a hyperplane B 6 Fn such that f |B 6∈ P.

The constant K is said to be the locality of P.

The following observation shows that an affine-invariant property is locally characterized

if and only if it can be described using a bounded number of induced affine constraints from

the previous section, and hence, is locally characterized in the sense of the introduction.

Lemma 6.5.2. If P ⊆ Fn → [R] : n > 0 is a locally characterized affine-invariant

property with locality K, then P is equivalent to A-freeness, where A is a finite collection of

induced affine constraints, with each constraint of size pK on K + 1 variables. On the other

hand, if P is equivalent to A-freeness, where A is a collection of induced affine constraints

with each constraint on 6 K + 1 variables, then P has locality at most K.

74

Finally, we also make formal note of the observation in the introduction that if a property

is testable, then it must be locally characterized.

Remark 6.5.3. If K is a fixed integer and P ⊂ Fn → [R] is an affine-invariant property

that is testable with K queries, then P is a locally characterized property with locality K.

So, we can view our main result as a converse statement.

6.6 Strong Near-Orthogonality

We will use the strong near-orthogonality theorem of [15] for affine systems of linear forms,

Theorem 2.2.21. Recall that this is a weaker version of Theorem 2.2.23, however we use

the former for its slightly simpler statement, and the fact that it does not require using

homogeneous non-classical polynomials.

Remark 6.6.1. The proof of Theorem 2.2.21 also shows the following. Suppose, in the

setting of Theorem 2.2.21, that for every Pi ∈ B and Lj ∈ A, either |Lj | 6 deg(λi,jPi) or

λi,j = 0. Then, unless every λi,j = 0 (mod pki+1), we have that PA,B,Λ is non-constant and

|E[e(PA,B,Λ(x1, . . . , x`)

)]| < ε. The only modification needed to the above proof is that the

transformation from Λ to Λ′ can be omitted.

To show equidistribution of (Pi(Lj(x1, . . . , x`)), we can use Theorem 2.2.21 in the same

manner we used Theorem 2.2.10 to show the equidistribution of (Pi(x)) in Lemma 2.2.18.

Before we do so, however, let us give a name to those Λ for which the first case of Theo-

rem 2.2.21 holds.

Definition 6.6.2. Given an affine constraint A = (L1, . . . , Lm) on ` variables and inte-

gers d, k > 0 such that d > k(p − 1), the (d, k)-dependency set of A is the set of tuples

(λ1, . . . , λm) ∈ [0, pk+1 − 1] such that∑mi=1 λiP (Li(x1, . . . , x`)) ≡ 0 for every polynomial

P : Fn → T of degree d and depth k.

75

Theorem 2.2.21 says that if B is a regular factor, PA,B,Λ ≡ 0 exactly when the first

condition holds. In other words:

Corollary 6.6.3. Fix an integer C > 0, tuples (d1, . . . , dC) ∈ ZC>0 and (k1, . . . , kC) ∈ ZC>0,

and an affine constraint (L1, . . . , Lm) on ` variables. For i ∈ [C], let Λi be the (di, ki)-

dependency set of A.

Then, for any polynomial factor B = (P1, . . . , PC), where each Pi has degree di and depth

ki, and B has rank > r2.2.13

(maxi di,

12

), it is the case that a tuple (λi,j)i∈[C],j∈[m] satisfies

C∑i=1

m∑j=1

λi,jPi(Lj(x1, . . . , x`)) ≡ 0

if and only if for every i ∈ [C], (λi,1 (mod pki+1), . . . , λi,m (mod pki+1)) ∈ Λi.

Proof. The “if” direction is obvious. For the “only if” direction, we use Theorem 2.2.21 to

conclude that if∑i,j λi,jPi(Lj(·)) ≡ 0, it must be that for every i ∈ [C],

∑j λi,jQi(Lj(·)) ≡

0 for any polynomial Qi with degree di and depth ki. This is equivalent to saying (λi,1

(mod pki+1), . . . , λi,m (mod pki+1)) ∈ Λi.

Similar to Theorem 4.1.1 we define consistency as following.

Definition 6.6.4 (Consistency). Let A be an affine constraint of size m. A sequence of

elements b1, . . . , bm ∈ T are said to be (d, k)-consistent with A if b1, . . . , bm ∈ Uk+1 and for

every tuple (λ1, . . . , λm) in the (d, k)-dependency set of A, it holds that∑mi=1 λibi = 0.

Given vectors d = (d1, . . . , dC) ∈ ZC>0 and k = (k1, . . . , kC) ∈ ZC>0, a sequence of vectors

b1, . . . , bm ∈ TC are said to be (d,k)-consistent with A if for every i ∈ [C], the elements

b1,i, . . . , bm,i are (di, ki)-consistent with A.

If B is a polynomial factor, the term B-consistent with A is a synonym for (d,k)-

consistent with A where d = (d1, . . . , dC) and k = (k1, . . . , kC) are respectively the degree

and depth sequences of polynomials defining B.

76

Now, the following is proved identically to Theorem 4.1.1.

Theorem 6.6.5. Given ε > 0, let B be a polynomial factor of degree d > 0, complexity C,

and rank r2.2.10(d, ε), that is defined by a tuple of polynomials P1, . . . , PC : Fn → T having

respective degrees d1, . . . , dC and respective depths k1, . . . , kC . Let A = (L1, . . . , Lm) be an

affine constraint on ` variables.

Suppose b1, . . . , bm ∈ TC are atoms of B that are B-consistent with A. Then

Prx1,...,x`

[B(Lj(x1, . . . , x`)) = bj ∀j ∈ [m]] =

∏Ci=1 |Λi|‖B‖m

± ε

where Λi is the (di, ki)-dependency set of A.

6.7 Degree-structural Properties

The conditions required in Theorem 6.1.1 and Theorem 6.3.3 are very general, and so, we

expect that they are satisfied by many interesting algebraic properties. This, in fact, turns

out to be the case. We show that a class of properties that we call degree-structural are all

locally characterized and are, hence, testable by Theorem 6.1.1. We give the definition below

in Definition 6.7.1. First let us list some examples of degree-structural properties. Let d be a

fixed positive integer. Each of the following definitions defines a degree-structural property.

• Degree 6 d: The degree of the function F : Fn → F as a polynomial is at most d;

• Splitting: A function F : Fn → F splits if it can be written as a product of at most d

linear functions;

• Factorization: A function F : Fn → F factors if F = GH for polynomials G,H :

Fn → F such that deg(G) 6 d− 1 and deg(H) 6 d− 1;

• Sum of two products: A function F : Fn → F is a sum of two products if there

77

are polynomials G1, G2, G3, G4 such that F = G1G2 +G3G4 and deg(Gi) 6 d− 1 for

i ∈ 1, 2, 3, 4;

• Having square root: A function F : Fn → F has a square root if F = G2 for a

polynomial G with deg(G) 6 d/2;

• Low d-rank: for a fixed integer r > 0, a function F : Fn → F has d-rank at most

r if there exist polynomials G1, . . . , Gr : Fn → F of degree 6 d − 1 and a function

Γ : Fr → F such that F = Γ(G1, . . . , Gr).

In fact, roughly speaking, any property that can be described as the property of decom-

posing into a known structure of low-degree polynomials is degree-structural.

Definition 6.7.1 (Degree-structural property). Given an integer c > 0, a vector of non-

negative integers d = (d1, . . . , dc) ∈ Zc>0, and a function Γ : Fc → F, define the (c,d,Γ)-

structured property to be the collection of functions F : Fn → F for which there exist

polynomials P1, . . . , Pc : Fn → F satisfying F (x) = Γ(P1(x), . . . , Pc(x)) for all x ∈ Fn and

deg(Pi) 6 di for all i ∈ [c].

We say a property P is degree-structural if there exist integers σ,∆ > 0 and a set of

tuples S ⊂ (c,d,Γ) | c ∈ [σ],d ∈ [0,∆]c,Γ : Fc → F, such that a function F : Fn → F

satisfies P if and only if F is (c,d,Γ)-structured for some (c,d,Γ) ∈ S. We call σ the scope

and ∆ the max-degree of the degree-structural property P.

It is straightforward to see that the examples above satisfy this definition.

6.7.1 Every Degree Structural Property is Testable

In this section, we prove our main result for degree-structural properties stating that if P

is degree-structural (recall Definition 6.7.1), then P is locally characterized. The proof uses

many of the tools established in Section 2.2.3.

78

Theorem 6.7.2. Every degree-structural property with bounded scope and max-degree is a

locally characterized affine-invariant property.

Combining Theorem 6.7.2 with Theorem 6.1.1 implies PO testability for all degree-

structural properties.

Proof. Let P be a degree-structural property with scope σ and max-degree ∆. Denote by

S the set of tuples (c,d,Γ) such that c 6 σ and P is the union over all (c,d,Γ) ∈ S of

(c,d,Γ)-structured functions. It is clear that P is affine-invariant, as having degree bounded

by a constant is an affine-invariant property. It is also immediate that P is closed under

taking restrictions to subspaces, since if F is (c,d,Γ)-structured, then F restricted to any

hyperplane is also (c,d,Γ)-structured. The non-trivial part of the theorem is to show that

the locality is bounded. In other words we need to show that there is a constant K such

that for n > K, if F : Fn → T is a function with F |A ∈ P for every hyperplane A 6 Fn,

then F ∈ P .

First, let us bound the degree of F . We know that F |A ∈ P for every hyperplane A.

Therefore, deg(F |A) 6 pσ∆ for every A, as F |A is a function of at most σ polynomials

each of degree at most ∆ over a field of characteristic p. It follows that F itself is of degree

6 pσ∆.

Let r : Z>0 → Z>0 be a function to be fixed later. Define r2 : Z>0 → Z>0 so that

r2(m) > r(C(r,pσ∆)2.2.29 (m+ σ)) + C

(r,pσ∆)2.2.29 (m+ σ) + p.

We apply Lemma 2.2.29 to F to find an r2-regular polynomial factor B of degree

6 pσ∆, defined by polynomials R1, . . . , RC : Fn → T, where C 6 Cr2,d2.2.29(1). Since F

is measurable with respect to B, there exists a function Σ : TC → F, such that F (x) =

Σ(R1(x), . . . , RC(x)).

From each Ri pick a monomial with degree equal to deg(Ri) and a monomial (possibly

the same one) with depth equal to depth(Ri). By taking K to be sufficiently large, we

can gaurantee the existence of an i0 ∈ [n] such that xi0 is not involved in any of these

79

monomials. Consequently deg(R′i) = deg(Ri) and depth(R′i) = depth(Ri) for all i ∈ [C],

where R′1, . . . , R′C are the restrictions of R1, . . . , RC , respectively, to the hyperplane xi0 =

0. Also by Lemma 2.2.2, R′1, . . . , R′C have rank > r2(C) − p. Since F |xi0=0 ∈ P , by

definition of P , there must exist (c,d,Γ) ∈ S with c 6 σ such that

Σ(R′1, . . . , R′C) = Γ(P1, . . . , Pc),

where deg(Pi) 6 di for all i ∈ [c].

Now, apply Lemma 2.2.29 to find an r-regular refinement of the factor defined by the

tuple of polynomials (R′1, . . . , R′C , P1, . . . , Pc). Because of our choice of r2 and the last part of

Lemma 2.2.29, we obtain a syntactic refinement ofR′1, . . . , R

′C

. That is, we obtain a tuple

B′ of polynomials R′1, . . . , R′C , S1, . . . , SD : Fn → T such that it has degree 6 pσ∆, its rank

> r(C+D), and C+D 6 C(r)2.2.29(C+σ), and for each i ∈ [c], Pi = Γi(R

′1, . . . , R

′C , S1, . . . , SD)

for some function Γi : TC+D → T. So for all x ∈ Fn,

Σ(R′1(x), . . . , R′C(x))

= Γ(Γ1(R′1(x), . . . , R′C(x), S1(x), . . . , SD(x)), . . . ,Γc(R′1(x), . . . , R′C(x), S1(x), . . . , SD(x))).

Applying Lemma 2.2.18, we see that if the rank of B′ is > r2.2.18 (pσ∆, ε) where ε > 0

is sufficiently small (say ε = ‖B′‖/2), then (R′1(x), . . . , R′C(x), S1(x), . . . , SD(x)) acquires

every value in its range. Thus, we have the identity

Σ(a1, . . . , ac) = Γ(Γ1(a1, . . . , aC , b1, . . . , bD), . . . ,Γc(a1, . . . , aC , b1, . . . , bD)),

for every ai ∈ Udepth(R′i)+1 and bi ∈ Udepth(Si)+1. Thus, we can substitute Ri for R′i and 0

80

for Si in the above equation and still retain the identity

F (x) = Σ(R1(x), . . . , RC(x))

= Γ(Γ1(R1(x), . . . , RC(x), 0, . . . , 0), . . . ,Γc(R1(x), . . . , RC(x), 0, . . . , 0))

= Γ(Q1(x), . . . , Qc(x))

where Qi : Fn → T are defined as Qi(x) = Γi(R1(x), . . . , RC(x), 0, . . . , 0). Since for ev-

ery i, deg(Ri) = deg(R′i) and depth(Ri) = depth(R′i), we can apply Theorem 6.7.3 be-

low to conclude that deg(Qi) 6 deg(Pi) 6 di for every i ∈ [c], as long as the rank of

B′ is > r6.7.3(pσ∆). Finally, we show that Q1, . . . , Qc map to U1 = ι(F ) and, so, are

classical. Indeed, since P1, . . . , Pc are classical, Γ1, . . . ,Γc must map to ι(F ) on all of∏Ci=1 Udepth(R′i)+1 ×

∏Di=1 Udepth(Si)+1 ⊇

∏Ci=1 Udepth(Ri)+1 × 0

D. Hence, F ∈ P .

The following theorem, used in the proof above, shows that a function of a high rank

collection of polynomials has the degree one would expect. Thus, it displays yet another way

in which high-rank polynomials behave “generically”. The proof is via another application

of the near-orthogonality result in Theorem 2.2.21.

Theorem 6.7.3. For an integer d > 0, let P1, . . . , PC : Fn → T be polynomials of degree 6 d

and rank > r6.7.3(d, C), and let Γ : TC → T be an arbitrary function. Define the polynomial

F : Fn → T by F (x) = Γ(P1(x), . . . , PC(x)). Then, for every collection of polynomials

Q1, . . . QC : Fn → T with deg(Qi) 6 deg(Pi) and depth(Qi) 6 depth(Pi) for all i ∈ [C], if

G : Fn → T is the polynomial G(x) = Γ(Q1(x), . . . , QC(x)), it holds that deg(G) 6 deg(F ).

Proof. Let f(x) = e (F (x)) and γ(x1, . . . , xC) = e (Γ(x1, . . . , xC)). Let D = deg(F ). Then,

for every x, y1, . . . , yD+1 ∈ Fn,

∆yD+1 · · ·∆y1f(x) = 1.

81

We need to show that g(x) = e (G(x)) also satisfies ∆yD+1 · · ·∆y1g(x) = 1.

Let k1, . . . , kC be the depths of P1, . . . , PC , respectively. Then, each Pi takes values in

Uki+1. Let Σ denote the group Zpk1+1 × · · · ×ZpkC+1 . Considering the Fourier transform of

γ, we have

f(x) = γ(P1(x), . . . , PC(x)) =∑β∈Σ

γ(β)e

C∑i=1

βiPi(x)

.Next, we look at the derivative.

∆yD+1 · · ·∆y1f(x) = ∆yD+1 · · ·∆y1

∑β∈Σ

γ(β)e

C∑i=1

βiPi(x)

=

∑αJ∈Σ:J⊆[D+1]

∏J⊆[D+1]

γ(αJ )e

(−1)|J |+1C∑i=1

αJ,iPi

x+∑j∈J

yj

.Denoting δ(α) =

∏J⊆[D+1] γ(αJ ) for α = (αJ )J⊆[D+1] ∈ ΣP([D+1]), we have

∆yD+1 · · ·∆y1f(x) =∑

α∈ΣP([D+1])

δ(α)e

C∑i=1

∑J⊆[D+1]

(−1)|J |+1αJ,iPi

x+∑j∈J

yj

.(6.1)

For any i, if there is a J such that |J | > deg(αJ,iPi), we can use Eq. (2.1) to rewrite

αJ,iPi

(x+

∑j∈J yj

)as a linear combination (over Z) of

Pi

(x+

∑j∈J ′ yj

): |J ′| < |J |

.

We repeat this process until for every i and J , either αJ,i = 0 or |J | 6 deg(αJ,iPi). Denoting

by A the set of α ∈ ΣP([D+1]) that satisfy this condition, we have obtained a new set of

coefficients δ′(α) such that

∆yD+1 · · ·∆y1f(x) =∑α∈A

δ′(α)e

C∑i=1

∑J⊆[D+1]

αJ,iPi

x+∑j∈J

yj

.Now, the crucial observation is that if instead of P1, . . . , PC , we had Q1, . . . , QC , the same

82

decomposition applies.

∆yD+1 · · ·∆y1g(x) =∑α∈A

δ′(A)e

C∑i=1

∑J⊆[D+1]

αJ,iQi

x+∑j∈J

yj

. (6.2)

The reason is that Eq. (6.1) remains valid as is if f is replaced by g and the Pi’s are

replaced by Qi’s, and furthermore since deg(Pi) > deg(Qi) and depth(Pi) > depth(Qi), the

applications of Eq. (2.1) remain valid also. Therefore, Eq. (6.2) is also valid

But now, we argue that δ′(·) are uniquely determined. Let k = maxi ki 6 d/(p− 1).

Claim 6.7.4. If P1, . . . , PC are of rank > r2.2.13(d, 1|A|) + 1, the functions

e

C∑i=1

∑J⊆[D+1]

αJ,iPi

x+∑j∈J

yj

: α ∈ A

are linearly independent over C.

Proof. Note that all these functions have L2-norm equal to 1. Hence it suffices to show

that their pairwise inner products are all bounded in absolute value by 1/|A|. To prove this

consider α, β ∈ A, and note that by Theorem 2.2.21 and, in particular, Remark 6.6.1, unless

all the αJ,i − βJ,i are zero,

∣∣∣∣∣∣Ee C∑i=1

∑J⊆[D+1]

(αJ,i − βJ,i)Pi

x+∑j∈J

yj

∣∣∣∣∣∣ < 1

|A|.

Therefore, since ∆yD+1 · · ·∆y1f(x) = 1, we must have δ′(α) = 1 when α is the all-

zero tuple, and δ′(α) = 0 for every non-zero α. Plugging this into Eq. (6.2), we get

∆yD+1 · · ·∆y1g(x) = 1.

83

6.8 Big Picture Functions

Suppose we have a function f : Fn → [R], and we want to find out whether it induces a

particular affine constraint (A, σ), where A = (L1, . . . , Lm) is a sequence of affine forms on

` variables and σ ∈ [R]m. Now, suppose Fn is partitioned by a polynomial factor B de-

fined by polynomials P1, . . . , PC of degrees d1, . . . , dC and depths k1, . . . , kC . Then, observe

that if b1, . . . , bm ∈ TC denote the atoms of B containing L1(x1, . . . , x`), . . . , Lm(x1, . . . , x`)

respectively, it must be the case that b1, . . . , bm are B-consistent with A (as defined in Defi-

nition 6.6.4). Thus, to locate where f might induce (A, σ), we should restrict our search to

sequences of atoms consistent with A.

It will be convenient to “blur” the given function f so as to retain only atom-level

information about it. That is, for every atom c of B, we will define fB(c) ⊆ [R] to be the set

of all values that f takes within c.

Definition 6.8.1. Given a function f : Fn → [R] and a polynomial factor B, the big picture

function of f is the function fB : T|B| → P([R]), defined by fB(c) = f(x) : B(x) = c.

On the other hand, given any function g : TC → P([R]), and a vector of degrees d =

(d1, . . . , dC) and depths k = (k1, . . . , kC) (which we think of as corresponding to the degrees

and depths of some polynomial factor of complexity C), we will define what it means for

such a function to “induce” a copy of a given constraint.

Definition 6.8.2 (Partially induce). Suppose we are given vectors d = (d1, . . . , dC) ∈ ZC>0

and k = (k1, . . . , kC) ∈ ZC>0, a function g :∏i∈[C] Uki+1 → P([R]), and an induced affine

constraint (A, σ) of size m. We say that g partially (d,k)-induces (A, σ) if there exist a

sequence b1, . . . , bm ∈ TC that is (d,k)-consistent with A, and σj ∈ g(bj) for each j ∈ [m].

Definition 6.8.2 is justified by the following trivial observation.

84

Remark 6.8.3. If f : Fn → [R] induces a constraint (A, σ), then for a factor B defined by

polynomials of respective degrees (d1, . . . , d|B|) = d and respective depths (k1, . . . , k|B|) = k,

the big picture function fB partially (d,k)-induces (A, σ).

To handle a possibly infinite collection A of affine constraints, we will employ a com-

pactness argument, analogous to one used in [7] to bound the size of the constraint partially

induced by the big picture function. Let us make the following definition:

Definition 6.8.4 (The compactness function). Suppose we are given positive integers C and

d, and a possibly infinite collection of induced affine constraints A = (A1, σ1), (A2, σ2), . . . ,

where (Ai, σi) is of size mi. For fixed d = (d1, . . . , dC) ∈ [d]C and k = (k1, . . . , kC) ∈[0,⌊d−1p−1

⌋]C, denote by G(d,k) the set of functions g :

∏Ci=1 Uki+1 → P([R]) that partially

(d,k)-induce some (Ai, σi) ∈ A. The compactness function is defined as

ΨA(C, d) = maxd,k

maxg∈G(d,k)

min(Ai,σi) partially

(d,k)-induced by g

mi

where the outer max is over vectors d = (d1, . . . , dC) ∈ [d]C and k = (k1, . . . , kC) ∈[0,⌊d−1p−1

⌋]C. Whenever G(d,k) is empty, we set the corresponding maximum to 0.

Note that ΨA(C, d) is indeed finite, as the number of possible degree and depth sequences

are bounded by d2C , and the size of G(d1, . . . , dC) is bounded by 2RpdC

.

Remark 6.8.5. Note that if a function g : TC → P([R]) partially (d,k)-induces some

constraint from A where d ∈ [d]C , then g must belong to G(d,k), and consequently it will

necessarily partially induce some (Ai, σi) ∈ A whose size is at most ΨA(C, d).

6.9 Proof of Testability

We prove the main result, Theorem 6.4.5, in this section. In fact, we will show the following.

85

Theorem 6.9.1. Let d > 0 be an integer. Suppose we are given a possibly infinite collection

of affine constraints A = (A1, σ1), (A2, σ2), . . . where each (Ai, σi) is an affine constraint

of complexity 6 d, and of size mi on ì variables. Then, there are functions À : (0, 1)→ Z>0

and δA : (0, 1) → (0, 1) such that the following is true for any ε ∈ (0, 1). If a function

f : Fn → [R] is ε-far from being A-free, then f induces at least δA(ε)pnì many copies of

some (Ai, σi) with ì < À(ε) .

Moreover, if A is locally characterized, then À(ε) is a constant independent of ε.

Theorem 6.4.5 immediately follows. Consider the following test: choose uniformly at

random x1, . . . , xÀ(ε) ∈ Fn, let H denote the affine spacex1 +

∑À(ε)j=2 λjxj : λj ∈ F

,

and check whether f restricted to H is A-free or not, thus making 6 pÀ(ε) queries. By

Theorem 6.9.1, if f is ε-far from A-freeness, this test rejects with probability at least δA(ε).

Proof Theorem 6.9.1:

Preliminaries. Fix a function f : Fn → [R] that is ε-far from being A-free. For i ∈ [R],

define f (i) : Fn → 0, 1 so that f (i)(x) equals 1 when f(x) = i and equals 0 otherwise.

Additionally, set the following parameters, where ΨA is the compactness function from

Definition 6.8.4:

α(C) = p−2dCΨA(C,d), ρ(C) = r2.2.13(d, α(C)), ζ = ε8R ,

∆(C) = 116ζ

ΨA(C,d), η(C) = 18pdCΨA(C,d)

( ε24R

)ΨA(C,d).

Decomposing by regular factors. Next, apply Theorem 2.3.6 to the functions

f (1), f (2), . . . , f (R) in order to get polynomial factors B′ syn B of complexity at most

C2.3.6(∆, d, ρ, ζ, η), an element s ∈ T|B′|−|B|, and functions f(i)1 , f

(i)2 , f

(i)3 : Fn → R for each

86

i ∈ [R] with the desired properties. The sequence of polynomials generating B′ will be

denoted by P1, . . . , P|B′|. Since B′ is a syntactic refinement, we can assume B is generated

by the polynomials P1, . . . , P|B|. Let C = |B| and C ′ = |B′|. Note that ‖B‖ < p(kmax+1)C 6

pdC , where kmax 6 b(d− 1)/(p− 1)c is the maximum depth of a polynomial in B. Denote

the degree of Pi by di and the depth of Pi by ki.

Cleanup. Based on B′ and B, we construct a function F : Fn → [R] that is ε2 -close to

f and hence, still violates A-freeness. The “cleaner” structure of F will help us locate the

induced constraint violated by f .

The function F is the same as f except for the following: For every atom c of B, let tc =

arg maxj∈[R] Pr[f(x) = j | B′(x) = (c, s)] be the most popular value inside the corresponding

subatom (c, s).

• Poorly-represented atoms: If there exists i ∈ [R] such that |Pr[f(x) = i | B(x) =

c]−Pr[f(x) = i | B′(x) = (c, s)]| > ζ, then set F (z) = tc for every z in the atom c.

• Unpopular values: Otherwise, for any z in the atom c with 0 < Prx[f(x) =

f(z) | B′(x) = (c, s)] < ζ, set F (z) = tc.

A key property of the cleanup function F is that it supports a value inside an atom c of

B only if the original function f acquires the value on at least an ζ fraction of the subatom

(c, s). Furthermore as the following lemma shows it is ε/2-close to f , and therefore, it is not

A-free.

Lemma 6.9.2. The cleanup function F is ε/2-close to f , and therefore, it is not A-free.

Proof. The first step applies to at most ζ‖B‖ atoms, since B′ ζ-represents B with respect to

each f (1), . . . , f (R). By Lemma 2.2.18, each atom occupies at most 1‖B‖ + α(C) fraction of

the entire domain. So, the fraction of points whose values are set in the first step is at most

ζ‖B‖( 1‖B‖ + α(C)) < 2ζ.

87

In the second step, if Pr[f(x) = f(z) | B′(x) = (c, s)] < ζ, then Pr[f(x) = f(z) | B(x) =

c] < Pr[f(x) = f(z) | B′(x) = (c, s)] + ζ < 2ζ. Hence, the fraction of the points whose

values are set in the second step is at most 2ζR = ε/4.

Thus, the distance of F from f is bounded by 2ζ + ε/4 < ε/2.

Locating a violated constraint. We now want to use F to “find” the affine constraint

induced in f . Setting d = (d1, . . . , dC) and k = (k1, . . . , kC), we have by Remark 6.8.3 that

the big picture function FB of F will partially (d,k)-induce some constraint from A, and

hence by Remark 6.8.5, it will partially (d,k)-induce some (A, σ) ∈ A of size m 6 ΨA(C, d)

on ` variables. We will show that the original function f violates many instances of this

constraint.

Denote the affine forms in A by (L1, . . . , Lm) and the vector σ by (σ1, . . . , σm). Since

we can assume ` 6 m (without loss of generality by making a change of variables), we can

now define

`A(ε) = ΨA(C2.3.6(∆, η, ρ, ζ, R), d). (6.3)

Let b1, . . . , bm ∈∏Ci=1 Uki+1 correspond to the atoms of B where (A, σ) is partially

(d,k)-induced by FB. That is, b1, . . . , bm are consistent with A, and σi ∈ FB(bi) for every

i ∈ [m]. Also, let b′1, . . . , b′m ∈

∏C ′i=1 Uki+1 index the associated subatoms in B′, obtained

by letting b′j = (bj , s) for every j ∈ [m].

Lemma 6.9.3. The subatoms b′1, . . . , b′m are consistent with A.

Proof. Since b1, . . . , bm are already consistent with A, we only need to show that for every

i ∈ [C + 1, C ′], the sequence (b′1,i, . . . , b′m,i) = (si−C , si−C , . . . , si−C) is (di, ki)-consistent.

This holds because a constant function is of degree 6 di.

88

The main analysis. Let x = (x1, . . . , x`) where x1, . . . , x` are independent random vari-

ables taking values in Fn uniformly. Our goal is to prove a lower bound on

Prx

[f(L1(x)) = σ1 ∧ · · · ∧ f(Lm(x)) = σm] = Ex

[f (σ1)(L1(x)) · · · f (σm)(Lm(x))

]. (6.4)

The theorem obviously follows if the above expectation is larger than the respective δA(ε).

We rewrite the expectation as

Ex

[(f

(σ1)1 + f

(σ1)2 + f

(σ1)3 )(L1(x)) · · · (f (σm)

1 + f(σm)2 + f

(σm)3 )(Lm(x))

]. (6.5)

We can expand the expression inside the expectation as a sum of 3m terms. The

expectation of any term involving f(σj)2 for any j ∈ [m] is bounded in magnitude by

‖f (σj)2 ‖Ud+1 6 η(|B′|), by Lemma 2.4.3 and the fact that the complexity of A is bounded by

d. Hence, the expression (6.5) is at least

Ex

[(f

(σ1)1 + f

(σ1)3 )(L1(x)) · · · (f (σm)

1 + f(σm)3 )(Lm(x))

]− 3mη(|B′|).

Now, because of the non-negativity of f(σj)1 + f

(σj)3 for every j ∈ [m], this is at least

Ex

(f (σ1)1 + f

(σ1)3

)(L1(x)) · · ·

(f

(σm)1 + f

(σm)3

)(Lm(x))

∏j∈[m]

1[B′(Lj)=b′j ]

− 3mη(|B′|),

(6.6)

where 1[B′(Lj)=b′j ]is the indicator function of the event B′(Lj(x)) = b′j . In other words, now

we are only counting patterns that arise from the selected subatoms b′1, . . . , b′m. We next

expand the product inside the expectation into 2m terms. We will show that the contribution

from each of the 2m−1 terms involving f(σk)3 for any k ∈ [m] is small. Such a term is trivially

89

bounded from above by

Ex

∣∣∣f (σk)3 (Lk(x))

∣∣∣ ∏j∈[m]

1[B′(Lj)=b′j ])

. (6.7)

Without loss of generality, we assume that k = 1. This is convenient as by Definition 6.4.1 (i)

we have L1(x) = x1. (For other values of k, we can do a change of variables, replacing x1

with Lk(x), so that we can assume Lk(x) = x1.) With the assumption k = 1, the square of

(6.7) is equal to the following.

Ex

∣∣∣f (σk)3 (x1)

∣∣∣ ∏j∈[m]

1[B′(Lj)=b′j ]

2

6 Ex1

[|f (σk)

3 (x1)|21[B′(Lk)=b′k]

]Ex1

Ex2,...,x`

∏j∈[m]

1[B′(Lj)=b′j ]

2

. (6.8)

By Theorem 2.3.6 (vi) and Lemma 2.2.18, we have

Ex1

[|f (σk)

3 (x1)|21[B′(Lk)=b′k]

]6 ∆2(C) Pr

x1[B′(x1) = b′k]

6 ∆2(C)

(1

‖B′‖+ α(C ′)

)6

2∆2(C)

‖B′‖. (6.9)

Let y = (y2, . . . , y`) where y2, . . . , y` are independent random variables taking values in Fn

uniformly. The second term in the right hand side of (6.8) is equal to

1

‖B′‖2mEx1

Ex2,...,x`

∏i∈[C ′]j∈[m]

1

pki+1

pki+1−1∑λi,j=0

e(λi,j · (Pi(L′j(x))− b′i,j)

)2

90

=1

‖B′‖2mEx1

∑

(λi,j)∈∏i,j [0,p

ki+1−1]

e

− ∑i∈[C ′]j∈[m]

λi,jb′i,j

Ex2,...,x`

e

∑i∈[C ′]j∈[m]

λi,jPi(Lj(x))

2

61

‖B′‖2m∑

(λi,j),(τi,j)∈∏i,j [0,p

ki+1−1]

∣∣∣∣∣∣∣∣∣ Ex,y

e ∑i∈[C ′]j∈[m]

λi,jPi(Lj(x))

e

− ∑i∈[C ′]j∈[m]

τi,jPi(Lj(x1,y))

∣∣∣∣∣∣∣∣∣ .

(6.10)

We can bound the above using Theorem 2.2.21. Let A′ denote the set of 2m linear forms:

Lj(x1, x2, . . . , x`) | j ∈ [m]

∪Lj(x1, y2, . . . , y`) | j ∈ [m]

in variables x1, . . . , x`, y2, . . . , y`. Let Λi and Λ′i denote the (di, ki)-dependency set of A and

A′ respectively.

Lemma 6.9.4. For each i, |Λ′i| = |Λi|2 · pki+1

Proof. Recall that L1(x) = L1(x1,y) = x1. For any λ, τ ∈ Λi and any α ∈ [0, pki+1 − 1],

note that (λ1 + α (mod pki+1), λ2, . . . , λm, τ1 − α (mod pki+1), τ2, . . . , τm) ∈ Λ′i. Hence,

|Λ′i| > |Λi|2 · pki+1. To show |Λ′i| 6 |Λi|

2 · pki+1, we give a map from Λ′i to Λi × Λi that is

pki+1-to-1. Suppose∑mj=1 λjQ(Lj(x1, x2, . . . , x`)) +

∑mj=1 τjQ(Lj(x1, y2, . . . , y`)) ≡ 0 for

every polynomial Q of degree di and depth ki. Setting x2 = . . . = x` = 0 shows that

m∑j=1

τjQ(Lj(x1, y2, . . . , y`)) = −

m∑j=1

λj

Q(x1),

91

and similarly setting y2 = . . . = y` = 0 shows

m∑j=1

λjQ(Lj(x1, x2, . . . , x`)) = −

m∑j=1

τj

Q(x1).

In particular∑mj=1 λj = −

∑mj=1 τj . Consequently,

(λ, τ) 7→

− m∑j=2

λj (mod pki+1), λ2, . . . , λm

,

− m∑j=2

τi (mod pki+1), τ2, . . . , τm

is a map from Λ′i to Λi × Λi. To see that it is pki+1-to-1, note that

(λ1 + τ1 − γ (mod pki+1), λ2, . . . , λm, γ, τ2, . . . , τm) ∈ Λ′i

for every γ ∈ [0, pki+1 − 1], and these elements are all mapped to the same element in

Λi × Λi.

Applying Theorem 2.2.21 (just as in the proof of Theorem 6.6.5), we get that

(6.10) 61

‖B′‖2m

C ′∏i=1

|Λi|2pki+1 + ‖B′‖2mα(C ′)

6 ∏C ′i=1 |Λi|2

‖B′‖2m+ α(C ′).

Combining this with Eq. (6.9) and Eq. (6.8), we obtain

(6.7) 6 2∆(C)

√∏C ′i=1 |Λi|2‖B′‖2m

+ α(C ′). (6.11)

Finally, we turn to the main term in the expansion of Eq. (6.6). We know from

92

Lemma 6.9.3 that the subatoms b′1, . . . , b′m are consistent with A. Thus

Ex

f (σ1)1 (L1(x)) · · · f (σm)

1 (Lm(x)) ·∏j∈[m]

1[B′(Lj)=b′j ]

= Pr[B′(L1(x)) = b′1 ∧ · · · ∧ B

′(Lm(x)) = b′m]·

Ex

[f

(σ1)1 (L1(x)) · · · f (σm)

1 (Lm(x))|∀j ∈ [m], B′(Lj(x)) = b′j]

>

(∏C ′i=1 |Λi|‖B′‖m

− α(C ′)

)ζm. (6.12)

Let us justify the last line. The first term is due to the lower bound on the probability from

Theorem 6.6.5. The second term in (6.12) follows since each f(σj)1 is constant on the atoms

of B′, and because by construction, the big picture function FB of the cleanup function F ,

on which (A, σ) was partially induced, supports a value inside an atom b of B only if the

original function f acquires the value on at least an ζ fraction of the subatom (c, s).

Setting β = (∏C′i=1 |Λi|‖B′‖ )m and combining the bounds from (6.6), (6.11) and (6.12), we

conclude

(6.4) > (β − α(C ′)) ·( ε

8R

)m− 2m+1∆(C)

√β2 + α(C ′)− 3m · η(C ′)

>β

2·( ε

8R

)ΨA(C,d)− 2ΨA(C,d)+1β ·∆(C)− 3ΨA(C,d) · η(C ′)

Since ‖B′‖ 6 pdC′,

• 1‖B′‖ΨA(C,d) 6 β 6 1,

• ∆(C) = 116( ε

8R)ΨA(d,C),

• η(C ′) < 18‖B′‖ΨA(C,d)

( ε24R

)ΨA(C,d),

93

and both C and C ′ are upper-bounded by C2.3.6(∆, η, ρ, ζ, R), we can now define

δA(ε) =1

4p−dΨA(C2.3.6(∆,η,ρ,ζ,R))C2.3.6(∆,η,ρ,ζ,R) ·

( ε

8R

)ΨA(C2.3.6(∆,η,ρ,ζ,R),d)(6.13)

to conclude the proof.

94

CHAPTER 7

ALGORITHMIC ASPECTS OF REGULARITY

The results in this chapter are based on [18]1. This chapter is concerned with the algorithmic

versions of regularity lemmas for polynomials over finite fields. These regularity lemmas

proved by Green and Tao [43] and by Kaufman and Lovett [54] as discussed in Section 2.2.5

show that one can modify a given collection of polynomials F = P1, . . . , Pm into a new

collection F ′ so that the polynomials in F ′ are regular in the sense of Definition 2.2.7. These

lemmas have various applications, such as (special cases of) Reed-Muller testing and worst-

case to average-case reductions for polynomials. However, as mentioned in the introduction,

the transformation from F to F ′ in these works were not algorithmic. In this chapter, we

show that the analytic notions of regularity such as the ones defined in Section 2.2.2, while

being qualitatively equivalent to the above, allow for efficient algorithms.

In particular, for high enough characteristic, in time polynomial in number of variables,

we can refine F into F ′ where every nonzero linear combination of polynomials in F ′ has

desirably small Gowers norm. In the case of small fields, we give algorithmic regularity lem-

mas for two different notions of regularity. First is the notion of strong regularity introduced

by Kaufman and Lovett [54] that allows one to show that a polynomial approximable by a

strongly regular factor is actually exactly computable by it. The second notion is rank of

a polynomial factor, which Tao and Ziegler [78] showed is equivalent to having the analytic

property of low Gowers norm.

Throughout this chapter a polynomial is always a classical polynomial unless it is other-

wise specified to be a non-classical polynomial.

1. This work is based on an earlier work: Algorithmic Regularity for Polynomials and Applications.In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA15), PRDA15, 16, pp. 1870-1889 c© 2015. by the Society for Industrial and Applied Mathematics..10.1137/1.9781611973730.125

95

7.1 Applications

A regular factor is a basic ingredient of higher-order Fourier analysis, and we believe that our

algorithmic regularity lemmas will therefore be helpful in a variety of future applications.

In the following, we examine a few interesting applications of relevance to coding theory,

complexity theory and algorithmic algebra.

7.1.1 Inverse Theorem and Reed-Muller Decoding in High Characteristic

One of our main motivations is finding an algorithmic version of the inverse Gowers Theorem

for polynomials.

Theorem 7.1.1 (Inverse Gowers Theorem for polynomials in high characteristic, [43]). If a

polynomial P of degree d < |F| satisfies ‖eP‖Uk+1 > ε, then there exists a polynomial Q of

degree at most k, such that

|〈eFP, eFQ〉| = |E [x ∈ Fn] eFP (x)−Q(x)| > η,

for some η depending only on ε and d.

The Inverse Gowers Theorem has also been established over small fields by Tao and

Ziegler [78] (Theorem 2.1.10), but here, one needs to extend the given theorem statement to

non-classical polynomials, which are discussed in Section 2.1.1.

Theorem 7.1.1 has a direct interpretation in terms of Reed-Muller codes over F. The

Reed-Muller code of order k over Fn is simply the set of polynomials of degree at most

k over Fn. If d < |F| and for a given polynomial P of degree d, there exists a degree-k

polynomial Q such that Pr x ∈ FnP (x) = Q(x) > 1|F| + ε, which is the same as saying

that dist(P,Q) 6 1− 1|F|−ε (for dist(P,Q) denoting the normalized Hamming distance), then

it follows from the definition of Gowers norms and the Gowers-Cauchy-Schwarz inequality

96

that ‖eFtP‖Uk+1 > ε for a nonzero t ∈ F. Then, Theorem 7.1.1 gives that there exists

a polynomial Q of degree k such that |⟨

eFtP , eFQ⟩| > η, or equivalently, dist(P,Q′) 6

1− 1|F| − η

′ for some η′ > 0 and Q′ of degree k.

Thus, when d < |F|, the Gowers norm gives an approximate test for checking if for a given

polynomial P , there exists a Q of degree at most k within Hamming distance 1− 1|F| − ε. If

there exists a Q, then the Gowers norm is large and if the Gowers norm is larger than ε, then

there exists a Q′ within distance 1 − 1|F| − η

′. This is remarkable because the list-decoding

radius of Reed-Muller of order k codes is only 1− k|F| for k < |F| [33], and the test works even

beyond that. In fact the Hamming distance of a random P is 1 − 1|F| − o(1) from all Q of

degree k, and the test works all the way up to that distance. Moreover, Tao and Ziegler [78]

later showed that this test works not only when P is a polynomial of degree d < |F| (as in

the case of Theorem 7.1.1) but also when P is an arbitrary function and2 k < |F|.

Given the above, it is natural to consider the decoding analogue of the above question:

Given P of degree d over Fn, if there exists Q of degree k such that dist(P,Q) 6

1 − 1|F| − ε, can one find a Q′ (in time polynomial in n) of degree k such that

dist(P,Q′) 6 1− 1|F| − η for some η depending on ε?

Note that d, k and |F| are assumed to be constants and dependence on these is allowed, but

not on n. Also, observe that there might be exponentially many such Q since we are in the

regime beyond the list-decoding radius (see [55]), but the question turns out to be tractable

since we only ask for one such Q and allow a loss from ε to η. Such a decoding question

was solved for Reed-Muller codes of order 2 over F2, for any given function f (instead of a

polynomial P of bounded degree) by [79].

In Section 7.5.2, we show an algorithmic version of Theorem 7.1.1, meaning we explicitly

find a Q which has correlation at least η. In Section 7.5.3, we show how this algorithmic

2. For fields of low characteristic (|F| 6 k), the only testing results of this flavor (which works beyond thelist-decoding radius for arbitrary input functions) were proved by Samorodnitsky [68] for Reed-Muller codesof order 2 over Fn

2 and by Green and Tao [41] for Reed-Muller codes of order 2 over Fn5 .

97

inverse theorem leads to solving the above decoding question for polynomials P of degree d

and the Reed-Muller code of order k for k 6 d < |F|. This special case can be interpreted

as follows: if P is of degree-d for some degree k 6 d < |F|, then we can think of P being

obtained from some Q of degree k (which is a codeword) by adding the “noise” P − Q of

degree d. Thus, when the noise is “structured” i.e. given by a degree-d polynomial, we can

decode in the above sense (of finding some codeword within a given distance) even beyond

the list-decoding radius. Note that d can be much larger than k, so the noise set is still much

larger than the code.

Both our algorithmic version of the regularity lemma and the decoding algorithm, are

randomized algorithms that run in time O(nd) and output the desired objects with high

probability. This is linear time for regularity since even writing a polynomial of degree d

takes time Ω(nd). The algorithms can be derandomized in polynomial time (at the expense

of a very large exponent) using pseudorandom generators for low-degree polynomials 3. For

the question of finding a degree k polynomial within a given distance of P , it is possible that

one may be able to do this in time O(nk), but we do not achieve this.

7.1.2 Efficient Worst-Case to Average-Case Reductions for Polynomials

The central component in Green and Tao’s proof of Theorem 7.1.1 can be viewed as a worst-

case to average-case reduction that is interesting by itself. Let P : Fn → F be a degree-d

polynomial which can be weakly approximated by a lower degree polynomial Q : Fn → F, of

degree < d. Here by weak approximation we mean 〈eFP, eFQ〉 > δ. Then Green and Tao

show that if d < |F|, P can in fact be computed by lower degree polynomials, i.e. there exist

Q1, . . . , Qm of degree < d and a function Γ : FM → F, such that P = Γ(Q1, . . . , QM ). M

depends only on d, δ and |F|. Our algorithmic regularity lemma for polynomials over high

characteristic, when plugged into the arguments of Green and Tao, show that this result

3. We thank Shachar Lovett for this observation

98

can be made algorithmic, meaning that Q1, . . . , QM and Γ can be found in Poly(n) time

(assuming constant d, δ and |F|).

Green and Tao (as well as Lovett, Meshulam and Samorodnitsky [63]) showed that their

proof techniques break down in the general case when |F| may be small. In particular,

Theorem 7.1.1 is false when |F| 6 d. However, Kaufman and Lovett [54] showed that the

above worst-case to average-case reduction is still true for small fields. In order to prove

this result, they introduced a more involved notion of regularity called strong regularity.

Informally, strong regularity for a factor ensures that not only is the joint distribution of the

polynomials in the factor close to uniform, but in fact, the joint distribution of the derivatives

of the polynomial is also close to uniform.

We prove that given a factor F , we can find a strongly unbiased refinement F ′ in

polynomial time. Inserting this algorithmic regularity lemma into Kaufman and Lovett’s

proof shows that given a polynomial P that is known to be correlated with a lower de-

gree polynomial, we can explicitly find polynomials Q1, . . . , QM and a function Γ such that

P = Γ(Q1, . . . , QM ) and M is bounded. A randomized algorithm for recovering Q1, . . . , QM

and Γ with high probability runs in time O(nd), where d = deg(P ).

7.1.3 Algorithmic Inverse Theorem for Polynomials: Fields of Small

Characteristic

Over small fields, the correspondence between high rank and low Gowers norm, which drives

our proof over large characteristic, breaks down and one needs to consider non-classical

polynomials. (see Section 2.1.1 for formal definition).

The corresponding notion of regularity for factors F over a small field is that no nonzero

linear combination P of polynomials in F can be written as a function of a bounded number

M of non-classical polynomials of degree less than deg(P ) (Definition 2.2.7). We prove an

algorithmic regularity lemma that in polynomial time, finds a regular refinement when given

99

as input an arbitrary non-classical polynomial factor of bounded degree.

Using the notion of high-rank non-classical polynomial factors, we obtain an algorithmic

version of the inverse theorem for bounded degree polynomials (for the case when the degree d

is greater than the order p of the underlying prime field) by Tao and Ziegler [78]. We manage

to somewhat simplify the proof of the inverse theorem for bounded degree polynomials (i.e.

Theorem 2.1.10 when f itself is a phase polynomial), by observing that one of its steps can

be generalized to avoid a few other parts of the proof.

7.1.4 Polynomial decompositions

Given a positive integer k, a vector of positive integers ∆ = (∆1,∆2, . . . ,∆k) and a function

Γ : Fk → F, we say that a function P : Fn → F is (k,∆,Γ)-structured if there exist

polynomials P1, P2, . . . , Pk : Fn → F with each deg(Pi) 6 ∆i such that for all x ∈ Fn,

P (x) = Γ(P1(x), P2(x), . . . , Pk(x)).

The polynomials P1, . . . , Pk are said to form a (k,∆,Γ)-decomposition (or simply, a polyno-

mial decomposition). For instance, an n-variate polynomial over the field F of total degree d

factors nontrivially exactly when it is (2, (d−1, d−1), prod)-structured where prod(a, b) = a·b.

In [13], Bhattacharyya noted that for any fixed k, ∆, Γ and F, our algorithmic version

of the Green-Tao regularity lemma over high characteristic allows one to decide (k,∆,Γ)-

structure in time Poly(n) for all input polynomials P : Fn → F whose degree is strictly

less than |F|. However, [13] left open the question of solving the polynomial decomposition

problem over small fields, such as F2, and input polynomials of degree larger than |F|. Here,

we resolve this issue.

At its core, the proof of correctness in [13] used our algorithmic version of the Green-Tao

regularity lemma that applies only for large characteristic. The algorithmic regularity lemma

100

for non-classical factors can then be used in the proof of [13] to show that over any fixed

prime order field F and for any fixed choice of positive integer k, vector of positive integers

∆ = (∆1,∆2, . . . ,∆k) and function Γ : Fk → F, the (k,∆,Γ)-decomposition problem is

solvable in Poly(n) time when the input is a bounded degree polynomial.

7.2 Algorithmic Bogdanov-Viola Lemma; Approximate

Regularization

The Bogdanov-Viola Lemma [21] states that if a polynomial of degree d is biased, then it

can be approximated by a bounded set of polynomials of lower degree. Recall the definition

of bias of a function P : Fn → F:

bias(P ) =∣∣∣Ex

[eF(P (x))]∣∣∣

The following is an easy to observe algorithmic version of this lemma. We reproduce the

proof by Green and Tao [43] while pointing out that all the steps can be performed efficiently.

Lemma 7.2.1 (Algorithmic Bogdanov-Viola lemma). Let d > 0 be an integer, and δ, σ, β ∈

(0, 1] be parameters. There exists a randomized algorithm, that given a polynomial P : Fn →

F of degree d with

bias(P ) > δ,

runs in time Oδ,β,σ(nd), and with probability 1 − β returns functions P : Fn → F and

Γ : FC → F, and a set of polynomials P1, ..., PC , where C 6 |F|5δ2σβ

and deg(Pi) < d for all

i ∈ [C], for which

• Prx(P (x) 6= P (x)) 6 σ, and

• P (x) = Γ(P1(x), ..., PC(x)).

101

Proof. The proof will be an adaptation of the proof from [43]. Notice that we can compute

the explicit description of P in O(nd) queries. For every a ∈ F define the measure µa(t)def=

Pr(P (x) = a+ t). It is easy to see that if bias(P (x)) > δ then, for every a 6= b,

‖µa − µb‖ >4δ

|F|. (7.1)

We will try to estimate each of these distributions. Let

µa(t)def=

1

C

∑16i6C

1P (xi)=a+t,

where C >|F|5δ·β1

, and x1, x2, ..., xC ∈ Fn are chosen uniformly at random. Therefore by an

application of Chebyshev’s inequality

Pr

(|µa(t)− µa(t)| > δ

2|F|2

)<β1

|F|,

for all t ∈ F and therefore

Pr

(‖µa − µa‖ >

δ

2|F|2

)< β1. (7.2)

Now we will focus on approximating P (x). Remember that DhP (x) = P (x+ h)− P (x)

is the additive derivative of P (x) in direction h. We have

Prh

(DhP (x) = r) = Prh

(P (x+ h)− P (x) = r) = µP (x)(r),

where h ∈ Fn is chosen uniformly at random. Let h = (h1, ..., hC) ∈ (Fn)C be chosen

102

uniformly at random, where C is a sufficiently large constant to be chosen later. Define

µ(x)obs(t)

def=

1

C

∑16i6C

1DhjP (x)=t,

and let

Ph(x)def= arg min

r∈F‖µr − µ

(x)obs‖.

Now choosing C > |F|5δ2σβ2

follows

Prh

(Ph(x) 6= P (x)) 6 Prh

(‖µ(x)

obs − µP (x)‖ >δ

|F|

)6 σβ2, (7.3)

where the first inequality follows from (7.1) and (7.2). Therefore

Prx,h

(Ph(x) 6= P (x)) = Ex

Eh1Ph(x)6=P (x)

6 σβ2,

and thus

Prh

[Prx

(Ph(x) 6= P (x)

)> σ

]6 β2.

Let Pi := DhiP , so that Pi is of degree 6 d and Ph is a function of P1, ..., PC . Now

setting β1 := β2|F|2 and β2 := β

2 finishes the proof.

Remark 7.2.2 (Derandomization). We note that Lemma 7.2.1 can be derandomized in

Poly(n) steps using known pseudorandom generators for low degree polynomials over finite

fields [21, 62, 80], as shown in [13], following a suggestion by Shachar Lovett. In particular,

Viola [80] showed that for any integer d > 1 and 0 < ε < 1, there is an explicit generator

g : Fs → Fn such that s = d log|F| n+O(d2d log(1/ε)) and for every polynomial P : Fn → F,

| EY←g(Us)

e(P (Y ))− EX←Un

e(P (X))| < ε

103

where Us is the uniform distribution on Fs and Un the uniform distribution on Fn. In order

to get a deterministic algorithm, we run over all possible seeds s to the generator g. As the

discussion in [13] shows, one can use this technique to derandomize Lemma 7.2.1 in Poly(n)

time. (The exponent on n becomes a large constant proportional to d and C in the proof

above.)

7.2.1 Unbiased Almost-Refinement

Using Lemma 7.2.1, we can deduce the following which can be thought of as an algorithmic

analogue of the regularity lemma, with the caveat that the refinement is approximate.

Lemma 7.2.3 (Unbiased almost Refinement). Let d > 1 be an integer. Suppose γ : N→ R+

is a decreasing function and σ, ρ ∈ (0, 1]. There is a randomized algorithm that given a factor

F of degree d, runs in time Oγ,ρ,σ,dim(F)(nd) and with probability 1−ρ returns a γ-unbiased

factor F ′ with dim(F ′) = Oγ,ρ,σ,dim(F)(1), such that F ′ is σ-close to being a refinement of

F .

Proof. The proof idea is similar to that of Lemma 2.2.27 in the sense that we do the same

type of induction. The difference is that at each step we will have to control the errors that

we introduce through the use of Lemma 7.2.1 and the probability of correctness. At all steps

in the proof without loss of generality, we will assume that the polynomials in the factor are

linearly independent, because otherwise we can always detect such a linear combination in

Oγ,ρ,σ,dim(F)(nd) time and remove a polynomial that can be written as a linear combination

of the rest of the polynomials in the factor.

The base case for d = 1 is simple, a linearly independent set of non-constant linear

polynomials is not biased at all, namely it is 0-unbiased. Assume that F is γ-biased, then

104

there exists a set of coefficients ci,j ∈ F16i6d,16j6Misuch that

bias(∑i,j

ci,jPi,j) > γ(dim(F)).

To detect this, we will use the following algorithm:

We will estimate bias of each of the |F|dim(F) linear combinations and check whether it

is greater than3γ(dim(F))

4 . To do so, for each linear combination∑i,j ci,jPi,j independently

select a set of vectors x1, ..., xC uniformly at random from Fn, and let bias(∑i,j ci,jPi,j)

def=∣∣∣ 1

C

∑`∈[C] eF(y`)

∣∣∣, with y` =∑i,j ci,j ·Pi,j(x`). Choosing C = Odim(F)

(1

γ(dim(F))2 log(1ρ))

,

we can distinguish bias > γ from bias 6 γ2 , with probability 1 − ρ′, where ρ′ := ρ

4|F|dim(F) .

Let∑i,j ci,jPi,j be such that the estimated bias was above

3γ(dim(F))4 and k be its degree.

We will stop if there is no such linear combination or if the factor is of degree 1. Since by a

union bound with probability at least 1− ρ4 , bias(

∑i,j ci,jPi,j) >

γ(dim(F))2 , by Lemma 7.2.1

we can find, with probability 1− ρ4 , a set of polynomials Q1, ..., Qr of degree k− 1 such that

•∑i,j ci,jPi,j is σ

2 -close to a function of Q1, ..., Qr,

• r 6 16|F|5γ(dim(F))2·σ·ρ .

We replace one polynomial of highest degree that appears in∑i,j ci,jPi,j with polynomials

Q1, ..., Qr.

We will prove by the induction that our algorithm satisfies the statement of the lemma.

For the base case, if F is of degree 1, our algorithm does not refine F by design. Notice that

since we have removed all linear dependencies, F is in fact 0-unbiased in this case.

Now given a factor F , if F is γ-biased, then with probability 1 − ρ′ our algorithm will

refine F . With probability 1− ρ4 the linear combination used for the refinement is

γ(dim(F))2 -

biased. Let F be the outcome of one step of our algorithm. With probability 1 − ρ4 , F is

σ2 -close to being a refinement of F . Using the induction hypothesis with parameters γ, σ2 ,

ρ4

we can find, with probability 1 − ρ4 , a γ-unbiased factor F ′ which is σ

2 -close to being a

105

refinement of F and therefore, with probability at least 1 − (ρ4 + ρ4 + ρ

4 + ρ′) > 1 − ρ, is

σ-close to being a refinement of F .

Remark 7.2.4. Note that Lemma 7.2.3 can be derandomized using PRGs for low-degree

polynomials. Remark 7.2.2 discusses derandomization of Lemma 7.2.1. We can similarly

also make the step for estimating the bias be deterministic, by using a different PRG for

fooling each linear combination of polynomials and then running over all seeds to these PRGs.

The resulting running time is Poly(n) with a large exponent.

Let P : Fn → F be a function, and let F be a polynomial factor such that P is measurable

in F . This means that there exists a function Γ : Fdim(F) → F such that P (x) = Γ(F(x)).

Although we know that such a Γ exists, but we do not have explicit description of Γ. It is a

simple but useful observation that we can provide query access to Γ in case F is unbiased.

It is worth recording this as the following lemma.

Lemma 7.2.5 (Query access). Suppose that β ∈ (0, 1] and let F = P1, . . . , Pm be a

γ-unbiased factor of degree d, where γ : Fn → F is a decreasing function which decreases

suitably fast. Suppose that we are given query access to a function f : Fn → F, where f is

measurable in F , namely there exists Γ : Fdim(F) → F such that f(x) = Γ(F(x)). There is

a randomized algorithm that, takes as input a value a = (a1, . . . , am) ∈ Fm, makes a single

query to f , runs in O(nd), and returns a value y ∈ F such that Γ(a) = y with probability

greater than 1− β.

Proof. Choose γ(x) := 12px . Applying Lemma 2.2.19 for the case of classical polynomials,

we get

|F−1(a)| > 1

‖F‖− 1

2‖F‖=

1

2‖F‖.

Let x1, ..., xK be chosen uniformly at random from Fn, where K > 2‖F‖ log 1β . Then with

probability at least 1 − β, there is i∗ ∈ [K] such that F(xi∗) = (a1, . . . , am). This means

that xi∗ ∈ F−1(a). Notice that we can find this i∗, since we have explicit description of

106

polynomials in F and we can evaluate them on each xi. Our algorithm will query f on input

xi∗ and return f(xi∗).

7.3 Algorithmic Uniform Refinement in high Characteristic

In this section we shall assume that the field F has large characteristic, namely |F| is greater

than the degree of the polynomials and the degree of the factors that we study. We will prove

the following lemma which states that we can efficiently refine a given factor to a uniform

factor.

Lemma 7.3.1 (Uniform Refinement). Suppose d < |F| is a positive integer and ρ ∈ (0, 1]

is a parameter. There is a randomized algorithm that, takes as input a factor F of degree

d over Fn, and a decreasing function γ : N → R+, runs in time Oρ,γ,dim(F)(nd), and with

probability 1 − ρ outputs a γ-uniform factor F , where F is a refinement of F , is of same

degree d, and |dim(F)| σ,γ,dim(F) 1.

Note 7.3.2. Notice that unlike Lemma 7.2.3, here F ′ is an exact refinement of F . This

will be achieved by the use of Proposition 7.3.4 which roughly states that approximation of a

polynomial by a uniform factor implies its exact computation.

The following proposition of Green and Tao states that if a polynomial is approximated

well enough by a regular factor, then it is computed by the factor.

Proposition 7.3.3 ([43]). Suppose that d > 1 is an integer. There exists a constant σd > 0

such that the following holds. Suppose that P : Fn → F is a polynomial of degree d and

that F is an F -regular factor of degree d− 1 for some increasing function F : N→ N which

increases suitably rapidly in terms of d. Suppose that P : Fn → F is an F-measurable

function and that Pr(P (x) 6= P (x)) 6 σd. Then P is itself F-measurable.

Our proof of Lemma 7.3.1 will require the following variant of Proposition 7.3.3 which

follows by Remark 2.2.17.

107

Proposition 7.3.4. Suppose that d > 1 is an integer. There exists σ7.3.4(d) > 0 such that

the following holds. Suppose that P : Fn → F is a polynomial of degree d and that F is

a γ7.3.4-uniform factor of degree d − 1 for some decreasing function γ7.3.4 which decreases

suitably rapidly in terms of d. Suppose that P : Fn → F is an F-measurable function and

that Pr(P (x) 6= P (x)) 6 σ7.3.4(d). Then P is itself F-measurable.

We will prove Lemma 7.3.1 by induction on the dimension vector of F . For the in-

duction step, we check whether there is a linear combination of polynomials in F that has

large Gowers norm. We will use this to replace a polynomial P from F with a set of lower

degree polynomials. To do this, we first approximate P with a few lower degree polynomi-

als Q1, ..., Qr, then use the induction hypothesis to refine Q1, ..., Qr to a uniform factor

Q1, ..., Qr′ and use Proposition 7.3.4 to conclude that P is measurable in Q1, ..., Qr′.

Proof of Lemma 7.3.1: Let F be a factor of degree d. Similar to the proof of Lemma 7.2.3

we will induct on dimension vector (M1, ...,Md). At all steps of the proof we may assume

that all polynomials in F are homogeneous, that is, all multinomials are of same degree. This

is because otherwise, we may replace each polynomial Pi,j with i homogeneous polynomials

that sum up to Pi,j . Moreover, as in the proof of Lemma 7.2.3, without loss of generality

we may assume that the polynomials in F are linearly independent, since otherwise we can

detect and remove linear dependencies in Odim(F)(nd) time.

Given the above assumptions, the base case for d = 1 is simple, a linearly independent

homogeneous factor is 0-uniform. Let d > 1 and assume that F is not γ-uniform, then there

exists a set of coefficients ci,j ∈ F16i6d,16j6Misuch that

‖eF(∑i,j

ci,j · Pi,j)‖Uk > γ(dim(F)),

where k = deg(∑i,j ci,j · Pi,j). We will use the following randomized algorithm to detect

such a linear combination:

108

We will estimate ‖eF(∑i,j ci,j ·Pi,j)‖2

k

Ukfor each of the |F|dim(F) linear combinations and

check whether it is greater than3γ(dim(F))2k

4 . To do this, as in the proof of Lemma 7.2.3, in-

dependently for each linear combination∑i,j ci,j ·Pi,j select C sets of vectors x(`), y

(`)1 , ..., y

(`)k

chosen uniformly at random from Fn for 1 6 ` 6 C, and compute

∣∣∣∣∣∣∑`∈[C]

eF

( ∑I⊆[k]

(−1)|I| ·∑i,j

ci,j · Pi,j(x(`) + y(`)I )

)∣∣∣∣∣∣Choosing C = Odim(F)

(1

γ(dim(F))2 log(1ρ))

, we can distinguish between Gowers norm

6 γ(F)

21/2kand Gowers norm > γ(F), with ρ′ = ρ

4|F|dim(F) probability of error.

Let Q :=∑i,j ci,j · Pi,j , with deg(Q) = k, the detected linear combination for which

the estimated ‖eF(Q)‖2kUk

is greater than3γ(dim(F))2k

4 . By a union bound on all the linear

combinations, with probability at least 1− ρ4 , ‖eF(Q)‖2k

Uk> γ(dim(F))2k

2 . Notice that

bias(DQ) = ‖e(Q)‖2k

Uk>γ (dim(F))

2,

where DQ : (Fn)k → F is defined as

DQ(y1, ..., yk)def= Dy1Dy2 · · ·DykQ(x).

Let σ and γ be as in Proposition 7.3.4. By Lemma 7.2.1 we can find, with probability 1− ρ4 ,

a set of polynomials Q1, ..., Qr of degree k − 1 such that

• DQ is σ-close to being measurable with respect to Q1, ..., Qr.

• r 6 8|F|5

γ(dim(F))2k+1σρ

.

By the induction hypothesis, with probability 1− ρ4 , we can refine Q1, ..., Qr to a new

factor Q1, ..., Qr′, which is γ-uniform. It follows from Proposition 7.3.4 that DQ is in fact

109

measurable in Q1, ..., Qr′. Since |F| > k we can write the following Taylor expansion

Q(x) =DQ(x, ..., x)

k!+R(x), (7.4)

where R(x) is a polynomial of degree 6 k− 1 which can be computed from Q. Thus we can

replace a maximum degree polynomial Pi,j , that appears in Q, with polynomials Q1, ..., Qr′ ,

and R(x). The proof follows by using the induction hypothesis for parameters ρ4 and γ.

Remark 7.3.5. The above proof can be derandomized in polynomial time. Lemma 7.2.1 can

be made deterministic as discussed after its proof. Here, the additional randomness is in

estimating the Gowers norms of each linear combination of polynomials. Again, using pseu-

dorandom generators for degree-d polynomials lets us make this estimation be deterministic.

For more details, see [13].

7.4 Algorithmic Regularity in Low Characteristic

As mentioned in previous sections, Gowers uniformity for polynomial factors fails to address

“bias implies low rank” in the case when the field F has small characteristic. Kaufman and

Lovett [54] introduce a stronger notion of regularity in order to handle the general case. To

define their notion of regularity, we first have to define the derivative space of a factor.

Definition 7.4.1 (Derivative Space). Let F = P1, ..., Pm be a polynomial factor over Fn.

Recall that

DhP (x) = P (x+ h)− P (x),

is the additive derivative of P in direction h, where h is a vector from Fn. We define

Der(F)def= (DhPi)(x) : i ∈ [m], h ∈ Fn,

as the derivative space of F .

110

Definition 7.4.2 (Strong regularity of polynomials [54]). Suppose that F : N → N is an

increasing function. Let F = P1, . . . , Pm be a polynomial factor of degree d and ∆ : F → N

assigns a natural number to each polynomial in F . We say that F is strongly F -regular with

respect to ∆ if

1. For every i ∈ [m], 1 6 ∆(Pi) 6 deg(Pi).

2. For any i ∈ [m] and r > ∆(Pi), there exist a function Γi,r such that

Pi(x+ y[r]) = Γi,r(Pj(x+ yJ ) : j ∈ [m], J ⊆ [r], |J | 6 ∆(Pj)

),

where x, y1, ..., yr are variables in Fn, and yJdef=∑k∈J yk for every J ⊆ [r],

3. For any r > 0, let C = ci,I ∈ Fi∈[m],I⊆[r],|I|6∆(Pi)be a collection of field elements.

Define

QC(x, y1, ..., yr)def=

∑i∈[m],I⊆[r],|I|6∆(Pi)

ci,I · Pi(X + YI).

Let F ′ ⊆ F be the set of all Pi’s which appear in QC . There do not exist polynomials

P1, ..., P` ∈ Der(F ′), and I1, ..., I` ⊆ [r], with l 6 F (dim(F)), for which QC can be

expressed as

Γ(P1(x+ yI1), ..., P`(x+ yI`)

),

for some function Γ : F` → F.

The following lemma states that every polynomial factor can be refined to a strongly

regular factor.

Lemma 7.4.3 (Strong-Regularity Lemma, [54]). Let F : N→ N be an increasing function.

Let F = P1, ..., Pm be a polynomial factor of degree d. There exists a refinement F ′ =

P1, . . . , Pr of same degree d along with a function ∆ : F → N such that

• F ′ is strongly F -regular with respect to ∆,

111

• dim(F ′) = OF,dim(F),d(1).

Moreover they prove that if a polynomial is approximated by a strongly regular factor,

then it is in fact measurable with respect to the factor.

Lemma 7.4.4 ([54]). For every d there exists a constant εd = 2−Ω(d) such that the following

holds. Let P : Fn → F be a polynomial of degree d, P1, ..., Ps be a strongly regular collection

of polynomials of degree d− 1 with degree bound ∆ and Γ : Fs → F be a function such that

Γ(P1, ..., Ps) is εd-close to P . Then there exists Γ′ : Fs → F such that

P (x) = Γ′(P1(x), ..., Ps(x)).

Lemma 7.4.3 and Lemma 7.4.4 combined with Bogdanov-Viola Lemma immediately gives

the following theorem.

Theorem 7.4.5 (Bias implies low rank for general fields [54]). Let P : Fn → F be a

polynomial of degree d such that bias(P ) > δ > 0. Then rankd−1 6 c(d, δ,F).

In Section 7.4.1 we will give an algorithmic version of this theorem. To do this, as in

previous sections, we will have to use a notion of regularity which is well suited for algorithmic

applications. Uniform factors (Definition 2.2.16) fail to address fields of low characteristic,

for the reason that to refine a factor to a uniform factor we make use of division by d! which

is not possible in fields with |F| 6 d. Strong regularity of [54] suggests how to extend the

notion of unbiased factors to a stronger version in quite a straight forward fashion.

Definition 7.4.6 (Strongly γ-unbiased factors). Suppose that γ : N → R+ is a decreasing

function. Let F = P1, ..., Pm be a polynomial factor of degree d over Fn and let ∆ : F → N

assign a natural number to each polynomial in the factor. We say that F is strongly γ-

unbiased with degree bound ∆ if

1. For every i ∈ [m], 1 6 ∆(Pi) 6 deg(Pi).

112

2. For every i ∈ [m] and r > ∆(Pi), there exists a function Γi,r such that

Pi(x+ y[r]) = Γi,r(Pj(x+ yJ ) : j ∈ [m], J ⊆ [r], |J | 6 ∆(Pj)

).

3. For any r 6 0 and not all zero collection of field elements

C = ci,I ∈ Fi∈[m],I⊆[r],|I|6∆(Pi)we have

bias

∑i∈[m],I⊆[r],|I|6∆(Pi)

ci,I · Pi(x+ yI)

6 γ(dim(F)),

for random variables x, y1, . . . , yr ∈ Fn.

We will say F is ∆-bounded if it only satisfies (1) and (2).

In Section 7.4.1 we will show how to refine a given factor to a strongly unbiased one.

This will allow us prove an algorithmic analogue of Lemma 7.4.4, which will be presented in

the next section.

7.4.1 Algorithmic approximate strongly unbiased refinement

In this section we will use strongly unbiased factors (Definition 7.4.6) to prove Proposi-

tion 7.3.3 and Proposition 7.3.4, respectively, without the assumption on |F| being larger

than d. The following claim states that although in part (3) there is no upper bound on r,

choosing a suitably faster decreasing function γ′ = γ2d , one can assume that r 6 d.

Claim 7.4.7. Let F = P1, . . . , Pm be a polynomial factor of degree d, and ∆ : F →

N be a function such that F is ∆-bounded. Suppose that γ : N → R+ is a decreasing

function such that for every r ∈ [d], x ∈ Fn and not all zero collection of coefficients

113

ci,Ii∈[m],I⊆[r],|I|6∆(Pi)

bias

∑i∈[m],I⊆[r],|I|6∆(Pi)

ci,I · Pi(x+ yI)

6 γ(dim(F))2d , (7.5)

where y1, . . . , yr are variables from Fn. Then F is strongly γ-unbiased.

Proof. Let r > d be an integer, x ∈ Fn be fixed and ci,Ii∈[m],I⊆[r],|I|6∆(Pi)be a set of

coefficients, not all of which are zero. We will show that

bias

∑i∈[m],I⊆[r],|I|6∆(Pi)

ci,I · Pi(x+ yI)

6 γ(dim(F)).

Define QC(y1, . . . , yr)def=∑i∈[m],I⊆[r],|I|6∆(Pi)

ci,I · Pi(x + yI), and let I ⊆ [r] be maximal

such that ci,I is nonzero for some i ∈ [m]. Without loss of generality assume that I =

1, 2, . . . , |I|. We will derive QC in directions y1, . . . , y|I|, namely

DQC,y1,...,yr(h1, . . . , h|I|) := Dh1· · ·Dh|I|QC(y1, . . . , yr),

where hi is only supported on yi coordinates. Notice that, since I was maximal, DQC does

not depend on any yi with i ∈ [r]\I, namely, for every such J ⊆ [r] with J 6⊆ I, and every

i ∈ [m], Dh1· · ·Dh|I|Pi(x+ yJ ) ≡ 0. Moreover

Dh1· · ·Dh|I|QC(y1, . . . , yr) =

∑i∈[m],J⊆[r],|J |6∆(Pi)

ci,J ·Dh1· · ·Dh|I|Pi(x+ yJ )

=∑i∈[m]

ci,I ·Dh1· · ·Dh|I|Pi(x+ yI)

=∑

i∈[m],J⊆Ici,I · (−1)|J | · Pi((x+ yI) + hJ ).

Applying the claim below for the random variable x′ := x + yI and taking the bias over

114

random variables x′, h1, . . . , h|I| we have that

γ(dim(F))2d > bias(DQC,y1,...,yr(x

′, h1, . . . , h|I|))> bias (QC(y1, . . . , yr))

2|I| .

Now the claim follows by the fact that |I| 6 d.

Following is a simple repeated application of the Cauchy-Schwarz inequality.

Claim 7.4.8.

bias(PΛ)2r 6 bias(DPΛ).

Proof. It suffices to show that we have

∣∣∣∣ Ey1,...,yk,h∈Fn

eF(PΛ(y1 + h, y2, . . . , yk)− PΛ(y1, y2, . . . , yk)

)∣∣∣∣>

∣∣∣∣ Ey1,...,yk

eF(PΛ(y1, . . . , yk)

)∣∣∣∣2 .This is a simple application of Cauchy Schwarz inequality

∣∣∣∣ Ey1,...,yk,h

eF(PΛ(y1 + h, y2, . . . , yk)− PΛ(y1, . . . , yk)

)∣∣∣∣= Ey2,...,yk

∣∣∣EzeF(PΛ(z, y2, . . . , yk)

)∣∣∣2>

∣∣∣∣ Ez,y2,...,yk

eF(PΛ(z, y2, . . . , yk ∈ Fn))

∣∣∣∣2 .

Claim 7.4.7 suggests a way of refining a factor to a strongly uniform one: We will estimate

the bias of every possible linear combination of the polynomials over all points x+ yI where

I ⊆ [r] and r 6 d, and once we have detected a biased combination, refine using Lemma 7.2.1.

This is made possible by the fact that r and dim(F) are both bounded.

115

Lemma 7.4.9 (Algorithmic strongly γ-unbiased refinement). Suppose that σ, β ∈ (0, 1] are

parameters, and γ : N → R+ is a decreasing function. There is a randomized algorithm

that given a factor F = P1, ..., Pm of degree d, runs in Oγ(nd), with probability 1 − β,

returns a strongly γ-unbiased factor F ′ with degree bound ∆ such that F ′ is σ-close to being

a refinement of F .

Proof. We will design a refinement process which has to stop in a finite number of steps

which only depends on dim(F), σ, β, and γ. We will start with the initial factor being

F and we will set ∆(Pi) := deg(Pi). Notice that this automatically satisfies the first two

properties of Definition 7.4.6, namely ∆-boundedness. This is because for every r > deg(Pi)

we can use the derivative property

Pi(x+ y[r]) =∑I([r]

(−1)r−|I| · Pi(x+ yI).

to write Pi(x+ y[r]) as a function of Pi(x+ yJ )J⊆[r],|J |6d.

As explained in proof of Lemma 7.3.1, at each step, we may assume without loss of

generality that all polynomials in the factor are homogeneous. Moreover, we will assume

that for every r 6 d, the set of polynomials Pi(x + yI)i∈[m],I⊆[r],|I|6∆(Pi)is linearly

independent, where x, y1, . . . , yr ∈ Fn are variables. This is because at each step we have

access to explicit description of the polynomials in the factor, therefore we can compute

each of the |F|B(r) possible linear combinations, where B(r) =∑i∈[m]

∑j∈[∆(Pi)]

(rj

), and

check whether it is equal to zero or not. Suppose that for a set of coefficients C = ci,I ∈

Fi∈[m],I⊆[r],|I|6∆(Pi)we have QC

def=∑i∈[m],I⊆[r],|I|6∆(Pi)

ci,IPi(x + yI) ≡ 0. Let Pi be

a polynomial that appears in QC and let I be maximal such that ci,I 6= 0. We can let

∆(Pi) := |I| − 1, and remove Pi from the factor if ∆(Pi) becomes zero.

We will stop refining if F is of degree 1. Notice that a linearly independent factor

of degree 1 is not biased at all, and therefore strongly 0-unbiased. Assume that F is not

116

strongly γ-unbiased with respect to the current ∆, then there exists r and a set of coefficients

C = ci,I ∈ Fi∈[m],I⊆[r],|I|6∆(Pi)for which

bias(QC) = bias

∑i∈[m],I⊆[r],|I|6∆(Pi)

ci,IPi(x+ yI)

> γ(dim(F)).

To detect this, we will use the following algorithm: Set K := |F|B(r). We can estimate

the bias of each of the K possible linear combinations QC and check whether our estimate

is greater than3γ(dim(F))

4 . Letting β′ := β4K , we can distinguish bias > γ(dim(F)) from

bias 6 γ(dim(F))2 correctly with probability 1 − β′ by computing the average of eF (QC) on

Odim(F))(1

γ(dim(F))2β) random sets of vectors x, y1, . . . , yr. Let C be such that the estimated

bias for QC was greater than3γ(dim(F))

4 , and let k = deg(QC). By a union bound with

probability 1− β4 , bias(QC) >

γ(dim(F))2 , and by Lemma 7.2.1 we can find, with probability

1− β4 , a set of polynomials Q1, . . . , Qs : (Fn)r+1 → F of degree k − 1 such that

• QC is σ2 -close to a function of Q1, . . . , Qs,

• s 6 16|F|5γ(dim(F))·σ·β .

Moreover we know from the proof of Lemma 7.2.1 that for each j ∈ [s] we have

Qj(x, y1, . . . , yr) = QC(x+ hx, y1 + h1, . . . , yr + hr)−QC(x, y1, . . . , yr)

=∑

i∈[m],I⊆[r],|I|6∆(Pi)

ci,I ·Dhx+hIPi(x+ yI).

for some fixed vectors hx, h1, . . . , hr ∈ Fn. Let R(j)i,I := Dhx+hIPi so that deg(R

(j)i,I ) <

deg(Pi) and Qj is measurable in R(j)i,I (x+ yI)i∈[m],I⊆[r],|I|6∆(Pi)

.

Let Pi be a polynomial of maximum degree that appears in QC , and let I be maximal such

that ci,I 6= 0. We will add each R(j)i,I to F with ∆(R

(j)i,I ) = deg(R

(j)i,I ) and let ∆(Pi) := |I|−1

and discard Pi if |I| becomes zero. Notice that the new factor is ∆-bounded because Pi(x+

117

yI) can be written as a function of Qj(x, y1, . . . , yr)j∈[s] and Pj(x + yJ )J⊆I,|J |6∆(Pj).

Moreover each Qj is measurable in R(j)i,I (x+ yI)i∈[m],I⊆[r],|I|6∆(Pi)

.

We can prove that the process stops after a constant number of steps by a strong induction

on the dimension vector (M1, . . . ,Md) of the factor F , whereMi is the number of polynomials

of degree i in the factor and d is the degree of the factor. The induction step follows from the

fact that at every step we discard Pi or decrease the ∆(Pi) for some i and add polynomials

of lower degree. Let F ′ and ∆ : F ′ → N be the output of the algorithm. The claim follows

by our choices for error distances and the probabilities.

Remark 7.4.10. The randomness involved in estimating the bias can be removed by using

PRGs for polynomials over Fn.

7.4.2 Algorithmic exact strongly unbiased refinement

It is not difficult to check that Kaufman and Lovett’s proof of Lemma 7.4.4, can be adapted

to a proof of the following analogue for our notion of strongly unbiased factors.

Lemma 7.4.11 (Approximation by Strongly Unbiased Factor implies Computation). For

every d > 0 there exists a constant σd and a decreasing function γ : N → R+ such that the

following holds. Let P : Fn → F be a polynomial of degree d, F = P1, . . . , Pm be a strongly

γ-unbiased polynomial factor of degree d− 1 with degree bound ∆, and let Γ : Fm → F be a

function such that P is σd-far from Γ(F). Then P is in fact measurable in F .

Remark 7.4.12. (Exact refinement) One can use the above lemma to modify the refine-

ment process of Lemma 7.4.9 to achieve exact refinement instead of approximate refinement.

This can be done by using induction on the degree of the factor and at every round refining

the new polynomials that are to be added to the factor to a strongly unbiased set of polynomi-

als. Then Lemma 7.4.11 ensures that the removed polynomial is measurable in the new set

of polynomials and therefore at every step we have exact refinement of the original factor.

118

7.5 Applications in High Characteristic

In this section we show some applications of the uniformity notion, and our algorithm for

uniform refinement of a factor. Throughout the section we shall assume that the field F has

large characteristic, namely |F| is greater than the degree of the polynomials and the degree

of the factors that we study.

7.5.1 Computing Polynomials with Large Gowers Norm

Following is an immediate corollary of Lemma 7.2.1, Lemma 7.2.5, and Proposition 7.3.4.

Lemma 7.5.1. Suppose that an integer d satisfies 0 6 d < |F|. Let δ, σ, β ∈ (0, 1]. There is

a randomized algorithm that given a polynomial P : Fn → F of degree d such that bias(P ) >

δ > 0, runs in Oδ,β,σ(nd) and with probability 1 − β, returns a polynomial factor F =

Pi,j16i6d−1,16j6Miof degree d− 1 and dim(F) = Oδ,β,σ(1) such that P is measurable in

F , namely there is a function Γ such that

P (x) = Γ((Pi,j(x)

)16i6d−1,16j6Mi

).

Moreover for every query to the value of Γ(·) for a vector from Fdim(F) the algorithm

returns the correct answer with probability 1− β.

Proof. Let σd and γ1 be as in Proposition 7.3.4, and let γ2 be as in Lemma 7.2.5. Set

γd := minγ1, γ2. By Lemma 7.2.1 since bias(P ) > δ, in time Oσd,β(nd), with probability

1− β2 we can find a polynomial factor F of degree d− 1 such that

• P is σd2 -close to a function of F

• dim(F) 6 4|F|5δ2σdβ

By Lemma 7.3.1, with probability 1− β2 , we can find a γd-uniform factor F ′ such that F ′ is

σd2 -close to being a refinement of F .

119

Thus with probability greater than 1 − β, P is σd-close to a function of the strongly

γd-unbiased factor F ′, and by Lemma 7.4.11 P is measurable in F ′. The query access to Γ

can be granted by Lemma 7.2.5.

The following is an algorithmic version of Proposition 6.1 of [43], which states that given

a polynomial P such that eF(P ) has large Gowers norm, then we can compute P by a few

lower degree polynomials.

Proposition 7.5.2 (Computing Polynomials with High Gowers Norm). Suppose that |F| >

d > 2 and that δ, β ∈ (0, 1]. There is a randomized algorithm that given a polynomial

P : Fn → F of degree d with ‖eF(P (x))‖Ud > δ, runs in Oδ,β(nd) and with probability 1−β,

returns a polynomial factor F of degree d− 1 such that

• There is a function Γ : Fdim(F) → F such that P = Γ(F).

• dim(F) = Od,δ,β(1).

Moreover, for every query to the value of Γ(·) for a vector from Fdim(F), the algorithm

returns the correct answer with probability 1− β.

Proof. Write ∂dP (h1, ..., hd) := Dh1· · ·DhdP (x). Since P has degree d, ∂dP does not

depend on x. From the definition of the Ud norm, we have

bias(∂dP ) = ‖e(P )‖2d

Ud> δ2d .

Applying Lemma 7.5.1 to ∂dP , with probability 1− β2 , we can find a factor F of degree

d− 1, such that dim(F) = Oδ,β,d(1) and ∂dP is measurable in F .

It is easy to check that since |F| > d, we have the following Taylor expansion

P (x) =1

d!∂dP (x, ..., x) +Q(x),

120

where Q is a polynomial of degree6 d−1. We can find explicit description of P , and therefore

Q in O(nd). Let F ′ := F ∪ Q, and let γ be as in Lemma 7.2.5 for error parameter β. By

Lemma 7.3.1, with probability 1− β2 , we can refine F ′ to a γ-uniform factor F of same degree

d−1 with dim(F) = Odim(F ′),β,γ(1). Now query access can be granted by Lemma 7.2.5.

7.5.2 Algorithmic Inverse Theorem in High Characteristic

Proposition 7.5.2 allows us to prove the following algorithmic version of Theorem 2.1.10.

Theorem 7.5.3. Suppose that |F| > d > 2 and that ε, β ∈ (0, 1]. There is an ηε,β,d ∈

(0, 1], and a randomized algorithm that given a polynomial P : Fn → F of degree d with

‖eF(P (x))‖Uk+1 > ε, runs in Oδ,β(nd) and with probability 1 − β, returns a polynomial Q

of degree 6 k such that

|〈eF(P ), eF(Q)〉| > η.

Proof. By Proposition 7.5.2, with probability 1 − β4 we can find a polynomial factor F of

degree d−1 such that P is measurable in F . Let γ : N→ R+ be a decreasing function to be

specified later. By Lemma 7.3.1, with probability 1− β4 , we can refine F to a γ-uniform factor

F = P1, ..., Pm of same degree d− 1, with dim(F) = Oγ,β,ε(1). Since P is measurable in

F , there exists Γ : Fdim(F) → F such that P = Γ(F). Letting f := eF(P ), and using the

Fourier decomposition of eF(Γ) we can write

eF(P (x)

)=

L∑i=1

ci eF(〈α(i),F〉(x)

), (7.6)

where L = |F|dim(F) = Oγ,β(1), α(i) ∈ Fdim(F), and

〈α(i),F〉(x)def=

m∑j=1

α(i)j · Pj(x).

Notice that the terms in (7.6), unlike Fourier characters, are not orthogonal. But since the

121

factor F is γ-uniform, Lemma 2.2.22 ensures approximate orthogonality. Let Qi := 〈α(i),F〉.

Choose γ(u) 6 σ|F|2u , so that γ(dim(F)) 6 σ

L2 , where σ := ε2k+1

4 . It follows from the near

orthogonality of the terms in (7.6) by Lemma 2.2.22 that

|ci − 〈f, eF(Qi)〉| 6σ

L, (7.7)

and ∣∣∣∣‖f‖22 − L∑i=1

c2i

∣∣∣∣ 6 σ. (7.8)

Claim 7.5.4. There exists δ′(ε, |F|) ∈ (0, 1] such that the following holds. Assume that f

and F are as above. Then there is i ∈ [L], for which deg(Qi) 6 k and∣∣〈f, eF(Qi)〉

∣∣ > δ′.

Proof. We will induct on the degree of F . Assume for the base case that F is of degree k,

i.e. d = k + 1, thus applying the following Cauchy-Schwarz inequality

ε2k+16 ‖f‖2

k+1

Uk+1 6 ‖f‖22‖f‖2k+1−2∞ , (7.9)

and (7.8) imply that there exists i ∈ [L] such that c2i >ε2k+1−σ

L = 3ε2k+1

4L , which combined

with (7.7) implies that |〈f, eF(Qi)〉| > ε2k+1

2L .

Now for the induction step, assume that d > k + 1. We will decompose (7.6) into two

parts, first part consisting of the terms of degree 6 k and the second part consisting of the

terms of degree strictly higher than k. Namely, letting S := i ∈ [L] : deg(Qi) 6 k we

write f = g + h where g :=∑i∈S cieF(Qi) and h :=

∑i∈[L]\S cieF(Qi). Notice that by the

triangle inequality of Gowers norm, our choice of γ, and the fact that F is γ-uniform

‖h‖Uk+1 6∑

i∈[L]\S|ci| · ‖eF(Qi)‖Uk+1 6 L · ε

2k+1

4L2=ε2k+1

4L,

122

and thus

‖g‖Uk+1 >ε

2.

Now the claim follows by the base case.

Let δ′(ε, |F|) be as in the above claim, and choose η := δ′4 . We will use the following

theorem of Goldreich and Levin [31] which gives an algorithm to find all the large Fourier

coefficients of eF(Γ).

Theorem 7.5.5 (Goldreich-Levin theorem [31]). Let ζ, ρ ∈ (0, 1]. There is a randomized

algorithm, which given oracle access to a function Γ : Fm → F, runs in time O(m2 logm ·

poly(1ζ , log(1

ρ)))

and outputs a decomposition

Γ =∑i=1

bi · eF(〈ηi, x〉) + Γ′,

with the following guarantee:

• ` = O( 1ζ2 ).

• Pr[∃i : |bi − Γ(ηi)| > ζ/2

]6 ρ.

• Pr

[∀α such that |f(α)| > ζ, ∃i ηi = α

]> 1− ρ.

We will use the above theorem with parameters ζ := δ′2 and ρ := β

4 , and use the random-

ized algorithm in Lemma 7.2.5 to provide answer to all its queries to Γ. Choose γ suitably

small, so that with probability at least 1− β4 our answer to all queries to Γ are correct. By

Claim 7.5.4 there is i ∈ [L] such that Γ(α) = ci >3δ′4 . With probability 1− β

4 there is j such

that ηj = αi and with probability at least 1− β4 , |bj − ci| 6 ζ

2 6δ′4 , and therefore ci >

δ′2 .

By a union bound, adding up the probabilities of the errors, with probability at least

1− β, we find Qi such that∣∣〈f, eF(Qi)〉

∣∣ > δ′4 = η.

123

7.5.3 Decoding Reed-Muller Codes with Structured Noise

In this section we discuss the task of decoding Reed-Muller codes when the noise is “struc-

tured”. Recall that the codewords in a Reed-Muller code of order k, correspond to evalua-

tions of all degree k polynomials in n variables over F. We will consider the task of decoding

codewords when the noise is structured in the sense that the applied noise to the codeword

itself is a polynomial of higher degree d. In other words, we are given a polynomial P of

degree d > k with the promise that it is “close” to an unknown degree k polynomial, and

the task is to find a degree k polynomial Q that is somewhat close to P . The assumption of

d > k is made, because the case d 6 k is trivial.

Theorem 7.5.6 (Reed-Muller Decoding). Suppose that d > k > 0 are integers, and let

ε ∈ (0, 1] and β ∈ (0, 1] be parameters. There is δ(ε, d, k) > 0 and a randomized algorithm

that, given a polynomial P : Fn → F of degree d, with the promise that there exists a

polynomial Q of degree k such that Pr(P (x) = Q(x)) > 1|F| + ε, runs in O(nd), and with

probability 1− β returns a polynomial Q of degree k such that

Pr(P (x) = Q(x)) >1

|F|+ δ.

Before presenting the proof of Theorem 7.5.6 we will look at the relations between Ham-

ming distance of P and Q, correlation between eF(P ) and eF(Q), and Gowers norm of

eF(P ). The following claim relates the Hamming distance of two polynomials P and Q to

the correlation between eF(P ) and eF(Q).

Claim 7.5.7. Suppose that ε ∈ (0, 1], and let P,Q : Fn → F be two functions such that

Prx(P (x) = Q(x)) > 1/|F|+ ε. Then there exists a nonzero t ∈ F for which

∣∣〈eF(t · P ), eF(t ·Q)〉∣∣ =

∣∣∣ExeF(t · (P (x)−Q(x))

)∣∣∣ > ε.

124

Proof. We have

1

|F|+ ε 6 Pr

x∈Fn(P (x) = Q(x)

)= Ex∈Fn

[1P (x)=Q(x)

]= Ex∈Fn

1

|F|∑t∈F

eF(t · (P (x)−Q(x))

)=

1

|F|∑c∈F〈eF(t · P ), eF(t ·Q)〉

=1

|F|

1 +∑

t∈F\0〈eF(t · P ), eF(t ·Q)〉

,

and thus there is t ∈ F\0 for which∣∣ 〈eF(t · P ), eF(t ·Q)〉

∣∣ > ε.

Assume that P : Fn → F is a polynomial of degree d and Q : Fn → F is a polynomial of

degree k < d. Using Lemma 2.1.9 with f∅ := eF(P −Q) and fS := 1 for S 6= ∅ implies that

| 〈eF(P ), eF(Q)〉 | = | Ex∈Fn

eF(P (x)−Q(x))| 6 ‖eF(P (x)−Q(x))‖Ud = ‖eF(P (x))‖Ud ,

(7.10)

where the last equality follows from the fact that deg(Q(x)) = k < d.

Proof of Theorem 7.5.6: Since there exists a polynomial Q of degree k with Pr(P (x) 6=

Q(x)) 6 ε, by Claim 7.5.7 there exists a nonzero t ∈ F such that

∣∣〈eF(t · P ), eF(t ·Q)〉∣∣ > ε.

Notice that

‖eF(t · P )‖Ud > ‖eF(t · P )‖Uk+1 > 〈eF(t · P ), eF(t ·Q)〉 > ε,

where the second inequality follows from (7.10) since deg(t · Q) 6 k. Notice that we may

find such a constant t ∈ F\0 with high probability by estimating the Uk+1 norm of tP for

125

all |F| − 1 possible choices of t. Set P := t · P .

Now by Theorem 7.5.3 there exists ηd,β,ε ∈ (0, 1] such that we can find in time O(nd),

with probability 1− β2 , a polynomial Q such that

∣∣∣〈eF(P ), eF(Q)〉∣∣∣ > η.

The following claim shows that, there is a constant shift of eF(Q) that approximates

eF(P ).

Claim 7.5.8. Let P and Q be as above. There is a randomized algorithm that with probability

1− β2 returns an h ∈ F for which

Pr(P (x) = Q(x) + h) >1

|F|+

η

2|F|2

Proof. Since P (x) − Q(x) takes values in F, there must be a choice of r ∈ F such that

Pr(P (x) − Q(x) = r) > 1|F| + η

|F|2 . Similar to the proof of Lemma 7.2.1 defining µ0(r) =

Pr(P (x) − Q(x) = r), we can find the estimate measure µobs by K = O|F|,δ′,β(1) random

queries to P (x)− Q(x) such that

Pr

(∃r ∈ F :

∣∣µ0(r)− µobs(r)∣∣ > η

2|F|2

)6β

2.

Choosing h ∈ F such that µobs(h) is maximized, we have

• Pr(P (x) = Q(x) + h

)> 1|F| + η

2|F|2

• deg(Q(x) + h) = k.

Since P = t · P , the same also holds between P and t−1(Q + h). The probability of

correctness is now ensured by a union bound.

126

Remark 7.5.9. The randomness involved in using the Goldreich-Levin algorithm can be

removed. Note that at the expense of a greater running time, we can simply look at all the

Fourier coefficients of Γ since its domain is of constant size. Finally, Claim 7.5.7 can be

derandomized in the same way as Lemma 7.2.1, by using PRGs with seed length O(log n) for

fooling polynomials over Fn.

7.6 Applications n Low Characterstc

7.6.1 Computing a biased Polynomial

Having Lemma 7.4.9 and Lemma 7.4.11 in hand we immediately have the following analogue

of Theorem 7.4.5, which states that if a polynomial is biased then we can find a factor that

computes it.

Theorem 7.6.1 (Computing a biased polynomial). Let β ∈ (0, 1] be an error parameter.

There is a randomized algorithm that given a polynomial P : Fn → F of degree d such that

bias(P ) > δ > 0, runs in Oδ,β(nd) and with probability 1 − β, returns a polynomial factor

F = P1, ..., Pc(d,δ) of degree d− 1 such that P is measurable in F .

Proof. Let σd and γd be as in Lemma 7.4.11. By Lemma 7.2.1 since bias(P ) > δ, in time

Oσd,β(nd), with probability 1 − β2 we can find a polynomial factor F of degree d − 1 such

that

• P is σd2 -close to a function of F

• dim(F) 6 4|F|5δ2σdβ

By Lemma 7.4.9, with probability 1− β2 , we can find a strongly γd-unbiased factor F ′ with

degree bound ∆ such that F ′ is σd2 -close to being a refinement of F .

Thus with probability greater than 1 − β, P is σd-close to a function of the strongly

γd-unbiased factor F ′, and it follows from Lemma 7.4.11 that P is measurable in F ′.

127

The algorithm can be made deterministic in time Poly(n) (with a large exponent), as

indicated in the previous sections.

7.6.2 Worst Case to Average Case Reduction

Here we will show how Theorem 7.6.1 implies an algorithmic version of worst case to average

case reduction from [54]. To present the result, we first have to define what it means for a

factor to approximate a polynomial.

Definition 7.6.2 (δ-approximation). We say that a function f : Fn → F δ-approximates a

polynomial P : Fn → F if ∣∣∣∣ Ex∈Fn

[eF(P (x)− f(x)

)]∣∣∣∣ > δ.

Kaufman and Lovett use Lemma 7.4.4 to show the following reduction.

Theorem 7.6.3 (Theorem 3 of [54]). Let P (x) be a polynomial of degree k, g1, ..., gc poly-

nomials of degree d, and Λ : Fc → F a function such that composition Λ(g1(x), . . . , gc(x))

δ-approximates P . Then there exist c′ polynomials h1, . . . , hc′ and a function Γ : Fc′ → F

such that

P (x) = Γ(h1(x), . . . , hc′(x)).

Moreover, c′ = c′(d, c, k) and each hi is of the form p(x + a) − p(x) or gj(x + a), where

a ∈ Fn. In particular, if k 6 k − 1 then deg(hi) 6 k − 1 also.

Here we will design a randomized algorithm that given g1, . . . , gk can compute a set of

h1, . . . , hc′ efficiently.

Theorem 7.6.4 (Worst-case to average case reduction). Let δ, β ∈ (0, 1] be parameters.

There is a randomized algorithm that takes as input

• A polynomial P : Fn → F of degree d

128

• A polynomial factor F = P1, . . . , Pm of degree d− 1

• A function Λ such that Λ(F) δ-approximates P

and with probability at least 1 − β, returns a polynomial factor F ′ = R1, . . . , Rm′ and a

function Γ : Fdim(F ′) → F such that

P (x) = Γ(R1(x), . . . , Rm′(x)),

moreover c′ = Odim(F),β,δ,d(1).

Proof. Looking at the Fourier decomposition of eF(Λ)(y1, . . . , ym), since Λ(P1, . . . , Pm) δ-

approximates P , there must exist α = (α1, . . . , αm) ∈ Fc such thatQα(x) =∑i∈[m] αi·Pi(x),

δ′-approximates P , where δ′ > δ|F|m . We will estimate |bias(P − Qα))| for every α ∈ Fm.

For each α, we can distinguish bias(P −Qα)) 6 δ′2 from bias(P −Qα) > δ′, with probability

1− β3|F|m , by evaluating eF(P (x)−Qα(x)) on C = Odim(F)

(1δ′2

log( 1β ))

random inputs. Let

α∗ ∈ Fm be such that our estimate for bias(P −Qα) is greater than 3δ′4 .

With probability at least 1 − β3|F|m we will find such α∗, and by a union bound with

probability at least 1 − β3 , bias(P − Qα∗) > δ′

2 . Now applying Theorem 7.6.1 to P − Qα∗

with parameters β3 and δ′

2 , we find a polynomial factor F ′ = R1, . . . , Rm of degree d− 1,

such that with probability 1− β3 , P −Qα∗ is measurable in F ′. Namely there exists Γ′ such

that P −Qα∗(x) = Γ′(F ′(x)) and therefore

P (x) = Γ′(R1(x), . . . , Rm(x)) +Qa∗(x).

This algorithm can be derandomized in time Poly(n), using the remarks made in the

previous sections.

129

7.7 Algorithmic Inverse Theorem for Polynomials

The proof of the inverse theorem for Gowers norms (Theorem 2.1.10 [78]) is based on ergodic

theory and a pure combinatorial proof of the inverse theorem for general functions is not

yet known, but [78] give an analytical proof for the case of bounded degree polynomials. In

this section we will show how to conclude an algorithmic inverse theorem for polynomials

of bounded degree from the proof presented in [78]. Observing that one can generalize one

of the steps in the proof by [78] we manage to somewhat simplify the proof of the inverse

theorem for bounded degree polynomials by skipping a few of the steps.

Theorem 7.7.1 (Algorithmic Inverse for Polynomials). Let s > 0 be an integer, and let

ε, µ ∈ (0, 1] be parameters. There is a randomized algorithm that given a (non-classical)

degree-s polynomial P : Fn → T with ‖P‖Us > ε, runs in Oε,s,β(nd) and with probability

1− µ returns a polynomial Q with deg(Q) 6 s− 1 such that

∣∣〈e(P ), e(Q)〉∣∣ > δ(ε, s) > 0.

To prove this we first show how to compute such a polynomial by a lower degree poly-

nomial factor.

Theorem 7.7.2 (Computing polynomials with large Gowers norm). Let s > 0 be an integer

and let µ ∈ (0, 1] be a parameter. There is a randomized algorithm that given a (non-

classical) polynomial P : Fn → T with ‖P‖2sUs > ε, runs in Os,β(ns) and with probability

1− µ outputs the following

1. A polynomial Q : Fn → T with

(a) deg(Q) = s.

(b) P and Q have the same s-th derivative. In particular deg(P −Q) 6 s− 1.

130

2. Polynomials (possibly non-classical) Q1, . . . , QC : Fn → T for some C = Os,β,ε(1)

such that Q is measurable in Q1, . . . , QC.

We will prove this theorem in Section 7.7.2. Being able to algorithmically compute

functions of large Gowers norm allows us to algorithmically refine any polynomial factor to

a uniform factor, in the sense of Definition 2.2.16 in low characteristic. Notice that the two

obstacles in the proof of Lemma 7.3.1 were that we were not able to compute polynomials of

high Gowers norm in the low characteristic case and the reason was that we used (7.4) which

is not possible when |F| 6 d. Theorem 7.7.2 allows us to compute each linear combination of

polynomials in the factor that has large Gowers norm, and thus the following is a corollary

to Theorem 7.7.2.

Corollary 7.7.3 (Uniform Refinement in Low Characteristic). Suppose d is a positive integer

and ρ ∈ (0, 1] is a parameter. There is a randomized algorithm that, takes as input a factor

F of degree d over Fn, and a decreasing function γ : N→ R+, runs in time Oρ,γ,dim(F)(nd),

and with probability 1 − ρ outputs a γ-uniform factor F , where F is a refinement of F , is

of same degree d, and |dim(F)| σ,γ,dim(F) 1.

Notice that both F and F ′ are now (non-classical) polynomial factors. Theorem 7.7.1

follows as a corollary by a proof identical to that of Theorem 7.5.6. Theorem 7.7.2 and

Corollary 7.7.3 can be derandomized just as it was possible in the high characteristic case.

7.7.1 Derivative Polynomials as Symmetric Multilinear Polynomials

Let P : Fn → T be a degree-d polynomial, possibly non-classical. Define the derivative

polynomial DP : (Fn)d → T by the following formula

DP (h1, . . . , hd)def= Dh1

· · ·DhdP (x),

131

where h1, . . . , hd ∈ Fn. Lemma 2.1.6 shows some useful properties of the derivative polyno-

mial. Motivated by the properties of the derivative function, we define a class of polynomials.

Definition 7.7.4 (Classical Symmetric Multilinear Polynomials). For every d, denote by

HCSMd(Fn) the set of all polynomials Q : (Fn)s → F, s 6 d, which satisfy properties

(i), (ii), (iii) and (iv) from Lemma 2.1.6. In particular for every degree-d polynomial P ,

DP ∈ HCSMd(Fn). The reason we use the name “HCSM” is that CSM which was defined

in [78] is reserved for the subclass of classical symmetric multilinear polynomials which are

the derivative polynomials of classical polynomials.

Remark 7.7.5. It is observed in [78] that if P : Fn → T is a classical degree-d polynomial

then DP (h1, . . . , hd) vanishes whenever at least p of the h1, . . . , hd are equal. Let CSMd(Fn)

denote the set of polynomials P ∈ HCSMd(Fn) which satisfy this condition.

In what follows we shall study some explicit structural properties of CSMd, HCSMd, and

derivative polynomials.

Definition 7.7.6 (Concatenation). Let P ∈ HCSMk(Fn) and Q ∈ HCSMl(Fn), for integers

l, k > 1. Define the concatenation P ∗Q ∈ HCSMk+l(Fn) as

(P ∗Q)(y1, . . . , yk+l)def=

∑A⊆[k+l],|A|=k

P((yi)i∈A

)·Q((hj)j∈[k+l]\A

).

The following lemma shows how the derivative polynomial acts on product of two poly-

nomials.

Lemma 7.7.7 ([78]). Let P,Q : Fn → T be degree-k and degree-l polynomials respectively.

Then we have the following identity

D(PQ) = DP ∗DQ.

132

The following lemma by Tao and Ziegler shows that every polynomial in CSMd(Fn) has

a d-th root which is a classical polynomial.

Lemma 7.7.8 (Part of Lemma 4.5 in [78]). Let d > 1 be an integer, then for every Q ∈

CSMd(Fn) there is a degree-d classical polynomial P : Fn → T for which DP = Q.

It is not difficult to extend this to the more general case of HCSMd(Fn) and non-classical

polynomials.

Lemma 7.7.9. Let d > 1 be an integer, then for every Q ∈ HCSMd(Fn) there is a degree-d

(non-classical) polynomial P : Fn → T for which DP = Q.

Proof. We know that Q is a classical polynomial so it there is a unique representation of Q

as following

Q(h1, . . . , hd) = ι

∑i1,...,id∈[n]

ci1,...,idh1,i1 · · ·hd,id

,

where hk = (hk,1, . . . , hk,n), and ci1,...,id ∈ F are coefficients. As a consequence of symmetry

of Q, the coefficients ci1,...,id are symmetric with respect to permutations of (i1, . . . , id) and

thus Q must be an integer linear combination of expressions of the form

ι

∑(i1,...,id):i1,...,id=A

h1,i1 · · ·hd,id

(7.11)

where A is a multiset of cardinality d with elements from [n]. For j ∈ [n], let aj denote the

multiplicity of j in the set A. Using this we can rewrite (7.11) as

Q1 ∗Q2 ∗ · · · ∗Qn,

where Qj(y1, . . . , yaj ) = y1,jy2,j · · · yaj ,j . Assume that aj = rj + (p − 1)tj where 1 6 rj 6

133

p− 1. Define

Pj(x) =|xj |rj

ptj+1mod 1.

Notice that deg(Pj) = rj + (p − 1)tj = aj and thus by Lemma 2.1.6 DPj ∈ HCSMaj (Fn).

It is easy to see that DPj(y1, . . . , yaj ) only depends on variables y1,j , . . . , yaj ,j and finaly it

follows by multilinearity of DPj that there exists c ∈ F\0 such that

DPj(y1, . . . , yaj ) = c y1,j · · · yaj ,j .

Now letting P ′j := c−1Pj , it follows by Lemma 7.7.7 that

D(∏j∈[n]

P ′j) = DP ′1 ∗ · · · ∗DP′n = Q1 ∗Q2 ∗ · · · ∗Qn.

We shall define the symmetric power of a polynomial.

Definition 7.7.10 (Symmetric Power). Let Q be a polynomial in HCSMk(Fn) for k > 1.

Define the symmetric power Symm(Q) ∈ HCSMmk(Fn) for m > 1 by

Symm(Q)(y1, . . . , ymk)def=∑A

∏A∈A

Q((yi)i∈A

),

where the sum ranges over all partitions A of 1, . . . ,mk into m subsets of size k.

Lemma 6.2 in [78] shows that for every R ∈ CSMd(Fn) there is a classical degree md

polynomial for which DQ = SymmR. By slight modification of their proof we extend this

to HCSMd(Fn).

Lemma 7.7.11 (Symmetric Power Rule). Let d,m be integers and let R ∈ HCSMd(Fn)

be a classical symmetric multilinear polynomial. Then there exists a degree-md polynomial

134

Q : Fn → T such that

DQ = Symm(R).

Moreover if m > 1 then Q can be written as a function of a polynomial of degree 6 md− 1.

Proof. By Lemma 7.7.9 there is a (non-classical) degree-d polynomial P for which DP = R.

Assume that P has depth k > 0.

Let M > 0 be such that pM 6 m < pM+1. There exists a degree d+M(p−1) polynomial

PM : Fn → T for which pM · PM ≡ P mod 1. Notice that PM is not unique, as pMT ≡ 0

mod 1 for any polynomial T of depth at most M . We can pull PM to Z/pk+M+1Z and obtain

a degree-(d+M(p− 1)) polynomial PM : Fn → Z/pk+M+1Z for which P = PMpk+1 mod 1.

Let Q :=(PMm )pk+1 mod 1. We claim that

(i) deg(Q) 6 md.

(ii) DQ = Symm(R).

Parts (i) and (ii) together imply that deg(Q) = md. The last part of the theorem follows

from the fact that d+M(p−1) < md when m > 1 and that Q can be expressed as a function

of PM which is of degree d+M(p− 1).

Part (i) will be a special case of the following claim.

Claim 7.7.12. Let j > 0 and m′ 6 m be parameters. Then

deg

(Dh1···DhjPMm′

)pk+1

mod 1

6 d− j + (m′ − 1) ·max(d− j, 1).

Proof. We break the claim to two cases.

Case 1, (d−j+(m′−1)·max(d−j, 1) < 0): This in particular implies that m′ < j−d. We

need to show that(Dh1

···DhjPMm′

)

pk+1 mod 1 ≡ 0 or equivalently that(Dh1

···DhjPMm′

)is divisible

by pk+1. Notice that deg(PM ) = d+M(p−1) which implies that deg(Dh1· · ·DhjPM ) 6 d−

135

j+M(p−1) for any fixed choice of vectors h1, · · · , hj ∈ Fn. Thus pM−aDh1· · ·DhjPM = 0

for any choice of a for which d − j + a(p − 1) 6 0. This implies that Dh1· · ·DhjPM is

divisible by pa+k+1. It is not difficult to check that choosing a := bm′−1p−1 c we have that(Dh1

···DhjPMm′

)is divisible by pk+1 since m′ < p

bm′−1p−1 c+1

= pa+1.

Case 2, (d− j+ (m′−1) ·max(d− j, 1) > 0): We will prove this by downward induction

on j and upward induction on m′. It is sufficient to show that

deg

Dhj+1

(Dh1···DhjPMm′

)pk+1

mod 1

6 d− j + (m′ − 1) ·max(d− j, 1)− 1.

We will make use of the following identity(r+sm′)

=∑m′i=0

(ri

)( sm′−i

)and write

Dhj+1

(Dh1···DhjPMm′

)pk+1

=

∑m′i=1

(Dh1···Dhj+1

PMi

)(Dh1···DhjPMm′−i

)pk+1

, (7.12)

where the equalities are modulo 1. Now by the two strong induction hypotheses the degree

of each summand in the right-hand-side summation is at most

d− (j + 1) + (i− 1) max(d− (j + 1), 1) + d− j + (m′ − i) max(d− j, 1)

6 d− j + (m′ − 1) max(d− j, 1)− 1,

which concludes the claim.

Now (i) follows by choosing j = 0 and m′ = m. We move to (ii), namely we want to

prove that DQ = SymmDP = SymmR. Notice that

Dh

(PMm

)pk+1

=

∑mi=1

(DhPMi

)( PMm−i

)pk+1

mod 1,

and thus the terms with i > 2 have degree at most mk − i 6 mk − 2 and thus do not

136

contribute to DQ. Therefore

DQ(h1, . . . , hmd) = D

(Dhmd

(PMm

)pk+1

)(h1, . . . , hmd−1)

= D

(DhmdPM

pk+1

)∗D

(( PMm−1

)pk+1

)

=∑

16i1<···<id−1<md

DP (hi1 , . . . , hid−1, hmd)D

(PMm− 1

)(hj1 , . . . , hj(m−1)d

),

where 1 6 j1 < · · · < j(m−1)d < md are such that j1, . . . , j(m−1)d = 1, . . . ,md −

1\i1, . . . , id−1 and the second equality follows by Lemma 7.7.7. Now the claim DQ =

SymmDP follows by induction on m.

7.7.2 Computing Polynomials with Large Gowers Norm

In this section we will prove Theorem 7.7.2. The next theorem is an algorithmic general-

ization of Theorem 6.6 in [78] from CSM polynomials to the more general class of HCSM

polynomials. The theorem shows what kind of structure can be obtained from biased classical

symmetric multilinear polynomials.

Theorem 7.7.13. Let s > 0 be an integer and ε, β ∈ (0, 1] be parameters. Suppose that

P ∈ HCSMs(Fn) is such that bias(P ) > ε. Then there exists a bounded index subspace V of

Fn such that on V s, P is a linear combination (over F) of a bounded number of expressions

of the form

Symm1(Q1) ∗ · · · ∗ Symmr(Qr), (7.13)

for some m1, . . . ,mr > 1 and 2 6 k1, . . . kr 6 s− 1 and Qi ∈ HCSMki(V ) for i ∈ [r] with

m1k1 + · · ·+mrkr = s.

Moreover, there is a randomized algorithm that given such P , runs in Oβ,ε(ns) and with

137

probability 1− β outputs P as a linear combination of terms of form Eq. (7.13) explicitly.

We are given a polynomial P ∈ HCSMs(Fn) that is biased. We can use Theorem 7.6.1

to compute P with a few polynomials of degree 6 s− 1. We will show that this can be done

using only classical symmetric multilinear polynomials. This requires going into the proofs

of Lemma 7.2.1 and Lemma 7.4.9 and the following claim which states that a derivative of a

classical symmetric multilinear polynomial can be written as linear combination of classical

symmetric multilinear polynomials.

Claim 7.7.14. Let s > 0 be an integer, P ∈ HCSMs(Fn), and h = (h1, . . . , hs) ∈ (Fn)s.

Then the additive derivative of P in direction h, DhP , can be written as a linear combination

of 2s − 1 polynomials(QS)S⊂[s],S 6=∅, where QS ∈ HCSMs−1(Fn).

Proof. We can write

DhP (x1, . . . , xs) = P (x1 + h1, . . . , xs + hs)− P (x1, . . . , xs)

=∑S⊂[s]S 6=∅

P ((xi)i/∈S , (hi)i∈S)

where the second equality follows from multilinearity of P . Now letting

QS := P ((xi)i/∈S , (1n)s−|S|) we have that QS ∈ HCSMs−|S|(Fn) ⊆ HCSMs−1(Fn).

Now we are able to show that a biased HCSM polynomial can be computed by an HCSM

factor of lower degree.

Definition 7.7.15 (HCSM-Factors). Let d > 0 be an integer. A collection

F = Pi,j16i6d,16j6mifor natural numbers m1, . . . ,md with Pi,j ∈ HCSMi(Fn) is called

an HCSM-factor of degree d.

Definition 7.7.16 (Measurability). Let d > k > 0 and let F = Pi,j16i6k,16j6mibe an

HCSM-factor of degree k. A polynomial P ∈ HCSMd(Fn) is said to be measurable in F , if

138

there exists a collection

A = α = (iα, jα, Sα) : 1 6 iα 6 k, jα < Miα , Sα ⊆ [d], |Sα| = iα,

and a function Γ : FA → F such that

P (h1, . . . , hd) = Γ(Piα,jα

((h`)`∈Sα

): α ∈ A

)Claim 7.7.14 allows us to compute a biased HCSM polynomial by an HCSM-factor of

lower degree.

Corollary 7.7.17. Let β ∈ (0, 1] be an error parameter. There is a randomized algorithm

that given a polynomial P ∈ HCSMs(Fn) with bias(P ) > δ > 0, runs in Oδ,β,γ(nd) and with

probability 1− β, returns an HCSM-factor F = Pi,j16i6d−1,16j6miof degree d− 1 such

that P is measurable in F .

Proof. We observe that Lemma 7.2.1 which is used in the proof of Theorem 7.6.1 and in every

step of Lemma 7.4.9, approximates (and therefore computes) a polynomial by a set of its

derivatives. The corollary then follows by using Claim 7.7.14 at every step. The γ-unbiased

factor can be achieved using the similar modification of Lemma 7.2.3.

Definition 7.7.18 (Strongly γ-unbiased HCSM-factors). Let γ : N → R+ be a decreasing

function, and let d > k > 0. Suppose that F = Pi,j16i6d,16j6miis an HCSM-factor of

degree d and ∆ = δi,j ∈ [i]16i6d,16j6miis a collection of degree bounds. We say that F is

strongly γ-unbiased if for every i ∈ [d], Fi = Pi,j(h1, . . . , hi)16j6Mdis strongly γ-unbiased

with degree bounds ∆i = δi,j16j6miin the sense of Definition 7.4.6.

Lemma 7.7.19 (Computation by strongly γ-unbiased factors). Let β ∈ (0, 1] be an error

parameter, and let γ : N → R+ be a decreasing function. There is a randomized algorithm

that given a polynomial P ∈ HCSMd(Fn) with bias(P ) > δ > 0, runs in Oδ,β,γ(nd) and with

139

probability 1− β, returns a strongly γ-unbiased HCSM-factor F = Pi,j16i6d−1,16j6miof

degree d− 1 with degree bound ∆ = δi,j ∈ [i]16i6d−1,16j6misuch that P is measurable in

F .

Proof. By Corollary 7.7.17 we can find an HCSM-factor F = Pi,j16i6d−1,16j6misuch

that P is measurable in F . Now one can use Lemma 7.4.9 and Claim 7.7.14 to repeatedly

refine the factor until we obtain a strongly γ-unbiased HCSM-factor.

Now we are ready to prove Theorem 7.7.13.

Proof of Theorem 7.7.13: We are given a polynomial P ∈ HCSMs(Fn) that is biased.

Let γ : N→ R+ be a decreasing function to be specified later. By Lemma 7.7.19 with high

probability we can find a strongly γ-unbiased HCSM-factor F = Pi,j16i6s−1,16j6misuch

that P is measurable in F . Namely there exists a collection A = α = (iα, jα, Sα) : 1 6

iα 6 s− 1, 1 6 jα < Miα , Sα ⊆ [s], |Sα| = iα and a function Γ : FA → F such that

P (h1, . . . , hs) = Γ(Piα,jα

((h`)`∈Sα

): α ∈ A

)For technical reasons to be stated later linear polynomials are problematic for the argument

and we will pass to a bounded index subspace V of Fn where F has no linear polynomials.

Notice that Γ must be invariant under permutations of h1, . . . , hs. We will show that Γ is

moreover multilinear.

Claim 7.7.20 (HCSM Analogue of Proposition 8.1 from [78]). For a suitably fast decreas-

ing choice of the function γ : N → R+, Γ((xα)α∈A) is a linear combiantion (over F) of

monomials

xα1 · · ·xαr ,

where α` = (iα` , jα` , Sα`) for ` = 1, . . . , r are elements of A for which Sα1 , . . . , Sαr partition

1, . . . , s.

140

Proof. The proof will follow the lines of the proof from [78], except now the factor is an

HCSM-factor instead of CSM-factor and the factor is strongly γ-unbiased instead of regular.

We will deduce the equidistribution properties of the HCSM-factor from Lemma 2.2.19.

Split A = A3s ∪A 63s, where A3s is the set of α = (iα, jα, Sα) ∈ A for which s ∈ Sα and

A63s = A\A3s. We will split FA = FA3s×FA 63s in the natural way, namely for every x ∈ FA

we write x = (x3s, x63s) where x3s ∈ FA3s and x 63s ∈ FA 63s . We will show the following

linearity

Γ(x3s + y3s, x63s) = Γ(x3s, x63s) + Γ(y3s, x63s), (7.14)

for every x3s, y3s ∈ FA3s and x 63s ∈ FA 63s . To prove (7.14), by multilinearity of the poly-

nomials P and Pi,j for i ∈ [s − 1], j ∈ [mi], it suffices to find h1, . . . , hs, h′s ∈ Fn such

that

• Piα,jα((h`)`∈Sα) = x3s, for every α ∈ A3s,

• Piα,jα((h`)`∈Sα) = x 63s, for every α ∈ A63s, and

• Piα,jα((h`)`∈Sα,i 6=s, h′s) = y3s, for every α ∈ A3s.

The existence of such vectors h1, . . . , hd, h′d is implied by the claim below. Now we can write

Γ(x3s, x63s)+Γ(y3s, x63s) = P (h1, . . . , hs)+P (h1, . . . , hs−1, h′s) = P (h1, . . . , hs−1, hs+h′s)

= Γ

((Piα,jα

((h`)`∈Sα\s, hs + h′s

))α∈A3s

,(Piα,jα

((h`)`∈Sα

))α∈A 63s

)= Γ(x3s + y3s, x63s),

where the second equality is multilinearity of P and the last equality is multilinearity of

the polynomials Pi,j . The following claim shows that choosing a suitably fast decreasing γ

implies existence of vectors h1, . . . , hs, h′s.

141

Claim 7.7.21. Let s > 0. There is a sufficiently fast decreasing choice of γ : N → R+

such that for every degree (s − 1) γ-unbiased HCSM-factor F = Pi,j16i6s−1,16j6mithe

following holds. For every collection

A = α = (iα, jα, Sα) : iα ∈ [s− 1], jα ∈ [mi], Sα ⊆ [s+ 1], |Sα| = iα,

and collection of field elements (xα)α∈A, there exists h1, . . . , hs+1 ∈ Fn such that for every

α ∈ A,

Piα,jα((h`)`∈Sα

)= xα.

Proof. We will prove this by showing that the distribution of the vector(Piα,jα

((h`)`∈Sα

))α∈A for uniformly at random h1, . . . , hs+1 can be made arbitrarily close

to the uniform distribution over F|A|. To do this, it is sufficient to prove that for every

collection of coefficients λα ∈ Fα∈A we have

∣∣∣∣∣∣ Eh1,...,hs+1∈Fn

e

∑α∈A

λαPiα,jα((h`)`∈Sα

)∣∣∣∣∣∣ 6 γ(dim(F))2−s+1.

This will be a corollary to Lemma 2.2.19. Let β ∈ A be such that |Sβ | = iβ is maximized as

λβ is nonzero. Without loss of generality assume that Sβ = 1, . . . , k, where iβ = k. We

will derive PA(h1, . . . , hs+1)def=∑α∈A λαPiα,jα

((h`)`∈Sα

)in directions h1, . . . , hk. Notice

that for any α with Sα 6= Sβ , the summand corresponding to α does not depend on at least

one vector among h1, . . . , hk and will therefore vanish. Thus we have

142

DY1DY2· · ·DYkPA(h1, . . . , hs+1) =

∑α∈A:

Sα=1,...,k

λαDy1Dy2 · · ·DykPiα,jα(h1, . . . , hk)

=∑α∈A:

Sα=1,...,k

λαPiα(y1, . . . , yk),

where Y1, . . . , Yk ∈ (Fn)s+1 are vectors that are supported only on h1, . . . , hk respectively

and yi is the restriction of Yi to the hi coordinates. The second equality follows from the

fact that Pi,j are multilinear and classical. Now similar to Claim 7.4.8 we have that

∣∣∣∣∣∣ Eh1,...,hs+1∈Fn

e

∑α∈A

λαPiα,jβ((h`)`∈Sα

)∣∣∣∣∣∣2k

6

∣∣∣∣∣∣∣∣∣ Ey1,...,yk∈Fn

e

∑α∈A:Sα=Sβ

λαPiα,jα(y1, . . . , yk)

∣∣∣∣∣∣∣∣∣ 6 γ(|dim(F)),

where the first inequality is repeated Cauchy-Schwarz inequality and the second inequality

follows by F being γ-unbiased. Now the claim follows by the fact that k = iβ 6 s− 1.

By symmetry (7.14) implies that

Γ(x3j + y3j , x63j) = Γ(x3j , x63j) + Γ(y3j , x63j), (7.15)

for every j ∈ [s], where A3j and A 63j are defined analogously to A3s and A63s. This shows

that Γ is multilinear. Now we show that all monomials of Γ are of the form xα1 · · ·xαr where

Sα1 , . . . , Sαr partition 1, . . . , s. For an arbitrary α ∈ A, we will look at DyαΓ where yα is

only supported on xα. Let β ∈ A be such that Sβ intersects Sα, and let ` be an arbitrary

element of Sβ ∩ Sα. Using (7.15) for j = ` implies that DyαΓ does not depend on xβ .

143

This implies that all the monomials of Γ are of the form xα1 · · · xαr where Sα1 , . . . , Sαr are

disjoint subsets of [s]. The fact that [s] = ∪ri=1Sαr follows from multilinearity of Γ.

By Claim 7.7.20, we have

P (y1, . . . , ys) =∑

α1,...,αr

cα1,...,αr · Piα1 ,jα1((yi)i∈Sα1

) · · ·Piαr ,jαr ((yi)i∈Sαr ),

for field elements cα1,...,αr , where yis are vectors from Fn, α1, . . . , αr are collections of

elements of A for which Sα1 , . . . , Sαr partition [s]. Since P is invariant under permutations

of (y1, . . . , ys), so should be the coefficients cα1,...,αr and therefore we may split this sum into

orbits of the action of the permutation group, and conclude that P is a linear combination

of basic symmetric monomials

Symm1(Piβ1,jβ1

) ∗ · · · ∗ Symml(Piβ` ,jβ`),

with m1iβ1+ · · · + m`iα` = s. Note that iα1 , . . . , iα` > 2 because we had discarded all of

the linear polynomials from our factor by restricting to the bounded index subspace V . This

finishes the proof of the existence of such a decomposition. It is left to show that we can

find the decomposition algorithmically. This is simple using Lemma 7.2.5 for Γ, since the

factor is regular and provided that γ decreases suitably fast. Since Γ : FA → F, we need only

O|A|(1) queries to Γ to find explicit representation as a linear combination of monomials,

and this can be guaranteed with high probabality using Lemma 7.2.5.

Theorem 7.7.2 now follows as a corollary.

Proof of Theorem 7.7.2: By Lemma 2.1.6 we know that DP ∈ HCSMs(Fn) and

bias(DP ) > ε. By Theorem 7.7.13 we can find with probability 1− µ a bounded index V of

Fn and write DP as a linear combination (over F) of a bounded number of expressions of

144

the form

Symm1(Q1) ∗ · · · ∗ Symmr(Qr), (7.16)

for some m1, . . . ,mr > 1 and 2 6 k1, . . . kr 6 s− 1 and Qi ∈ HCSMki(V ) for i ∈ [r] with

m1k1 + · · ·+mrkr = s.

Now notice that by Lemma 7.7.9 for every Qi we can find a degree-ki polynomial Pi such

that DPi = Qi, and by Lemma 7.7.11 we can find a degree miki (non-classical) polynomial

Pi for which Symmi(DPi) = DPi. Moreover if mi > 2, Pi can be written as a function

of a lower degree polynomial P(i)M . Now notice that if r > 1 then by Lemma 7.7.7 for the

polynomial∏i Pi we have

D(r∏i=1

Pi) = Symm1(Q1) ∗ · · · ∗ Symmr(Qr),

and∏ri=1 Pi is measurable in P1, . . . , Pr. In the case when r = 1 then 2 6 k1 6 s − 1

implies that m1 > 2 which concludes the theorem over the subspace V .

Notice that if V = Fn we would be done. We have shown that we can find a degree-

s polynomial Q and degree 6 s − 1 (non-classical) polynomials Q1, . . . , QC such that the

statement of the theorem holds on a bounded index subspace V . We can extend the statement

to all of Fn by a simple derivative trick. Let h1, . . . , hK ∈ Fn be the presentatives of Fn/V ,

where K =|F|n|V | . Suppose that x is a point in Fn, then there is i ∈ [K] such that x−hi ∈ V .

This means that Q(x) = Q(x−hi)−Q(x)−Q(x−hi) = D−hiQ(x)−Q(x−hi). Now notice

that deg(D−hiQ) 6 s − 1 and since x − hi is in V , Q(x − hi) can be written as a function

of Q1, . . . QC , and thus Q is measurable in D−h1Q, . . . , D−hKQ ∪ Q1, . . . , QC.

145

7.8 Application: Polynomial Decompositions

Consider the following family of properties of functions over a finite field F of fixed prime

order p.

Definition 7.8.1. Given a positive integer k, a vector of positive integers

∆ = (∆1,∆2, . . . ,∆k) and a function Γ : Fkp → Fp, we say that a function P : Fn → F is

(k,∆,Γ)-structured if there exist polynomials P1, P2, . . . , Pk : Fn → F with each deg(Pi) 6

∆i such that for all x ∈ Fn,

P (x) = Γ(P1(x), P2(x), . . . , Pk(x)).

The polynomials P1, . . . , Pk are said to form a (k,∆,Γ)-decomposition.

Informally, a degree-structural property refers to a property from the family of (k,∆,Γ)-

structured properties. Our main result in this section is that every degree-structural property

can be decided in polynomial time:

Theorem 7.8.2. For every finite field F of prime order, positive integers d, k, every vec-

tor of positive integers ∆ = (∆1,∆2, . . . ,∆k) and every function Γ : Fk → F, there is a

deterministic algorithm Ak,∆,Γ that takes as input a polynomial P : Fn → F of degree d,

runs in time polynomial in n, and outputs a (k,∆,Γ)-decomposition of P if one exists while

otherwise returning NO.

In [13], a similar result was obtained for the special case of d < |F|, using our algorithmic

Green-Tao regularity lemma that applies in the high characteristic case.

The algorithm itself is quite simple. Given a polynomial P : Fn → F, we first use

Corollary 7.7.3 to write it as a function of a uniform factor Q1, . . . , Qm of non-classical

polynomials, i.e. P (x) = G(Q1(x), . . . , Qm(x)) where m = O(1) and G : Tm → F. Now, the

proof technique of [13] (similar reasoning also in proof of Theorem 1.7 of [15]) shows that the

146

only way P can have a (k,∆,Γ)-decomposition is if there are functions G1, . . . , Gk : Tm → T

such that G(z1, . . . , zm) = Γ(G1(z1, . . . , zm), . . . , Gk(z1, . . . , zm)) and also, for every i ∈ [k],

Gi(Q1(x), . . . , Qm(x)) is a classical polynomial of degree at most ∆i. Since m = O(1),

there are only a constant number of possible G1, . . . , Gk, and so the whole algorithm runs

in polynomial time.

Another natural way to view the algorithm is as an induction on the dimension. The

proof shows that there exists i ∈ [n] such that P is (k,∆,Γ)-structured if and only if P |xi=0 is

(k,∆,Γ)-structured. We can keep doing this until the number of variables reaches a constant

(that is dependent only on F, k, ∆ and Γ), and moreover, at every step, in polynomial time,

we can check whether this base case has been reached. When the base case is reached, in

constant time, we can compute a (k,∆,Γ)-structure if one exists. Finally, the proof also

shows a way to lift up the recovered decomposition back to the original input on n variables.

Thus, the algorithm is another instance of the “restrict-solve-lift” paradigm that is used in

many algebraic algorithms, such as those for factoring.

147

CHAPTER 8

NORMS DEFINED BY LINEAR FORMS

In this chapter we study norms that are defined by averaging over a set of linear forms. The

results in this chapter are based on the joint work with Hamed Hatami and Shachar Lovett.

An example of such norms is the Gowers norms. Such norms can be defined over any abelian

group, but here similar to the rest of this thesis, we will only be interested in the case where

the group is Fn, where F = Fp for a prime p. For the most general averages that we could

define, we also allow the terms f(Li(X)) to have non-negative real valued powers.

Definition 8.0.3. Let L = L1, ..., Ll be a family of linear forms in Fk, and α, β ∈ Rl>0.

For a function f : Fn → C, define

tL,α,β(f)def= E

l∏i=1

f(Li(X))αi f(Li(X))βi

,where X ∈ (Fn)k is chosen uniformly at random. Letting T := (L, α, β), for a function

f : Fn → C we write

‖f‖Tdef= tL,α,β(f)1/|α,β|,

where |α, β| def=∑li=1(αi + βi).

Remark 8.0.4. For ‖·‖T to be well defined over complex valued functions, we will require

αi − βi to be an integer for every i. Notice that in the study of real functions we will not

have conjugations, namely we can assume without loss of generality that β = 0 and we will

not make any further assumptions about α.

The next theorem to be proved in Section 8.4 states that in order for tL,α,β to be a norm,

α and β must be one of two specific forms.

148

Informal Statement (Theorem 8.2.2). Let T = (L, α, β) be such that ‖·‖T is a norm.

Then either there is an s > 1 such that

tL,α,β(f) = E

l∏i=1

|f(Li(X))|s/2 ,

or for every i, αi, βi = 0, 1. We call norms of the former case type (I) norms and

norms of the latter case type (II) norms.

In order to prove Theorem 8.2.2, in Section 8.3 we develope tools such as Gowers type

Cauchy-Schwarz inequality and a Holder type inequality for ‖·‖T norms. This approach

follows the same lines as [48].

Example. Gowers U2 norm over G→ R can be defined by (L, α, β), where

L = (1, 0, 0), (1, 1, 0), (1, 0, 1), (1, 1, 1), α = (1, 1, 1, 1), and β = (0, 0, 0, 0), and therefore it

is a type (I) norm. Namely

‖f‖U2 = ‖f‖L,M.

Later we investigate the properties of type (I) and type (II) norms over the space of

bounded valued complex functions. Lemma 8.5.2 is a simple observation that type (II) norms

are equivalent to L1 norm. Let D denote the complex unit disc D def= c ∈ C : |c| 6 1.

Informal Statement (Lemma 8.5.2). Let T = (L, α, β), be such that ‖·‖T is a type (II)

norm. Then for every g : Fn → D we have

• ‖g‖1 6 ‖g‖T , and

• ‖g‖T 6 ‖g‖(1/|α,β|)1 .

In Theorem 8.4.3 we prove that every type (I) norm must be invariant under affine

transformations, which means we can assume that the first coordinate of the linear forms

defining the norm is 1. This allows us to find a positive integer z = z(L,M) for which we

prove that every type (I) norm is essentially equivalent to the Gowers norm ‖·‖Uz+1 .

149

Informal Statement (Theorem 8.5.4). Let T = (L, α, β) be such that ‖·‖T is a type (I)

norm and L ∪M is of complexity 6 p. Then there exist an integer z and functions δ1, δ2 :

R>0 → R>0 such that for every n and f : Fn → D,

i. If ‖f‖T 6 ε then ‖f‖Uz+1 6 δ1(ε), and

ii. If ‖f‖Uz+1 6 ε then ‖f‖T 6 δ2(ε).

8.1 Notation

Recall the definition of type (I) and (II) norms from the statement of Theorem 8.2.2 as

stated in the introduction. Since a type (II) norm is defined only by L and s, from now on

we denote a type (II) norm by ‖·‖T , where T = (L, s). In the case of type (I) norms, namely

when for all i, αi, βi = 0, 1, we drop the α and β and work with two collections of linear

forms L = L1, ..., Ll and M = M1, ...,Mm and write

tL,M(f) = EX∈(Fn)k

l∏i=1

f(Li(X))m∏i=1

f(Mi(X))

,moreover for T = (L,M) define

‖f‖T = tL,M(f)1/(l+m).

From now on by T = (L, α, β) we mean that L = L1, ..., Ll is a family of linear

forms and α, β ∈ Rl>0. Also by T = (L,M), we mean that L = L1, ..., Ll and M =

M1, ...,Mm are families of linear forms over k variables where k is a fixed positive integer.

Definition 8.1.1. We will define rL,M(f) and rL,α,β(f) similar to tL,M and tL,α,β,

rL,M(f)(X)def=

l∏i=1

f(Li(X))m∏i=1

f(Mi(X))

,150

and

rL,α,β(f)(X)def=

l∏i=1

f(Li(X))αi f(Li(X))βi

.This allows one to simply write

tL,α,β(f) = EX∈(Fn)k

[rL,α,β(f)(X)

],

and

tL,M(f) = EX∈(Fn)k

[rL,M(f)(X)

].

8.2 Norms defined by linear forms

Let L = L1, ..., Ll be a system of linear forms on k variables. Analytic averages such as

E[∏

i f(Li(X))]

have been subject of interest in several works, see for example [51, 37]. In

this section we will consider a generalized version of such averages and study these averages in

the cases when they obey norm properties. Namely letting T = (L, α, β), where α, β ∈ Rl>0,

we study norms of the following form:

‖f‖∑αi+

∑βj

T = Ex∈(Fn)k

l∏i=1

f(Li(X))αil∏

i=1

f(Li(x))βi

. (8.1)

We will assume that for every i, αi + βi 6= 0, since otherwise Li is redundant.

Remark 8.2.1. For ‖·‖T in Eq. (8.1) to be well defined over complex functions, we need to

assume that for all i ∈ [l], αi − βi ∈ Z. Notice that in the case of real functions there is no

need for β and T = (L, α) would be sufficient to define such norms, where α ∈ R>0.

In the following theorem we show that for ‖·‖T to be a norm, T has to be one of only

two types.

151

Theorem 8.2.2. Let T = (L, α, β) be such that ‖·‖T is a norm. Then one of the following

holds

(i) For every i ∈ [l], αi, βi = 0, 1, or

(ii) There exists s > 1 such that for every i ∈ [l], αi = βi = s/2.

We call norms of the former case type (I) norms and norms of the latter case, type (II)

norms.

We prove the above theorem by extending ideas from the study of hypergraph normings

by the first author [48]. One of the main tools that is used in [48] to prove a characterization

theorem for hypergraph normings is certain Holder type inequalities for such norms. We

prove Theorem 8.2.2 by showing that similar Holder type inequalities also hold for norms

defined by systems of linear forms. We state and prove these inequalities in Section 8.3.

Finally we present the proof of Theorem 8.2.2 in Section 8.4.

8.3 Holder type inequalities

Definition 8.3.1. Let f1, f2 : Fn → C be two functions. The tensor of f1 and f2, denoted

by f1 ⊗ f2 : Fn × Fn → C is defined as follows

f1 ⊗ f2(x1, x2) := f1(x1) · f2(x2).

Moreover by f⊗l we mean the l-th tensor power of f , f⊗l def= f ⊗ f ⊗ . . .⊗ f : Fnl → C.

It is a simple observation that || · ||T norms are multiplicative with respect the above

tensor products.

Claim 8.3.2. Let T = (L, α, β). For functions f1, f2 : Fn → C we have

tL,α,β(f1 ⊗ f2) = tL,α,β(f1) · tL,α,β(f2).

152

Proof. This follows from the independence of entries of x, and the fact that conjugation is

distributive over both addition and multiplication.

A very similar argument shows that

E

l∏i=1

fi ⊗ fi(Li(X))αigi ⊗ gi(Li(X))βi

=

∣∣∣∣∣∣E l∏i=1

fi(Li(X))αigi(Li(X))βi

∣∣∣∣∣∣2

(8.2)

Theorem 8.3.3 (Holder Type Inequality). Let T = (L, α, β). If ‖·‖T is a semi-norm, then

for every a ∈ [l], if αa 6= 0 then

∣∣∣∣∣∣Eg(La(X)) ·

l∏i=1

f(Li(X))αi−ea(i)f(Li(X))βi

∣∣∣∣∣∣ 6 ‖f‖|α,β|−1T ‖g‖T , (8.3)

and if βa 6= 0 then

∣∣∣∣∣∣Eg(La(X)) ·

l∏i=1

f(Li(X))αi f(Li(X))βi−ea(i)

∣∣∣∣∣∣ 6 ‖f‖|α,β|−1T ‖g‖T , (8.4)

where ea is the vector which is equal to 1 on the a-th entry and 0 everywhere else. Moreover

if there exists a ∈ [l] such that at least one of Eq. (8.3) or Eq. (8.4) holds for every pair of

functions f and g, then ‖·‖T satisfies the triangle inequality.

Proof. We first prove the fact that each of Eq. (8.3) or Eq. (8.4) implies the triangle inequal-

ity. Assume that f, g : Fn → C and that Eq. (8.3) holds for a. Then

tL,α,β(f + g) = E

(g(La(X)) + f(La(X))) ·l∏

i=1

(f + g)(Li(X))αi−ea(i)(f + g)(Li(X))βi

6 tL,α,β(f + g)1−1/|α,β| (‖f‖T + ‖g‖T )

which simplifies to the triangle inequality for ‖·‖T . The proof for Eq. (8.4) is identical.

153

For the other direction, assume f, g : Fn → C are functions with ‖f‖T , ‖g‖T 6∞. Since

‖·‖T is a semi-norm, for every δ > 0

‖f + δg‖T 6 ‖f‖T + δ‖g‖T ,

thus

d‖f + δg‖Tdδ

∣∣∣∣∣0

6 ‖g‖T . (8.5)

Now expanding the derivative

d‖f + δg‖Tdδ

=

‖f + δg‖1−|α,β|T

|α, β|·E

l∑i=1

(αig(Li(x))rL,α−ei,β(f + δg)(x) + βig(Li(x))rL,α,β−ei(f + δg)(x)

) .Thus by Eq. (8.5)

‖f‖1−|α,β|T

|α, β|· E

l∑i=1

(αig(Li(x))rL,α−ei,β(f)(x) + βig(Li(x))rL,α,β−ei(f)(x)

) 6 ‖g‖T ,which can be rearranged to

1

|α, β|· E

l∑i=1

(αig(Li(x))rL,α−ei,β(f)(x) + βig(Li(x))rL,α,β−ei(f)(x)

)6 ‖f‖|α,β|−1

T ‖g‖T . (8.6)

Replacing f and g by (f ⊗ f)⊗m and (g ⊗ g)⊗m for an integer m > 0, by Claim 8.3.2 and

154

Eq. (8.2) we have

1

|α, β|·

l∑i=1

(αi∣∣E[g(Li(x))rL,α−ei,β(f)(x)]

∣∣2m + βi∣∣E[g(Li(x))rL,α,β−ei(f)(x)]

∣∣2m)6(‖f‖|α,β|−1

T ‖g‖T)2m

.

The proof follows from the fact that this inequality holds for all m > 0 and that

|α, β| =l∑

i=1

αi +m∑i=1

βi.

One simple corollary to the above theorem is that the absolute value of the average of a

function f is bounded by its ‖·‖T norm.

Corollary 8.3.4. Let T = (L, α, β) be such that ‖·‖T is a norm. For every g : Fn → C we

have

|E[g]| 6 ‖g‖T .

Proof. Define f(x) = 1 for all x ∈ Fn. Assume there exists i ∈ [l] such that αi 6= 0, then we

have

|E[g(x)]| =∣∣∣E [g(Li(x))rL,α−ei,β(f)(x)

] ∣∣∣ 6 ‖g‖T ,where the inequality follows from Theorem 8.3.3. The case when for all i ∈ [l], αi = 0 is

handled similarly using the second inequality from Theorem 8.3.3.

Using the above claim we have the following corollary to Theorem 8.3.3.

155

Corollary 8.3.5. Let T = (L, α, β) be such that ‖·‖T is a norm. Then for every i ∈ [l],

αi + βi > 1.

Proof. Assume for contradiction that αi + βi < 1. By Eq. (8.3) we have

∣∣E [g(Li(X)) · rL,α−ei,β(f)(X)] ∣∣ 6 ‖f‖|α,β|−1

T ‖g‖T . (8.7)

The next claim shows that there is n and Y ∈ (Fn)k such that Li(Y ) 6= Lj(Y ) for all

j 6= i.

Claim 8.3.6. Let L = L1, ..., Ll be a family of linear forms over k variables. For every

i ∈ [l], there exists an integer n and X ∈ (Fn)k such that

Li(X) 6∈Lj(X) : j 6= i

.

Proof. For every j 6= i, letAj denote the set of solutions to Li(X) = Lj(X). Since Li−Lj 6= 0

we have that |Aj | 6 pn(k−1). Thus

∣∣∣∣∣∣⋃j 6=i

Aj

∣∣∣∣∣∣ 6 l · pn(k−1),

which for large enough n is strictly less than∣∣∣(Fn)k

∣∣∣ = pnk.

Let ω = Li(Y ), and for ε > 0 define f : Fn → C as follows

f(x) =

ε x = ω

1 Otherwise

Defining g(x) := 1 we have ‖f‖T 6 1 and ‖g‖T = 1. One the other hand

156

∣∣E [g(Li(X)) · rL,α−ei,β(f)(X)] ∣∣ > εαi+βi−1/pnk.

Choosing ε small enough this contradicts Eq. (8.7).

The next lemma is an analogue of Gowers Cauchy-Schwarz, for averages over arbitrary

linear forms that satisfy the norm properties.

Lemma 8.3.7 (Cauchy-Schwarz Type Inequality). Let T = (L,M), such that ‖·‖T is a

semi-norm. Then for every set of m+ l functions f1, ..., fl, g1, ..., gm : Fn → C, we have

∣∣∣∣∣∣ EX∈(Fn)k

l∏i=1

fi(Li(X))m∏i=1

gi(Mi(X))

∣∣∣∣∣∣ 6l∏

i=1

‖fi‖Tm∏j=1

‖gj‖T .

Proof. Assume for contradiction that there exist functions f1, ..., fl, g1, ..., gm such that the

inequality does not hold. One can normalize the functions such that

∣∣∣∣∣∣ EX∈(Fn)k

l∏i=1

fi(Li(X))m∏i=1

gi(Mi(X))

∣∣∣∣∣∣ = c > 1,

and for every i ∈ [l], tL,M(fi) 6 1, and for every j ∈ [m], tL,M(gj) 6 1. Let h be such

that c2h/(m+l) > m+ l. Taking h-th tensor power of fi ⊗ fi and gi ⊗ gj ,

tL+M(l∑

i=1

(fi ⊗ fi)⊗h +m∑i=1

(gi ⊗ gi)⊗h) =

E

[l∏

i=1

l∑j=1

(fi ⊗ fi)⊗h(Li(X)) +m∑j=1

(gi ⊗ gi)⊗h(Li(X))

·m∏i=1

l∑j=1

(fi ⊗ fi)⊗h(Mi(X)) +m∑j=1

(gi ⊗ gi)⊗h(Mi(X))

],

157

which by expanding is equal to

∑σ,π

∣∣∣∣∣E[ ∏i:σ(i)6m

(fσ(i))⊗h(Li(X))

∏i: σ(i)>m

(gσ(i))⊗h(Li(X))·

∏i:π(i)6m

(fπ(i))⊗h(Mi(X))

∏i: π(i)>m

(gπ(i))⊗h(Mi(X))

]∣∣∣∣∣2

,

where the sum is over σ ∈ [m+l]l and π ∈ [m+l]m. Using Claim 8.3.2, and Eq. (8.2) it is easy

to see that this value is greater than or equal to c2h. On the other hand∣∣∣∣∣∣(fi ⊗ fi)⊗h∣∣∣∣∣∣

T6 1,

and∣∣∣∣∣∣(gi ⊗ gi)⊗h∣∣∣∣∣∣

T6 1. This contradicts the triangle inequality as

∣∣∣∣∣∣∣∣∣∣∣∣l∑

i=1

(fi ⊗ fi)⊗h +m∑i=1

(gi ⊗ gi)⊗h∣∣∣∣∣∣∣∣∣∣∣∣T

> c2h/(m+l) > m+ l

>l∑

i=1

∣∣∣∣∣∣(fi ⊗ fi)⊗h∣∣∣∣∣∣T

+m∑i=1

∣∣∣∣∣∣(gi ⊗ gi)⊗h∣∣∣∣∣∣T.

Lemma 8.3.8 (Cauchy-Schwarz type inequality for nonnegative functions). Let

T = (L, α, β) be such that ‖·‖T is a norm, and Tii∈[m] with Ti = (L, αi, βi), be such that

for each i, j, αij + βij > 0. Moreover let f1, ..., fm : Fn → R>0, then we have

EX

[m∏i=1

rL,αi,βi(fi)(X)

]6

m∏i=1

‖fi‖|αi,βi|T

Proof. The proof is very similar to the proof of Lemma 8.3.7, with the difference that we

use the boost up argument using f⊗hi for large enough h.

8.4 A characterization theorem

Now we have the necessary tools to prove the following characterization theorem.

158

Theorem 8.2.2 (restated). Let T = (L, α, β) be such that ‖·‖T is a norm. Then one of

the following holds

(i) For every i ∈ [l], αi, βi = 0, 1, or

(ii) There exists s > 1 such that for every i ∈ [l], αi = βi = s/2.

Proof. Assume a ∈ [l] and b ∈ [l] contradict the statement of the Theorem, namely

αa, βa 6= 0, 1 and αb 6= βb. Moreover assume that αb > βb. The case when βb > αb can

be handled in a similar fashion. Let ε be a sufficiently small constant. The following lemma

states that for large enough n there exists h : Fn → −1, 1 with ‖h‖T 6 ε and∣∣Ehc

∣∣ 6 ε,

where c = αb − βb.

Lemma 8.4.1. Let T = (L, α, β) be such that ‖·‖T is a norm, and αi 6= βi for some i. For

every ε > 0, there exists n > 0 and h : Fn → (−1)1/|αi−βi|, 1 such that

• ‖h‖T 6 ε, and

• Letting g(x) = h(x)|αi−βi|, we have∣∣E[g]

∣∣ 6 ε.

We defer the proof of this Lemma to later in the section. Define f as

f(x) :=

1 h(x) = 1

0 Otherwise

Since∣∣Ehc

∣∣ 6 ε we have that ‖f‖1 > 1/3. By definition of f and the fact that either

αa − 1 6= 0 or βa 6= 0 we have

∣∣∣EXh(Li(X)) · rL,α−ea,β(f)(X)

∣∣∣ = tL,α,β(f) > 3−|α,β|, (8.8)

where the inequality follows from the following simple observation.

159

Claim 8.4.2. Let T = (L, α, β) be such that ‖·‖T is a norm. Then for every f : Fn → R>0

we have

‖f‖T > ‖f‖1

Proof. This is an immediate corollary to Theorem 8.3.3. Letting g := 1, we have ‖g‖|α,β|−1T =

1, and

‖f‖1 =∣∣E [f(L1(X)) · rL,α−e1,β(g)(X)

]∣∣ 6 ‖f‖T .

On the other hand since αa 6= 0, by Theorem 8.3.3 we have

∣∣∣EXh(Li(X)) · rL,α−ea,β(f)(X)

∣∣∣ 6 ‖h‖T 6 ε,

which for small enough ε contradicts Eq. (8.8).

Now we prove that if α = β then there is a universal s such that for every i ∈ [l],

αi = βi = s/2. Letting

s := maxiαi + βi,

we will prove that ‖·‖T ′ , for T ′ = (L, αs ,βs ), is a norm. This combined with Corollary 8.3.5

proves our claim. To prove that ‖·‖T ′ is a norm we will use the reverse direction of Theo-

rem 8.3.3 for a such that αa + βa = s. Notice that

∣∣∣E [g(La(X))rL,αs−ea,βs(f)(X)

] ∣∣∣ 6 E[rL,s·ea,0(|g|1/s)(X) · rL,α−s·ea,β(|f |1/s)(X)

]6 ‖|f |1/s‖|α,β|−sT ‖|g|1/s‖sT = ‖f‖

∣∣∣αs ,βs ∣∣∣−1

T ′ ‖g‖T ′ ,

where the last inequality follows from Lemma 8.3.8. The reason we could exchange from f

160

to |f | and back is that for every i, αi = βi.

Proof of Lemma 8.4.1. Set a := |αi− βi|. Let Ai be the set of vectors X such that Li(X) is

equal to some Lj(X), namely

A :=X ∈ (Fn)k : (∃j 6= i)

(Li(X) = Lj(X)

).

As we saw in the proof of Corollary 8.3.5,

|A| 6 l · pn(k−1). (8.9)

Let h : Fn → (−1)1/a, 1 be a random function which is a result of an independent Bernouli

process. Since ‖·‖T is a norm, then for every h we know tL,α,β(h) ∈ R>0. We will look at

the following expected value

EhtL,α,β(h) = E

hEXrL,α,β(h)(X)

= Eh

EXrL,α,β(h)(X)1A(X) + E

hEXrL,α,β(h)(X)1A(X).

By Eq. (8.9) we have

Eh

EXrL,α,β(h)(X)1A(X) 6

l

pn.

Moreover

Eh

EXrL,α,β(h)(X)1A(X) = 0,

which follows from the independence of h(Li(X)) from the other terms. Thus

EhtL,α,β(h) 6 l · p−n.

161

Moreover by Chernoff bound we have that

Prh

(∣∣Exh(x)a

∣∣ > p− 1

n1/3

)6 2e

− 1

2n1/3 .

Therefore for large enough n there exists h : Fn → −1, 1 with both ‖h‖T 6 ε and∣∣Ex h(x)∣∣ 6 ε.

Here we prove further that all type (I) norms must be invariant under affine transforma-

tions.

Theorem 8.4.3 (Affine Transformations). Every type (I) norm ‖·‖T defined by a system of

linear forms is invariant under affine transformations.

Proof. For b ∈ Fn define

f+b(x)def= f(x+ b).

We want to show that ‖f‖T = ‖f+b‖T . For every t ∈ F and j ∈ [n] define

fjt (X)

def= f(x+ tej),

then for every α ∈ F by Cauchy-Schwarz type inequality we have

‖f jt ‖mT 6

p−1∏i=0

‖f jt+αi‖ciT , (8.10)

where ci is the number of linear forms La such that La,1 = i plus the number of linear

forms Mb such that Mb,1 = i. Notice that this inequality holds for every t and α, thus for

every f one can move the fjt with maximum ‖f jt ‖T to the left hand side of Eq. (8.10), which

implies invariance of norm under transformations of type tej . Since this holds for every j,

‖·‖T is invariant under any affine transformation.

162

8.5 Topology

Theorem 8.2.2 states that norms defined by linear forms can only be of two types, type

(I) and type (II). In this subsection we study the topology that such norms induce on

bounded functions on finite dimensional vector spaces. We prove every type (I) norm || · ||T

is essentially equivalent to Gowers norm of a proper order, where the order depends only on

the system of linear forms defining || · ||T . We also show that type (II) norms are equivalent

to L1 norm and thus Lp norms. For a formal statement we have to first define what we

mean by equivalence of norms over bounded functions.

Definition 8.5.1 (Equivalence of Norms on Bounded Functions). We say that two norms

‖·‖A and ‖·‖B are equivalent on the space of bounded functions Fn → D, n < ∞, if for

every sequence of functions fi : Fni → Di,ni∈N we have

limi→∞‖fi‖A = 0 if and only if lim

i→∞‖fi‖B = 0.

Type (II) norms turn out to be easy to handle. The following lemma relates type (II)

norms to the L1 norm.

Lemma 8.5.2. Let T = (L, s), be such that ‖·‖T is a type (II) norm. Then for every

g : Fn → D we have

• ‖g‖1 6 ‖g‖T , and

• ‖g‖T 6 ‖g‖(1/ls)1 .

Proof. The first part follows from Theorem 8.3.3 letting f(x) := 1 for all x ∈ Fn, and the

second part is a simple consequence of the fact that ‖g‖∞ 6 1.

This immediately implies the following corollary.

163

Corollary 8.5.3. Let T = (L, s), be such that ‖·‖T is a type (II) norm. Then ‖·‖T is

equivalent to L1 norm on bounded complex functions.

Let || · ||T be a type (I) norm. We show that there exists some integer z > 0 such that for

every bounded function f , ‖f‖T is small if and only if ‖f‖Uz+1 is small. This is the statement

of the next theorem which we will prove in Section 8.6. Unfortunately for technical reasons

we have to make an assumption that p, the characteristic of the underlying field, is larger

than the complexity of the system of linear forms defining T .

Theorem 8.5.4. Let T = (L,M) be such that ‖·‖T is a type (I) norm and L ∪ M is

of Cauchy-Schwarz complexity s 6 p. Then there exist an integer z 6 s and functions

δ1, δ2 : R>0 → R>0 such that for every n and f : Fn → D,



This in particular implies that ‖·‖T is equivalent to Gowers’ Uz+1 norm.

Corollary 8.5.5. Let T = (L,M), be such that ‖·‖T is a type (I) norm. Then there exists

z > 0 such that ‖‖T is equivalent to Gowers’ Uz+1 norm on the space of bounded complex

functions.

8.6 Type (I) norms

In the case of type (I) norms, we may write T = (L,M) where |L| = l, and |M | = m,

defining the norm by

tL,M(f) = E

l∏i=1

f(Li(X))m∏i=1

f(Mi(X))

,and

164

‖f‖m+lT = tL,M(f).

By Theorem 8.4.3 we know that that L and M are systems of affine linear forms. The-

orem 8.5.4 states a relation between type (I) norms and Gowers’ Uz+1 norm of a function

for some integer z > 0. Here we will show how z is derived for two given systems of linear

forms L,M and in Lemma 8.6.2 and Corollary 8.6.3 will show that in the case when the

linear forms are affine such z(L,M) exists.

Definition 8.6.1. Let T = (L,M). Define z(L,M) as maximum d > 1 for which

L⊗d1 + L⊗d2 + · · ·+ L⊗dl −M⊗d1 −M⊗d2 − · · · −M⊗dm = 0 mod p,

where L⊗di is the d-th tensor power of Li. Existence of such z is the subject of Corollary 8.6.3.

Lemma 8.6.2. Let L1, ..., Lm ∈ Fk be a set of distinct linear forms. Assume that α ∈ Fm\0.

There exists z such that ∑i

αiL⊗zi 6= 0 mod p.

Proof. Let αL = αi if L = Li and αL = 0 otherwise, where L ranges over all linear forms over

k variables, namely Fk. So α ∈ Fpk and if no such z exists then∑L∈Fk αL

∏kj=1 L

zjj = 0

for all w ∈ Fk. Let M be the pk × pk matrix over F defined as

ML,w =k∏j=1

Lzjj .

(note that we also allow w = (0, 0, ..., 0); and 00 = 1, 0a = 0 for 1 6 a 6 p− 1). Then if no

such z exists then Mα = 0. However, we can factor M as the k-th tensor power of the p× p

matrix M ′ given by (M ′)a,b = ab mod p. Then M ′ is a Vandermonde matrix and hence

det(M ′) 6= 0. So also det(M) = det(M ′)k 6= 0, hence we must have α = 0.

165

Corollary 8.6.3. Let L1, ..., Ll,M1, ...,Mm ∈ Fk be a set of distinct affine linear forms.

There exists a maximal z > 0 such that for every d 6 z,

∑i

L⊗di −∑j

M⊗dj ≡ 0 mod p.

Proof. Since

∑i

L⊗zi −∑j

M⊗zj ≡∑i

L⊗zi +∑j

(p− 1) ·M⊗zj mod p,

by Lemma 8.6.2 there exists a finite maximal z such that

∑i

L⊗zi −∑j

M⊗zj ≡ 0 mod p.

This completes the proof because for an affine linear form L, L⊗d is a sub-vector of L⊗z for

d 6 z .

Claim 8.6.4. If ‖·‖T is a norm then z(L,M) > 0, namely

l∑i=1

Li −m∑i=1

Mi ≡ 0 mod p.

Proof. Assume for contradiction that z(L,M) = 0. We look at the linear Fourier characters

χq, for q ∈ Fn, notice that since∑li=1 Li −

∑mi=1Mi 6≡ 0 mod p, thus ‖χq‖T = 0, unless

q = 0. This is in contradiction with ‖·‖T being a norm.

Theorem 8.5.4 (restated). Let T = (L,M) be such that ‖·‖T is a type (I) norm and

L∪M is of Cauchy-Schwarz complexity s 6 p. Then there exist an integer z = z(L,M) 6 s

and functions δ1, δ2 : R>0 → R>0 such that for every n and f : Fn → D,


166


Remark 8.6.5. Notice that z(L,M) 6 s 6 p, where s is the Cauchy-Schwarz complexity of

the system of linear forms. This is because for any s < d and a high enough rank polynomial

P of degree 6 d+1, we would have ‖eF(P )‖Us+1 6 ε. But at the same time ‖eF(P )‖L,M = 1

which would contradict Lemma 2.4.3.

We will prove Theorem 8.5.4 in two parts through the next two subsections, namely

Lemma 8.6.6 and Theorem 8.6.8.

8.6.1 Theorem 8.5.4, the Direct Part

Lemma 8.6.6 (Direct Theorem). Let T = (L,M) be such that ‖·‖T is a norm. For every

ε there is a δ1(ε) such that for every n, for every function f : Fn → D, with ‖f‖T 6 δ1(ε)

we have ‖f‖Uz(L,M)+1 6 ε.

Proof. Assume that ‖f‖T 6 ε, then for every polynomial q of degree d 6 z

∣∣〈f, q〉∣∣ 6 ‖f · q‖T = ‖f‖T 6 ε,

where the first inequality follows by Corollary 8.3.4 and the equality is the subject of the

following claim.

Claim 8.6.7. Let T = (L,M). Let f : Fn → D, and q be a non-classical polynomial of

degree d with d 6 z(L,M). We have

tL,M (f · eF(q)) = tL,M(f).

Proof. Since d 6 z(L,M)

167

rL,M(f · eF(q)) = rL,M(f)l∏

i=1

eF(q(Li(X))m∏i=1

eF(−q(Mi(X))

= rL,M(f) · eF

l∑i=1

q(Li(X))−m∑i=1

q(Mi(X))

= rL,M(f),

where the last equality is due to the fact that all the coefficients of monomials of∑li=1 q(Li(X))−

∑mi=1 q(Mi(X)) are zero since d 6 z(L,M).

We conclude by the inverse theorem for Gowers Norms (Theorem 2.1.10).

8.6.2 Theorem 8.5.4, the Inverse Part

Here we will prove the following inverse theorem.

Theorem 8.6.8 (Inverse Theorem). Let T = (L,M) be such that ‖·‖T is a norm. Assume

that L ∪M has Cauchy-Schwarz complexity d 6 p. For every ε there exists δ2(ε) such that

for every function f : Fn → D, with ‖f‖Uz(L,M)+1 6 δ2(ε) we have ‖f‖T 6 ε.

Proof. First of all by Theorem 8.4.3 we may assume that L and M are systems of affine

linear forms, namely we can assume that for every i ∈ [l] and j ∈ [m], Li,1 = 1 and Mj,1 = 1.

Following is a list of the constants we will be working with.

(1) Let z := z(L,M).

(2) Let δ′ = (ε/3)m+l.

(3) Let r(C) := τp,d

((ε/3)m+lp−C

), where τp,d(·) is the function in Lemma 2.2.20.

Let f : Fn → D be a function with ‖f‖Uz+1 6 δ2 = (ε/3)m+lp−C?, where C? =

Cmax(p, d, δ′, r(·)) is the constant from Theorem 2.3.1. By Theorem 2.3.1 and Remark 2.3.2,

we can write

f = f1 + f2,

168

such that

f1 = E(f |B), and ‖f2‖Ud+1 6 δ′,

where B is a polynomial factor of degree at most d, complexity C 6 Cmax(p, d, δ′, r(·)), and

rank at least r(C). Moreover B is defined by homogeneous polynomials.

By Lemma 2.4.3,

‖f2‖T 6 ‖f2‖1/m+lUs+1 6 ε/3. (8.11)

Now we will bound ‖f1‖T . Since f1 = Γ(P1, ..., PC), for a set of homogeneous polynomials

Pi ∈ Poly6s, we can write the degree s Fourier decomposition of f ,

f1(x) =∑γ∈FC

Γ(γ)eF

C∑i=1

γ(i)Pi(x)

.

Therefore

‖f1‖T =

∣∣∣∣∣∣∣∣∣∣∣∣∑γ∈FC

Γ(γ)eF

C∑i=1

γ(i)Pi(x)

∣∣∣∣∣∣∣∣∣∣∣∣T

6∑γ∈FC

∣∣∣Γ(γ)∣∣∣ ·∣∣∣∣∣∣∣∣∣∣∣∣eF(

C∑i=1

γ(i)Pi(X))

∣∣∣∣∣∣∣∣∣∣∣∣T

, (8.12)

where the last inequality is the triangle inequality and positive homogeneity for the ‖·‖T

norm. Looking at the terms in the RHS of Eq. (8.12), for each γ ∈ FC one of the following

holds:

(i) (∑Ci=1 γ(i)Pi(X)) is of degree 6 z.

(ii) (∑Ci=1 γ(i)Pi(X)) has multinomial of degree higher than z.

Case (i): By Remark 2.2.6 and the fact that ‖f‖Uz+1 6 (ε/3)m+lp−C?6 (ε/3)m+lp−C ,

using the direct theorem for Gowers norms we have that the terms satisfying (i) contribute

at most (ε/3)m+l to the RHS of Eq. (8.12).

169

Case (ii): Since the degree of such Polynomials is higher than z, by Lemma 2.2.20 we have

l∑j=1

C∑i=1

γ(i)Pi(Lj(X))−m∑j=1

C∑i=1

γ(i)Pi(Mj(X)) 6≡ 0,

and that the terms satisfying this condition contribute at most (ε/3)m+l to the RHS of

Eq. (8.12).

Finally adding everything up and using the triangle inequality we conclude

‖f‖T 6 ‖f1‖T + ‖f2‖T 6 2(ε/3)m+l + ε/3 6 ε,

since m+ l > 1.

170

CHAPTER 9

CONCLUSION

Algebraic property testing. In Chapter 6 we proved the following characterization the-

orem for affine invariant properties defined by induced affine constraints.

Theorem 6.4.5 (restated). For any integer d > 0 and (possibly infinite) fixed collection

A = (A1, σ1), (A2, σ2), . . . , (Ai, σi), . . . of induced affine constraints, each of complexity

6 d, there are functions qA : (0, 1)→ Z+, δA : (0, 1)→ (0, 1) and a tester T which, for every

ε > 0, makes qA(ε) queries, accepts A-free functions and rejects functions ε-far from A-free

with probability at least δA(ε). Moreover, qA is a constant if A is of finite size.

This implies that every locally characterized affine-invariant property is proximity-obliviously

testable (Theorem 6.1.1), however in the case of testability, our main theorem implies that

every subspace hereditary property of bounded complexity is testable (Theorem 6.3.3). This

leaves the following conjecture of [17] open for the case of unbounded complexity.

Conjecture 6.3.2 [17] (restated). Every subspace hereditary property is testable.

Affine invariant properties of unbounded complexity are those which can only be defined

by an infinite collection of induced affine constraints with unbounded Cauchy-Schwarz com-

plexity. We are not aware of any natural property of unbounded complexity, and moreover

suspect that the answer to the above conjecture should be in the negative.

Another interesting question is whether Theorem 6.4.5 can be extended to linear-invariant

properties. The proof of Theorem 6.4.5 makes use of near-equidistribution of regular poly-

nomial factors over affine collections of linear forms (Theorem 2.2.21). Although this near-

equidistribution property was not known for general collections of linear forms at the time

in [15], we now have such a general equidistribution (Theorem 4.1.1, [50]). However, it is

still unclear how to adapt the rest of our proof to the linear-invariant case.

171

Reed-Muller codes beyond list decoding radius. Our decoder in Section 7.5.3 only

works for a P which is a polynomial of degree d < p. It is a fascinating problem to find out

if this can be extended to general functions. A first step to this program would be to get

a deeper understanding of cubic polynomials. The known testers and decoders in this case

follow the same steps as the proofs of the inverse theorems for the Gowers norms [68, 79, 18].

However, in some cases, even a combinatorial proof of such inverse theorems is not yet

available. It has been seen, in the context of property testing, that one can bypass such

obstacles by considering algorithms that require the inverse theorems and decomposition

theorems only in their analysis [52, 15]. It is therefore plausible to hope for such algorithms

for the decoding question as well.

Norms defined by linear forms. In Chapter 8, we gave necessary conditions for a

collection of linear forms defining a norm (Theorem 8.2.2). A great and possibly hard open

problem is to give a full characterization of collections of linear forms that define a norm,

and we think of our result as a very small step towards this. Theorem 8.2.2 shows that

a system of linear forms that defines a norm must be one of only two types, one of which

(type (II)) is easily seen to be essentially equivalent to Lp norms. Next, we prove under the

assumption of |F| being sufficiently large, that type (I) norms are essentially the same as

some Gowers norm (Theorem 8.5.4). We believe that the assumption on the size of the field

is unnecessary, meaning that every type (I) norm is equivalent to a Gowers norm, and we

leave this as an open problem.

172

REFERENCES

[1] Noga Alon, Richard A. Duke, Hanno Lefmann, Vojtech Rodl, and Raphael Yuster. Thealgorithmic aspects of the regularity lemma. J. Algorithms, 16(1):80–109, 1994.

[2] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. Efficient testing oflarge graphs. Combinatorica, 20(4):451–476, 2000.

[3] Noga Alon, Eldar Fischer, Ilan Newman, and Asaf Shapira. A combinatorial charac-terization of the testable graph properties: it’s all about regularity. SIAM J. Comput.,39(1):143–167, 2009.

[4] Noga Alon, Tali Kaufman, Michael Krivelevich, Simon Litsyn, and Dana Ron. TestingReed-Muller codes. IEEE Trans. Inform. Theory, 51(11):4032–4039, 2005.

[5] Noga Alon and Assaf Naor. Approximating the cut-norm via Grothendieck’s inequality.SIAM J. Comput., 35(4):787–803, 2006.

[6] Noga Alon and Asaf Shapira. A characterization of the (natural) graph propertiestestable with one-sided error. SIAM J. on Comput., 37(6):1703–1727, 2008.

[7] Noga Alon and Asaf Shapira. Every monotone graph property is testable. SIAM J. onComput., 38(2):505–522, 2008.

[8] L. Babai, L. Fortnow, and C. Lund. Addendum to: “Nondeterministic exponentialtime has two-prover interactive protocols” [Comput. Complexity 1 (1991), no. 1, 3–40;MR1113533 (92h:68031)]. Comput. Complexity, 2(4):374, 1992.

[9] Laszlo Babai, Lance Fortnow, Leonid A. Levin, and Mario Szegedy. Checking compu-tations in polylogarithmic time. In Proc. 23rd Annual ACM Symposium on the Theoryof Computing, pages 21–32, New York, 1991. ACM Press.

[10] Eli Ben-Sasson, Noga Ron-Zewi, Madhur Tulsiani, and Julia Wolf. Sampling-basedproofs of almost-periodicity results and algorithmic applications. In Automata, Lan-guages, and Programming, pages 955–966. Springer, 2014.

[11] Vitaly Bergelson, Terence Tao, and Tamar Ziegler. An inverse theorem for the unifor-mity seminorms associated with the action of F∞p . Geom. Funct. Anal., 19(6):1539–1596,2010.

[12] Arnab Bhattacharyya. Guest column: On testing affine-invariant properties over finitefields. ACM SIGACT News, 44(4):53–72, 2013.

[13] Arnab Bhattacharyya. Polynomial decompositions in polynomial time. Technical report,Electronic Colloquium on Computational Complexity (ECCC), 2014.

[14] Arnab Bhattacharyya, Victor Chen, Madhu Sudan, and Ning Xie. Testing linear-invariant non-linear properties. Theory Comput., 7:75–99, 2011.

[15] Arnab Bhattacharyya, Eldar Fischer, Hamed Hatami, Pooya Hatami, and ShacharLovett. Every locally characterized affine-invariant property is testable. In Proceed-ings of the 45th annual ACM symposium on Symposium on theory of computing, STOC’13, pages 429–436, New York, NY, USA, 2013. ACM.

173

[16] Arnab Bhattacharyya, Eldar Fischer, and Shachar Lovett. Testing low complexity affine-invariant properties. In Proc. 24th ACM-SIAM Symposium on Discrete Algorithms,pages 1337–1355, 2013.

[17] Arnab Bhattacharyya, Elena Grigorescu, and Asaf Shapira. A unified framework fortesting linear-invariant properties. In Proc. 51st Annual IEEE Symposium on Founda-tions of Computer Science, pages 478–487, 2010.

[18] Arnab Bhattacharyya, Pooya Hatami, and Madhur Tulsiani. Algorithmic regularity forpolynomials and applications. In Proceedings of the Twenty-Sixth Annual ACM-SIAMSymposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6,2015, pages 1870–1889, 2015.

[19] Abhishek Bhowmick and Shachar Lovett. List decoding Reed-Muller codes over finitefields. STOC ’15, 2015.

[20] Manuel Blum, Michael Luby, and Ronitt Rubinfeld. Self-testing/correcting with appli-cations to numerical problems. In Proceedings of the 22nd Annual ACM Symposium onTheory of Computing (Baltimore, MD, 1990), volume 47, pages 549–595, 1993.

[21] Andrej Bogdanov and Emanuele Viola. Pseudorandom bits for polynomials. In Proc.48th Annual IEEE Symposium on Foundations of Computer Science, pages 41–51,Washington, DC, USA, 2007. IEEE Computer Society.

[22] Christian Borgs, Jennifer Chayes, Laszlo Lovasz, Vera T. Sos, Balazs Szegedy, andKatalin Vesztergombi. Graph limits and parameter testing. In STOC’06: Proceedingsof the 38th Annual ACM Symposium on Theory of Computing, pages 261–270. ACM,New York, 2006.

[23] Uriel Feige, Shafi Goldwasser, Laszlo Lovasz, Shmuel Safra, and Mario Szegedy. In-teractive proofs and the hardness of approximating cliques. J. ACM, 43(2):268–292,1996.

[24] Eldar Fischer. The art of uninformed decisions: A primer to property testing. InG. Paun, G. Rozenberg, and A. Salomaa, editors, Current Trends in Theoretical Com-puter Science: The Challenge of the New Century, volume 1, pages 229–264. WorldScientific Publishing, 2004.

[25] Eldar Fischer, Arie Matsliah, and Asaf Shapira. Approximate hypergraph partitioningand applications. SIAM J. Comput., 39(7):3155–3185, 2010.

[26] Eldar Fischer and Ilan Newman. Testing versus estimation of graph properties. SIAMJ. Comput., 37(2):482–501 (electronic), 2007.

[27] Alan Frieze and Ravi Kannan. The regularity lemma and approximation schemes fordense problems. In 37th Annual Symposium on Foundations of Computer Science(Burlington, VT, 1996), pages 12–20. IEEE Comput. Soc. Press, Los Alamitos, CA,1996.

[28] Alan M. Frieze and Ravi Kannan. A simple algorithm for constructing Szemeredi’sregularity partition. Electr. J. Comb., 6, 1999.

174

[29] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connectionto learning and approximation. J. ACM, 45(4):653–750, 1998.

[30] Oded Goldreich and Tali Kaufman. Proximity oblivious testing and the role of invari-ances. In Approximation, randomization, and combinatorial optimization, volume 6845of Lecture Notes in Comput. Sci., pages 579–592. Springer, Heidelberg, 2011.

[31] Oded Goldreich and Leonid A Levin. A hard-core predicate for all one-way functions. InProceedings of the twenty-first annual ACM symposium on Theory of computing, pages25–32. ACM, 1989.

[32] Oded Goldreich and Dana Ron. On proximity-oblivious testing. SIAM J. Comput.,40(2):534–566, 2011.

[33] Parikshit Gopalan. A Fourier-analytic approach to Reed-Muller decoding. IEEE Trans.Inform. Theory, 59(11):7747–7760, 2013.

[34] Parikshit Gopalan, Ryan O’Donnell, Rocco A. Servedio, Amir Shpilka, and Karl Wim-mer. Testing Fourier dimensionality and sparsity. In Proc. 36th Annual InternationalConference on Automata, Languages, and Programming, pages 500–512, 2009.

[35] W. T. Gowers. A new proof of Szemeredi’s theorem. Geom. Funct. Anal., 11(3):465–588,2001.

[36] W. T. Gowers. Decompositions, approximate structure, transference, and the Hahn-Banach theorem. Bull. Lond. Math. Soc., 42(4):573–606, 2010.

[37] W. T. Gowers and J. Wolf. The true complexity of a system of linear equations. Proc.Lond. Math. Soc. (3), 100(1):155–176, 2010.

[38] W. T. Gowers and J. Wolf. Linear forms and higher-degree uniformity for functions onFnp . Geom. Funct. Anal., 21(1):36–69, 2011.

[39] W. T. Gowers and J. Wolf. Linear forms and quadratic uniformity for functions on Fnp .Mathematika, 57(2):215–237, 2011.

[40] Ben Green. Montreal notes on quadratic Fourier analysis. In Additive combinatorics,volume 43 of CRM Proc. Lecture Notes, pages 69–102. Amer. Math. Soc., Providence,RI, 2007.

[41] Ben Green and Terence Tao. An inverse theorem for the Gowers U3-norm. Proc. Edin.Math. Soc., 51:73–153, 2008.

[42] Ben Green and Terence Tao. The primes contain arbitrarily long arithmetic progressions.Ann. of Math. (2), 167(2):481–547, 2008.

[43] Ben Green and Terence Tao. The distribution of polynomials over finite fields, withapplications to the Gowers norms. Contrib. Discrete Math., 4(2):1–36, 2009.

[44] Ben Green, Terence Tao, and Tamar Ziegler. An inverse theorem for the Gowers u4-norm. Glasg. Math. J., 53(1):1–50, 2011.

175

[45] Ben Green, Terence Tao, and Tamar Ziegler. An inverse theorem for the Gowers us+1[n]-norm. Ann. of Math. (2), 176(2):1231–1372, 2012.

[46] Benjamin Green and Terence Tao. Linear equations in primes. Ann. of Math. (2),171(3):1753–1850, 2010.

[47] Hamed Hatami. On generalizations of Gowers norms. PhD thesis, University of Toronto,2009.

[48] Hamed Hatami. Graph norms and Sidorenko’s conjecture. Israel Journal of Mathemat-ics, 175(1):125–150, 2010.

[49] Hamed Hatami, Pooya Hatami, and James Hirst. Limits of Boolean functions on Fnp .Electron. J. Combin., 21(4):Paper 4.2, 15, 2014.

[50] Hamed Hatami, Pooya Hatami, and Shachar Lovett. General systems of linear forms:equidistribution and true complexity. arXiv preprint arXiv:1403.7703, 2014.

[51] Hamed Hatami and Shachar Lovett. Higher-order Fourier analysis of Fnp and the com-plexity of systems of linear forms. Geom. Funct. Anal., 21(6):1331–1357, 2011.

[52] Hamed Hatami and Shachar Lovett. Estimating the distance from testable affine-invariant properties. In Foundations of Computer Science (FOCS), 2013 IEEE 54thAnnual Symposium on, pages 237–242. IEEE, 2013.

[53] Charanjit S. Jutla, Anindya C. Patthak, Atri Rudra, and David Zuckerman. Testinglow-degree polynomials over prime fields. In Proc. 45th Annual IEEE Symposium onFoundations of Computer Science, pages 423–432, 2004.

[54] Tali Kaufman and Shachar Lovett. Worst case to average case reductions for poly-nomials. Foundations of Computer Science, IEEE Annual Symposium on, 0:166–175,2008.

[55] Tali Kaufman, Shachar Lovett, and Ely Porat. Weight distribution and list-decodingsize of reed–muller codes. Information Theory, IEEE Transactions on, 58(5):2689–2696,2012.

[56] Tali Kaufman and Dana Ron. Testing polynomials over general fields. SIAM J. onComput., 36(3):779–802, 2006.

[57] Tali Kaufman and Madhu Sudan. Algebraic property testing: the role of invariance. InProc. 40th Annual ACM Symposium on the Theory of Computing, pages 403–412, 2008.

[58] Yoshiharu Kohayakawa, Vojtech Rodl, and Lubos Thoma. An optimal algorithm forchecking regularity. SIAM J. Comput., 32(5):1210–1235, 2003.

[59] Janos Komlos, Ali Shokoufandeh, Miklos Simonovits, and Endre Szemeredi. The reg-ularity lemma and its applications in graph theory. In Theoretical aspects of computerscience (Tehran, 2000), volume 2292 of Lecture Notes in Comput. Sci., pages 84–112.Springer, Berlin, 2002.

176

[60] Daniel Kral, Oriol Serra, and Lluis Vena. A removal lemma for systems of linear equa-tions over finite fields. Israel J. Math., 187:193–207, 2012.

[61] Laszlo Lovasz and Balazs Szegedy. Szemeredi’s lemma for the analyst. Geom. Funct.Anal., 17(1):252–270, 2007.

[62] Shachar Lovett. Unconditional pseudorandom generators for low degree polynomials.Theory of Computing, 5(1):69–82, 2009.

[63] Shachar Lovett, Roy Meshulam, and Alex Samorodnitsky. Inverse conjecture for theGowers norm is false. Theory Comput., 7:131–145, 2011.

[64] Dana Ron. Algorithmic and analysis techniques in property testing. Foundations andTrends in Theoretical Computer Science, 5(2):73–205, 2009.

[65] Ronitt Rubinfeld. Sublinear time algorithms. In Proceedings of International Congressof Mathematicians 2006, volume 3, pages 1095–1110, 2006.

[66] Ronitt Rubinfeld and Madhu Sudan. Robust characterizations of polynomials withapplications to program testing. SIAM J. Comput., 25(2):252–271, 1996.

[67] Imre Z. Ruzsa and Endre Szemeredi. Triple systems with no six points carrying three tri-angles. In Combinatorics (Proc. Fifth Hungarian Colloq., Keszthely, 1976), Vol. II, vol-ume 18 of Colloq. Math. Soc. Janos Bolyai, pages 939–945. North-Holland, Amsterdam-New York, 1978.

[68] Alex Samorodnitsky. Low-degree tests at large distances. In STOC’07—Proceedings ofthe 39th Annual ACM Symposium on Theory of Computing, pages 506–515. ACM, NewYork, 2007.

[69] Asaf Shapira. Green’s conjecture and testing linear-invariant properties. In STOC’09—Proceedings of the 2009 ACM International Symposium on Theory of Computing, pages159–166. ACM, New York, 2009.

[70] Madhu Sudan. Invariance in property testing. Technical Report 10-051, ElectronicColloquium in Computational Complexity, March 2010.

[71] Balazs Szegedy. Gowers norms, regularization and limits of functions on abelian groups.arXiv preprint arXiv:1010.6211, 2010.

[72] Balazs Szegedy. On higher order fourier analysis. arXiv preprint arXiv:1203.2260, 2012.

[73] E. Szemeredi. On sets of integers containing no k elements in arithmetic progression.Acta Arith., 27:199–245, 1975. Collection of articles in memory of Juriui VladimirovivcLinnik.

[74] Endre Szemeredi. Regular partitions of graphs. In Problemes combinatoires et theoriedes graphes (Colloq. Internat. CNRS, Univ. Orsay, Orsay, 1976), volume 260 of Colloq.Internat. CNRS, pages 399–401. CNRS, Paris, 1978.

[75] Terence Tao. Structure and randomness in combinatorics. In Foundations of ComputerScience, 2007. FOCS’07. 48th Annual IEEE Symposium on, pages 3–15. IEEE, 2007.

177

[76] Terence Tao. Higher order Fourier analysis, volume 142. American Mathematical Soc.,2012.

[77] Terence Tao and Tamar Ziegler. The inverse conjecture for the Gowers norm over finitefields via the correspondence principle. Anal. PDE, 3(1):1–20, 2010.

[78] Terence Tao and Tamar Ziegler. The inverse conjecture for the Gowers norm over finitefields in low characteristic. Ann. Comb., 16(1):121–188, 2012.

[79] Madhur Tulsiani and Julia Wolf. Quadratic Goldreich-Levin theorems. In Foundationsof Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 619–628.IEEE, 2011.

[80] Emmanuele Viola. The sum of D small-bias generators fools polynomials of degree D.Computational Complexity, 18(2):209–217, 2009.

[81] Yuichi Yoshida. A characterization of locally testable affine-invariant properties viadecomposition theorems. In Proceedings of the 46th Annual ACM Symposium on Theoryof Computing, pages 154–163. ACM, 2014.

[82] Yuichi Yoshida. Gowers norm, function limits, and parameter estimation. arXiv preprintarXiv:1410.5053, 2014.

[83] Yufei Zhao. An arithmetic transference proof of a relative Szemeredi theorem. Math.Proc. Cambridge Philos. Soc., 156(2):255–261, 2014.

178

THE UNIVERSITY OF CHICAGO HIGHER-ORDER FOURIER ANALYSIS ... · The traditional Fourier analysis...

Documents

Transcript of THE UNIVERSITY OF CHICAGO HIGHER-ORDER FOURIER ANALYSIS ... · The traditional Fourier analysis...