THE UNIVERSITY OF CHICAGO HIGHER-ORDER FOURIER ANALYSIS ... · The traditional Fourier analysis...
Transcript of THE UNIVERSITY OF CHICAGO HIGHER-ORDER FOURIER ANALYSIS ... · The traditional Fourier analysis...
THE UNIVERSITY OF CHICAGO
HIGHER-ORDER FOURIER ANALYSIS OVER FINITE FIELDS AND APPLICATIONS
A DISSERTATION SUBMITTED TO
THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES
IN CANDIDACY FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
DEPARTMENT OF COMPUTER SCIENCE
BY
POOYA HATAMI
CHICAGO, ILLINOIS
APRIL 13, 2015
Copyright c© 2015 by Pooya Hatami
All Rights Reserved
ABSTRACT
Higher-order Fourier analysis is a powerful tool in the study of problems in additive and
extremal combinatorics, for instance the study of arithmetic progressions in primes, where
the traditional Fourier analysis comes short. In recent years, higher-order Fourier analysis
has found multiple applications in computer science in fields such as property testing and
coding theory. In this thesis, we develop new tools within this theory with several new ap-
plications such as a characterization theorem in algebraic property testing. One of our main
contributions is a strong near-equidistribution result for regular collections of polynomials.
The densities of small linear structures in subsets of Abelian groups can be expressed
as certain analytic averages involving linear forms. Higher-order Fourier analysis examines
such averages by approximating the indicator function of a subset by a function of bounded
number of polynomials. Then, to approximate the average, it suffices to know the joint
distribution of the polynomials applied to the linear forms. We prove a near-equidistribution
theorem that describes these distributions for the group Fnp when p is a fixed prime. This
fundamental fact was previously known only under various extra assumptions about the
linear forms or the field size. We use this near-equidistribution theorem to settle a conjecture
of Gowers and Wolf on the true complexity of systems of linear forms.
Our next application is towards a characterization of testable algebraic properties. We
prove that every locally characterized affine-invariant property of functions f : Fnp → R
with n ∈ N, is testable. In fact, we prove that any such property P is proximity-obliviously
testable. More generally, we show that any affine-invariant property that is closed under
subspace restrictions and has “bounded complexity” is testable. We also prove that any
property that can be described as the property of decomposing into a known structure of
low-degree polynomials is locally characterized and is, hence, testable.
We discuss several notions of regularity which allow us to deduce algorithmic versions of
various regularity lemmas for polynomials by Green and Tao and by Kaufman and Lovett.
iii
We show that our algorithmic regularity lemmas for polynomials imply algorithmic versions
of several results relying on regularity, such as decoding Reed-Muller codes beyond the list
decoding radius (for certain structured errors), and prescribed polynomial decompositions.
Finally, motivated by the definition of Gowers norms, we investigate norms defined by
different systems of linear forms. We give necessary conditions on the structure of systems
of linear forms that define norms. We prove that such norms can be one of only two types,
and assuming that |Fp| is sufficiently large, they essentially are equivalent to either a Gowers
norm or Lp norms.
iv
To my family
ACKNOWLEDGMENTS
First and foremost, I would like to express many thanks to my advisor Alexander Razborov.
It has been a great honor for me to be a student of Sasha. He is a great mentor, and he
has always been there for me whenever I have needed advice, and has been of great help
whenever I needed his supports. I would like to thank my second adviser and friend Madhur
Tulsiani, for his friendship, support and his great ideas and interesting problems, and for
always being available for sharing his insights and intuition about a problem or a paper. I am
very thankful to Shachar Lovett who has been a great friend, colaborator, a mentor, and for
his never-ending collection of interesting open problems and ideas. It was a great experience
when he hosted my visits to UCSD for two summers, where I learned a lot from him while
also enjoying the beautiful San Diego summers. I am also thankful to Arnab Bhattacharyya
for many interesting and fruitful discussions and collaborations.
Many thanks to Laci Babai and Janos Simons for many invaluable discussions on theory,
and sharing their stories on the theory community and history of the mathematical problems.
I would like to thank Denis Pankratov, for being a great office-mate and friend during
the past few years at University of Chicago. I want to thank Raghav Kulkarni, Sushant
Sachdeva, Abhishek Bhowmick, Ankan Saha, Shubhendu Trivedi, Yuan Li, Joshua Grochow,
Kaave Hosseini and Joshua Brody for many interesting conversations. A special thanks to
my friends Rasool Zandvakil, Gregor Jarosch, Laura Pilossoph, Rodrigo Dufeo and James
Edwards for making my time in Chicago most enjoyable. I want to thank Sarah, for her
amazing patience and support during the process of me writing this thesis.
I would like to thank my brother, Hamed, for without him I would not be where I am. I
am very grateful to him for introducing me to theoretical computer science years ago when
we were both much younger, and for keeping the habit of sharing with me the new exciting
theory that he explores. I dedicate this dissertation to my loving family, my mother Sedigheh,
my father Ibrahim, and my brothers Hamed and Amir.
vi
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 HIGHER-ORDER FOURIER ANALYSIS . . . . . . . . . . . . . . . . . . . . . . 102.1 Polynomials over Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Non-classical Polynomials . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 The Derivative Polynomial . . . . . . . . . . . . . . . . . . . . . . . . 132.1.3 Gowers Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Rank and Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.1 Polynomial Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.2 Analytic Measures of Uniformity . . . . . . . . . . . . . . . . . . . . 192.2.3 Equidistribution and Near-Orthogonality of Regular Factors . . . . . 212.2.4 Strong Near-Orthogonality and Equidistribution . . . . . . . . . . . . 222.2.5 Regularization of Factors . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Decomposition Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3.1 Strong Decomposition Theorems . . . . . . . . . . . . . . . . . . . . 312.3.2 Higher-order Fourier Expansion . . . . . . . . . . . . . . . . . . . . . 35
2.4 Systems of Linear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.4.1 Cauchy-Schwarz Complexity . . . . . . . . . . . . . . . . . . . . . . . 362.4.2 The True Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 HOMOGENEOUS NON-CLASSICAL POLYNOMIALS . . . . . . . . . . . . . . 393.1 Properties of the Derivative Polynomial . . . . . . . . . . . . . . . . . . . . . 413.2 A Homogeneous Basis, the Univariate Case . . . . . . . . . . . . . . . . . . . 423.3 A Homogeneous Basis, the Multivariate Case . . . . . . . . . . . . . . . . . . 44
4 DISTRIBUTION OF POLYNOMIAL FACTORS OVER LINEAR FORMS . . . . 464.1 Near-equidistribution over Linear Forms . . . . . . . . . . . . . . . . . . . . 484.2 Strong near-orthogonality: Proof of Theorem 2.2.23 . . . . . . . . . . . . . . 50
5 TRUE COMPLEXITY, ON A THEOREM OF GOWERS AND WOLF . . . . . . 575.1 The Gowers-Wolf Conjecture: Proof of Theorem 5.0.7 . . . . . . . . . . . . . 58
6 TESTABLE AFFINE-INVARIANT PROPERTIES . . . . . . . . . . . . . . . . . 656.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1.1 Algebraic Property Testing, Affine Invariance . . . . . . . . . . . . . 666.1.2 Localy Characterized Properties . . . . . . . . . . . . . . . . . . . . . 66
6.2 Locality for Affine-Invariant Properties . . . . . . . . . . . . . . . . . . . . . 686.3 Subspace Hereditary Properties . . . . . . . . . . . . . . . . . . . . . . . . . 69
vii
6.4 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4.1 Affine constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4.2 Complexity of Induced Affine Constraints . . . . . . . . . . . . . . . 726.4.3 Statement of the main result . . . . . . . . . . . . . . . . . . . . . . . 726.4.4 Comparison with Previous Work . . . . . . . . . . . . . . . . . . . . 73
6.5 Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.6 Strong Near-Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.7 Degree-structural Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.7.1 Every Degree Structural Property is Testable . . . . . . . . . . . . . 786.8 Big Picture Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.9 Proof of Testability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7 ALGORITHMIC ASPECTS OF REGULARITY . . . . . . . . . . . . . . . . . . 957.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.1.1 Inverse Theorem and Reed-Muller Decoding in High Characteristic . 967.1.2 Efficient Worst-Case to Average-Case Reductions for Polynomials . . 987.1.3 Algorithmic Inverse Theorem for Polynomials: Fields of Small Char-
acteristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.1.4 Polynomial decompositions . . . . . . . . . . . . . . . . . . . . . . . . 100
7.2 Algorithmic Bogdanov-Viola Lemma; Approximate Regularization . . . . . . 1017.2.1 Unbiased Almost-Refinement . . . . . . . . . . . . . . . . . . . . . . 104
7.3 Algorithmic Uniform Refinement in high Characteristic . . . . . . . . . . . . 1077.4 Algorithmic Regularity in Low Characteristic . . . . . . . . . . . . . . . . . 110
7.4.1 Algorithmic approximate strongly unbiased refinement . . . . . . . . 1137.4.2 Algorithmic exact strongly unbiased refinement . . . . . . . . . . . . 118
7.5 Applications in High Characteristic . . . . . . . . . . . . . . . . . . . . . . . 1197.5.1 Computing Polynomials with Large Gowers Norm . . . . . . . . . . . 1197.5.2 Algorithmic Inverse Theorem in High Characteristic . . . . . . . . . . 1217.5.3 Decoding Reed-Muller Codes with Structured Noise . . . . . . . . . . 124
7.6 Applications n Low Characterstc . . . . . . . . . . . . . . . . . . . . . . . . 1277.6.1 Computing a biased Polynomial . . . . . . . . . . . . . . . . . . . . . 1277.6.2 Worst Case to Average Case Reduction . . . . . . . . . . . . . . . . . 128
7.7 Algorithmic Inverse Theorem for Polynomials . . . . . . . . . . . . . . . . . 1307.7.1 Derivative Polynomials as Symmetric Multilinear Polynomials . . . . 1317.7.2 Computing Polynomials with Large Gowers Norm . . . . . . . . . . . 137
7.8 Application: Polynomial Decompositions . . . . . . . . . . . . . . . . . . . . 146
8 NORMS DEFINED BY LINEAR FORMS . . . . . . . . . . . . . . . . . . . . . . 1488.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1508.2 Norms defined by linear forms . . . . . . . . . . . . . . . . . . . . . . . . . . 1518.3 Holder type inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528.4 A characterization theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588.5 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1638.6 Type (I) norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
viii
8.6.1 Theorem 8.5.4, the Direct Part . . . . . . . . . . . . . . . . . . . . . 1678.6.2 Theorem 8.5.4, the Inverse Part . . . . . . . . . . . . . . . . . . . . . 168
9 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
ix
CHAPTER 1
INTRODUCTION
Traditional Fourier analysis has been shown to be a powerful tool in the study of problems
in additive and extremal combinatorics. However, in the recent years there has been several
cases, such as the study of arithmetic progressions, where traditional Fourier analysis comes
short. An extension of the traditional Fourier analysis, called higher-order Fourier analysis,
was initiated by Gowers’ seminal work [35], where he introduced a sequence of norms for
functions on Abelian groups and used them to prove quantitative bounds for Szemeredi’s
theorem on arithmetic progressions [73]. Higher-order Fourier analysis has been shown to
be very useful in analysing the density of linear patterns in mathematical objects, and has
been extensively studied in several works [44, 45, 11, 78, 77, 68, 36, 38, 37, 39, 72, 71]. We
refer the interested reader to the monograph by Tao [76].
In this dissertation we are interested in the case where the underlying group is Fn, where
F = Fp for a fixed prime p. The traditional Fourier analysis over Fn allows one to express
a given function as a linear combination of the characters of Fn. Each character of Fn
is a linear phase corresponding to an α ∈ Fn, defined as χp(x) := ep (∑ni=1 αixi), where
ep(m) := e2πm/p. In higher-order Fourier analysis, the linear polynomials are replaced
by higher-degree polynomials, and one would like to express a function f : Fn → C as
a linear combination of such functions. Such a higher-order expansion can be achieved in
an approximate sense via the so-called “decomposition theorems” which are proved by an
application of the “inverse theorems” for uniformity norms, such as Gowers norms [36, 75, 43].
This allows one to express a given function f in the form f = f1+∑ki=1 gi where f1 is a linear
combination of higher-degree phases and gi’s are sufficiently small in an appropriate norm
allowing us to ignore them as negligible error [36]. One can then use such a decomposition
of f in order to reduce the problem of estimating the density of a linear pattern within f
to the problem of understanding the distribution of a collection of polynomials over a given
1
linear pattern.
One of the main useful properties of Fourier analysis is that the set of characters form an
orthonormal basis. However, we lose this orthogonality when we include higher degree poly-
nomials; for example a nonzero linear combination of two distinct quadratic polynomials will
always have a nonzero bias. Green and Tao [43], and Kaufman and Lovett [54] resolve this
issue by introducing a notion of regularity for polynomials, where a regular polynomial is one
that is “distributionally well-behaved”. These set of works provide an approximate orthogo-
nality which is sufficient for understading quantities of the form Ex∈Fn [f1(x) · · · fm(x)]. This
is not completely satisfactory towards answering questions about the density of arbitrary lin-
ear patterns, namely EX∈(Fn)k [f1(L1(X)) · · · fm(Lm(X))], where Li’s are arbitrary linear
forms over k variables. A main contribution of this dissertation which resolves this issue, is
a new near-orthogonality for regular polynomials over arbitrary collections of linear forms.
This improves on previous more restrictive results such as [15] for affine systems of linear
forms, and [51] for fields of sufficiently large characteristic. This strong near-orthogonality
result allows us to settle a conjecture of Gowers and Wolf regarding the complexity of a
system of linear forms.
Gowers uniformity norms are commonly used to control averages of linear patterns such
as arithmetic progressions within subsets of a group. Since higher-order Gowers norms
represent stronger notions of uniformity, a question arises naturally: Given a linear pattern,
i.e. a collection of linear forms, what is the smallest k for which the Gowers Uk norm
controls the density of such patterns? This question was asked and studied by Gowers and
Wolf [37] who conjectured a simple characterization for this value, and in [38] verified it for
the case of functions over Fn in the case when |F| is sufficiently large. We use our strong
near-orthogonality result to settle the Gowers-Wolf conjecture in full generality over Fn.
The decomposition theorems discussed above can be thought of as arithmetic “regularity
lemmas”, where we think of regularity as a notion of uniformity that allows one to decompose
2
a given object into a collection of simpler objects which appear random according to certain
statistics. Szemeredi’s celebrated regularity lemma [74, 61], which has been a fundamental
tool in combinatorics [73, 59], can be thought of as a decomposition theorem for bivariate
functions, where a given function, corresponding to the edges of a given graph is decomposed
into a structured part, a small error, and a uniform or pseudorandom part. In the past
decade such decomposition theorems have been developed in much more general settings
and under various stronger or weaker notions and parameters of uniformity [36, 75, 40,
27]. Decomposition theorems can be used to reduce questions regarding large and complex
mathematical objects to that of bounded complexity objects while only suffering a small error
in parameters. This general phenomenon has been used in proving several major results in
mathematics [42, 36, 83], and the reduction to a lower complexity object under a controllable
error has also been shown to be useful in computer science [3, 16, 19]. The development
and application of decomposition theorems requires answers to several interesting questions
regarding the notion of uniformity that is to be used, the kinds of error that can be tolerated,
the relations between error, uniformity and structure parameters that are required, and
whether such a decomposition theorem exists or can be obtained efficiently. Several such
questions are discussed in this dissertation.
The sequence of Gowers uniformity norms have been shown to have numerous applications
such as the study of linear structures such as arithmetic progressions in subsets of a group,
or as a test for correlation with low-degree phases. The d+ 1’th Gowers norm of a function
f : Fn → C is defined by picking a random d dimensional parallelepiped and considering the
average product of the function over all its points, more formally
‖f‖Ud+1 :=
EX,Y1,...,Yd∈Fn
∏S⊆[d]
C|S|f(X +∑i∈S
Yi)
1/2d
,
where C is complex conjugation. Motivated by the definition of Gowers norms one may
3
be interested in introducing similar norms defined by different systems of linear forms. We
study such norms and give nice necessary conditions on the structure of systems of linear
forms which define a norm. In particular, we prove that such norms can be one of two types.
We then show that norms of the first type, for which the complexity of the system of the
linear forms is smaller than |F|, are essentially equivalent to a Gowers norm. We further
show that norms of the second type are equivalent to Lp norms. Our results are inspired by
similar necessary conditions in the context of graph normings by Hamed Hatami [47, 48].
Motivated by turning the numerous applications of decomposition theorems and regu-
larity lemmas into algorithms, it is natural to ask whether it is possible to achieve them
algorithmically. In the case of Szemeredi’s regularity lemma, the original proof was non-
algorithmic and the question of finding an algorithm for computing the regularity partition
was first considered by Alon et.al. [1]. The Szemeredi regularity lemma is often used to
prove existence of certain substructures, such as triangles [67], in large graphs, and with
an algorithmic version of the lemma one can efficiently “find’ these structures in a given
graph. Since then, there have been numerous improvements and extensions to the algorith-
mic version of the Szemeredi regularity lemma, of which the works [5, 28, 58, 25] constitute
a partial list (see [25] for a detailed discussion). The arithmetic regularity lemmas of Green
and Tao [43] and Kaufman and Lovett [54] show that one can modify a given collection
of polynomials into a new “pseudorandom” collection of polynomials. These lemmas have
various applications in computer science, such as (special cases of) Reed-Muller testing and
worst-case to average-case reductions for polynomials. However, the proofs of [43, 54], which
involve a transformation from the first collection to the pseudorandom collection, are not
algorithmic. In this dissertation we consider analytic notions of regularity for polynomials
and show that they allow for efficient algorithms.
One main application of our algorithmic regularity lemma is in decoding of Reed-Muller
codes. In the case when the characteristic of F is large, the Gowers norm gives an approximate
4
test for checking if for a given polynomial P , there exists a Q of degree at most k within
Hamming distance 1 − 1|F| − ε [43]. This is remarkable because the list-decoding radius of
Reed-Muller codes of order k codes is less than 1 − k|F| for k < |F| [33], and the test works
even beyond that. Moreover, Tao and Ziegler [11] later showed that this test works not only
when P is a polynomial of degree d < |F| but also when P is an arbitrary function. Given
this, it is natural to ask for the decoding analogue:
Given P of degree d over Fn, if there exists Q of degree k such that dist(P,Q) 6
1 − 1|F| − ε, can one find a Q′ (in time polynomial in n) of degree k such that
dist(P,Q′) 6 1− 1|F| − η for some η depending on ε?
We use our algorithmic regularity lemma and solve the above decoding question for Reed-
Muller codes beyond their list-decoding radius: This special case can be interpreted as the
scenario when the received word P is obtained from some codeword Q of degree k by adding
a “noise” P − Q of degree d. Thus, when the noise is “structured”, we can decode in the
discussed sense even beyond the list-decoding radius. This shows that although there can be
exponentially many such Q due to the fact that we are in the regime beyond the list-decoding
radius (see [55, 19]), the question is still tractable if we ask only for one such Q and if we
allow a loss from ε to η. Such a decoding question was solved for Reed-Muller codes of order
2 over F2, for any given function f (instead of a polynomial P of bounded degree) by [79, 10].
Lastly a key contribution of this dissertation is an application of higher-order Fourier
analysis to give a characterization of testable algebraic properties. The field of property
testing, as initiated by [20, 8], is the study of algorithms that query their input a very small
number of times and with high probability decide correctly whether the input satisfies a
given property or is “far” from satisfying that property. A property is called testable if
the number of queries can be made independent of the size of the object without affecting
the correctness probability. Perhaps surprisingly, it has been found that a large number of
natural properties satisfy this strong requirement; see e.g. the surveys [24, 65, 64, 70] for
5
a general overview. An example of such phenomenon is the famous result of Blum, Luby
and Rubinfeld [20] who showed that linearity of a function f : Fn → F is testable by a test
which makes only 3 random queries. Some of the earliest results in the field of property
testing are about the properties of functions on Fn, where p is a fixed prime. However, other
settings such as graph and hypergraph properties have witnessed more progress in the past,
where several characterizations for testable properties have been proved [3, 7, 6, 26, 22].
This is mainly due to the fact that the concepts of uniformity and related approximations
are somewhat simpler in those settings. In this thesis we use the recently developed tools
in higher-order Fourier analysis in order to extend some of the results from the graph and
hypergraphs setting to the setting of functions on Fn.
Specifically, we give a complete characterization of the so called “proximity oblivious”
testable properties that are invariant under affine transformations. We consider properties
of functions f : Fn → 1, . . . , R. Our main result shows that any such property that
is invariant with respect to affine transformations on Fn and that is locally characterized
is testable. Furthermore, we show that a large class of natural algebraic properties whose
query complexity had not been previously studied are locally characterized affine-invariant
properties and are, hence, testable. Our results are in the one-sided regime, where a tester
is required to accept any function that belongs to the studied property.
Recently, there has been several other developments in the study of affine-invariant prop-
erties; see [12] for a summary. Hatami and Lovett [52] proved that higher-order Fourier
analysis combined with ideas from [26] can be used to show that the distance to every
testable affine invariant property is constant-query estimable. This extends the ideas of [26]
to the algebraic property testing setting. In the two-sided error regime, Yoshida [81] uses
the decomposition theorems in order to give a full characterization of two-sided testable
affine invariant properties, analogous to the work of Alon et.al. [3] in the context of graphs.
In a more recent work, Yoshida [82] uses non-standard analysis in order to define limits of
6
functions sequences, which allows for a simpler characterization of two-sided testable affine-
invariant properties. An approach to defining limit objects for Boolean functions over Fn
using finitary higher-order Fourier analysis can be seen in [49].
7
Organization.
In Chapter 2 we introduce higher-order Fourier analysis in detail and develop some of the
essential tools that we need throughout this thesis. Notions of regularity and their associ-
ated norms and decomposition theorems as well as distributional properties of collections
of polynomials over linear forms are discussed. In Chapter 3 we prove that there exists a
basis for “non-classical” polynomials consisting only of “homogeneous non-classical poly-
nomials”. Later we use this theorem in Chapter 4 to prove our main theorem which is a
near-equidistribution theorem for collections of polynomials over arbitrary collections of lin-
ear forms. We use our near-equidistribution theorem to prove a conjecture of Gowers and
Wolf regarding a complexity measure of systems of linear forms in Chapter 5. Chapter 6
is concerned with the field of algebraic property testing, where we prove a characterization
theorem for affine invariant testable properties. Finally, in Chapter 8 we investigate norms
that are defined by averaging over a system of linear forms, where we give necessary condi-
tions on the structure of such systems of linear forms and use it to relate to Lp or Gowers
norms.
8
Notation
Fix a prime field F = Fp for a prime p > 2. Define | · | to be the standard map from F
to 0, 1, . . . , p − 1 ⊂ Z. Let D denote the complex unit disk z ∈ C : |z| 6 1, and let
T denote the circle group R/Z, which is an Abelian group with group operation denoted
+. Let e : T → C denote the character e (x) = e2πix, by abuse of notation, we may use
e : F→ C to denote e(x) = e2πix/|F| as well.
For integers a, b, we let [a] denote the set 1, 2, . . . , a and [a, b] denote the set a, a +
1, . . . , b. For real numbers, α, σ, ε, we use the shorthand σ = α ± ε to denote α − ε 6 σ 6
α + ε. The zero element in Fn is denoted by 0. We will denote by lower case letters, e.g.
x, y, elements of Fn. We use capital letters, e.g. X = (x1, . . . , xk) ∈ (Fn)k, to denote tuples
of variables.
9
CHAPTER 2
HIGHER-ORDER FOURIER ANALYSIS
In this chapter we introduce higher-order Fourier analysis in detail, and prove several tech-
nical tools that will be required throught this dissertation. We discuss several properties of
polynomials over finite fields, and extensions of polynomials that are required when the field
size is small. Notions of regularity for collections of polynomials and norms that capture
these notions of regularity. Algebraic decomposition theorems associated with uniformity
norms are presented, which allow to decompose a given function to a structured part given
by a collection of polynomials and a pseudorandom part. We then show how one can use such
a decomposition to write a higher-order expansion of the given function. We then discuss the
distribution polynomials over collections of linear forms. This is useful because the density
of a linear pattern in an arbitrary function can be related to the distribution of a regular
collection of polynomials over a collection of linear forms via a decomposition theorem.
2.1 Polynomials over Finite Fields
Let d > 0 be an integer. A polynomial P : Rn → R of degree < d can be defined in one of
two equivalent ways:
• (Global Definition) P is a polynomial of degree < d if and only if it can be written in
the form
P (x1, ..., xn) =∑
06i1,...,in;i1+···+in<d
ci1,...,inxi11 · · ·x
inn ,
with coefficients ci1,...,in ∈ R.
• (Local Definition) P is a polynomial of degree < d if it is d times differentiable and its
d-th derivative vanishes.
10
It is not difficult to check that the above two definitions are equivalent, using the Taylor series
expansion to go from the local to the global definition. In finite characteristic, i.e. when
P : Fn → G, the local definition of a polynomial used the notion of additive derivatives.
Definition 2.1.1 (Polynomials over finite fields (local definition)). For an integer d > 0, a
function P : Fn → G is said to be a polynomial of degree 6 d if for all y1, . . . , yd+1, x ∈ Fn,
it holds that
(Dy1 · · ·Dyd+1P )(x) = 0,
where DyP (x) = P (x + y)− P (x) is the additive derivative of P at x and direction y. The
degree of P is the smallest d for which the above holds.
It follows simply from the definition that for any direction y ∈ Fn, deg(DyP ) < deg(P ).
In the “classical” case of polynomials P : Fn → F, it is a well-known fact that the global and
local definitions coincide. However, the situation is different in more general groups. One
way to suspect this is the fact that we are not able to divide by d! in some cases, in order to
be able to make use of Taylor expansions. For example when the range of P is R/Z, it turns
out that the global definition must be refined to the “non-classical polynomials”, which may
have monomials that are different from the classical case. This phenomenon was noted by
Tao and Ziegler [78] in the study of Gowers norms.
2.1.1 Non-classical Polynomials
Definition 2.1.2 (Non-classical Polynomials (local definition)). For an integer d > 0, a
function P : Fn → T is said to be a non-classical polynomial of degree 6 d (or simply a
polynomial of degree 6 d) if for all y1, . . . , yd+1, x ∈ Fn, it holds that
(Dy1 · · ·Dyd+1P )(x) = 0. (2.1)
11
The degree of P is the smallest d for which the above holds. A function P : Fn → T is said
to be a classical polynomial of degree 6 d if it is a non-classical polynomial of degree 6 d
whose image is contained in 1pZ/Z.
The following lemma of Tao and Ziegler [78] shows that a classical polynomial P of degree
d must always be of the form x 7→ |Q(x)|p , where Q : Fn → F is a polynomial (in the usual
sense) of degree d, and | · | is the standard map from F to 0, 1, . . . , p− 1. This lemma also
characterizes the structure of non-classical polynomials.
Lemma 2.1.3 (Lemma 1.7 in [78]). A function P : Fn → T is a polynomial of degree 6 d
if and only if P can be represented as
P (x1, . . . , xn) = α +∑
06d1,...,dn<p;k>0:0<∑i di6d−k(p−1)
cd1,...,dn,k|x1|d1 · · · |xn|dn
pk+1mod 1,
for a unique choice of cd1,...,dn,k ∈ 0, 1, . . . , p− 1 and α ∈ T. The element α is called the
shift of P , and the largest integer k such that there exist d1, . . . , dn for which cd1,...,dn,k 6= 0
is called the depth of P . A depth-k polynomial P takes values in a coset of the subgroup
Uk+1def= 1
pk+1Z/Z. Classical polynomials correspond to polynomials with 0 shift and 0 depth.
Note that Lemma 2.1.3 immediately implies the following important observation1:
Remark 2.1.4. If Q : Fn → T is a polynomial of degree d and depth k, then pQ is a
polynomial of degree max(d − p + 1, 0) and depth k − 1. In other words, if Q is classical,
then pQ vanishes, and otherwise, its degree decreases by p − 1 and its depth by 1. Also, if
λ ∈ [1, p− 1] is an integer, then deg(λQ) = d and depth(λQ) = k.
For convenience of exposition, henceforth we will assume that the shifts of all polynomials
are zero. This can be done without affecting any of the results presented in this thesis. Hence,
all polynomials of depth k take values in Uk+1.
1. Recall that T is an additive group. If n ∈ Z and x ∈ T, then nx is shorthand for x + · · ·+ x if n > 0and −x− · · · − x otherwise, where there are |n| terms in both expressions.
12
2.1.2 The Derivative Polynomial
Given a non-classical polynomial P of exact degree d, it is often useful to consider the
properties of its d-th derivative. Motivated by this, we give the following definition.
Definition 2.1.5 (Derivative Polynomial). Let P : Fn → T be a degree-d polynomial, possi-
bly non-classical. Define the derivative polynomial ∂P : (Fn)d → T by the following formula
∂P (h1, . . . , hd)def= Dh1
· · ·DhdP (0),
where h1, . . . , hd ∈ Fn.2 Moreover for k < d define
∂kP (x, h1, . . . , hk)def= Dh1
· · ·DhkP (x).
The following lemma shows some useful properties of the derivative polynomial.
Lemma 2.1.6. Let P : Fn → T be a degree-d (non-classical) polynomial. Then the polyno-
mial ∂P (h1, . . . , hd) is
(i) multilinear: ∂P is additive in each hi.
(ii) invariant under permutations of h1, . . . , hd.
(iii) a classical nonzero polynomial of degree d.
(iv) homogeneous: all its monomials are of degree d.
Notice that by multilinear we mean additive in each direction hi, which is not the usual
use of the term “multilinear”.
Proof. The proof follows by the properties of the additive derivative Dh. Multilinearity of ∂P
follows from linearity of the additive derivative, namely for every function Q and directions
2. Notice since P is a degree d polynomial, Dh1. . . Dhd
P (x) does not depend on x and thus we have theidentity ∂P (h1, . . . , hd) = Dh1
. . . DhdP (x) for any choice of x ∈ Fn.
13
h1, h2 we have the identity Dh1+h2Q(x) = Dh1
Q(x) +Dh2Q(x+ h1). The invariance under
permutations of h1, . . . , hd is a result of commutativity of the additive derivatives. Since
P is a degree-d (non-classical) polynomial, ∂P is nonzero by definition. Notice that since
D0Q ≡ 0 for any function Q, we have ∂P (h1, . . . , hd) = 0 if any of hi is equal to zero. Hence
every monomial of ∂P must depend on all hi’s. The properties (iii) and (iv) now follow from
this and the fact that deg(∂P ) 6 d and thus each monomial has exactly one variable from
each hi.
2.1.3 Gowers Norms
Gowers norms, which were introduced by Gowers [35], play an important role in additive
combinatorics, more specifically in the study of polynomials of bounded degree. For example,
it is known through an inverse theorem for Gowers norms that a Gowers norm of a function
can be used to test whether it is close to a polynomial of a given degree bound. Let us first
define the multiplicative derivative of a complex valued function.
Definition 2.1.7 (Multiplicative Derivative). Given a function f : Fn → C and an element
h ∈ Fn, define the multiplicative derivative in direction h of f to be the function ∆hf : Fn →
C satisfying ∆hf(x) = f(x+ h)f(x) for all x ∈ Fn.
The Gowers norm of order d for a function f : Fn → C is the expected multiplicative
derivative of f in d random directions at a random point.
Definition 2.1.8 (Gowers norm). Given a function f : Fn → C and an integer d > 1, the
Gowers norm of order d for f is given by
‖f‖Ud =
∣∣∣∣ Ey1,...,yd,x∈Fn
[(∆y1∆y2 · · ·∆ydf)(x)
]∣∣∣∣1/2d .Note that as ‖f‖U1 = |E [f ] | the Gowers norm of order 1 is only a semi-norm. However
for d > 1, it is not difficult to show that ‖ · ‖Ud is indeed a norm, using the following lemma
14
which was first proved in [35] as a key property of Gowers norms.
Lemma 2.1.9 (Gowers Cauchy-Schwarz). Consider a family of functions fS : Fn → C,
where S ⊆ [d]. Then
∣∣∣∣∣∣ Ex,y1,...,yd∈Fn
∏S⊆[d]
Cd−|S|fS(x+∑i∈S
yi)
∣∣∣∣∣∣ 6∏S⊆[d]
‖fS‖Ud , (2.2)
It is a direct consequence of Definition 2.1.2 that a function f : Fn → C with ‖f‖∞ 6 1
satisfies ‖f‖Ud+1 = 1 if and only if f = e (P ) for a (non-classical) polynomial P : Fn → T
of degree 6 d. The inverse theorem for Gowers norms is a robust version of this statement.
We denote by Poly(Fn → T) and Poly6d(Fn → T), respectively, the set of all non-classical
polynomials, and the ones of degree at most d.
Theorem 2.1.10 (Theorem 1.11 of [78]). Suppose δ > 0, and d > 1 is an integer. There
exists an ε = ε2.1.10(δ, d,F) such that the following holds. For every function f : Fn → C
with ‖f‖∞ 6 1 and ‖f‖Ud+1 > δ, there exists a polynomial P ∈ Poly6d(Fn → T) of degree
6 d that is ε-correlated with f , meaning
∣∣∣∣ Ex∈Fn
f(x)e (−P (x))
∣∣∣∣ > ε.
2.2 Rank and Regularity
The rank of a polynomial is a notion of its complexity according to lower degree polynomials.
Definition 2.2.1 (Rank of a polynomial). Given a polynomial P : Fn → T and an integer
d > 1, the d-rank of P , denoted rankd(P ), is defined to be the smallest integer r such that
there exist polynomials Q1, . . . , Qr : Fn → T of degree 6 d − 1 and a function Γ : Tr → T
satisfying P (x) = Γ(Q1(x), . . . , Qr(x)). If d = 1, then 1-rank is defined to be ∞ if P is
non-constant and 0 otherwise.
15
The rank of a polynomial P : Fn → T is its deg(P )-rank. We say that P is r-regular if
rank(P ) > r.
Note that for an integer λ ∈ [1, p−1], rank(P ) = rank(λP ). For future use, we record here
a simple lemma stating that restrictions of high rank polynomials to hyperplanes generally
preserve degree and high rank.
Lemma 2.2.2. Suppose P : Fn → T is a polynomial of degree d and rank > r, where
r > p + 1. Let A be a hyperplane in Fn, and denote by P ′ the restriction of P to A. Then,
P ′ is a polynomial of degree d and rank > r − p, unless d = 1 and P is constant on A.
Proof. For the case d = 1, we can check directly that either P ′ is constant or else, P ′ is a
non-constant degree-1 polynomial and so has rank infinity.
So, assume d > 1. By making an affine transformation, we can assume without loss of
generality that A is the hyperplane x1 = 0. Let π : Fn → Fn−1 be the projection to A.
Let P ′′ = P − P ′ π. Clearly, P ′′ is zero on A. For x ∈ F \ 0, let hx = (x, 0, . . . , 0) ∈ Fn.
Note that DhxP′′ is of degree 6 d − 1 and that (DhxP
′′)(y) = P ′′(y + hx) for all y ∈ A.
Hence, for every x ∈ F \ 0, P ′′ on hx +A agrees with a polynomial Qx of degree 6 d− 1.
So, for a function Γ : Tp+1 → T, we can write P = Γ(ι(x1), P ′, Q1, Q2, . . . , Qp−1), where
ι(x1), Q1, . . . , Qp−1 are of degree 6 d− 1.
Now, if P ′ itself is of degree d− 1, then P is of rank 6 p+ 1 < r, a contradiction. If P ′
is of rank < r − p, then again P is of rank < r − p+ p = r, a contradiction.
A high-rank polynomial of degree d is, intuitively, a “generic” degree-d polynomial. There
are no unexpected ways to decompose it into lower degree polynomials.
2.2.1 Polynomial Factors
Next, we will formalize the notion of a generic collection of polynomials. Intuitively, it should
mean that there are no unexpected algebraic dependencies among the polynomials. First,
16
we need to set up some notation.
Definition 2.2.3 (Factors). If X is a finite set then by a factor B we mean simply a partition
of X into finitely many pieces called atoms.
A function f : X → C is called B-measurable if it is constant on atoms of B. For any
function f : X → C, we may define the conditional expectation
E[f |B](x) = Ey∈B(x)
[f(y)],
where B(x) is the unique atom in B that contains x. Note that E[f |B] is B-measurable.
A finite collection of functions φ1, . . . , φC from X to some other space Y naturally define a
factor B = Bφ1,...,φC whose atoms are sets of the form x : (φ1(x), . . . , φC(x)) = (y1, . . . , yC)
for some (y1, . . . , yC) ∈ Y C . By an abuse of notation we also use B to denote the map
x 7→ (φ1(x), . . . , φC(x)), thus also identifying the atom containing x with (φ1(x), . . . , φC(x)).
Definition 2.2.4 (Polynomial factors). If P1, . . . , PC : Fn → T is a sequence of polynomials,
then the factor BP1,...,PC is called a polynomial factor.
The complexity of B, denoted |B| := C, is the number of defining polynomials. The degree
of B is the maximum degree among its defining polynomials P1, . . . , PC . If P1, . . . , PC are
of depths k1, . . . , kC , respectively, then the number of atoms of B is at most∏Ci=1 p
ki+1.
Definition 2.2.5 (Conditional Expectation over Factors). Let B be a polynomial factor
defined by P1, . . . , PC . For f : Fn → C, the conditional expectation of f with respect to B,
denoted E[f |B] : Fn → C, is
E[f |B](x) = Ey∈Fn
[f(y)
∣∣P1(y) = P1(x), . . . , PC(y) = PC(x)].
Namely E[f |B] is a constant on every atom of B.
17
Following is a simple observation stating that E[f |B] has same correlation with any other
function as f does.
Remark 2.2.6. Let f : Fn → C. Let B be a polynomial factor defined by polynomials
P1, ..., PC . Let g : Fn → C be any B-measurable function. Then
〈f, g〉 = 〈E(f |B), g〉.
Finally, we define the rank of a factor similarly to that of a single polynomial.
Definition 2.2.7 (Rank of a Factor). A polynomial factor B defined by a sequence of poly-
nomials P1, . . . , PC : Fn → T with respective depths k1, . . . , kC is said to have rank r if r
is the least integer for which there exists (λ1, . . . , λC) ∈ ZC , with (λ1 mod pk1+1, . . . , λC
mod pkC+1) 6= 0C , such that rankd(∑Ci=1 λiPi) 6 r, where d = maxi deg(λiPi).
Given a polynomial factor B and a function r : Z>0 → Z>0, we say that B is r-regular
if B is of rank larger than r(|B|).
Notice that by the definition of rank, for a degree-d polynomial P of depth k we have
rank(P) = min
rankd(P ), rankd−(p−1)(pP ), . . . , rankd−k(p−1)(pkP )
,
where P, is a polynomial factor consisting only of one polynomial P .
Regular factors indeed do behave like a generic collection of polynomials, as we shall
establish in a precise sense in Section 2.2.3. Thus, given any factor B that is not regular, it
will often be useful to regularize B, that is, find a refinement B′ of B that is regular up to
our desires. We distinguish between two kinds of refinements:
Next, we define the notion of conditional expectation with respect to a given factor.
Definition 2.2.8 (Expectation over polynomial factor). Given a factor B and a function
f : Fn → 0, 1, the expectation of f over an atom y ∈ T|B| is the average Ex:B(x)=y[f(x)],
18
which we denote by E[f |y]. The conditional expectation of f over B, is the real-valued
function over Fn given by E[f |B](x) = E[f |B(x)]. In particular, it is constant on every
atom of the polynomial factor.
2.2.2 Analytic Measures of Uniformity
Regularity defined by the notion of rank is an algebraic/combinatorial notion of pseudo-
randomness, however there are several cases where an analytic notion would be much more
useful. In many cases, what is needed from the notion of regularity is for every nonzero
linear combination of polynomials in the polynomial factor to be unbiased, where the bias
of a function is defined as follows.
Definition 2.2.9 (Bias). The bias of a function f : Fn → T is defined to be
bias(f) := Ex∈Fn
[eF(f(x))] .
Green and Tao [43], and Kaufman and Lovett [54] proved the following relation between
bias and rank of a polynomial.
Theorem 2.2.10 (d < |F| [43], arbitrary F [54]). For any ε > 0 and integer d > 0, there
exists r = r2.2.10(d, ε) such that the following is true. If P : Fn → T is a degree-d polynomial
with rank greater than r, then |Ex[e (P (x))]| < ε.
Kaufman and Lovett originally proved Theorem 2.2.10 for classical polynomials. But
their proof also works for non-classical ones without modification.
Note that r2.2.10(δ, d), does not depend on the dimension n of Fn. Motivated by Theo-
rem 2.2.10 we define unbiasedness for polynomial factors.
Definition 2.2.11 (γ-unbiased factors). Let F be a factor of degree d, and let γ : N→ R+
be a decreasing function. We say that F is γ-unbiased if for every collection of coefficients
19
ci,j ∈ F16i6d,16j6Mi, we have
bias(∑i,j
ci,jPi,j) 6 γ(dim(F)).
Using Gowers norms, one can define the following analytical notion of uniformity for
polynomials which is stronger than unbiasedness.
Definition 2.2.12 (Uniformity). Let ε > 0 be a real. A degree-d polynomial P : Fn → T is
said to be ε-uniform if
‖e (P )‖Ud < ε.
Tao and Ziegler used Theorem 2.2.10 to show that high rank polynomials have small
Gowers norm.
Theorem 2.2.13 (Theorem 1.20 of [78]). For any ε > 0 and integer d > 0, there exists an
integer r(d, ε) such that the following is true. For any (non-classical) polynomial P : Fn → T
of degree 6 d, if ‖e (P )‖Ud > ε, then rankd(P ) 6 r.
This immediately implies that a regular polynomial is also uniform.
Corollary 2.2.14. Let ε, d, and r(d, ε) be as in Theorem 2.2.13. Every r-regular polynomial
P of degree d is also ε-uniform.
The next claim, which is a standard application of Fourier analysis, shows that the
converse of this is true at least qualitatively.
Claim 2.2.15. For every r, d ∈ N, there is ε(r, d) such that every ε-uniform polynomial of
degree d has rank(P ) > r.
The following is an analytical notion of uniformity for a factor along the same lines as
Definition 2.2.12.
20
Definition 2.2.16 (Uniform Factor). Let γ : N → R+ be a decreasing function. A poly-
nomial factor B defined by a sequence of polynomials P1, . . . , PC : Fn → T with respective
depths k1, . . . , kC is said to be γ-uniform if for every collection (λ1, . . . , λC) ∈ ZC , with (λ1
mod pk1+1, . . . , λC mod pkC+1) 6= 0C
∥∥∥∥∥e(∑
i
λiPi
)∥∥∥∥∥Ud
< γ(dim(F)),
where d = maxi deg(λiPi).
Remark 2.2.17 (Equivalence between rank and uniformity). Similar to Corollary 2.2.14
it also follows from Theorem 2.2.13 that an r-regular degree-d factor B is also ε-uniform
when r = r2.2.13(d, ε) is as in Theorem 2.2.13. The converse of this also holds, since by
Claim 2.2.15, and there is an approximate equivalence between regularity and uniformity.
2.2.3 Equidistribution and Near-Orthogonality of Regular Factors
In this section, we make precise the intuition that a high-rank collection of polynomials
often behaves like a collection of independent random variables. The key technical tool is
the connection between the combinatorial notion of rank and the analytic notion of bias
(Theorem 2.2.10).
Using a standard observation that relates the bias of a function to its distribution on its
range, Theorem 2.2.10 implies the following.
Lemma 2.2.18 (Size of atoms). Given ε > 0, let B be a polynomial factor of degree d > 0,
complexity C, and rank r2.2.10(d, ε), defined by a tuple of polynomials P1, . . . , PC : Fn → T
having respective depths k1, . . . , kC . Suppose b = (b1, . . . , bC) ∈ Uk1+1 × · · · ×UkC+1. Then
Prx
[B(x) = b] =1
‖B‖± ε.
21
In particular, for ε < 1‖B‖ , B(x) attains every possible value in its range and thus has ‖B‖
atoms.
Proof.
Prx
[B(x) = b] = Ex
∏i
1
pki+1
pki+1−1∑λi=0
e (λi(Pi(x)− bi))
=∏i
p−(ki+1) ·∑
(λ1,...,λC)
∈∏i[0,p
ki+1−1]
Ex
[e
(∑i
λi(Pi(x)− bi)
)]
=∏i
p−(ki+1) ·
(1± ε
∏i
pki+1
)=
1
‖B‖± ε.
The first equality uses the fact that Pi(x)−bi is in Uki+1 and that for any nonzero x ∈ Uki+1,∑pk+1−1λ=0 e (λx) = 0. The third equality uses Theorem 2.2.10 and the fact that unless every
λi = 0, the polynomial∑i λi(Pi(x)− bi) has rank at least r2.2.10(d, ε).
An almost identical proof implies a similar statement for unbiased factors instead of
regular factors.
Lemma 2.2.19 (Equidistribution for unbiased factors). Suppose that γ : N → R+ is
a decreasing function. Let F be a γ-unbiased factor of degree d > 0, defined by a tu-
ple of polynomials P1, . . . , PC : Fn → T having respective depths k1, . . . , kC . Suppose
b = (b1, . . . , bC) ∈ Uk1+1 × · · · × UkC+1. Then
Prx∈Fn
[F(x) = b] =1
‖F‖± γ(dim(F)).
2.2.4 Strong Near-Orthogonality and Equidistribution
One of the important and useful properties of the classical Fourier characters is that they
form an orthonormal basis. Theorem 2.2.10, Lemma 2.2.18 and Lemma 2.2.19 provide an
22
approximate version of this phenomenon which is useful for several applications. However,
this is not completely satisfactory, as we will see later that in order to study averages of the
form
EX∈(Fn)k[f(L1(X)) · · · f(Lm(X))
],
for a bounded function f : Fn → R, one needs to understand the distribution of the more
sophisticated random variable
AP,L(X) :=
P1(L1(X)) P2(L1(X)) . . . PC(L1(X))
P1(L2(X)) P2(L2(X)) . . . PC(L2(X))
......
P1(Lm(X)) P2(Lm(X)) . . . PC(Lm(X))
,
where X is the uniform random variable taking values in (Fn)k, P = P1, ..., PC is a family
of non-classical polynomials and L = L1, ..., Lm ⊂ Fk is an arbitrary system of linear
forms.
Since (non-classical) polynomials of a given degree satisfy various linear identities (e.g.
every degree one polynomial P satisfies P (x + y + z) = P (x + y) + P (x + z) − P (x)), it
is no longer possible to choose the polynomials in a way that this random matrix is almost
uniformly distributed over all m×C matrices A ∈∏mi=1
∏Cj=1 Ukj+1. Therefore, in this case
one would like to obtain an almost uniform distribution on the points of the configuration
space that are consistent with these linear identities. Note that [43] and [54] only address
the case where the system of linear forms consists only of a single linear form; in other words
they show that the entries in each row of this matrix are nearly independent. The stronger
near-equidistribution would in particular imply that the columns of this matrix are nearly
independent, with each column uniform modulo the required linear dependencies.
Hatami and Lovett [51] established this strong near-equidistribution in the case where
the characteristic of the field F is greater than the degree of the (classical) polynomial factor.
23
Lemma 2.2.20 ([51]). Let F be a prime field and d < |F|. Let L1, . . . , Lm be a system of
linear forms. Let F = P1, . . . , Pk be a collection of homogeneous classical polynomials of
degree at most d, such that rank(F) > τp,d(ε). For every set of coefficients Λ = λi,j ∈ F :
i ∈ [k], j ∈ [m], and
PΛ(x) :=k∑i=1
m∑j=1
λi,jPi(Lj(x)),
one of the following two cases holds:
PΛ ≡ 0 or E [eF(PΛ)] < ε.
The proof technique used in [51] breaks down once |F| is allowed to be small. Bhat-
tacharyya, et al. [15] later extended the result of [51] to the general characteristic case, but
under the extra assumption that the system of linear forms is affine, i.e. there is a variable
that appears with coefficient 1 in all the linear forms.
Theorem 2.2.21 (Near orthogonality [15]). Given ε > 0, suppose B = (P1, . . . , PC) is a
polynomial factor of degree d > 0 and rank > r2.2.13(d, ε), A = (L1, . . . , Lm) is an affine
constraint on ` variables, and Λ is a tuple of integers (λi,j)i∈[C],j∈[m]. Define
PA,B,Λ(x1, . . . , x`) =∑
i∈[C],j∈[m]
λi,jPi(Lj(x1, . . . , x`)).
Then, one of the two statements below is true.
• For every i ∈ [C], it holds that∑j∈[m] λi,jQi(Lj(·)) ≡ 0 for all polynomials Qi : Fn →
T with the same degree and depth as Pi. Clearly, PA,B,Λ ≡ 0 in this case.
• PA,B,Λ is non-constant. Moreover, |Ex1,...,x` [e(PA,B,Λ(x1, . . . , x`)
)]| < ε.
One can use the derivative technique used in [15] in a straightforward manner to prove
the following near-orthogonality result for when the linear forms are not necessarily affine
24
but they are allowed only to take 0 and 1 entries (See [18] for a proof).
Lemma 2.2.22 (Near orthogonality). For every decreasing function ε : N→ R+ and param-
eter d > 1, there is a decreasing function γ : N→ R+ such that the following holds. Suppose
that F = Pi,j16i6d,16j6Miis a γ-uniform factor of degree d. Let x ∈ Fn be a fixed vector.
For every integer k and set of not all zero coefficients Λ = λi,j,ω16i6d,16j6Mi,ω∈0,1k
define
PΛ(y1..., yk)def=
∑i∈[d],j∈[Mi]
∑ω∈0,1k
λi,j,ω · Pi,j(x+ ω · y),
where y = (y1, ..., yk) ∈ (Fn)k. Then either we have that PΛ is a constant, i.e. it does not
depend on y, or
bias(PΛ) < ε(dim(F))
Finally, in the present dissertation we fully characterize the joint distribution of the
matrix AP,L without any extra assumptions on the linear forms.
Theorem 2.2.23 (Near Orthogonality over Linear Forms). Let L1, . . . , Lm be linear forms
on ` variables and let B = (P1, . . . , PC) be an ε-uniform polynomial factor for some ε ∈ (0, 1]
defined only by homogeneous polynomials. For every tuple Λ of integers (λi,j)i∈[C],j∈[m],
define PΛ : (Fn)` → T as
PΛ(X) =∑
i∈[C],j∈[m]
λi,jPi(Lj(X)).
Then one of the following two statements holds:
• PΛ ≡ 0.
• PΛ is non-constant and∣∣∣EX∈(Fn)` [e (PΛ)]
∣∣∣ < ε.
Furthermore PΛ ≡ 0 if and only if for every i ∈ [C], we have (λi,j)j∈[m] ∈ Φdi,ki(L)⊥ where
di, ki are the degree and depth of Pi, respectively.
25
Homogeneous non-classical polynomials are introduced in Chapter 3, where we also prove
several useful facts about them. Theorem 2.2.23 is discussed in Chapter 4 and proved in
Section 4.2.
2.2.5 Regularization of Factors
Due to the generic properties of regular factors, it is often useful to refine a given polynomial
factor to a regular one [78]. We will first formally define what we mean by refining a
polynomial factor.
Definition 2.2.24 (Refinement). A factor B′ is called a refinement of B, and denoted B′
B, if the induced partition by B′ is a combinatorial refinement of the partition induced by B.
In other words, if for every x, y ∈ Fn, B′(x) = B′(y) implies B(x) = B(y).
One needs to be careful about distinguishing between two types of refinements.
Definition 2.2.25 (Semantic and syntactic refinements). B′ is called a syntactic refinement
of B, and denoted B′ syn B, if the sequence of polynomials defining B′ extends that of
B. It is called a semantic refinement, and denoted B′ sem B if the induced partition is a
combinatorial refinement of the partition induced by B. In other words, if for every x, y ∈ Fn,
B′(x) = B′(y) implies B(x) = B(y).
Remark 2.2.26. Clearly, being a syntactic refinement is stronger than being a semantic
refinement. But observe that if B′ is a semantic refinement of B, then there exists a syntactic
refinement B′′ of B that induces the same partition of Fn, and for which |B′′| 6 |B′| + |B|,
because we can define B′′ by just adding the defining polynomials of B to those of B′.
The following, by now well-known, lemma states that given a classical polynomial factor
one can refine it to a regular one.
26
Lemma 2.2.27 (Regularity Lemma for Polynomials). Let d > 1, F : N → N be a growth
function, and let F be a classical polynomial factor of degree d. Then there exists a semantic
F -regular refinement F ′ sem F of classical polynomials of same degree d, satisfying
dim(F ′) = OF,d,dim(F)(1).
Proof. We shall induct on the dimension vectors (M1, ...,Md). Notice that the dimension
vectors of a factor of degree d takes values in Nd, and we will use the reverse lexicographical
ordering on this space for our induction. The proof for d = 1 is obvious, because non-
zero linear functions have infinite rank0, unless there is a linear dependency between the
polynomials in F , which we can simply discard by removing the polynomials that can be
written as a linear combination of the others.
If F is already F -regular, then we are done. Otherwise, there is a set of coefficients
ci,j16i6d,16j6Mi, not all zero, such that
rankk−1
d∑i=1
Mi∑j=1
ci,jPi,j
< F (dim(F)).
Without loss of generality assume that ck,Mk6= 0. The above inequality means that Pk,Mk
can be written as a function of the rest of the polynomials in the factor and a set of at
most F (dim(F)) polynomials of degree at most k−1. We will replace Pk,Mkwith these new
polynomials. The new factor will have the following dimension vector
(M1, . . . ,Mk−1 + F (dim(F)),Mk − 1,Mk+1, . . . ,Md).
Now by the induction hypothesis the above can be regularized.
The same proof idea can be used to extend this lemma to non-classical polynomial factors.
27
Lemma 2.2.28 (Lemma 9.6 of [78]). Let r : Z>0 → Z>0 be a non-decreasing function and
d > 0 be an integer. Then, there are functions C(r,d) : Z>0 → Z>0 and I(d) : Z>0 →
Z>0 such that the following is true. Suppose B is an extended polynomial factor defined by
polynomials P1, . . . , PC : Fn → T of degree 6 d. Then, there is a subspace V 6 Fn and
an r-regular extended factor B consisting of polynomials Q1, . . . , QC : V → T such that
2 6 deg(Qi) 6 d for each i, B semantically refines the factor defined by P1|V , . . . , PC |V ,
C 6 C(r,d)(C), and dim(V ) > n− I(d)(C).
The nature of the proof in regularity lemmas of the above type results in Ackermann-like
functions even for “reasonable” growth functions.
The following lemma is the workhorse that allows us to construct regular refinements.
Lemma 2.2.29 (Polynomial Regularity Lemma). Let r : Z>0 → Z>0 be a non-decreasing
function and d > 0 be an integer. Then, there is a function C(r,d)2.2.29 : Z>0 → Z>0 such that
the following is true. Suppose B is a factor defined by polynomials P1, . . . , PC : Fn → T of
degree at most d. Then, there is an r-regular factor B′ consisting of polynomials Q1, . . . , QC ′ :
Fn → T of degree 6 d such that B′ sem B and C ′ 6 C(r,d)2.2.29(C).
Moreover, if B is itself a refinement of some B that has rank > (r(C ′) +C ′) and consists
of polynomials, then additionally B′ will be a syntactic refinement of B.
Proof. We can prove this lemma starting from Lemma 9.6 of [78]. To explain, let us define
the notion of an extended factor. We say a polynomial factor B is extended if for any
polynomial Q ∈ B that is not classical, pQ ∈ B also. Note that an extended factor defined
by polynomials P1, . . . , PC is of high rank if for all tuples (λ1, . . . , λC) ∈ [0, p− 1]C , unless
all the λi’s are zero,∑i λiPi is of high (maxi deg(λiPi))-rank.
Let B1 be the extended factor defined bypkPi | 0 6 k 6 depth(Pi), i ∈ [C]
. Apply
Lemma 2.2.28 to B1 in order to obtain a bounded index subspace V1 and an extended
R1-regular factor B1 defined by polynomials Q1, . . . , QC : V1 → T, where R1 is a growth
function (growing even faster than r) we specify later on in the proof and C 6 C(R1,d)(|B1|).28
For a ∈ Fn/V1 and P ∈ B1, define P a : V1 → Fn to be P a(x) = DaP (x) = P (a+ x)−P (x).
Each P a is of degree 6 d−1. Also, since V1 is the intersection of I 6 I(d)(C) hyperplanes, we
can decide which coset in Fn/V1 an element x ∈ Fn belongs to as a function of I 6 I(d)(C)
(classical) linear functions π1, . . . , πI . Let B2 be the extended factor obtained by adding to
B1 all the polynomials P a | P ∈ B1, a ∈ Fn/V1 and π1, . . . , πI . Consider x ∈ Fn and let
x = a + y where y ∈ V1, and a ∈ Fn/V1. Since P (x) = P (y)− P−a(x), each polynomial in
B1 is a function of the polynomials in B2 over all of Fn, and so B2 is a semantic refinement
of B1 (and a syntactic refinement of B1). Note that |B2| 6 C + dCI(d)(C) + I(d)(C) <
C + 2dCI(d)(C).
Now, suppose we repeat the steps in the previous paragraph with B2 taking the place
of B1 and a different function R2 taking the place of R1. We specify R2 later, but we will
choose it so that it grows faster than r. The new application of Lemma 2.2.28 to B2 produces
an extended factor B2 that is R2-regular and a bounded index subspace V2 such that the
polynomials in B2 restricted to V2 are measurable with respect to B2. We argue that B2
differs from B2 only by polynomials of degree 6 d−1. Suppose B2 is not R2-regular to start
off with. The function R1 is chosen so that B1’s rank, R1(|B1|) > R2(|B2|) + |B2|. This
means that if a linear combination of polynomials in B2,∑S∈B2
λSS, has rank 6 R2(|B2|)
and d′ = maxS:λS 6=0 deg(S), then there must be an S 6∈ B1 with λS 6= 0 and degree d′,
since otherwise the rank condition of B1 would be violated. Since all the polynomials in B2
which are not in B1 have degree 6 d− 1, we conclude that d′ 6 d− 1. Inspecting the proof
of Lemma 2.2.28 in [78] shows that this means B2 consists of the polynomials of B1 along
with other polynomials of degree 6 d − 1. In the same way as in the previous paragraph,
we obtain an extended factor B3 syn B2, so that B3 is a semantic refinement of B2 over
all of Fn. Note that since all the polynomials of B1 are already in B2, we only need to addP a | P ∈ B2 \ B1, a ∈ Fn/V2
, together with some linear functions. All these polynomials
have degree at most 6 d− 2.
29
We keep repeating this process to obtain a sequence of extended factors B1,B2,B3, . . .
and B1, B2, B3, . . . . Each Bi+1 semantically refines Bi and syntactically refines Bi. The
process stops at step i if Bi becomes Ri-regular, where the sequence of growth functions
Ri satisfies Ri(m) > Ri+1(m + 2dmI(d)(m)) + m + 2dmI(d)(m) and Rd(m) = r(m). The
functions Ri are chosen so that Ri(|Bi|) > Ri+1(|Bi+1|)+ |Bi+1|, and therefore, by the above
argument, Bi+1 differs from Bi by polynomials of degree 6 d − i. So, we must stop after
obtaining Bd in the sequence. Also, since each Ri grows faster than r, note that Ri-regularity
for any i ∈ [d] implies r-regularity. So, it must be that some Bi for i 6 d already becomes
r-regular.
Given an extended factor B′′ of rank > r, we can get a (standard) factor B′ of rank > r
by letting B′ be defined by the smallest subset of polynomials S such thatpiP | P ∈ S, i ∈ Z>0
⊇ B′′. The last statement of the lemma follows from the same
considerations as used above to argue that Bi syntactically refines Bi.
2.3 Decomposition Theorems
“Decomposition theorems” [36, 75, 43] are important applications of inverse theorems. These
theorems allow one to express a given function f with certain properties as a sum f1 +∑ki=1 gi, where f1 is “structured” and each gi has certain desired quasirandomness proper-
ties. The rough idea is that the structure of f1 is strong enough for us to be able to analyze
it reasonably explicitly, while the quasirandomness of gis is strong enough for many proper-
ties of f1 to be unaffected if we “perturb” it to f = f1 +∑ki=1 gi. We refer the interested
reader to [36] and [40] for a detailed discussion. The following decomposition theorem is a
consequence of an inverse theorem for Gowers norms ([78, Theorem 1.11]).
Theorem 2.3.1 (Strong Decomposition Theorem for Multiple Functions). Let m, d > 1 be
integers, δ > 0 a parameter, and let r : N → N be an arbitrary growth function. Given any
30
functions f1, . . . , fm : Fn → D, there exists a decomposition
fi = gi + hi,
such that for every 1 6 i 6 m,
1. gi = E[fi|B], where B is an r-regular polynomial factor of degree at most d and com-
plexity C 6 Cmax(p,m, d, δ, r(·)),
2. ‖hi‖Ud+1 6 δ.
Remark 2.3.2. In the case of high-characteristic, i.e. d < |F|, B will consist only of
classical polynomials, and it is easily seen that we can further assume that the polynomials
in factor B are homogeneous, which means that all their multinomials are of same degree.
However, the case of non-classical polynomials is tricky and unclear, as for example it is not
a priori clear what a good candidate definition for homogeneous non-classical polynomial is
and whether such homogeneous non-classical polynomials span all polynomials. Chapter 3
addresses this issue, and proves that homogeneous non-classical polynomials span the space
of all non-classical polynomials.
2.3.1 Strong Decomposition Theorems
Often, in order to obtain stronger statements about the structure and the qasirandomness in
a decomposition theorem, one allows also a small L2-error: that is, one writes f as f1+f2+f3
with f1 structured, f2 quasirandom, and f3 small in L2.
The strong decomposition theorem below shows that any Boolean function can be de-
composed into the sum of a conditional expectation over a high rank factor, a function with
small Gowers norm, and a function with small L2-norm.
Theorem 2.3.3 (Strong Decomposition Theorem; Theorem 4.4 of [16]). Suppose δ > 0
and d > 1 are integers. Let η : N → R+ be an arbitrary non-increasing function and
31
r : N → N be an arbitrary non-decreasing function. Then there exist N = N2.3.3(δ, η, r, d)
and C = C2.3.3(δ, η, r, d) such that the following holds.
Given f : Fn → 0, 1 where n > N , there exist three functions f1, f2, f3 : Fn → R and
a polynomial factor B of degree at most d and complexity at most C such that the following
conditions hold:
(i) f = f1 + f2 + f3.
(ii) f1 = E[f |B].
(iii) ‖f2‖Ud+1 6 1/η(|B|).
(iv) ‖f3‖2 6 δ.
(v) f1 and f1 + f3 have range [0, 1]; f2 and f3 have range [−1, 1].
(vi) B is r-regular.
It turns out though that this strong decomposition theorem is not quite sufficient for
some applications. For example, in the study of algebraic property testing we require an
even stronger decomposition theorem. The issue is that the bound on f3 above is a constant
δ. Ideally, we would want δ to decrease as a function of the complexity of the polynomial
factor, but such a decomposition theorem is simply not possible. However, analogous to what
it is shown in [2] in the context graphs, here one can find two polynomial factors B′ syn B
such that, the structured part f1 equals to E[f |B′], but now the L2-norm of f3 can be made
arbitrarily small in terms of the complexity of the coarser factor B. Furthermore for most
atoms c of B, the function f : Fn → 0, 1 have roughly the same density on c and most of
its subatoms in B′. To make this precise, we make the following definition.
Definition 2.3.4 (Polynomial factor represents another factor). Given a function f : Fn →
0, 1, a polynomial factor B′ that refines another factor B and a real ζ ∈ (0, 1), we say
32
B′ ζ-represents B with respect to f if for at most ζ fraction of atoms c of B, more than ζ
fraction of the atoms c′ lying inside c satisfy |E[f |c]− E[f |c′]| > ζ.
We can now state the following “Super Decomposition Theorem” from [16].
Theorem 2.3.5 (Super Decomposition Theorem; Theorem 4.9 of [16]). Suppose ζ > 0 is
a real and d, C0 > 1 are integers. Let η : N → R+ and δ : N → R+ be arbitrary non-
increasing functions, and r : N → N be an arbitrary non-decreasing function. Then there
exist N = N2.3.5(δ, η, r, d, ζ) and C = C2.3.5(δ, η, r, d, ζ) such that the following holds.
Given f : Fn → 0, 1 where n > N , there exist functions f1, f2, f3 : Fn → R, and
polynomial factors B′ syn B of degree at most d and of complexity at most C, such that the
following conditions hold:
(i) f = f1 + f2 + f3.
(ii) f1 = E[f |B′].
(iii) ‖f2‖Ud+1 6 η(|B′|).
(iv) ‖f3‖2 6 δ(|B|).
(v) f1 and f1 + f3 have range [0, 1]; f2 and f3 have range [−1, 1].
(vi) B and B′ are both r-regular.
(vii) B′ ζ-represents B with respect to f .
Although the above Super Decomposition Theorem may be useful by itself for other
applications, we will need a particular variant. The factor B′ is a syntactic refinement of B,
and thus is defined by adding new polynomials Q1, . . . , Q|B′|−|B| to the polynomials defining
B. Then for each atom c in the coarser factor B we will select one atom c′ of B′ such that
the following hold:
33
• There is a fixed s ∈ T|B′|−|B| such that for every atom c in B its corresponding atom
c′ is obtained by requiring (Q1, . . . , Q|B′|−|B|) to be equal to s.
• The L2-norm of f3 conditioned inside every such atom (i.e., Ex∈c′ [|f3(x)|2]) is small.
• Most subatoms c′ will “well-represent” (in the sense of Definition 2.3.4) their corre-
sponding atoms c from B.
Before stating this formally, let us also take this opportunity to remark that it is possible
to adapt the proofs of the above decomposition theorems to decompose several functions
f (1), . . . , f (R) : Fn → 0, 1 simultaneously. Alternatively, this could be thought of as
decomposing a single vector-valued function f : Fn → 0, 1R. Now we finally state the
decomposition theorem that we will use in the proof of our main result.
Theorem 2.3.6 (Subatom Selection; Theorem 4.12 of [16]). Suppose ζ > 0 is a real and
d,R > 1 are integers. Let η, δ : N → R+ be arbitrary non-increasing functions, and let
r : N→ N be an arbitrary non-decreasing function. Then, there exist C = C2.3.6(δ, η, r, ζ, R)
such that the following holds.
Given f (1), . . . , f (R) : Fn → 0, 1, there exist functions f(i)1 , f
(i)2 , f
(i)3 : Fn → R for all
i ∈ [R], a polynomial factor B of degree d with atoms denoted by elements of T|B|, a syntactic
refinement B′ syn B of degree d with complexity at most C and atoms denoted by elements
of T|B| × T|B′|−|B|, and an element s ∈ T|B′|−|B| such that the following is true:
(i) f (i) = f(i)1 + f
(i)2 + f
(i)3 for every i ∈ [R].
(ii) f(i)1 = E[f (i)|B′] for every i ∈ [R].
(iii) ‖f (i)2 ‖Ud+1 < η(|B′|) for every i ∈ [R].
(iv) For every i ∈ [R], f(i)1 and f
(i)1 + f
(i)3 have range [0, 1], and f
(i)2 and f
(i)3 have range
[−1, 1].
34
(v) B and B′ are both r-regular.
(vi) For every atom c ∈ T|B| of B, the subatom c′ = (c, s) ∈ T|B′| satisfy
E[|f (i)
3 |2∣∣∣ (c, s)
]< δ(|B|)2
for every i ∈ [R].
(vii) If c is an atom of B chosen uniformly at random, then
Prc
[maxi∈[R]
(∣∣∣E [f (i)∣∣∣ c]− E
[f (i)∣∣∣ (c, s)]∣∣∣) > ζ
]< ζ.
2.3.2 Higher-order Fourier Expansion
In the previous sections we saw that decomposition theorems can be used to express a
given function f that satisfies certain reasonable properties as f = f1 +∑ki=1 gi, where
f1 = Γ(P1, ..., PC) for a set of (non-classical) polynomials, for some function Γ : TC → D,
and gi’s are small in norms that allow us to regard them as negligible error terms. Let
ki denote the depth of the polynomial Pi so that by Lemma 2.1.3, each Pi takes values in
Uki+1 = 1pki+1Z/Z. Moreover, let Σ := Zpk1+1 × · · · × Z
pkC+1 . Using the Fourier expansion
of Γi we have
f1(x) =∑
Λ=(λ1,...,λC)∈Σ
Γi(Λ) · e
C∑j=1
λjPj(x)
, (2.3)
where Γi(Λ) is the Fourier coefficient of Γi corresponding to Λ. Equation (2.3) implies an
approximate higher-order Fourier expansion of f , a generalization of the classical expansion.
f(x) '∑
Λ=(λ1,...,λC)∈Σ
Γi(Λ) · e
C∑j=1
λjPj(x)
. (2.4)
35
2.4 Systems of Linear forms
Definition 2.4.1. A linear form on k variables is a vector L = (`1, . . . , `k) ∈ Fk and it
maps X = (x1, . . . , xk) ∈ (Fn)k to L(X) =∑ki=1 `ixi ∈ Fn.
For a linear form L = (`1, . . . , `k) ∈ Fk we define |L| def=∑ki=1 |`i|.
Gowers norms are known to control the density of linear patterns in subsets of an Abelian
group. Next, we present a notion of complexity associated with a collection of linear forms
that determines an upper-bound on the order of the required Gowers norm to do so.
2.4.1 Cauchy-Schwarz Complexity
Let A be a subset of Fn with the indicator function 1A : Fn → 0, 1. Then the analytical
average
EX∈(Fn)k
[1A(L1(X)) · · ·1A(Lm(X))] , (2.5)
represents the probability that L1(X), . . . , Lm(X) all fall in A, where X ∈ (Fn)k is chosen
uniformly at random. Then roughly speaking, we will say that A ⊆ Fn is pseudorandom
with regards to L if
EX∈(Fn)k
[m∏i=1
1A(Li(X))
]≈(|A|pn
)m;
That is, the probability that all L1(X), . . . , Lm(X) fall in A is close to what we would expect
if A was a random subset of Fn of cardinality |A|. Let α := |A|/pn be the density of A, and
define f := 1A − α. We have
EX
[m∏i=1
1A(Li(X))
]
= EX
[m∏i=1
(α + f(Li(X)))
]= αm +
∑S⊆[m],S 6=∅
αm−|S|EX
∏i∈S
f(Li(X))
.
36
Therefore, a sufficient condition for A to be pseudorandom with regards to L is that
EX[∏
i∈S f(Li(X))]
is negligible for all nonempty subsets S ⊆ [m]. Green and Tao [46]
showed that a sufficient condition for this to occur is that ‖f‖Us+1 is small enough, where s
is the Cauchy-Schwarz complexity of the system of linear forms.
Definition 2.4.2 (Cauchy-Schwarz complexity [46]). Let L = L1, . . . , Lm be a system of
linear forms. The Cauchy-Schwarz complexity of L is the minimal s such that the following
holds. For every 1 6 i 6 m, we can partition Ljj∈[m]\i into s+ 1 subsets, such that Li
does not belong to the linear span of any of the subsets.
Note that the Cauchy-Schwarz complexity of any system of m linear forms in which any
two linear forms are linearly independent (i.e. one is not a multiple of the other) is at most
m− 2, since we can always partition Ljj∈[m]\i into the m− 1 singleton subsets.
The reason for the term Cauchy-Schwarz complexity is the following lemma due to Green
and Tao [46] the proof of which is based on a clever iterative application of the Cauchy-
Schwarz inequality.
Lemma 2.4.3 ([46], See also [37, Theorem 2.3]). Let f1, . . . , fm : F → D. Let L =
L1, . . . , Lm be a system of m linear forms in ` variables of Cauchy-Schwarz complexity s.
Then ∣∣∣∣∣ EX∈(Fn)`
[m∏i=1
fi(Li(X))
]∣∣∣∣∣ 6 min16i6m
‖fi‖Us+1 .
2.4.2 The True Complexity
The Cauchy-Schwarz complexity of L gives an upper bound on s, such that if ‖f‖Us+1 is
small enough for some function f : Fn → D, then f is pseudorandom with regards to L.
Gowers and Wolf [37] defined the true complexity of a system of linear forms as the minimal
s for which this condition holds for all f : Fn → D.
37
Definition 2.4.4 (True complexity [37]). Let L = L1, . . . , Lm be a system of linear forms
over F. The true complexity of L is the smallest d ∈ N with the following property. For
every ε > 0, there exists δ > 0 such that if f : Fn → D is any function with ‖f‖Ud+1 6 δ,
then ∣∣∣∣∣ EX∈(Fn)k
[m∏i=1
f(Li(X))
]∣∣∣∣∣ 6 ε.
An obvious bound on the true complexity is the Cauchy-Schwarz complexity of the sys-
tem.
38
CHAPTER 3
HOMOGENEOUS NON-CLASSICAL POLYNOMIALS
Recall that a classical polynomial is called homogeneous if all of its monomials are of the
same degree. Trivially a homogeneous classical polynomial P (x) satisfies P (cx) = |c|dP (x)
for every c ∈ F. We will use this property to define the class of nonclassical homogeneous
polynomials.
Definition 3.0.5 (Homogeneity). A (nonclassical) polynomial P : Fn → T is called homo-
geneous if for every c ∈ F there exists a σc ∈ Z such that P (cx) = σcP (x) mod 1 for all
x ∈ Fn.
Remark 3.0.6. It is not difficult to see that P (cx) = σcP (x) mod 1 implies that σc =
|c|deg(P ) mod p, a property that we will use later. Indeed for d = deg(P ), we have 0 =
∂d(P (cx) − σcP (x)) = (|c|d − σc)∂dP (x) mod 1. This, since ∂dP (x) is a nonzero degree-d
classical polynomial, implies σc = |c|d mod p.
Notice that for a polynomial P to be homogeneous it suffices that there exists σ ∈ Z
for which P (ζx) = σP (x) mod 1, where ζ is a generator of F∗. Throughout this chapter,
we fix ζ ∈ F∗ a generator of F∗. If P has depth k, then we can assume that σ ∈ Zpk+1 , as
pk+1P ≡ 0. The following lemma shows that σ is uniquely determined for all homogeneous
polynomials of degree d and depth k. Henceforth, we will denote this unique value by σ(d, k).
Lemma 3.0.7. For every d and k, there is a unique σ = σ(d, k) ∈ Zpk+1, such that for
every homogeneous polynomial P of degree d and depth k, P (ζx) = |σ|P (x) mod 1, where
| · | is the natural map from Zpk+1 to 0, 1, . . . , pk+1 − 1 ⊂ Z.
Proof. Let P be a homogeneous polynomial of degree d and depth k, and let σ ∈ Zpk+1
be such that P (ζx) = |σ|P (x) mod 1. By Remark 3.0.6 we know that |σ| = |ζ|d mod p.
We also observe that P (x) = P (ζp−1x) = |σ|p−1P (x) mod 1 from which it follows that
σp−1 = 1. We claim that σ ∈ Zpk+1 is uniquely determined by the two properties
39
i. |σ| = |c|d mod p, and
ii. σp−1 = 1.
Suppose to the contrary that there are two nonzero values σ1, σ2 ∈ Zpk+1 that satisfy the
above two properties, and choose t ∈ Zpk+1 such that σ1 = tσ2. It follows from (i) that t = 1
mod p and from (ii) that tp−1 = 1. We will show that t = 1 is the only possible such value
in Zpk+1 .
Let a1, . . . , apk ∈ Zpk+1 be all the possible solutions to x = 1 mod p in Zpk+1 . Note that
ta1, . . . , tapk is just a permutation of the first sequence and thus
tpk∏
ai =∏
ai.
Consequently tpk
= 1, which combined with tp = t implies t = 1.
Lemma 2.1.3 allows us to express every (nonclassical) polynomial as a linear span of
monomials of the form|x1|d1 ···|xn|dn
pk+1 . Unfortunately, unlike in the classical case, these mono-
mials are not necessarily homogeneous, and for some applications it is important to express
a polynomial as a linear span of homogeneous polynomials. We show that this is possible as
homogeneous multivariate (nonclassical) polynomials linearly span the space of multivariate
(nonclassical) polynomials. We will present the proof of this theorem in Section 3.3.
Theorem 3.0.8. There is a basis for Poly(Fn → T) consisting only of homogeneous multi-
variate polynomials.
Theorem 3.0.8 allows us to make the extra assumption in the strong decomposition
theorem (Theorem 2.3.1) that the resulting polynomial factor B consists only of homogeneous
polynomials.
Corollary 3.0.9 (Homogeneous Strong Decomposition). Let d > 1 be an integer, δ > 0
a parameter, and let r : N → N be an arbitrary growth function. Given any functions
40
f1, . . . , fm : Fn → D, there exists a decomposition
fi = gi + hi,
such that or every 1 6 i 6 m,
1. gi = E[fi|B], where B is an r-regular polynomial factor of degree at most d and com-
plexity C 6 Cmax(p, d, δ, r(·)), moreover B only consists of homogeneous polynomials.
2. ‖hi‖Ud+1 6 δ.
To simplify the notation, through the rest of this section we will omit writing “mod 1”
in the description of the defined nonclassical polynomials.
3.1 Properties of the Derivative Polynomial
We start by proving the following simple observation.
Claim 3.1.1. Let P : F → T be a univariate polynomial of degree d. Then for every
c ∈ F\0,
deg(P (cx)− |c|dP (x)
)< d.
Proof. By Lemma 2.1.3 it suffices to prove the claim for a monomial q(x) :=|x|spk+1 with
k(p− 1) + s = d. Note that q(cx)− |c|dq(x) takes values in 1pkZ/Z as |c|s − |c|d is divisible
by p. Hence
deg(q(cx)− |c|dq(x)
)6 (p− 1)(k − 1) < d. (3.1)
It is not difficult to show that the above claim holds for any multivariate polynomial
P : Fn → T. We provide a proof for the multivariate case since it is a useful observation,
although the univariate case suffices for our purposes in this section.
41
Claim 3.1.2. Let P : Fn → T be a univariate polynomial of degree d. Then for every
c ∈ F\0,
deg(P (cx)− |c|dP (x)
)< d.
Proof. Notice that the claim is trivial for classical polynomials, since in this case, if R denotes
the homogeneous degree-d part of P , then R(cx) − |c|dR(x) = 0. We prove the statement
for nonclassical polynomials. Let Q(x) := P (cx), and note deg(Q) = d. We will inspect
the derivative polynomial of Q. Recall from Definition 2.1.5 and (Lemma 2.1.6 that the
derivative polynomial of Q,
∂Q(y1, . . . , yd) = Dy1 · · ·DydQ(0),
is a degree-d classical homogeneous multi-linear polynomial which is invariant under permu-
tations of (y1, . . . , yd). In particular
|c|−d∂Q(y1, . . . , yd) = ∂Q(c−1y1, . . . , c−1yd) = Dc−1y1
Dc−1y2· · ·Dc−1yd
Q(x)
=∑S⊆[d]
(−1)|S|Q
c−1∑i∈S
yi
=∑S⊆[d]
(−1)|S|P
∑i∈S
yi
= ∂P (y1, . . . , yd),
This implies that ∂d(Q− |c|dP ) ≡ 0 and thus deg(Q− |c|dP ) < d.
3.2 A Homogeneous Basis, the Univariate Case
First we prove Theorem 3.0.8 for univariate polynomials.
Lemma 3.2.1. There is a basis of homogeneous univariate polynomials for Poly(F→ T).
Proof. We will prove by induction on d that there is a basis h1, . . . , hd of homogeneous
univariate polynomials for Poly6d(F → T) for every d. Let ζ be a fixed generator of F∗.42
For any degree d > 0, we will build a degree-d homogeneous polynomial hd(x) such that
hd(ζx) = σdhd(x) for some integer σd. The base case of d 6 p−1 is trivial as Poly6p−1(F→
T) consists of only classical polynomials, and those are spanned by h0(x) := 1p , h1(x) :=
|x|p , . . . , hp−1(x) :=
|x|p−1
p . Now suppose that d = s + (p − 1)(k − 1) with 0 < s 6 p − 1,
and k > 1. It suffices to show that the degree-d monomial|x|spk
can be expressed as a linear
combination of homogeneous polynomials. Consider the function
f(x) :=|ζx|s
pk− |ζ|
s|x|s
pk.
Claim 3.1.1 implies that deg(f) < d. Using the induction hypothesis, we can express f(x)
as a linear combination of|x|sp`
for ` = 0, . . . , k−1, and he for e < d with e 6= s mod (p−1):
f(x) =k−1∑`=1
a`|x|s
p`+
∑e<d,
e 6=d mod (p−1)
behe(x).
Set A := |ζ|s +∑k−1`=1 a`p
k−`, so that
|ζx|s
pk− A |x|
s
pk=
∑e<d,
e 6=d mod (p−1)
behe(x). (3.2)
By the induction hypothesis, for e < d, he(ζx) = σeh(x) where σe = |ζ|e mod p, and thus
as A = |ζ|s mod p, we have σe 6= A mod p when e 6= s mod (p− 1). Consequently,
∑e<d,
e 6=d mod (p−1)
behe(x) =∑e<d,
e 6=d mod (p−1)
beσe − A
(σe − A)he(x)
=∑e<d,
e 6=d mod (p−1)
beσe − A
(he(ζx)− Ahe(x)).
43
Combing this with (3.2) we conclude that
hd(x) :=|x|s
pk−
∑e<d,
e 6=d mod (p−1)
beσe − A
he(x),
satisfies
hd(ζx) = Ahd(x).
3.3 A Homogeneous Basis, the Multivariate Case
We are now ready to prove the main result of this section.
Theorem 3.0.8 (restated). There is a basis for Poly(Fn → T) consisting only of homoge-
neous multivariate polynomials.
Proof. We will show by induction on the degree d, that every degree d monomial can be
written as a linear combination of homogeneous polynomials. The base case of d < p
is trivial as such monomials are classical and thus homogeneous themselves. Consider a
(nonclassical) monomial M(x1, . . . , xn) =|x1|s1 ···|xn|sn
pkof degree d = s1 + · · · + sn + (p −
1)(k − 1). For every i ∈ [n] let gi(xi) := hsi+(p−1)(k−1)(xi) where hsi+(p−1)(k−1)(·) is the
homogeneous univariate polynomial from Lemma 3.2.1. Every gi takes values in 1pkZ/Z, and
thus corresponds to a polynomial Gi : F→ Zpk . Define F : Fn → Zpk as
F (x1, . . . , xn) := G1(x1) · · ·Gn(xn),
and f : Fn → T as
f(x1, . . . , xn) :=F (x1, . . . , xn)
pk.
It is simple to verify that deg(f) = s1 + . . . + sn + (p − 1)(k − 1) = d, it has only one
44
monomial of deg(f), which is|x1|s1 ···|xn|sn
pk= M(x1, . . . , xn), and it is homogeneous. Thus
M(x1, . . . , xn)− f(x1, . . . , xn) is of degree less than d and by the induction hypothesis can
be written as a linear combination of homogeneous polynomials.
45
CHAPTER 4
DISTRIBUTION OF POLYNOMIAL FACTORS OVER LINEAR
FORMS
A main contribution of this dissertation is a near-orthogonality result for (non-classical)
polynomial factors of high rank 1. Such a statement was proved in [43, 78] for systems of
linear forms corresponding to repeated derivatives or equivalently Gowers norms, and in [51]
for the case when the field is of high characteristic but with arbitrary system of linear forms
and in [15] for (non-classical) polynomials but only in the case of systems of affine linear
forms. In Theorem 2.2.23 we establish the near-orthogonality over any arbitrary system of
linear forms. Before stating this theorem we need to introduce the notion of consistency.
Definition 4.0.1 (Consistency). Let L = L1, . . . , Lm be a system of linear forms. A
vector (β1, . . . , βm) ∈ Tm is said to be (d, k)-consistent with L if there exists a homogeneous
polynomial P of degree d and depth k and a point X such that P (Li(X)) = βi for every
i ∈ [m]. Let Φd,k(L) denote the set of all such vectors.
It is immediate from the definition that Φd,k(L) ⊆ Umk+1 is a subgroup of Tm, or more
specifically, a subgroup of Umk+1. Let
Φd,k(L)⊥ :=
(λ1, . . . , λm) ∈ Zm : ∀(β1, . . . , βm) ∈ Φd,k(L),∑
λiβi = 0.
Equivalently Φd,k(L)⊥ is the set of all (λ1, . . . , λm) ∈ Zm such that∑mi=1 λiP (Li(X)) ≡ 0
for every homogeneous polynomial P of degree d and depth k.
Theorem 2.2.23 (restated). Let L1, . . . , Lm be linear forms on ` variables and let
B = (P1, . . . , PC) be an ε-uniform polynomial factor for some ε ∈ (0, 1] defined only by homo-
geneous polynomials. For every tuple Λ of integers (λi,j)i∈[C],j∈[m], define PΛ : (Fn)` → T
1. The work in this chapter is based on the work [50]
46
as
PΛ(X) =∑
i∈[C],j∈[m]
λi,jPi(Lj(X)).
Then one of the following two statements holds:
• PΛ ≡ 0.
• PΛ is non-constant and∣∣∣EX∈(Fn)` [e (PΛ)]
∣∣∣ < ε.
Furthermore PΛ ≡ 0 if and only if for every i ∈ [C], we have (λi,j)j∈[m] ∈ Φdi,ki(L)⊥
where di, ki are the degree and depth of Pi, respectively. We will present the proof of
Theorem 2.2.23 in Section 4.2.
Remark 4.0.2. By Corollary 2.2.14 the assumption of ε-uniformity in Theorem 2.2.23 is
satisfied for every factor B of rank at least r2.2.13(d, ε). However, we would like to point out
that in Theorem 2.2.23 by using the assumption of ε-uniformity instead of the assumption
of high rank, we are able to achieve the quantitative bound of ε on the bias of PΛ.
Remark 4.0.3. In Theorem 2.2.23 in the second case where PΛ is non-constant, it is possible
to deduce a more general statement that ‖e (PΛ)‖2tU t
< ε for every t 6 deg(PΛ). Indeed
assume that PΛ is non-constant, and consider the derivative
DY1. . . DYtPΛ(X) =
∑i∈[C],j∈[m]
λi,j∑S⊆[t]
(−1)|S|Pi(Lj(X +∑r∈S
Yr)), (4.1)
where Yi := (yi,1, . . . , yi,`) ∈ (Fn)`. Notice that for every choice of j ∈ [m] and S ⊆ [t],
Lj(X +∑r∈S Yr) is an application of a linear form on the vector
(x1, . . . , x`, y1,1, . . . , y1,`, . . . , yt,1, . . . , yt,`
)∈ (Fn)(t+1)`,
47
and since by Lemma 2.1.6 the polynomial ∂PΛ is nonzero, Theorem 2.2.23 implies
‖e (PΛ)‖2t
U t = Eh1,...,ht
[e (∂tPΛ(h1, . . . , ht))] 6 ε.
4.1 Near-equidistribution over Linear Forms
It is well-known that statements similar to that of Theorem 2.2.23 imply “near-
equidistributions” of the joint distribution of the polynomials applied to linear forms. Con-
sider a highly uniform polynomial factor of degree d > 0, defined by a tuple of homogeneous
polynomials P1, . . . , PC : Fn → T with respective degrees d1, . . . , dC and depths k1, . . . , kC ,
and let L = (L1, . . . , Lm) be a collection of linear forms on ` variables. As we mentioned
earlier, we are interested in the distribution of the random matrix
P1(L1(X)) P2(L1(X)) . . . PC(L1(X))
P1(L2(X)) P2(L2(X)) . . . PC(L2(X))
......
P1(Lm(X)) P2(Lm(X)) . . . PC(Lm(X))
, (4.2)
where X is the uniform random variable taking values in (Fn)`. Note that by the definition
of consistency, for every 1 6 i 6 C, the i-th column of this matrix must belong to Φdi,ki(L).
Theorem 4.1.1 below says that Equation (4.2) is “almost” uniformly distributed over the set
of all matrices satisfying this condition. The proof of Theorem 4.1.1 is standard and is iden-
tical to the proof of [15, Theorem 3.10] with the only difference that it uses Theorem 2.2.23
instead of the weaker near-orthogonality theorem of [15].
Theorem 4.1.1 (Near-equidistribution over Linear Forms). Given ε > 0, let B be an ε-
uniform polynomial factor of degree d > 0 and complexity C, that is defined by a tuple of
homogeneous polynomials P1, . . . , PC : Fn → T having respective degrees d1, . . . , dC and
48
depths k1, . . . , kC . Let L = (L1, . . . , Lm) be a collection of linear forms on ` variables.
Suppose (βi,j)i∈[C],j∈[m] ∈ TC×m is such that (βi,1, . . . , βi,m) ∈ Φdi,ki(L) for every
i ∈ [C]. Then
PrX∈(Fn)`
[Pi(Lj(X)) = βi,j ∀i ∈ [C], j ∈ [m]
]=
1
K± ε,
where K =∏Ci=1 |Φdi,ki(L)|.
Proof. We have
Pr[Pi(Lj(X)) = βi,j ∀i ∈ [C],∀j ∈ [m]]
= E
∏i,j
1
pki+1
pki+1−1∑λi,j=0
e(λi,j(Pi(Lj(X))− βi,j
))=
∏i∈[C]
p−(ki+1)
m ∑(λi,j)
e
−∑i,j
λi,jβi,j
E
e ∑i∈[C],j∈[m]
λi,jPi(Lj(X))
,where the outer sum is over (λi,j)i∈[C],j∈[m] with λi,j ∈ [0, pki+1−1]. Let Λi = Φdi,ki(L)⊥∩
[0, pk+1 − 1]m, and note that |Λi||Φdi,ki | = pm(ki+1). Since (βi,1, . . . , βi,m) ∈ Φdi,ki(L) for
every i ∈ [C], it follows that∑i,j λi,jβi,j = 0 if (λi,1, . . . , λi,m) ∈ Λi for all i ∈ [C]. If
the latter holds, then the expected value in the above expression is 0, and otherwise by
Theorem 2.2.23, it is bounded by ε. Hence the above expression can be approximated by
p−m∑Ci=1(ki+1) ·
C∏i=1
|Λi| ± εpm∑Ci=1(ki+1)
=1
K± ε.
49
4.2 Strong near-orthogonality: Proof of Theorem 2.2.23
We prove Theorem 2.2.23 in this section. Our proof uses similar derivative techniques as
used in [15], but in order to handle the general setting we will need a few technical claims
which we present first. Recall that |L| =∑`i=1 |λi| for a linear form L = (λ1, . . . , λ`).
Claim 4.2.1. Let d > 0 be an integer, and L = (λ1, . . . , λ`) ∈ F` be a linear form on
` variables. There exists linear forms Li = (λi,1, . . . , λi,`) ∈ F` for i = 1, . . . ,m, and
coefficients a1, . . . , am ∈ Z with m 6 |F|` such that
• P (L(X)) =∑mi=1 aiP (Li(X)) for every degree-d polynomial P : Fn → T;
• |Li| 6 d for every i ∈ [m];
• λi,j 6 λj for every i ∈ [m] and j ∈ [`].
Proof. The proof proceeds by simplifying P (L(X)) using identities that are valid for every
polynomial P : Fn → T of degree d.
In the case |L| 6 d there is nothing to prove. Assume otherwise that |L| > d. We will
use the fact that for every choice of y1, . . . , y|L| ∈ Fn,
∑S⊆[|L|]
(−1)|S|P
∑i∈S
yi
≡ 0. (4.3)
Let X = (x1, . . . , x`) ∈ (Fn)`. Setting |λi| of the vectors y1, . . . , y|L| to xi for every i ∈ [`],
Equation (4.3) implies
P (L(X)) =∑i
αiP (Mi(X)),
where Mi = (τi,1, . . . , τi,`), |Mi| 6 |L| − 1 and for every j ∈ [`], |τi,j | 6 |λj |. Repeatedly
applying the same process to every Mi with |Mi| > d we arrive at the desired expansion.
50
The next claim shows that we can further simplify the expression given in Claim 4.2.1.
Let Ld ⊆ F` denote the set of nonzero linear forms L with |L| 6 d and with the first
(left-most) nonzero coefficient equal to 1, e.g. (0, 1, 0, 2) ∈ L3 but (2, 1, 0, 0) 6∈ L3.
Claim 4.2.2. For any linear form L ∈ F` and integer d > 0, there is a collection of
coefficients aM,c ∈ ZM∈Ld,c∈F∗ such that for every degree-d polynomial P : Fn → T,
P (L(X)) =∑
M∈Ld,c∈F∗aM,cP (cM(X)). (4.4)
Proof. Similar to the proof of Claim 4.2.1 we simplify P (L(X)) using identities that are valid
for every polynomial P : Fn → T of degree d.
We use induction on the number of nonzero entries of L. The case when L has only one
nonzero entry is trivial. For the induction step, choose c ∈ F so that the leading nonzero
coefficient of L′ = c · L is equal to 1. Assume that L′ = (λ1, . . . , λ`). If |L′| 6 d we are
done. Assume otherwise that |L′| > d. Applying Claim 4.2.1 for the degree-d polynomial
R(x) := P (c−1x) and the linear form L′ we can write
P (L(X)) = P (c−1L′(X)) =∑i
βiP (c−1Mi(X)), (4.5)
where for every i, Mi = (λi,1, . . . , λi,`) satisfies |Mi| 6 d, and for every j ∈ [`], λi,j 6 λj .
Let I denote the set of indices i such that the leading nonzero entry of Mi is one. Then
P (L(x)) =∑i∈I
αiP (c−1Mi(X)) +∑j /∈I
αjP (c−1Mj(X)). (4.6)
Notice that since the leading coefficient of L′ is 1, for every j /∈ I, Mj has smaller support
than L and thus applying the induction hypothesis to the linear forms c−1Mj with j /∈ I
concludes the claim.
51
Claim 4.2.2 applies to all polynomials of degree d. If we also specify the depth then we
can obtain a stronger statement.
Claim 4.2.3. For every d, k, every system of linear forms L1, . . . , Lm, and constants
λi ∈ Zi∈[m], there exists aM ∈ ZM∈Ld such that the following is true for every (d, k)-
homogeneous polynomial P : Fn → T:
•∑mi=1 λiP (Li(X)) ≡
∑M∈Ld aMP (M(X));
• For every M with aM 6= 0, we have |M | 6 deg(aMP ).
Proof. The proof is similar to that of Claim 4.2.2, except that now we repeatedly apply
Claim 4.2.2 to every term of the form λP (L(X)) to express it as a linear combination of
P (cM(X)) for M ∈ Ldeg(λP ) and c ∈ F∗. Then we use homogeneity to replace P (cM(X))
with σcP (M(X)), where if c = ζi for the fixed generator ζ ∈ F∗ then σc = σ(d, k)i. By
repeating this procedure we arrive at the desired expansion.
We are now ready for the proof of our main theorem. For a linear form L = (λ1, . . . , λ`),
let lc(L) denote the index of its first nonzero entry, namely lc(L)def= mini:λi 6=0 i.
Proof of Theorem 2.2.23: Let d′ be the degree of the factor. For every i ∈ [C], by
Claim 4.2.3 we have
m∑j=1
λi,jPi(Lj(X)) =∑
M∈Ld′λ′i,MPi(M(X)) (4.7)
for some integers λ′i,M such that |M | 6 deg(λ′i,MPi) if λ′i,M 6= 0. The simplifications of
Claim 4.2.3 depend only on the degrees and depths of the polynomials. Hence if λ′i,M = 0
for all M ∈ Ld′ , then (λi,1, . . . , λi,m) ∈ Φ(di, ki)⊥. So to prove the theorem, it suffices to
show that PΛ has small bias if λ′i,MPi 6≡ 0 for some M ∈ Ld′ . Suppose this is true, and thus
52
there exists a nonempty set M⊆ Ld′ such that
PΛ(X) =∑
i∈[C],M∈Mλ′i,MPi(M(X)),
and for every M ∈ M, there is at least one index i ∈ [C] for which λ′i,MPi 6= 0. Choose
i∗ ∈ [C] and M∗ ∈M in the following manner.
• First, let M∗ ∈ M be such that lc(M∗) = minM∈M lc(M), and among these, |M∗| is
maximal.
• Then, let i∗ ∈ [C] be such that deg(λ′i∗,M∗Pi∗) is maximized.
Without loss of generality assume that i∗ = 1, lc(M∗) = 1, and let d := deg(λ′1,M∗P1).
We claim that if∑j∈[m] λ1,jP1(Lj(X)) is not the zero polynomial, then deg(PΛ) > d, and
moreover PΛ has small bias. We prove this by deriving PΛ in specific directions in a manner
that all the terms but λ′1,M∗P1(M∗(X)) vanish.
Given a vector α ∈ F`, an element y ∈ Fn, and a function P : (Fn)` → T, define the
derivative of P according to the pair (α, y) as
Dα,yP (x1, . . . , x`)def= P (x1 + α1y, · · · , x` + α`y)− P (x1, . . . , x`). (4.8)
Note that for every M ∈M,
Dα,y(Pi M)(x1, · · · , x`) = Pi(M(x1, . . . , x`) +M(α)y)− Pi(M(x1, . . . , x`))
= (D〈M,α〉·yPi)(M(x1, . . . , x`)).
Thus if α is chosen such that 〈M,α〉 = 0 then Dα,y(Pi M) ≡ 0.
Assume that M∗ = (w1, . . . , w`), where w1 = 1. Let t := |M∗|,
α1 := e1 = (1, 0, 0, . . . , 0) ∈ F`, and let α2, . . . , αt be the set of all vectors of the form
53
(−w, 0, . . . , 0, 1, 0, . . . , 0) where 1 is in the i-th coordinate for i ∈ [2, `] and 0 6 w 6 wi − 1.
In addition, pick αt+1 = · · · = αd = e1.
Claim 4.2.4.
Dα1,y1 · · ·Dαd,ydPΛ(X) =
(D〈M∗,α1〉y1
· · ·D〈M∗,αd〉yd∑i∈[C]:
deg(λ′i,M∗Pi)=d
λ′i,1Pi
)(M∗(X)).
(4.9)
Proof. Deriving according to (α1, y1), . . . , (αt, yt) gives
Dα1,y1 · · ·Dαt,ytPΛ(X) =
D〈M∗,α1〉y1· · ·D〈M∗,αt〉yt
C∑i=1
λ′i,M∗Pi
(M∗(X)). (4.10)
This is because for every M = (w′1, . . . , w′`) ∈ M \ M
∗, either w′1 = 0 in which case
〈M,α1〉 = 0, or otherwise w1 = 1 and |M | 6 t, thus there must be an index ξ ∈ [`]
such that w′ξ < wξ; By our choice of α2, . . . , αt there is e, 2 6 e 6 t, such that αe =
(−w′ξ, 0, . . . , 0, 1, 0, . . . , 0) where 1 is in the ξ-th coordinate and thus 〈Mj , αe〉 = 0. Now, the
claim follows after additionally deriving according to (αt+1, yt+1), . . . , (αd, yd).
Claim 4.2.4 implies that
Ey1,...,yd,x1,...,x`
[e(Dα1,y1 · · ·Dαd,ydPΛ)(x1, . . . , x`)
)]=
∥∥∥∥∥∥∥∥∥∥e
∑i∈[C]:
deg(λ′i,1Pi)=d
λ′i,M∗Pi
∥∥∥∥∥∥∥∥∥∥
2d
Ud
6 ε2d ,
where the last inequality holds by the ε-uniformity of the polynomial factor. Now the theorem
follows from the next claim from [15] which is a repeated application of the Cauchy-Schwarz
inequality. We include a proof for self-containment.
54
Claim 4.2.5 ([15, Claim 3.4]). For any α1, . . . , αd ∈ F`\0,
Ey1,...,yd,x1,...,x`
[e((Dα1,y1 · · ·Dαd,ydPΛ)(x1, . . . , x`)
)]>
(∣∣∣∣ Ex1,...,x`
e (PΛ(x1, . . . , x`))
∣∣∣∣)2d
.
Proof. It suffices to show that for any function P (x1, . . . , x`) and nonzero α ∈ F`,
∣∣∣∣ Ey,x1,...,x`
[e((Dα,yP )(x1, . . . , x`)
)]
∣∣∣∣ > ∣∣∣∣ Ex1,...,x`
[e (P (x1, . . . , x`))]
∣∣∣∣2 .Recall that (Dα,yP )(x1, . . . , x`) = P (x1 + α1y, . . . , x` + α`y)− P (x1, . . . , x`). Without loss
of generality, suppose α1 6= 0. We make a change of coordinates so that α can be assumed
to be (1, 0, . . . , 0). More precisely, define P ′ : (Fn)` → T as
P ′(x1, . . . , x`) = P
(x1,
x2 + α2x1
α1,x3 + α3x1
α1, . . . ,
x` + α`x1
α1
),
so that P (x1, . . . , x`) = P ′(x1, α1x2 − α2x1, α1x3 − α3x1, . . . , α1x` − α`x1), and thus
(Dα,yP )(x1, . . . , x`)
= P ′(x1 + α1y, α1x2 − α2x1, . . . , α1x` − α`x1)− P ′(x1, α1x2 − α2x1, . . . , α1x` − α`x1).
Therefore
∣∣∣∣ Ey,x1,...,x`
[e((Dα,yP )(x1, . . . , x`)
)]
∣∣∣∣=
∣∣∣∣ Ey,x1,...,x`
[e(P ′(x1 + α1y, x2, . . . , x`)− P ′(x1, x2, . . . , x`)
)]
∣∣∣∣= Ex2,...,x`
∣∣∣∣Ex1[e(P ′(x1, x2, . . . , x`)
)]
∣∣∣∣2 > ∣∣∣∣ Ex1,x2,...,x`
[e(P ′(x1, x2, . . . , x`)
)]
∣∣∣∣2=
∣∣∣∣ Ex1,x2,...,x`
[e (P (x1, x2, . . . , x`))]
∣∣∣∣2 .55
The above proof also implies the following proposition just by omitting the application
of Claim 4.2.3.
Proposition 4.2.6. Let L1, . . . , Lm be linear forms on ` variables and let B = (P1, . . . , PC)
be an ε-uniform polynomial factor of degree d > 0 for some ε ∈ (0, 1] which is defined by
only homogeneous polynomials. For every tuple Λ of integers (λi,j)i∈[C],j∈[m], define
PΛ(X) =∑
i∈[C],j∈[m]
λi,jPi(Lj(X)),
where PΛ : (Fn)` → T. Moreover assume that for every i ∈ [C], j ∈ [m], Lj ∈ Ldeg(λi,jPi).
Then, PΛ is of degree d = maxi,j deg(λi,jPi) and ‖e (PΛ)‖Ud < ε.
56
CHAPTER 5
TRUE COMPLEXITY, ON A THEOREM OF GOWERS AND
WOLF
The work in this chapter is based on a previous joint work [50]. As discussed in Section 2.4.2,
Cauchy-Schwarz complexity is an upper-bound on the true complexity of a system of linear
forms. However, there are cases where this is not tight. Gowers and Wolf conjectured that
the true complexity of a system of linear forms can be characterized by a simple linear
algebraic condition. Namely, that it is equal to the smallest d > 1 such that Ld+11 , . . . , Ld+1
m
are linearly independent where the d-th tensor power of a linear form L is defined as
Ld =
d∏j=1
λij : i1, . . . , id ∈ [k]
∈ Fkd.
Later in [38, Theorem 6.1] they verified their conjecture in the case where |F| is sufficiently
large; more precisely when |F| is at least the Cauchy-Schwarz complexity of the system of
linear form. In this chapter we verify the Gowers-Wolf conjecture in full generality by proving
the following stronger theorem.
Theorem 5.0.7. Let L = L1, . . . , Lm be a system of linear forms. Assume that Ld+11 is
not in the linear span of Ld+12 , . . . , Ld+1
m . For every ε > 0, there exists δ > 0 such that for
any collection of functions f1, . . . , fm : Fn → D with ‖f1‖Ud+1 6 δ, we have
∣∣∣∣∣ EX∈(Fn)k
[m∏i=1
fi(Li(X))
]∣∣∣∣∣ 6 ε.
Theorem 5.0.7 was conjectured in [38], and left open even in the case of large |F|. In [51],
a partial near-orthogonality result is proved and used to prove Theorem 5.0.7 in the case
where |F| is greater or equal to the Cauchy-Schwarz complexity of the system of linear form.
Theorem 5.0.7 establishes this in its full generality using our near-orthogonality result for
57
regular non-classical polynomials. The following corollary to Theorem 5.0.7 is very useful
when combined with the decomposition theorems such as Theorem 2.3.1.
Corollary 5.0.8. Let L = L1, . . . , Lm be a system of linear forms. Assume that
Ld+11 , . . . , Ld+1
m are linearly independent. For every ε > 0, there exists δ > 0 such that for
any functions f1, . . . , fm, g1, . . . , gm : Fn → D with ‖fi − gi‖Ud+1 6 δ, we have
∣∣∣∣∣EX[m∏i=1
fi(Li(X))
]− EX
[m∏i=1
gi(Li(X))
]∣∣∣∣∣ 6 ε,
Proof. Choosing δ = δ(ε′) as in Theorem 5.0.7 for ε′ := ε/m, we have
∣∣∣∣∣EX[m∏i=1
fi(Li(X))
]− EX
[m∏i=1
gi(Li(X))
]∣∣∣∣∣=
∣∣∣∣∣∣m∑i=1
EX
(fi − gi)(Li(X)) ·i−1∏j=1
gj(Lj(X)) ·m∏
j=i+1
fj(Lj(X))
∣∣∣∣∣∣6
m∑i=1
∣∣∣∣∣∣EX(fi − gi)(Li(X)) ·
i−1∏j=1
gj(Lj(X)) ·m∏
j=i+1
fj(Lj(X))
∣∣∣∣∣∣ 6 m · δ 6 ε,
where the second inequality follows from Theorem 5.0.7 since ‖fi − gi‖Ud+1 6 δ and Ld+1i
is not in the linear span of Ld+1j j∈[m]\i.
5.1 The Gowers-Wolf Conjecture: Proof of Theorem 5.0.7
Theorem 5.0.7 (restated). Let L = L1, . . . , Lm be a system of linear forms. Assume
that Ld+11 is not in the linear span of Ld+1
2 , . . . , Ld+1m . For every ε > 0, there exists δ > 0
such that for any collection of functions f1, . . . , fm : Fn → D with ‖f1‖Ud+1 6 δ, we have
∣∣∣∣∣ EX∈(Fn)k
[m∏i=1
fi(Li(X))
]∣∣∣∣∣ 6 ε. (5.1)
58
We prove Theorem 5.0.7 in this section. Note that since Ld+11 is not in the linear span
of Ld+12 , . . . , Ld+1
m we have that L1 is linearly independent from each Lj for j > 1. We
claim that we may assume without loss of generality that L2, . . . , Lm are pairwise linearly
independent as well, namely that L1, . . . , Lm has bounded Cauchy-Schwarz complexity.
Assume that there are j, ` ∈ [m]\1 and a nonzero c ∈ F such that L` = cLj . Then we
may define a new function f ′j(x) = fj(x)f`(cx) so that f ′j(Lj(X)) = fj(Lj(X))f`(L`(X))
and remove the linear form L` and functions fj , f` from the system. Now
EX∈(Fn)k
[m∏i=1
fi(Li(X))
]= EX∈(Fn)k
f ′j(Lj(X)) ·
∏i∈[m]\j,`
fi(Li(X))
,and thus it suffices to bound the right hand side of the above identity. We may repeatedly
apply the above procedure in order to achieve a new system of pairwise linearly independent
linear forms along with their corresponding functions while keeping L1 and f1 untouched.
Thus we may assume that L1, . . . , Lm is of finite Cauchy-Schwarz complexity s for
some s 6 m < ∞. The case when d > s follows from Lemma 2.4.3, thus we will consider
the case when d < s. We will use Corollary 3.0.9 to write fi = gi + hi with
1. gi = E[fi|B], where B is an r-regular polynomial factor of degree at most s and
complexity C 6 Cmax(p, s, η, δ, r(·)) defined by only homogeneous polynomials, where
r is a sufficiently fast growing growth function (to be determined later);
2. ‖hi‖Us+1 6 η.
We first show that by choosing a sufficiently small η we may replace fi’s in Equation (5.1)
with gi’s.
59
Claim 5.1.1. Choosing η 6 ε2m we have
∣∣∣∣∣ EX∈(Fn)k
[m∏i=1
fi(Li(X))
]− EX∈(Fn)k
[m∏i=1
gi(Li(X))
]∣∣∣∣∣ 6 ε
2.
Proof. We have
∣∣∣∣∣EX[m∏i=1
fi(Li(X))
]− EX
[m∏i=1
gi(Li(X))
]∣∣∣∣∣=
∣∣∣∣∣∣m∑i=1
EX
hi(Li(X)) ·i−1∏j=1
gj(Lj(X)) ·m∏
j=i+1
fj(Lj(X))
∣∣∣∣∣∣6
m∑i=1
∣∣∣∣∣∣EXhi(Li(X)) ·
i−1∏j=1
gj(Lj(X)) ·m∏
j=i+1
fj(Lj(X))
∣∣∣∣∣∣6
m∑i=1
‖hi‖Us+1 6 m · η 6 ε
2,
where the second inequality follows from Lemma 2.4.3 since the Cauchy-Schwarz complexity
of L is s.
Thus it is sufficient to bound∣∣∣EX∈(Fn)k
[∏mi=1 gi(Li(X))
]∣∣∣ by ε/2. For each i, gi =
E[fi|B] and thus
gi(x) = Γi(P1(x), . . . , PC(x)),
where P1, . . . , PC are the (nonclassical) homogeneous polynomials of degree 6 s defining B
and Γi : TC → D is a function. Let ki denote the depth of the polynomial Pi so that by
Lemma 2.1.3, each Pi takes values in Uki+1 = 1pki+1Z/Z. Moreover let Σ := Zpk1+1 × · · · ×
ZpkC+1 . Using the Fourier expansion of Γi we have
gi(x) =∑
Λ=(λ1,...,λC)∈Σ
Γi(Λ) · e
C∑j=1
λjPj(x)
, (5.2)
60
where Γi(Λ) is the Fourier coefficient of Γi corresponding to Λ. Let PΛ :=∑Cj=1 λjPj(x) for
the sake of brevity so that we may write
EX
[m∏i=1
gi(Li(X))
]=
∑Λ1,...,Λm∈Σ
(m∏i=1
Γi(Λi)
)· EX
[e
(m∑i=1
PΛi(Li(X))
)]. (5.3)
We will show that for a sufficiently fast growing choice of the regularity function r(·) we may
bound each term in Equation (5.3) by σ := ε2|Σ|m , thus concluding the proof by the triangle
inequality. We will first show that the terms for which deg(PΛ1) 6 d can be made small.
Claim 5.1.2. Let Λ ∈ Σ be such that deg(PΛ) 6 d. For a sufficiently fast growing choice of
r(·) and choice of δ 6 σ2 , ∣∣∣Γ1(Λ)
∣∣∣ < σ.
Proof. It follows from Equation (5.2) that
Γ1(Λ) = Ex
[g1(x)e (−PΛ(x))]−∑
Λ′∈Σ\ΛΓ1(Λ′) · E
x[e (PΛ′(x)− PΛ(x))] .
Note that ‖f1‖Ud+1 6 δ and thus
∣∣∣Ex
[g1(x)e (−PΛ(x))]∣∣∣ =
∣∣∣Ex
[f1(x)e (−PΛ(x))]∣∣∣
6 ‖f1e (−PΛ)‖Ud+1 = ‖f1‖Ud+1 6 δ 6 σ/2,
where we used the fact that g1 = E[f1|B] and the fact that Gowers norms are increasing in
d. Finally the terms of the form Γ1(Λ′) · Ex[e (PΛ′(x)− PΛ(x))] with Γ′ 6= Γ can be made
arbitrarily small by choosing a sufficiently fast growing r(·) due to Remark 2.2.17, since
PΛ′ −PΛ = PΛ′−Λ is a nonzero linear combination of the polynomials defining the r-regular
factor B.
The above claim allows us to bound the terms from Equation (5.3) corresponding to
61
tuples (Λ1, . . . ,Λm) ∈ Σm with deg(PΛ1) 6 d. This is because for such terms |Γ1(Λ1)| < σ
by the above claim, and |Γi(Λi)| 6 1 since fi’s take values in D. It remains to bound the
terms for which deg(PΛ1) > d. We will need the following claim.
Lemma 5.1.3. Assume that Ld+11 is not in the linear span of Ld+1
2 , . . . , Ld+1m , and let
(Λ1, . . . ,Λm) ∈ Σm be such that deg(PΛ1) > d+ 1. Then
m∑i=1
PΛi(Li(X)) 6≡ 0.
This combined with Theorem 2.2.23 implies that for a sufficiently fast growing choice of
r(·), EX[e(∑m
i=1 PΛi(Li(X)))]< σ which completes the proof of Theorem 5.0.7. Thus, we
are left with proving Lemma 5.1.3.
Proof of Lemma 5.1.3: Assume to the contrary that∑mi=1 PΛi(Li(X)) ≡ 0. Denoting
the coordinates of Λi by (λi,1, . . . , λi,C) ∈ ΣC we have
m∑i=1
PΛi(Li(X)) =∑
i∈[m],j∈[C]
λi,jPj(Li(X)) ≡ 0.
Since the polynomial factor defined by P1, . . . , PC is r-regular with a sufficiently fast growing
growth function r(·), Theorem 2.2.23 implies that for every j ∈ [C] we must have that
m∑i=1
λi,jPj(Li(X)) ≡ 0. (5.4)
Since deg(PΛ1) > d, there must exist j ∈ [C] such that λ1,j 6= 0 and deg(λ1,jPj) > d.
Let j∗ ∈ [C] be such that deg(λ1,j∗Pj∗) is maximized, and let d∗ := deg(Pj∗) (note that
deg(λ1,j∗Pj∗) 6 d∗). We will first prove that replacing Pj∗ with a classical homogeneous
polynomial Q of the same degree, Equation (5.4) for j = j∗ would still hold.
Claim 5.1.4. Let Q : Fn → T be a classical homogeneous polynomial with deg(Q) = d∗.
62
Thenm∑i=1
λi,j∗Q(Li(X)) ≡ 0.
Proof. Assume to the contrary that∑mi=1 λi,j∗Q(Li(X)) 6≡ 0. By Claim 4.2.2 for degree d∗
and linear forms L1, . . . , Lm, we can find a set of coefficients ai,t ∈ Zi∈[`],t∈[m′], ci,t ∈
F\0i∈[`],t∈[m′] and linear forms Mtt∈[m′] with |Mt| 6 d∗ such that the first nonzero
entry of every Mt is equal to 1, such that
m∑i=1
λi,j∗Q(Li(X)) =∑
i∈[`],t∈[m′]
ai,tλi,j∗Q(ci,tMt(X)) =m′∑t=1
αtQ(Mt(X)) 6≡ 0, (5.5)
where αt =∑`i=1 ai,tλi,j∗|ci,t|d
∗. Here the second equality follows from Q being classical
and homogeneous. Furthermore,
m∑i=1
λi,j∗Pj∗(Li(X)) =∑
i∈[`],t∈[m′]
ai,tλi,j∗Pj∗(ci,tMt(X)) =m′∑t=1
βtPj∗(Mt(X)), (5.6)
where βt =∑`i=1 ai,tλi,j∗σi,t, σi,ti∈[`],t∈[m′] are integers whose existence follows from the
homogeneity of Pj∗ . Moreover, by Remark 3.0.6 we know that σi,t ≡ |ci,t|d∗
mod p, and
hence
αt ≡ βt mod p, ∀t ∈ [m′]. (5.7)
Now notice that Equation (5.5) implies that there exists some t ∈ [m′] for which αtQ 6= 0,
which, since Q is classical, is equivalent to αt 6≡ 0 mod p. Hence also βt 6≡ 0 mod p. Let
T = t ∈ [m′] : βt 6≡ 0 mod p, which we just verified is nonempty. Then for t ∈ T ,
deg(βtPj∗) = deg(Pj∗) = d∗; and for t ∈ [m] \ T , deg(βtPj∗) 6 d∗ − (p− 1) < d∗.
We can now decompose
m∑i=1
λi,j∗Pj∗(Li(X)) =∑t∈T
βtPj∗(Mt(X)) +∑
t∈[m]\TβtPj∗(Mt(X)). (5.8)
63
By Proposition 4.2.6, the first sum in Equation (5.8) is a nonzero polynomial of degree d∗,
and by our previous argument, the sum for t 6∈ T is a polynomial of degree less than d∗.
Hence, we get that∑mi=1 λi,j∗Pj∗(Li(X)) is a nonzero polynomial of degree d∗, which is a
contradiction to our assumption.
We have proved that for every choice of a degree-d∗ classical homogeneous polynomial Q,∑mi=1 λi,j∗Q(Li(X)) ≡ 0. Now choosing the polynomial Q(x(1), . . . , x(n)) = x(1) · · ·x(d∗)
and looking at the coefficients of the monomials of degree d∗, we have
m∑i=1
λi,j∗Ld∗i ≡ 0.
Recalling that λ1,j∗ 6= 0 and that d∗ > d, this means that Ld+11 can be written as a linear
combination of Ld+12 , . . . , Ld+1
m , a contradiction.
64
CHAPTER 6
TESTABLE AFFINE-INVARIANT PROPERTIES
The results in this chapter are based on previous work [15] 1. The field of property testing,
as initiated by [20, 8] and defined formally by [66, 29], is the study of algorithms that query
their input a very small number of times and with high probability decide correctly whether
their input satisfies a given property or is “far” from satisfying that property. A property is
called testable, or sometimes strongly testable or locally testable, if the number of queries can
be made independent of the size of the object without affecting the correctness probability.
Perhaps surprisingly, it has been found that a large number of natural properties satisfy this
strong requirement; see e.g. the surveys [24, 65, 64, 70] for a general overview.
Let [R] denote the set 1, . . . , R. Given a property P of functions in Fn → [R] | n ∈
Z>0, we say that f : Fn → [R] is ε-far from P if
ming∈P
Prx∈Fn
[f(x) 6= g(x)] > ε,
and we say that it is ε-close otherwise.
Definition 6.0.5 (Testability). A property P is said to be testable (with one-sided error)
if there are functions q : (0, 1) → Z>0, δ : (0, 1) → (0, 1), and an algorithm T that, given
as input a parameter ε > 0 and oracle access to a function f : Fn → [R], makes at most
q(ε) queries to the oracle for f , always accepts if f ∈ P and rejects with probability at least
δ(ε) if f is ε-far from P. If, furthermore, q is a constant function, then P is said to be
proximity-obliviously testable (PO testable).
The term proximity-oblivious testing is coined by Goldreich and Ron in [32]. As an
example of a testable (in fact, PO testable) property, let us recall the famous result by
1. This work is based on an earlier work: Every locally characterized affine-invariant property is testable.In Proceedings of the forty-fifth annual ACM symposium on Theory of computing (STOC 13), pp. 429–436.c© ACM, 2013. DOI=10.1145/2488608.2488662
65
Blum, Luby and Rubinfeld [20] which initiated this line of research. They showed that
linearity of a function f : Fn → F is testable by a test which makes 3 queries. This test
accepts if f is linear and rejects with probability Ω(ε) if f is ε-far from linear.
6.1 Background
6.1.1 Algebraic Property Testing, Affine Invariance
We say that a property P ⊆ Fn → [R] is linear-invariant if it is the case that for any
f ∈ P and for any linear transformation L : Fn → Fn, it holds that f L ∈ P . Similarly,
an affine-invariant property is closed under composition with affine transformations A :
Fn → Fn (an affine transformation A is of the form L + c where L is linear and c is a
constant). The property of a function f : Fn → F being affine is testable by a simple
reduction to [20], and is itself affine-invariant. Other well-studied examples of affine-invariant
(and hence, linear-invariant) properties include Reed-Muller codes (in other words, bounded
degree polynomials) [8, 9, 23, 66, 4] and Fourier sparsity [34]. In fact, affine invariance seems
to be a common feature of most interesting properties that one would classify as “algebraic”.
Kaufman and Sudan in [57] made explicit note of this phenomenon and initiated a general
study of the testability of affine-invariant properties (see also [30]). In particular, they asked
for necessary and sufficient conditions for the testability of affine-invariant properties.
6.1.2 Localy Characterized Properties
“Local characterization” is a necessary condition for PO testability. For a PO testable
property P , if a function f does not satisfy P , then by Definition 6.0.5, the tester rejects f
with positive probability. Since the test always accepts functions with the property, there
must be q points x1, . . . , xq ∈ Fn that form a witness for non-membership in P . These are
the queries that cause the tester to reject. Thus, denoting σ = (f(x1), . . . , f(xq)) ∈ [R]q, we
66
say that C = (x1, x2, . . . , xq;σ) forms a q-local constraint for P . This means that whenever
the constraint is violated by a function g, i.e., (g(x1), . . . , g(xq)) = σ, we know that g is not
in P . A property P is q-locally characterized if there exists a collection of q-local constraints
C1, . . . , Cm such that g ∈ P if and only if none of the constraints C1, . . . , Cm are violated.
It follows from the above discussion that if P is PO testable with q queries, then P is q-
locally characterized. We say P is locally characterized if it is q-locally characterized for
some constant q.
We now give some examples of locally characterized affine-invariant properties. Consider
the property of being affine. It is 4-locally characterized because a function f is affine if and
only if f(x) − f(x + y) − f(x + z) + f(x + y + z) = 0 for every x, y, z ∈ Fn. Note that
this characterization automatically suggests a 4-query test: pick random x, y, z ∈ Fn and
check whether the identity holds or not for that choice of x, y, z. More generally, consider
the property of being a polynomial of degree at most d, for some fixed integer d > 0. The
property is known to be PO testable due to independent work of [56, 53], and their test
is based upon a pdd+1p−1e-local characterization. Again, the test is simply to pick a random
constraint and check if it is violated.
Indeed, for any q-locally characterized property P defined by constraints C1, . . . , Cm, one
can design the following q-query test: choose a constraint Ci uniformly at random and reject
only if the input function violates Ci. Clearly, if the input function f is in P , the test always
accepts. The question is the probability with which a function ε-far from P is rejected. We
show that for affine-invariant properties, this test always rejects with probability bounded
away from zero for every constant ε > 0.
Theorem 6.1.1. Every q-locally characterized affine-invariant property is proximity oblivi-
ously testable with q queries.
67
6.2 Locality for Affine-Invariant Properties
In the context of affine-invariant properties, we can define the notion of local characterization
in a more algebraic way than we did in the introduction. Recall that a hyperplane is an affine
subspace of codimension 1.
Definition 6.2.1 (Locally characterized properties). An affine-invariant property P ⊂
Fn → [R] : n > 0 is said to be locally characterized if both of the following hold:
• For every function f : Fn → [R] in P and every hyperplane A 6 Fn, f |A ∈ P.
• There exists a constant K > 1 such that if f : Fn → [R] does not belong to P and
n > K, then there exists a hyperplane B 6 Fn such that f |B 6∈ P.
The constant K is said to be the locality of P.
The following observation shows that an affine-invariant property is locally characterized
if and only if it can be described using a bounded number of induced affine constraints from
the previous section, and hence, is locally characterized in the sense of the introduction.
Lemma 6.2.2. If P ⊆ Fn → [R] : n > 0 is a locally characterized affine-invariant
property with locality K, then P is equivalent to A-freeness, where A is a finite collection of
induced affine constraints, with each constraint of size pK on K + 1 variables. On the other
hand, if P is equivalent to A-freeness, where A is a collection of induced affine constraints
with each constraint on 6 K + 1 variables, then P has locality at most K.
Finally, we also make formal note of the observation in the introduction that if a property
is testable, then it must be locally characterized.
Remark 6.2.3. If K is a fixed integer and P ⊂ Fn → [R] is an affine-invariant property
that is testable with K queries, then P is a locally characterized property with locality K.
So, we can view our main result as a converse statement.
68
6.3 Subspace Hereditary Properties
Just as a necessary condition for PO testability is local characterization, one can formulate
a natural condition that is (almost) necessary for testability in general. In the context of
affine-invariant properties, the condition can be succinctly stated as follows:
Definition 6.3.1 (Subspace hereditary properties). An affine-invariant property P is said
to be (affine) subspace hereditary if for any f : Fn → [R] satisfying P, the restriction of f
to any affine subspace of Fn also satisfies P.
In [17], it is shown that every affine-invariant property testable by a “natural” tester is
very “close” to a subspace hereditary property2. Thus, if we gloss over some technicalities,
subspace hereditariness is a necessary condition for testability. In the opposite direction,
[17] conjectures the following:
Conjecture 6.3.2 ([17]). Every subspace hereditary property is testable.
Resolving Conjecture 6.3.2 would yield a combinatorial characterization of the (natu-
ral) one-sided testable affine-invariant properties, similar to the characterization for testable
dense graph properties [6]. Although we are not yet able to confirm or refute the full Con-
jecture 6.3.2, we can show testability if we make an additional assumption of “bounded
complexity”, defined formally in Section 6.4.2.
Theorem 6.3.3 (Informal). Every subspace hereditary property of “bounded complexity” is
testable.
We will formally define the notion of complexity later on in Section 6.4.2, but for now, it
suffices to know that it is an integer that we will associate with each property (independent of
2. We omit the technical definitions of “natural” and “close”, since they are unimportant here. Informally,the behavior of a “natural” tester is independent of the size of the domain and “close” means that the propertydeviates from an actual affine subspace hereditary property on functions over a finite domain. See [17] fordetails, or [6] for the analogous definitions in a graph-theoretic context.
69
n). Also, q-locally characterized properties are of complexity at most q. All natural affine-
invariant properties that we know of have bounded complexity (in fact, most are locally
characterized). So, the subspace hereditary properties not covered by Theorem 6.3.3 seem
to be mainly of theoretical interest.
6.4 Main Result
In this section, we describe our main result, Theorem 6.3.3, rigorously. Theorem 6.1.1 follows
as a corollary. We first need to set up some notions. Just as a locally characterized property
can be described by a list of constraints, subspace hereditary properties can also be described
similarly, but here, the size of the list can be infinite. For affine-invariant properties, we can
represent the constraints in a very special form, as “induced affine constraints”. We first
describe these, then define the notion of complexity, and finally state the theorem.
6.4.1 Affine constraints
A linear form L = (w1, w2, . . . , wk) is said to be affine if w1 = 1. In this section, linear forms
will always be assumed to be affine.
We specify a partial order among affine forms. We say (w1, . . . , wk) (w′1, . . . , w′k)
if |wi| 6 |w′i| for all i ∈ [k], where | · | is the obvious map from F to 0, 1, . . . , p− 1. An
affine constraint is a collection of affine forms, with the added technical restriction of being
downward-closed with respect to . For future references we state this as the following
definition.
Definition 6.4.1 (Affine constraints). An affine constraint of size m on k variables is a
tuple A = (L1, . . . , Lm) of m affine forms L1, . . . , Lm over F on k variables, where:
(i) L1(x1, . . . , xk) = x1;
(ii) If L belongs to A and L′ L, then L′ also belongs to A.
70
Any subspace hereditary property can be described using affine constraints and forbidden
patterns, in the following way.
Definition 6.4.2 (Properties defined by induced affine constraints).
• An induced affine constraint of size m on ` variables is a pair (A, σ) where A is an
affine constraint of size m on ` variables and σ ∈ [R]m.
• Given such an induced affine constraint (A, σ), a function f : Fn → [R] is said to be
(A, σ)-free if there exist no x1, . . . , x` ∈ Fn such that
(f(L1(x1, . . . , x`)), . . . , f(Lm(x1, . . . , x`))) = σ.
On the other hand, if such x1, . . . , x` exist, we say that f induces (A, σ) at x1, . . . , x`.
• Given a (possibly infinite) collection A = (A1, σ1), (A2, σ2), . . . , (Ai, σi), . . . of in-
duced affine constraints, a function f : Fn → [R] is said to be A-free if it is (Ai, σi)-free
for every i > 1.
As an example consider the property of having degree at most 1 as a polynomial, for
function F : Fn → F. It is easy to see that F satisfies this property if and only if F (x1) −
F (x1 + x2) − F (x1 + x3) + F (x1 + x2 + x3) = 0 for all x1, x2, x3 ∈ F. Consequently the
property can be defined by the set of induced affine constraints that forbid any values for
F (x1), F (x1 + x2), F (x1 + x3), F (x1 + x2 + x3) that do not satisfy the identity F (x1) −
F (x1 + x2)− F (x1 + x3) + F (x1 + x2 + x3) = 0.
The connection between affine subspace hereditariness and affine constraints is given by
the following simple observation.
Observation 6.4.3. An affine-invariant property P is subspace hereditary if and only if
it is equivalent to the property of A-freeness for some fixed collection A of induced affine
constraints.
71
Proof. Given an affine invariant property P , a simple (though inefficient) way to obtain the
set A is to let it be the following: For every n and a function f : Fn → [R] that is not in P ,
we include in A the constraint (Af , σf ), where Af is indexed by members of Fn and contains
Lz(X1, . . . , Xn+1) = X1 +∑ni=1 ziXi+1 : z = (z1, . . . , zn) ∈ Fn, and σf is just set to f .
Setting X1 = 0 and Xi+1 to the ith standard vector ei for every i ∈ [n] shows that f is
not (Af , σf )-free. Hence the property defined by A is contained in P . The containment in
the other direction follows from P being affine-invariant and hereditary.
The other direction of the observation is trivial.
6.4.2 Complexity of Induced Affine Constraints
Recall from Definition 2.4.2, that if L = L1, . . . , Lm ∈ Fk contains two linear forms that
are multiples of each other (that is Li = λLj for i 6= j and λ ∈ F), then the Cauchy-Schwarz
complexity of L is infinity. Otherwise, its Cauchy-Schwarz complexity is at most |L| − 2.
Note, also that sets of affine linear forms are always of finite complexity.
Definition 6.4.4. Given a collection A =
(A1, σ1), (A2, σ2), . . . , (Ai, σi)
of induced affine
constraints, we say that A is of complexity 6 d if for each i, the collection of affine forms
Ai is of Cauchy-Schwarz complexity 6 d according to Definition 2.4.2.
6.4.3 Statement of the main result
Theorem 6.4.5 (Main theorem). For any integer d > 0 and (possibly infinite) fixed col-
lection A = (A1, σ1), (A2, σ2), . . . , (Ai, σi), . . . of induced affine constraints, each of
complexity 6 d, there are functions qA : (0, 1) → Z+, δA : (0, 1) → (0, 1) and a tester T
which, for every ε > 0, makes qA(ε) queries, accepts A-free functions and rejects functions
ε-far from A-free with probability at least δA(ε). Moreover, qA is a constant if A is of finite
size.
72
We do not have any explicit bounds on δA because the analysis depends on previous
work based on ergodic theory. It would of course be interesting to have explicit bounds for
some of the properties described in Section 6.1.2.
Let us finally note that Theorem 6.4.5 is quite nontrivial even if A consists only of a
single induced affine constraint of complexity greater than 1. Indeed, previously it was not
known how to show testability in this case. A more detailed account of previous work is
given in Section 6.4.4.
6.4.4 Comparison with Previous Work
Our testing result is part of, and culminates a sequence of works investigating the relationship
between affine-invariance and testability. As described, Kaufman and Sudan [57] initiated the
program. Subsequently, Bhattacharyya, Chen, Sudan, and Xie [14] investigated monotone
linear-invariant properties of functions f : Fn2 → 0, 1, where a property P is monotone if
it satisfies the condition that for any function g ∈ P , modifying g by changing some outputs
from 1 to 0 does not make it violate P . Kral, Serra and Vena [60] and, independently, Shapira
[69] showed testability for any monotone linear-invariant property characterized by a finite
number of linear constraints (of arbitrary complexity). For general non-monotone properties,
Bhattacharyya, Grigorescu, and Shapira proved in [17] that affine-invariant properties of
functions in Fn2 → 0, 1 are testable if the complexity of the property is 1. Earlier
this year, Bhattacharrya, Fischer and Lovett in [16] generalized [17] to show that affine-
invariant properties of complexity < p are testable. In this paper, we only have to restrict
the complexity to be bounded, but the bound can be independent of p.
In terms of techniques, the general framework of the proof for testability here is very
much the same as in [17] or [16]. However, the main difference here is that we work with
collections of non-classical polynomials, rather than classical ones. Because the degrees of
non-classical polynomials can change when multiplied by constants, the notions of rank and
73
regularity are much more subtle. We need to establish a new version of a “polynomial
regularity lemma” which allows us to decompose a given polynomial collection into a high
rank collection of non-classical polynomials. Also, as discussed earlier, we establish a new
equidistribution theorem for non-classical polynomials. We expect that these results will be
of independent interest.
6.5 Locality
In the context of affine-invariant properties, we can define the notion of local characterization
in a more algebraic way than we did in the introduction. Recall that a hyperplane is an affine
subspace of codimension 1.
Definition 6.5.1 (Locally characterized properties). An affine-invariant property P ⊂
Fn → [R] : n > 0 is said to be locally characterized if both of the following hold:
• For every function f : Fn → [R] in P and every hyperplane A 6 Fn, f |A ∈ P.
• There exists a constant K > 1 such that if f : Fn → [R] does not belong to P and
n > K, then there exists a hyperplane B 6 Fn such that f |B 6∈ P.
The constant K is said to be the locality of P.
The following observation shows that an affine-invariant property is locally characterized
if and only if it can be described using a bounded number of induced affine constraints from
the previous section, and hence, is locally characterized in the sense of the introduction.
Lemma 6.5.2. If P ⊆ Fn → [R] : n > 0 is a locally characterized affine-invariant
property with locality K, then P is equivalent to A-freeness, where A is a finite collection of
induced affine constraints, with each constraint of size pK on K + 1 variables. On the other
hand, if P is equivalent to A-freeness, where A is a collection of induced affine constraints
with each constraint on 6 K + 1 variables, then P has locality at most K.
74
Finally, we also make formal note of the observation in the introduction that if a property
is testable, then it must be locally characterized.
Remark 6.5.3. If K is a fixed integer and P ⊂ Fn → [R] is an affine-invariant property
that is testable with K queries, then P is a locally characterized property with locality K.
So, we can view our main result as a converse statement.
6.6 Strong Near-Orthogonality
We will use the strong near-orthogonality theorem of [15] for affine systems of linear forms,
Theorem 2.2.21. Recall that this is a weaker version of Theorem 2.2.23, however we use
the former for its slightly simpler statement, and the fact that it does not require using
homogeneous non-classical polynomials.
Remark 6.6.1. The proof of Theorem 2.2.21 also shows the following. Suppose, in the
setting of Theorem 2.2.21, that for every Pi ∈ B and Lj ∈ A, either |Lj | 6 deg(λi,jPi) or
λi,j = 0. Then, unless every λi,j = 0 (mod pki+1), we have that PA,B,Λ is non-constant and
|E[e(PA,B,Λ(x1, . . . , x`)
)]| < ε. The only modification needed to the above proof is that the
transformation from Λ to Λ′ can be omitted.
To show equidistribution of (Pi(Lj(x1, . . . , x`)), we can use Theorem 2.2.21 in the same
manner we used Theorem 2.2.10 to show the equidistribution of (Pi(x)) in Lemma 2.2.18.
Before we do so, however, let us give a name to those Λ for which the first case of Theo-
rem 2.2.21 holds.
Definition 6.6.2. Given an affine constraint A = (L1, . . . , Lm) on ` variables and inte-
gers d, k > 0 such that d > k(p − 1), the (d, k)-dependency set of A is the set of tuples
(λ1, . . . , λm) ∈ [0, pk+1 − 1] such that∑mi=1 λiP (Li(x1, . . . , x`)) ≡ 0 for every polynomial
P : Fn → T of degree d and depth k.
75
Theorem 2.2.21 says that if B is a regular factor, PA,B,Λ ≡ 0 exactly when the first
condition holds. In other words:
Corollary 6.6.3. Fix an integer C > 0, tuples (d1, . . . , dC) ∈ ZC>0 and (k1, . . . , kC) ∈ ZC>0,
and an affine constraint (L1, . . . , Lm) on ` variables. For i ∈ [C], let Λi be the (di, ki)-
dependency set of A.
Then, for any polynomial factor B = (P1, . . . , PC), where each Pi has degree di and depth
ki, and B has rank > r2.2.13
(maxi di,
12
), it is the case that a tuple (λi,j)i∈[C],j∈[m] satisfies
C∑i=1
m∑j=1
λi,jPi(Lj(x1, . . . , x`)) ≡ 0
if and only if for every i ∈ [C], (λi,1 (mod pki+1), . . . , λi,m (mod pki+1)) ∈ Λi.
Proof. The “if” direction is obvious. For the “only if” direction, we use Theorem 2.2.21 to
conclude that if∑i,j λi,jPi(Lj(·)) ≡ 0, it must be that for every i ∈ [C],
∑j λi,jQi(Lj(·)) ≡
0 for any polynomial Qi with degree di and depth ki. This is equivalent to saying (λi,1
(mod pki+1), . . . , λi,m (mod pki+1)) ∈ Λi.
Similar to Theorem 4.1.1 we define consistency as following.
Definition 6.6.4 (Consistency). Let A be an affine constraint of size m. A sequence of
elements b1, . . . , bm ∈ T are said to be (d, k)-consistent with A if b1, . . . , bm ∈ Uk+1 and for
every tuple (λ1, . . . , λm) in the (d, k)-dependency set of A, it holds that∑mi=1 λibi = 0.
Given vectors d = (d1, . . . , dC) ∈ ZC>0 and k = (k1, . . . , kC) ∈ ZC>0, a sequence of vectors
b1, . . . , bm ∈ TC are said to be (d,k)-consistent with A if for every i ∈ [C], the elements
b1,i, . . . , bm,i are (di, ki)-consistent with A.
If B is a polynomial factor, the term B-consistent with A is a synonym for (d,k)-
consistent with A where d = (d1, . . . , dC) and k = (k1, . . . , kC) are respectively the degree
and depth sequences of polynomials defining B.
76
Now, the following is proved identically to Theorem 4.1.1.
Theorem 6.6.5. Given ε > 0, let B be a polynomial factor of degree d > 0, complexity C,
and rank r2.2.10(d, ε), that is defined by a tuple of polynomials P1, . . . , PC : Fn → T having
respective degrees d1, . . . , dC and respective depths k1, . . . , kC . Let A = (L1, . . . , Lm) be an
affine constraint on ` variables.
Suppose b1, . . . , bm ∈ TC are atoms of B that are B-consistent with A. Then
Prx1,...,x`
[B(Lj(x1, . . . , x`)) = bj ∀j ∈ [m]] =
∏Ci=1 |Λi|‖B‖m
± ε
where Λi is the (di, ki)-dependency set of A.
6.7 Degree-structural Properties
The conditions required in Theorem 6.1.1 and Theorem 6.3.3 are very general, and so, we
expect that they are satisfied by many interesting algebraic properties. This, in fact, turns
out to be the case. We show that a class of properties that we call degree-structural are all
locally characterized and are, hence, testable by Theorem 6.1.1. We give the definition below
in Definition 6.7.1. First let us list some examples of degree-structural properties. Let d be a
fixed positive integer. Each of the following definitions defines a degree-structural property.
• Degree 6 d: The degree of the function F : Fn → F as a polynomial is at most d;
• Splitting: A function F : Fn → F splits if it can be written as a product of at most d
linear functions;
• Factorization: A function F : Fn → F factors if F = GH for polynomials G,H :
Fn → F such that deg(G) 6 d− 1 and deg(H) 6 d− 1;
• Sum of two products: A function F : Fn → F is a sum of two products if there
77
are polynomials G1, G2, G3, G4 such that F = G1G2 +G3G4 and deg(Gi) 6 d− 1 for
i ∈ 1, 2, 3, 4;
• Having square root: A function F : Fn → F has a square root if F = G2 for a
polynomial G with deg(G) 6 d/2;
• Low d-rank: for a fixed integer r > 0, a function F : Fn → F has d-rank at most
r if there exist polynomials G1, . . . , Gr : Fn → F of degree 6 d − 1 and a function
Γ : Fr → F such that F = Γ(G1, . . . , Gr).
In fact, roughly speaking, any property that can be described as the property of decom-
posing into a known structure of low-degree polynomials is degree-structural.
Definition 6.7.1 (Degree-structural property). Given an integer c > 0, a vector of non-
negative integers d = (d1, . . . , dc) ∈ Zc>0, and a function Γ : Fc → F, define the (c,d,Γ)-
structured property to be the collection of functions F : Fn → F for which there exist
polynomials P1, . . . , Pc : Fn → F satisfying F (x) = Γ(P1(x), . . . , Pc(x)) for all x ∈ Fn and
deg(Pi) 6 di for all i ∈ [c].
We say a property P is degree-structural if there exist integers σ,∆ > 0 and a set of
tuples S ⊂ (c,d,Γ) | c ∈ [σ],d ∈ [0,∆]c,Γ : Fc → F, such that a function F : Fn → F
satisfies P if and only if F is (c,d,Γ)-structured for some (c,d,Γ) ∈ S. We call σ the scope
and ∆ the max-degree of the degree-structural property P.
It is straightforward to see that the examples above satisfy this definition.
6.7.1 Every Degree Structural Property is Testable
In this section, we prove our main result for degree-structural properties stating that if P
is degree-structural (recall Definition 6.7.1), then P is locally characterized. The proof uses
many of the tools established in Section 2.2.3.
78
Theorem 6.7.2. Every degree-structural property with bounded scope and max-degree is a
locally characterized affine-invariant property.
Combining Theorem 6.7.2 with Theorem 6.1.1 implies PO testability for all degree-
structural properties.
Proof. Let P be a degree-structural property with scope σ and max-degree ∆. Denote by
S the set of tuples (c,d,Γ) such that c 6 σ and P is the union over all (c,d,Γ) ∈ S of
(c,d,Γ)-structured functions. It is clear that P is affine-invariant, as having degree bounded
by a constant is an affine-invariant property. It is also immediate that P is closed under
taking restrictions to subspaces, since if F is (c,d,Γ)-structured, then F restricted to any
hyperplane is also (c,d,Γ)-structured. The non-trivial part of the theorem is to show that
the locality is bounded. In other words we need to show that there is a constant K such
that for n > K, if F : Fn → T is a function with F |A ∈ P for every hyperplane A 6 Fn,
then F ∈ P .
First, let us bound the degree of F . We know that F |A ∈ P for every hyperplane A.
Therefore, deg(F |A) 6 pσ∆ for every A, as F |A is a function of at most σ polynomials
each of degree at most ∆ over a field of characteristic p. It follows that F itself is of degree
6 pσ∆.
Let r : Z>0 → Z>0 be a function to be fixed later. Define r2 : Z>0 → Z>0 so that
r2(m) > r(C(r,pσ∆)2.2.29 (m+ σ)) + C
(r,pσ∆)2.2.29 (m+ σ) + p.
We apply Lemma 2.2.29 to F to find an r2-regular polynomial factor B of degree
6 pσ∆, defined by polynomials R1, . . . , RC : Fn → T, where C 6 Cr2,d2.2.29(1). Since F
is measurable with respect to B, there exists a function Σ : TC → F, such that F (x) =
Σ(R1(x), . . . , RC(x)).
From each Ri pick a monomial with degree equal to deg(Ri) and a monomial (possibly
the same one) with depth equal to depth(Ri). By taking K to be sufficiently large, we
can gaurantee the existence of an i0 ∈ [n] such that xi0 is not involved in any of these
79
monomials. Consequently deg(R′i) = deg(Ri) and depth(R′i) = depth(Ri) for all i ∈ [C],
where R′1, . . . , R′C are the restrictions of R1, . . . , RC , respectively, to the hyperplane xi0 =
0. Also by Lemma 2.2.2, R′1, . . . , R′C have rank > r2(C) − p. Since F |xi0=0 ∈ P , by
definition of P , there must exist (c,d,Γ) ∈ S with c 6 σ such that
Σ(R′1, . . . , R′C) = Γ(P1, . . . , Pc),
where deg(Pi) 6 di for all i ∈ [c].
Now, apply Lemma 2.2.29 to find an r-regular refinement of the factor defined by the
tuple of polynomials (R′1, . . . , R′C , P1, . . . , Pc). Because of our choice of r2 and the last part of
Lemma 2.2.29, we obtain a syntactic refinement ofR′1, . . . , R
′C
. That is, we obtain a tuple
B′ of polynomials R′1, . . . , R′C , S1, . . . , SD : Fn → T such that it has degree 6 pσ∆, its rank
> r(C+D), and C+D 6 C(r)2.2.29(C+σ), and for each i ∈ [c], Pi = Γi(R
′1, . . . , R
′C , S1, . . . , SD)
for some function Γi : TC+D → T. So for all x ∈ Fn,
Σ(R′1(x), . . . , R′C(x))
= Γ(Γ1(R′1(x), . . . , R′C(x), S1(x), . . . , SD(x)), . . . ,Γc(R′1(x), . . . , R′C(x), S1(x), . . . , SD(x))).
Applying Lemma 2.2.18, we see that if the rank of B′ is > r2.2.18 (pσ∆, ε) where ε > 0
is sufficiently small (say ε = ‖B′‖/2), then (R′1(x), . . . , R′C(x), S1(x), . . . , SD(x)) acquires
every value in its range. Thus, we have the identity
Σ(a1, . . . , ac) = Γ(Γ1(a1, . . . , aC , b1, . . . , bD), . . . ,Γc(a1, . . . , aC , b1, . . . , bD)),
for every ai ∈ Udepth(R′i)+1 and bi ∈ Udepth(Si)+1. Thus, we can substitute Ri for R′i and 0
80
for Si in the above equation and still retain the identity
F (x) = Σ(R1(x), . . . , RC(x))
= Γ(Γ1(R1(x), . . . , RC(x), 0, . . . , 0), . . . ,Γc(R1(x), . . . , RC(x), 0, . . . , 0))
= Γ(Q1(x), . . . , Qc(x))
where Qi : Fn → T are defined as Qi(x) = Γi(R1(x), . . . , RC(x), 0, . . . , 0). Since for ev-
ery i, deg(Ri) = deg(R′i) and depth(Ri) = depth(R′i), we can apply Theorem 6.7.3 be-
low to conclude that deg(Qi) 6 deg(Pi) 6 di for every i ∈ [c], as long as the rank of
B′ is > r6.7.3(pσ∆). Finally, we show that Q1, . . . , Qc map to U1 = ι(F ) and, so, are
classical. Indeed, since P1, . . . , Pc are classical, Γ1, . . . ,Γc must map to ι(F ) on all of∏Ci=1 Udepth(R′i)+1 ×
∏Di=1 Udepth(Si)+1 ⊇
∏Ci=1 Udepth(Ri)+1 × 0
D. Hence, F ∈ P .
The following theorem, used in the proof above, shows that a function of a high rank
collection of polynomials has the degree one would expect. Thus, it displays yet another way
in which high-rank polynomials behave “generically”. The proof is via another application
of the near-orthogonality result in Theorem 2.2.21.
Theorem 6.7.3. For an integer d > 0, let P1, . . . , PC : Fn → T be polynomials of degree 6 d
and rank > r6.7.3(d, C), and let Γ : TC → T be an arbitrary function. Define the polynomial
F : Fn → T by F (x) = Γ(P1(x), . . . , PC(x)). Then, for every collection of polynomials
Q1, . . . QC : Fn → T with deg(Qi) 6 deg(Pi) and depth(Qi) 6 depth(Pi) for all i ∈ [C], if
G : Fn → T is the polynomial G(x) = Γ(Q1(x), . . . , QC(x)), it holds that deg(G) 6 deg(F ).
Proof. Let f(x) = e (F (x)) and γ(x1, . . . , xC) = e (Γ(x1, . . . , xC)). Let D = deg(F ). Then,
for every x, y1, . . . , yD+1 ∈ Fn,
∆yD+1 · · ·∆y1f(x) = 1.
81
We need to show that g(x) = e (G(x)) also satisfies ∆yD+1 · · ·∆y1g(x) = 1.
Let k1, . . . , kC be the depths of P1, . . . , PC , respectively. Then, each Pi takes values in
Uki+1. Let Σ denote the group Zpk1+1 × · · · ×ZpkC+1 . Considering the Fourier transform of
γ, we have
f(x) = γ(P1(x), . . . , PC(x)) =∑β∈Σ
γ(β)e
C∑i=1
βiPi(x)
.Next, we look at the derivative.
∆yD+1 · · ·∆y1f(x) = ∆yD+1 · · ·∆y1
∑β∈Σ
γ(β)e
C∑i=1
βiPi(x)
=
∑αJ∈Σ:J⊆[D+1]
∏J⊆[D+1]
γ(αJ )e
(−1)|J |+1C∑i=1
αJ,iPi
x+∑j∈J
yj
.Denoting δ(α) =
∏J⊆[D+1] γ(αJ ) for α = (αJ )J⊆[D+1] ∈ ΣP([D+1]), we have
∆yD+1 · · ·∆y1f(x) =∑
α∈ΣP([D+1])
δ(α)e
C∑i=1
∑J⊆[D+1]
(−1)|J |+1αJ,iPi
x+∑j∈J
yj
.(6.1)
For any i, if there is a J such that |J | > deg(αJ,iPi), we can use Eq. (2.1) to rewrite
αJ,iPi
(x+
∑j∈J yj
)as a linear combination (over Z) of
Pi
(x+
∑j∈J ′ yj
): |J ′| < |J |
.
We repeat this process until for every i and J , either αJ,i = 0 or |J | 6 deg(αJ,iPi). Denoting
by A the set of α ∈ ΣP([D+1]) that satisfy this condition, we have obtained a new set of
coefficients δ′(α) such that
∆yD+1 · · ·∆y1f(x) =∑α∈A
δ′(α)e
C∑i=1
∑J⊆[D+1]
αJ,iPi
x+∑j∈J
yj
.Now, the crucial observation is that if instead of P1, . . . , PC , we had Q1, . . . , QC , the same
82
decomposition applies.
∆yD+1 · · ·∆y1g(x) =∑α∈A
δ′(A)e
C∑i=1
∑J⊆[D+1]
αJ,iQi
x+∑j∈J
yj
. (6.2)
The reason is that Eq. (6.1) remains valid as is if f is replaced by g and the Pi’s are
replaced by Qi’s, and furthermore since deg(Pi) > deg(Qi) and depth(Pi) > depth(Qi), the
applications of Eq. (2.1) remain valid also. Therefore, Eq. (6.2) is also valid
But now, we argue that δ′(·) are uniquely determined. Let k = maxi ki 6 d/(p− 1).
Claim 6.7.4. If P1, . . . , PC are of rank > r2.2.13(d, 1|A|) + 1, the functions
e
C∑i=1
∑J⊆[D+1]
αJ,iPi
x+∑j∈J
yj
: α ∈ A
are linearly independent over C.
Proof. Note that all these functions have L2-norm equal to 1. Hence it suffices to show
that their pairwise inner products are all bounded in absolute value by 1/|A|. To prove this
consider α, β ∈ A, and note that by Theorem 2.2.21 and, in particular, Remark 6.6.1, unless
all the αJ,i − βJ,i are zero,
∣∣∣∣∣∣Ee C∑i=1
∑J⊆[D+1]
(αJ,i − βJ,i)Pi
x+∑j∈J
yj
∣∣∣∣∣∣ < 1
|A|.
Therefore, since ∆yD+1 · · ·∆y1f(x) = 1, we must have δ′(α) = 1 when α is the all-
zero tuple, and δ′(α) = 0 for every non-zero α. Plugging this into Eq. (6.2), we get
∆yD+1 · · ·∆y1g(x) = 1.
83
6.8 Big Picture Functions
Suppose we have a function f : Fn → [R], and we want to find out whether it induces a
particular affine constraint (A, σ), where A = (L1, . . . , Lm) is a sequence of affine forms on
` variables and σ ∈ [R]m. Now, suppose Fn is partitioned by a polynomial factor B de-
fined by polynomials P1, . . . , PC of degrees d1, . . . , dC and depths k1, . . . , kC . Then, observe
that if b1, . . . , bm ∈ TC denote the atoms of B containing L1(x1, . . . , x`), . . . , Lm(x1, . . . , x`)
respectively, it must be the case that b1, . . . , bm are B-consistent with A (as defined in Defi-
nition 6.6.4). Thus, to locate where f might induce (A, σ), we should restrict our search to
sequences of atoms consistent with A.
It will be convenient to “blur” the given function f so as to retain only atom-level
information about it. That is, for every atom c of B, we will define fB(c) ⊆ [R] to be the set
of all values that f takes within c.
Definition 6.8.1. Given a function f : Fn → [R] and a polynomial factor B, the big picture
function of f is the function fB : T|B| → P([R]), defined by fB(c) = f(x) : B(x) = c.
On the other hand, given any function g : TC → P([R]), and a vector of degrees d =
(d1, . . . , dC) and depths k = (k1, . . . , kC) (which we think of as corresponding to the degrees
and depths of some polynomial factor of complexity C), we will define what it means for
such a function to “induce” a copy of a given constraint.
Definition 6.8.2 (Partially induce). Suppose we are given vectors d = (d1, . . . , dC) ∈ ZC>0
and k = (k1, . . . , kC) ∈ ZC>0, a function g :∏i∈[C] Uki+1 → P([R]), and an induced affine
constraint (A, σ) of size m. We say that g partially (d,k)-induces (A, σ) if there exist a
sequence b1, . . . , bm ∈ TC that is (d,k)-consistent with A, and σj ∈ g(bj) for each j ∈ [m].
Definition 6.8.2 is justified by the following trivial observation.
84
Remark 6.8.3. If f : Fn → [R] induces a constraint (A, σ), then for a factor B defined by
polynomials of respective degrees (d1, . . . , d|B|) = d and respective depths (k1, . . . , k|B|) = k,
the big picture function fB partially (d,k)-induces (A, σ).
To handle a possibly infinite collection A of affine constraints, we will employ a com-
pactness argument, analogous to one used in [7] to bound the size of the constraint partially
induced by the big picture function. Let us make the following definition:
Definition 6.8.4 (The compactness function). Suppose we are given positive integers C and
d, and a possibly infinite collection of induced affine constraints A = (A1, σ1), (A2, σ2), . . . ,
where (Ai, σi) is of size mi. For fixed d = (d1, . . . , dC) ∈ [d]C and k = (k1, . . . , kC) ∈[0,⌊d−1p−1
⌋]C, denote by G(d,k) the set of functions g :
∏Ci=1 Uki+1 → P([R]) that partially
(d,k)-induce some (Ai, σi) ∈ A. The compactness function is defined as
ΨA(C, d) = maxd,k
maxg∈G(d,k)
min(Ai,σi) partially
(d,k)-induced by g
mi
where the outer max is over vectors d = (d1, . . . , dC) ∈ [d]C and k = (k1, . . . , kC) ∈[0,⌊d−1p−1
⌋]C. Whenever G(d,k) is empty, we set the corresponding maximum to 0.
Note that ΨA(C, d) is indeed finite, as the number of possible degree and depth sequences
are bounded by d2C , and the size of G(d1, . . . , dC) is bounded by 2RpdC
.
Remark 6.8.5. Note that if a function g : TC → P([R]) partially (d,k)-induces some
constraint from A where d ∈ [d]C , then g must belong to G(d,k), and consequently it will
necessarily partially induce some (Ai, σi) ∈ A whose size is at most ΨA(C, d).
6.9 Proof of Testability
We prove the main result, Theorem 6.4.5, in this section. In fact, we will show the following.
85
Theorem 6.9.1. Let d > 0 be an integer. Suppose we are given a possibly infinite collection
of affine constraints A = (A1, σ1), (A2, σ2), . . . where each (Ai, σi) is an affine constraint
of complexity 6 d, and of size mi on `i variables. Then, there are functions `A : (0, 1)→ Z>0
and δA : (0, 1) → (0, 1) such that the following is true for any ε ∈ (0, 1). If a function
f : Fn → [R] is ε-far from being A-free, then f induces at least δA(ε)pn`i many copies of
some (Ai, σi) with `i < `A(ε) .
Moreover, if A is locally characterized, then `A(ε) is a constant independent of ε.
Theorem 6.4.5 immediately follows. Consider the following test: choose uniformly at
random x1, . . . , x`A(ε) ∈ Fn, let H denote the affine spacex1 +
∑`A(ε)j=2 λjxj : λj ∈ F
,
and check whether f restricted to H is A-free or not, thus making 6 p`A(ε) queries. By
Theorem 6.9.1, if f is ε-far from A-freeness, this test rejects with probability at least δA(ε).
Proof Theorem 6.9.1:
Preliminaries. Fix a function f : Fn → [R] that is ε-far from being A-free. For i ∈ [R],
define f (i) : Fn → 0, 1 so that f (i)(x) equals 1 when f(x) = i and equals 0 otherwise.
Additionally, set the following parameters, where ΨA is the compactness function from
Definition 6.8.4:
α(C) = p−2dCΨA(C,d), ρ(C) = r2.2.13(d, α(C)), ζ = ε8R ,
∆(C) = 116ζ
ΨA(C,d), η(C) = 18pdCΨA(C,d)
( ε24R
)ΨA(C,d).
Decomposing by regular factors. Next, apply Theorem 2.3.6 to the functions
f (1), f (2), . . . , f (R) in order to get polynomial factors B′ syn B of complexity at most
C2.3.6(∆, d, ρ, ζ, η), an element s ∈ T|B′|−|B|, and functions f(i)1 , f
(i)2 , f
(i)3 : Fn → R for each
86
i ∈ [R] with the desired properties. The sequence of polynomials generating B′ will be
denoted by P1, . . . , P|B′|. Since B′ is a syntactic refinement, we can assume B is generated
by the polynomials P1, . . . , P|B|. Let C = |B| and C ′ = |B′|. Note that ‖B‖ < p(kmax+1)C 6
pdC , where kmax 6 b(d− 1)/(p− 1)c is the maximum depth of a polynomial in B. Denote
the degree of Pi by di and the depth of Pi by ki.
Cleanup. Based on B′ and B, we construct a function F : Fn → [R] that is ε2 -close to
f and hence, still violates A-freeness. The “cleaner” structure of F will help us locate the
induced constraint violated by f .
The function F is the same as f except for the following: For every atom c of B, let tc =
arg maxj∈[R] Pr[f(x) = j | B′(x) = (c, s)] be the most popular value inside the corresponding
subatom (c, s).
• Poorly-represented atoms: If there exists i ∈ [R] such that |Pr[f(x) = i | B(x) =
c]−Pr[f(x) = i | B′(x) = (c, s)]| > ζ, then set F (z) = tc for every z in the atom c.
• Unpopular values: Otherwise, for any z in the atom c with 0 < Prx[f(x) =
f(z) | B′(x) = (c, s)] < ζ, set F (z) = tc.
A key property of the cleanup function F is that it supports a value inside an atom c of
B only if the original function f acquires the value on at least an ζ fraction of the subatom
(c, s). Furthermore as the following lemma shows it is ε/2-close to f , and therefore, it is not
A-free.
Lemma 6.9.2. The cleanup function F is ε/2-close to f , and therefore, it is not A-free.
Proof. The first step applies to at most ζ‖B‖ atoms, since B′ ζ-represents B with respect to
each f (1), . . . , f (R). By Lemma 2.2.18, each atom occupies at most 1‖B‖ + α(C) fraction of
the entire domain. So, the fraction of points whose values are set in the first step is at most
ζ‖B‖( 1‖B‖ + α(C)) < 2ζ.
87
In the second step, if Pr[f(x) = f(z) | B′(x) = (c, s)] < ζ, then Pr[f(x) = f(z) | B(x) =
c] < Pr[f(x) = f(z) | B′(x) = (c, s)] + ζ < 2ζ. Hence, the fraction of the points whose
values are set in the second step is at most 2ζR = ε/4.
Thus, the distance of F from f is bounded by 2ζ + ε/4 < ε/2.
Locating a violated constraint. We now want to use F to “find” the affine constraint
induced in f . Setting d = (d1, . . . , dC) and k = (k1, . . . , kC), we have by Remark 6.8.3 that
the big picture function FB of F will partially (d,k)-induce some constraint from A, and
hence by Remark 6.8.5, it will partially (d,k)-induce some (A, σ) ∈ A of size m 6 ΨA(C, d)
on ` variables. We will show that the original function f violates many instances of this
constraint.
Denote the affine forms in A by (L1, . . . , Lm) and the vector σ by (σ1, . . . , σm). Since
we can assume ` 6 m (without loss of generality by making a change of variables), we can
now define
`A(ε) = ΨA(C2.3.6(∆, η, ρ, ζ, R), d). (6.3)
Let b1, . . . , bm ∈∏Ci=1 Uki+1 correspond to the atoms of B where (A, σ) is partially
(d,k)-induced by FB. That is, b1, . . . , bm are consistent with A, and σi ∈ FB(bi) for every
i ∈ [m]. Also, let b′1, . . . , b′m ∈
∏C ′i=1 Uki+1 index the associated subatoms in B′, obtained
by letting b′j = (bj , s) for every j ∈ [m].
Lemma 6.9.3. The subatoms b′1, . . . , b′m are consistent with A.
Proof. Since b1, . . . , bm are already consistent with A, we only need to show that for every
i ∈ [C + 1, C ′], the sequence (b′1,i, . . . , b′m,i) = (si−C , si−C , . . . , si−C) is (di, ki)-consistent.
This holds because a constant function is of degree 6 di.
88
The main analysis. Let x = (x1, . . . , x`) where x1, . . . , x` are independent random vari-
ables taking values in Fn uniformly. Our goal is to prove a lower bound on
Prx
[f(L1(x)) = σ1 ∧ · · · ∧ f(Lm(x)) = σm] = Ex
[f (σ1)(L1(x)) · · · f (σm)(Lm(x))
]. (6.4)
The theorem obviously follows if the above expectation is larger than the respective δA(ε).
We rewrite the expectation as
Ex
[(f
(σ1)1 + f
(σ1)2 + f
(σ1)3 )(L1(x)) · · · (f (σm)
1 + f(σm)2 + f
(σm)3 )(Lm(x))
]. (6.5)
We can expand the expression inside the expectation as a sum of 3m terms. The
expectation of any term involving f(σj)2 for any j ∈ [m] is bounded in magnitude by
‖f (σj)2 ‖Ud+1 6 η(|B′|), by Lemma 2.4.3 and the fact that the complexity of A is bounded by
d. Hence, the expression (6.5) is at least
Ex
[(f
(σ1)1 + f
(σ1)3 )(L1(x)) · · · (f (σm)
1 + f(σm)3 )(Lm(x))
]− 3mη(|B′|).
Now, because of the non-negativity of f(σj)1 + f
(σj)3 for every j ∈ [m], this is at least
Ex
(f (σ1)1 + f
(σ1)3
)(L1(x)) · · ·
(f
(σm)1 + f
(σm)3
)(Lm(x))
∏j∈[m]
1[B′(Lj)=b′j ]
− 3mη(|B′|),
(6.6)
where 1[B′(Lj)=b′j ]is the indicator function of the event B′(Lj(x)) = b′j . In other words, now
we are only counting patterns that arise from the selected subatoms b′1, . . . , b′m. We next
expand the product inside the expectation into 2m terms. We will show that the contribution
from each of the 2m−1 terms involving f(σk)3 for any k ∈ [m] is small. Such a term is trivially
89
bounded from above by
Ex
∣∣∣f (σk)3 (Lk(x))
∣∣∣ ∏j∈[m]
1[B′(Lj)=b′j ])
. (6.7)
Without loss of generality, we assume that k = 1. This is convenient as by Definition 6.4.1 (i)
we have L1(x) = x1. (For other values of k, we can do a change of variables, replacing x1
with Lk(x), so that we can assume Lk(x) = x1.) With the assumption k = 1, the square of
(6.7) is equal to the following.
Ex
∣∣∣f (σk)3 (x1)
∣∣∣ ∏j∈[m]
1[B′(Lj)=b′j ]
2
6 Ex1
[|f (σk)
3 (x1)|21[B′(Lk)=b′k]
]Ex1
Ex2,...,x`
∏j∈[m]
1[B′(Lj)=b′j ]
2
. (6.8)
By Theorem 2.3.6 (vi) and Lemma 2.2.18, we have
Ex1
[|f (σk)
3 (x1)|21[B′(Lk)=b′k]
]6 ∆2(C) Pr
x1[B′(x1) = b′k]
6 ∆2(C)
(1
‖B′‖+ α(C ′)
)6
2∆2(C)
‖B′‖. (6.9)
Let y = (y2, . . . , y`) where y2, . . . , y` are independent random variables taking values in Fn
uniformly. The second term in the right hand side of (6.8) is equal to
1
‖B′‖2mEx1
Ex2,...,x`
∏i∈[C ′]j∈[m]
1
pki+1
pki+1−1∑λi,j=0
e(λi,j · (Pi(L′j(x))− b′i,j)
)2
90
=1
‖B′‖2mEx1
∑
(λi,j)∈∏i,j [0,p
ki+1−1]
e
− ∑i∈[C ′]j∈[m]
λi,jb′i,j
Ex2,...,x`
e
∑i∈[C ′]j∈[m]
λi,jPi(Lj(x))
2
61
‖B′‖2m∑
(λi,j),(τi,j)∈∏i,j [0,p
ki+1−1]
∣∣∣∣∣∣∣∣∣ Ex,y
e ∑i∈[C ′]j∈[m]
λi,jPi(Lj(x))
e
− ∑i∈[C ′]j∈[m]
τi,jPi(Lj(x1,y))
∣∣∣∣∣∣∣∣∣ .
(6.10)
We can bound the above using Theorem 2.2.21. Let A′ denote the set of 2m linear forms:
Lj(x1, x2, . . . , x`) | j ∈ [m]
∪Lj(x1, y2, . . . , y`) | j ∈ [m]
in variables x1, . . . , x`, y2, . . . , y`. Let Λi and Λ′i denote the (di, ki)-dependency set of A and
A′ respectively.
Lemma 6.9.4. For each i, |Λ′i| = |Λi|2 · pki+1
Proof. Recall that L1(x) = L1(x1,y) = x1. For any λ, τ ∈ Λi and any α ∈ [0, pki+1 − 1],
note that (λ1 + α (mod pki+1), λ2, . . . , λm, τ1 − α (mod pki+1), τ2, . . . , τm) ∈ Λ′i. Hence,
|Λ′i| > |Λi|2 · pki+1. To show |Λ′i| 6 |Λi|
2 · pki+1, we give a map from Λ′i to Λi × Λi that is
pki+1-to-1. Suppose∑mj=1 λjQ(Lj(x1, x2, . . . , x`)) +
∑mj=1 τjQ(Lj(x1, y2, . . . , y`)) ≡ 0 for
every polynomial Q of degree di and depth ki. Setting x2 = . . . = x` = 0 shows that
m∑j=1
τjQ(Lj(x1, y2, . . . , y`)) = −
m∑j=1
λj
Q(x1),
91
and similarly setting y2 = . . . = y` = 0 shows
m∑j=1
λjQ(Lj(x1, x2, . . . , x`)) = −
m∑j=1
τj
Q(x1).
In particular∑mj=1 λj = −
∑mj=1 τj . Consequently,
(λ, τ) 7→
− m∑j=2
λj (mod pki+1), λ2, . . . , λm
,
− m∑j=2
τi (mod pki+1), τ2, . . . , τm
is a map from Λ′i to Λi × Λi. To see that it is pki+1-to-1, note that
(λ1 + τ1 − γ (mod pki+1), λ2, . . . , λm, γ, τ2, . . . , τm) ∈ Λ′i
for every γ ∈ [0, pki+1 − 1], and these elements are all mapped to the same element in
Λi × Λi.
Applying Theorem 2.2.21 (just as in the proof of Theorem 6.6.5), we get that
(6.10) 61
‖B′‖2m
C ′∏i=1
|Λi|2pki+1 + ‖B′‖2mα(C ′)
6 ∏C ′i=1 |Λi|2
‖B′‖2m+ α(C ′).
Combining this with Eq. (6.9) and Eq. (6.8), we obtain
(6.7) 6 2∆(C)
√∏C ′i=1 |Λi|2‖B′‖2m
+ α(C ′). (6.11)
Finally, we turn to the main term in the expansion of Eq. (6.6). We know from
92
Lemma 6.9.3 that the subatoms b′1, . . . , b′m are consistent with A. Thus
Ex
f (σ1)1 (L1(x)) · · · f (σm)
1 (Lm(x)) ·∏j∈[m]
1[B′(Lj)=b′j ]
= Pr[B′(L1(x)) = b′1 ∧ · · · ∧ B
′(Lm(x)) = b′m]·
Ex
[f
(σ1)1 (L1(x)) · · · f (σm)
1 (Lm(x))|∀j ∈ [m], B′(Lj(x)) = b′j]
>
(∏C ′i=1 |Λi|‖B′‖m
− α(C ′)
)ζm. (6.12)
Let us justify the last line. The first term is due to the lower bound on the probability from
Theorem 6.6.5. The second term in (6.12) follows since each f(σj)1 is constant on the atoms
of B′, and because by construction, the big picture function FB of the cleanup function F ,
on which (A, σ) was partially induced, supports a value inside an atom b of B only if the
original function f acquires the value on at least an ζ fraction of the subatom (c, s).
Setting β = (∏C′i=1 |Λi|‖B′‖ )m and combining the bounds from (6.6), (6.11) and (6.12), we
conclude
(6.4) > (β − α(C ′)) ·( ε
8R
)m− 2m+1∆(C)
√β2 + α(C ′)− 3m · η(C ′)
>β
2·( ε
8R
)ΨA(C,d)− 2ΨA(C,d)+1β ·∆(C)− 3ΨA(C,d) · η(C ′)
Since ‖B′‖ 6 pdC′,
• 1‖B′‖ΨA(C,d) 6 β 6 1,
• ∆(C) = 116( ε
8R)ΨA(d,C),
• η(C ′) < 18‖B′‖ΨA(C,d)
( ε24R
)ΨA(C,d),
93
and both C and C ′ are upper-bounded by C2.3.6(∆, η, ρ, ζ, R), we can now define
δA(ε) =1
4p−dΨA(C2.3.6(∆,η,ρ,ζ,R))C2.3.6(∆,η,ρ,ζ,R) ·
( ε
8R
)ΨA(C2.3.6(∆,η,ρ,ζ,R),d)(6.13)
to conclude the proof.
94
CHAPTER 7
ALGORITHMIC ASPECTS OF REGULARITY
The results in this chapter are based on [18]1. This chapter is concerned with the algorithmic
versions of regularity lemmas for polynomials over finite fields. These regularity lemmas
proved by Green and Tao [43] and by Kaufman and Lovett [54] as discussed in Section 2.2.5
show that one can modify a given collection of polynomials F = P1, . . . , Pm into a new
collection F ′ so that the polynomials in F ′ are regular in the sense of Definition 2.2.7. These
lemmas have various applications, such as (special cases of) Reed-Muller testing and worst-
case to average-case reductions for polynomials. However, as mentioned in the introduction,
the transformation from F to F ′ in these works were not algorithmic. In this chapter, we
show that the analytic notions of regularity such as the ones defined in Section 2.2.2, while
being qualitatively equivalent to the above, allow for efficient algorithms.
In particular, for high enough characteristic, in time polynomial in number of variables,
we can refine F into F ′ where every nonzero linear combination of polynomials in F ′ has
desirably small Gowers norm. In the case of small fields, we give algorithmic regularity lem-
mas for two different notions of regularity. First is the notion of strong regularity introduced
by Kaufman and Lovett [54] that allows one to show that a polynomial approximable by a
strongly regular factor is actually exactly computable by it. The second notion is rank of
a polynomial factor, which Tao and Ziegler [78] showed is equivalent to having the analytic
property of low Gowers norm.
Throughout this chapter a polynomial is always a classical polynomial unless it is other-
wise specified to be a non-classical polynomial.
1. This work is based on an earlier work: Algorithmic Regularity for Polynomials and Applications.In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA15), PRDA15, 16, pp. 1870-1889 c© 2015. by the Society for Industrial and Applied Mathematics..10.1137/1.9781611973730.125
95
7.1 Applications
A regular factor is a basic ingredient of higher-order Fourier analysis, and we believe that our
algorithmic regularity lemmas will therefore be helpful in a variety of future applications.
In the following, we examine a few interesting applications of relevance to coding theory,
complexity theory and algorithmic algebra.
7.1.1 Inverse Theorem and Reed-Muller Decoding in High Characteristic
One of our main motivations is finding an algorithmic version of the inverse Gowers Theorem
for polynomials.
Theorem 7.1.1 (Inverse Gowers Theorem for polynomials in high characteristic, [43]). If a
polynomial P of degree d < |F| satisfies ‖eP‖Uk+1 > ε, then there exists a polynomial Q of
degree at most k, such that
|〈eFP, eFQ〉| = |E [x ∈ Fn] eFP (x)−Q(x)| > η,
for some η depending only on ε and d.
The Inverse Gowers Theorem has also been established over small fields by Tao and
Ziegler [78] (Theorem 2.1.10), but here, one needs to extend the given theorem statement to
non-classical polynomials, which are discussed in Section 2.1.1.
Theorem 7.1.1 has a direct interpretation in terms of Reed-Muller codes over F. The
Reed-Muller code of order k over Fn is simply the set of polynomials of degree at most
k over Fn. If d < |F| and for a given polynomial P of degree d, there exists a degree-k
polynomial Q such that Pr x ∈ FnP (x) = Q(x) > 1|F| + ε, which is the same as saying
that dist(P,Q) 6 1− 1|F|−ε (for dist(P,Q) denoting the normalized Hamming distance), then
it follows from the definition of Gowers norms and the Gowers-Cauchy-Schwarz inequality
96
that ‖eFtP‖Uk+1 > ε for a nonzero t ∈ F. Then, Theorem 7.1.1 gives that there exists
a polynomial Q of degree k such that |⟨
eFtP , eFQ⟩| > η, or equivalently, dist(P,Q′) 6
1− 1|F| − η
′ for some η′ > 0 and Q′ of degree k.
Thus, when d < |F|, the Gowers norm gives an approximate test for checking if for a given
polynomial P , there exists a Q of degree at most k within Hamming distance 1− 1|F| − ε. If
there exists a Q, then the Gowers norm is large and if the Gowers norm is larger than ε, then
there exists a Q′ within distance 1 − 1|F| − η
′. This is remarkable because the list-decoding
radius of Reed-Muller of order k codes is only 1− k|F| for k < |F| [33], and the test works even
beyond that. In fact the Hamming distance of a random P is 1 − 1|F| − o(1) from all Q of
degree k, and the test works all the way up to that distance. Moreover, Tao and Ziegler [78]
later showed that this test works not only when P is a polynomial of degree d < |F| (as in
the case of Theorem 7.1.1) but also when P is an arbitrary function and2 k < |F|.
Given the above, it is natural to consider the decoding analogue of the above question:
Given P of degree d over Fn, if there exists Q of degree k such that dist(P,Q) 6
1 − 1|F| − ε, can one find a Q′ (in time polynomial in n) of degree k such that
dist(P,Q′) 6 1− 1|F| − η for some η depending on ε?
Note that d, k and |F| are assumed to be constants and dependence on these is allowed, but
not on n. Also, observe that there might be exponentially many such Q since we are in the
regime beyond the list-decoding radius (see [55]), but the question turns out to be tractable
since we only ask for one such Q and allow a loss from ε to η. Such a decoding question
was solved for Reed-Muller codes of order 2 over F2, for any given function f (instead of a
polynomial P of bounded degree) by [79].
In Section 7.5.2, we show an algorithmic version of Theorem 7.1.1, meaning we explicitly
find a Q which has correlation at least η. In Section 7.5.3, we show how this algorithmic
2. For fields of low characteristic (|F| 6 k), the only testing results of this flavor (which works beyond thelist-decoding radius for arbitrary input functions) were proved by Samorodnitsky [68] for Reed-Muller codesof order 2 over Fn
2 and by Green and Tao [41] for Reed-Muller codes of order 2 over Fn5 .
97
inverse theorem leads to solving the above decoding question for polynomials P of degree d
and the Reed-Muller code of order k for k 6 d < |F|. This special case can be interpreted
as follows: if P is of degree-d for some degree k 6 d < |F|, then we can think of P being
obtained from some Q of degree k (which is a codeword) by adding the “noise” P − Q of
degree d. Thus, when the noise is “structured” i.e. given by a degree-d polynomial, we can
decode in the above sense (of finding some codeword within a given distance) even beyond
the list-decoding radius. Note that d can be much larger than k, so the noise set is still much
larger than the code.
Both our algorithmic version of the regularity lemma and the decoding algorithm, are
randomized algorithms that run in time O(nd) and output the desired objects with high
probability. This is linear time for regularity since even writing a polynomial of degree d
takes time Ω(nd). The algorithms can be derandomized in polynomial time (at the expense
of a very large exponent) using pseudorandom generators for low-degree polynomials 3. For
the question of finding a degree k polynomial within a given distance of P , it is possible that
one may be able to do this in time O(nk), but we do not achieve this.
7.1.2 Efficient Worst-Case to Average-Case Reductions for Polynomials
The central component in Green and Tao’s proof of Theorem 7.1.1 can be viewed as a worst-
case to average-case reduction that is interesting by itself. Let P : Fn → F be a degree-d
polynomial which can be weakly approximated by a lower degree polynomial Q : Fn → F, of
degree < d. Here by weak approximation we mean 〈eFP, eFQ〉 > δ. Then Green and Tao
show that if d < |F|, P can in fact be computed by lower degree polynomials, i.e. there exist
Q1, . . . , Qm of degree < d and a function Γ : FM → F, such that P = Γ(Q1, . . . , QM ). M
depends only on d, δ and |F|. Our algorithmic regularity lemma for polynomials over high
characteristic, when plugged into the arguments of Green and Tao, show that this result
3. We thank Shachar Lovett for this observation
98
can be made algorithmic, meaning that Q1, . . . , QM and Γ can be found in Poly(n) time
(assuming constant d, δ and |F|).
Green and Tao (as well as Lovett, Meshulam and Samorodnitsky [63]) showed that their
proof techniques break down in the general case when |F| may be small. In particular,
Theorem 7.1.1 is false when |F| 6 d. However, Kaufman and Lovett [54] showed that the
above worst-case to average-case reduction is still true for small fields. In order to prove
this result, they introduced a more involved notion of regularity called strong regularity.
Informally, strong regularity for a factor ensures that not only is the joint distribution of the
polynomials in the factor close to uniform, but in fact, the joint distribution of the derivatives
of the polynomial is also close to uniform.
We prove that given a factor F , we can find a strongly unbiased refinement F ′ in
polynomial time. Inserting this algorithmic regularity lemma into Kaufman and Lovett’s
proof shows that given a polynomial P that is known to be correlated with a lower de-
gree polynomial, we can explicitly find polynomials Q1, . . . , QM and a function Γ such that
P = Γ(Q1, . . . , QM ) and M is bounded. A randomized algorithm for recovering Q1, . . . , QM
and Γ with high probability runs in time O(nd), where d = deg(P ).
7.1.3 Algorithmic Inverse Theorem for Polynomials: Fields of Small
Characteristic
Over small fields, the correspondence between high rank and low Gowers norm, which drives
our proof over large characteristic, breaks down and one needs to consider non-classical
polynomials. (see Section 2.1.1 for formal definition).
The corresponding notion of regularity for factors F over a small field is that no nonzero
linear combination P of polynomials in F can be written as a function of a bounded number
M of non-classical polynomials of degree less than deg(P ) (Definition 2.2.7). We prove an
algorithmic regularity lemma that in polynomial time, finds a regular refinement when given
99
as input an arbitrary non-classical polynomial factor of bounded degree.
Using the notion of high-rank non-classical polynomial factors, we obtain an algorithmic
version of the inverse theorem for bounded degree polynomials (for the case when the degree d
is greater than the order p of the underlying prime field) by Tao and Ziegler [78]. We manage
to somewhat simplify the proof of the inverse theorem for bounded degree polynomials (i.e.
Theorem 2.1.10 when f itself is a phase polynomial), by observing that one of its steps can
be generalized to avoid a few other parts of the proof.
7.1.4 Polynomial decompositions
Given a positive integer k, a vector of positive integers ∆ = (∆1,∆2, . . . ,∆k) and a function
Γ : Fk → F, we say that a function P : Fn → F is (k,∆,Γ)-structured if there exist
polynomials P1, P2, . . . , Pk : Fn → F with each deg(Pi) 6 ∆i such that for all x ∈ Fn,
P (x) = Γ(P1(x), P2(x), . . . , Pk(x)).
The polynomials P1, . . . , Pk are said to form a (k,∆,Γ)-decomposition (or simply, a polyno-
mial decomposition). For instance, an n-variate polynomial over the field F of total degree d
factors nontrivially exactly when it is (2, (d−1, d−1), prod)-structured where prod(a, b) = a·b.
In [13], Bhattacharyya noted that for any fixed k, ∆, Γ and F, our algorithmic version
of the Green-Tao regularity lemma over high characteristic allows one to decide (k,∆,Γ)-
structure in time Poly(n) for all input polynomials P : Fn → F whose degree is strictly
less than |F|. However, [13] left open the question of solving the polynomial decomposition
problem over small fields, such as F2, and input polynomials of degree larger than |F|. Here,
we resolve this issue.
At its core, the proof of correctness in [13] used our algorithmic version of the Green-Tao
regularity lemma that applies only for large characteristic. The algorithmic regularity lemma
100
for non-classical factors can then be used in the proof of [13] to show that over any fixed
prime order field F and for any fixed choice of positive integer k, vector of positive integers
∆ = (∆1,∆2, . . . ,∆k) and function Γ : Fk → F, the (k,∆,Γ)-decomposition problem is
solvable in Poly(n) time when the input is a bounded degree polynomial.
7.2 Algorithmic Bogdanov-Viola Lemma; Approximate
Regularization
The Bogdanov-Viola Lemma [21] states that if a polynomial of degree d is biased, then it
can be approximated by a bounded set of polynomials of lower degree. Recall the definition
of bias of a function P : Fn → F:
bias(P ) =∣∣∣Ex
[eF(P (x))]∣∣∣
The following is an easy to observe algorithmic version of this lemma. We reproduce the
proof by Green and Tao [43] while pointing out that all the steps can be performed efficiently.
Lemma 7.2.1 (Algorithmic Bogdanov-Viola lemma). Let d > 0 be an integer, and δ, σ, β ∈
(0, 1] be parameters. There exists a randomized algorithm, that given a polynomial P : Fn →
F of degree d with
bias(P ) > δ,
runs in time Oδ,β,σ(nd), and with probability 1 − β returns functions P : Fn → F and
Γ : FC → F, and a set of polynomials P1, ..., PC , where C 6 |F|5δ2σβ
and deg(Pi) < d for all
i ∈ [C], for which
• Prx(P (x) 6= P (x)) 6 σ, and
• P (x) = Γ(P1(x), ..., PC(x)).
101
Proof. The proof will be an adaptation of the proof from [43]. Notice that we can compute
the explicit description of P in O(nd) queries. For every a ∈ F define the measure µa(t)def=
Pr(P (x) = a+ t). It is easy to see that if bias(P (x)) > δ then, for every a 6= b,
‖µa − µb‖ >4δ
|F|. (7.1)
We will try to estimate each of these distributions. Let
µa(t)def=
1
C
∑16i6C
1P (xi)=a+t,
where C >|F|5δ·β1
, and x1, x2, ..., xC ∈ Fn are chosen uniformly at random. Therefore by an
application of Chebyshev’s inequality
Pr
(|µa(t)− µa(t)| > δ
2|F|2
)<β1
|F|,
for all t ∈ F and therefore
Pr
(‖µa − µa‖ >
δ
2|F|2
)< β1. (7.2)
Now we will focus on approximating P (x). Remember that DhP (x) = P (x+ h)− P (x)
is the additive derivative of P (x) in direction h. We have
Prh
(DhP (x) = r) = Prh
(P (x+ h)− P (x) = r) = µP (x)(r),
where h ∈ Fn is chosen uniformly at random. Let h = (h1, ..., hC) ∈ (Fn)C be chosen
102
uniformly at random, where C is a sufficiently large constant to be chosen later. Define
µ(x)obs(t)
def=
1
C
∑16i6C
1DhjP (x)=t,
and let
Ph(x)def= arg min
r∈F‖µr − µ
(x)obs‖.
Now choosing C > |F|5δ2σβ2
follows
Prh
(Ph(x) 6= P (x)) 6 Prh
(‖µ(x)
obs − µP (x)‖ >δ
|F|
)6 σβ2, (7.3)
where the first inequality follows from (7.1) and (7.2). Therefore
Prx,h
(Ph(x) 6= P (x)) = Ex
Eh1Ph(x)6=P (x)
6 σβ2,
and thus
Prh
[Prx
(Ph(x) 6= P (x)
)> σ
]6 β2.
Let Pi := DhiP , so that Pi is of degree 6 d and Ph is a function of P1, ..., PC . Now
setting β1 := β2|F|2 and β2 := β
2 finishes the proof.
Remark 7.2.2 (Derandomization). We note that Lemma 7.2.1 can be derandomized in
Poly(n) steps using known pseudorandom generators for low degree polynomials over finite
fields [21, 62, 80], as shown in [13], following a suggestion by Shachar Lovett. In particular,
Viola [80] showed that for any integer d > 1 and 0 < ε < 1, there is an explicit generator
g : Fs → Fn such that s = d log|F| n+O(d2d log(1/ε)) and for every polynomial P : Fn → F,
| EY←g(Us)
e(P (Y ))− EX←Un
e(P (X))| < ε
103
where Us is the uniform distribution on Fs and Un the uniform distribution on Fn. In order
to get a deterministic algorithm, we run over all possible seeds s to the generator g. As the
discussion in [13] shows, one can use this technique to derandomize Lemma 7.2.1 in Poly(n)
time. (The exponent on n becomes a large constant proportional to d and C in the proof
above.)
7.2.1 Unbiased Almost-Refinement
Using Lemma 7.2.1, we can deduce the following which can be thought of as an algorithmic
analogue of the regularity lemma, with the caveat that the refinement is approximate.
Lemma 7.2.3 (Unbiased almost Refinement). Let d > 1 be an integer. Suppose γ : N→ R+
is a decreasing function and σ, ρ ∈ (0, 1]. There is a randomized algorithm that given a factor
F of degree d, runs in time Oγ,ρ,σ,dim(F)(nd) and with probability 1−ρ returns a γ-unbiased
factor F ′ with dim(F ′) = Oγ,ρ,σ,dim(F)(1), such that F ′ is σ-close to being a refinement of
F .
Proof. The proof idea is similar to that of Lemma 2.2.27 in the sense that we do the same
type of induction. The difference is that at each step we will have to control the errors that
we introduce through the use of Lemma 7.2.1 and the probability of correctness. At all steps
in the proof without loss of generality, we will assume that the polynomials in the factor are
linearly independent, because otherwise we can always detect such a linear combination in
Oγ,ρ,σ,dim(F)(nd) time and remove a polynomial that can be written as a linear combination
of the rest of the polynomials in the factor.
The base case for d = 1 is simple, a linearly independent set of non-constant linear
polynomials is not biased at all, namely it is 0-unbiased. Assume that F is γ-biased, then
104
there exists a set of coefficients ci,j ∈ F16i6d,16j6Misuch that
bias(∑i,j
ci,jPi,j) > γ(dim(F)).
To detect this, we will use the following algorithm:
We will estimate bias of each of the |F|dim(F) linear combinations and check whether it
is greater than3γ(dim(F))
4 . To do so, for each linear combination∑i,j ci,jPi,j independently
select a set of vectors x1, ..., xC uniformly at random from Fn, and let bias(∑i,j ci,jPi,j)
def=∣∣∣ 1
C
∑`∈[C] eF(y`)
∣∣∣, with y` =∑i,j ci,j ·Pi,j(x`). Choosing C = Odim(F)
(1
γ(dim(F))2 log(1ρ))
,
we can distinguish bias > γ from bias 6 γ2 , with probability 1 − ρ′, where ρ′ := ρ
4|F|dim(F) .
Let∑i,j ci,jPi,j be such that the estimated bias was above
3γ(dim(F))4 and k be its degree.
We will stop if there is no such linear combination or if the factor is of degree 1. Since by a
union bound with probability at least 1− ρ4 , bias(
∑i,j ci,jPi,j) >
γ(dim(F))2 , by Lemma 7.2.1
we can find, with probability 1− ρ4 , a set of polynomials Q1, ..., Qr of degree k− 1 such that
•∑i,j ci,jPi,j is σ
2 -close to a function of Q1, ..., Qr,
• r 6 16|F|5γ(dim(F))2·σ·ρ .
We replace one polynomial of highest degree that appears in∑i,j ci,jPi,j with polynomials
Q1, ..., Qr.
We will prove by the induction that our algorithm satisfies the statement of the lemma.
For the base case, if F is of degree 1, our algorithm does not refine F by design. Notice that
since we have removed all linear dependencies, F is in fact 0-unbiased in this case.
Now given a factor F , if F is γ-biased, then with probability 1 − ρ′ our algorithm will
refine F . With probability 1− ρ4 the linear combination used for the refinement is
γ(dim(F))2 -
biased. Let F be the outcome of one step of our algorithm. With probability 1 − ρ4 , F is
σ2 -close to being a refinement of F . Using the induction hypothesis with parameters γ, σ2 ,
ρ4
we can find, with probability 1 − ρ4 , a γ-unbiased factor F ′ which is σ
2 -close to being a
105
refinement of F and therefore, with probability at least 1 − (ρ4 + ρ4 + ρ
4 + ρ′) > 1 − ρ, is
σ-close to being a refinement of F .
Remark 7.2.4. Note that Lemma 7.2.3 can be derandomized using PRGs for low-degree
polynomials. Remark 7.2.2 discusses derandomization of Lemma 7.2.1. We can similarly
also make the step for estimating the bias be deterministic, by using a different PRG for
fooling each linear combination of polynomials and then running over all seeds to these PRGs.
The resulting running time is Poly(n) with a large exponent.
Let P : Fn → F be a function, and let F be a polynomial factor such that P is measurable
in F . This means that there exists a function Γ : Fdim(F) → F such that P (x) = Γ(F(x)).
Although we know that such a Γ exists, but we do not have explicit description of Γ. It is a
simple but useful observation that we can provide query access to Γ in case F is unbiased.
It is worth recording this as the following lemma.
Lemma 7.2.5 (Query access). Suppose that β ∈ (0, 1] and let F = P1, . . . , Pm be a
γ-unbiased factor of degree d, where γ : Fn → F is a decreasing function which decreases
suitably fast. Suppose that we are given query access to a function f : Fn → F, where f is
measurable in F , namely there exists Γ : Fdim(F) → F such that f(x) = Γ(F(x)). There is
a randomized algorithm that, takes as input a value a = (a1, . . . , am) ∈ Fm, makes a single
query to f , runs in O(nd), and returns a value y ∈ F such that Γ(a) = y with probability
greater than 1− β.
Proof. Choose γ(x) := 12px . Applying Lemma 2.2.19 for the case of classical polynomials,
we get
|F−1(a)| > 1
‖F‖− 1
2‖F‖=
1
2‖F‖.
Let x1, ..., xK be chosen uniformly at random from Fn, where K > 2‖F‖ log 1β . Then with
probability at least 1 − β, there is i∗ ∈ [K] such that F(xi∗) = (a1, . . . , am). This means
that xi∗ ∈ F−1(a). Notice that we can find this i∗, since we have explicit description of
106
polynomials in F and we can evaluate them on each xi. Our algorithm will query f on input
xi∗ and return f(xi∗).
7.3 Algorithmic Uniform Refinement in high Characteristic
In this section we shall assume that the field F has large characteristic, namely |F| is greater
than the degree of the polynomials and the degree of the factors that we study. We will prove
the following lemma which states that we can efficiently refine a given factor to a uniform
factor.
Lemma 7.3.1 (Uniform Refinement). Suppose d < |F| is a positive integer and ρ ∈ (0, 1]
is a parameter. There is a randomized algorithm that, takes as input a factor F of degree
d over Fn, and a decreasing function γ : N → R+, runs in time Oρ,γ,dim(F)(nd), and with
probability 1 − ρ outputs a γ-uniform factor F , where F is a refinement of F , is of same
degree d, and |dim(F)| σ,γ,dim(F) 1.
Note 7.3.2. Notice that unlike Lemma 7.2.3, here F ′ is an exact refinement of F . This
will be achieved by the use of Proposition 7.3.4 which roughly states that approximation of a
polynomial by a uniform factor implies its exact computation.
The following proposition of Green and Tao states that if a polynomial is approximated
well enough by a regular factor, then it is computed by the factor.
Proposition 7.3.3 ([43]). Suppose that d > 1 is an integer. There exists a constant σd > 0
such that the following holds. Suppose that P : Fn → F is a polynomial of degree d and
that F is an F -regular factor of degree d− 1 for some increasing function F : N→ N which
increases suitably rapidly in terms of d. Suppose that P : Fn → F is an F-measurable
function and that Pr(P (x) 6= P (x)) 6 σd. Then P is itself F-measurable.
Our proof of Lemma 7.3.1 will require the following variant of Proposition 7.3.3 which
follows by Remark 2.2.17.
107
Proposition 7.3.4. Suppose that d > 1 is an integer. There exists σ7.3.4(d) > 0 such that
the following holds. Suppose that P : Fn → F is a polynomial of degree d and that F is
a γ7.3.4-uniform factor of degree d − 1 for some decreasing function γ7.3.4 which decreases
suitably rapidly in terms of d. Suppose that P : Fn → F is an F-measurable function and
that Pr(P (x) 6= P (x)) 6 σ7.3.4(d). Then P is itself F-measurable.
We will prove Lemma 7.3.1 by induction on the dimension vector of F . For the in-
duction step, we check whether there is a linear combination of polynomials in F that has
large Gowers norm. We will use this to replace a polynomial P from F with a set of lower
degree polynomials. To do this, we first approximate P with a few lower degree polynomi-
als Q1, ..., Qr, then use the induction hypothesis to refine Q1, ..., Qr to a uniform factor
Q1, ..., Qr′ and use Proposition 7.3.4 to conclude that P is measurable in Q1, ..., Qr′.
Proof of Lemma 7.3.1: Let F be a factor of degree d. Similar to the proof of Lemma 7.2.3
we will induct on dimension vector (M1, ...,Md). At all steps of the proof we may assume
that all polynomials in F are homogeneous, that is, all multinomials are of same degree. This
is because otherwise, we may replace each polynomial Pi,j with i homogeneous polynomials
that sum up to Pi,j . Moreover, as in the proof of Lemma 7.2.3, without loss of generality
we may assume that the polynomials in F are linearly independent, since otherwise we can
detect and remove linear dependencies in Odim(F)(nd) time.
Given the above assumptions, the base case for d = 1 is simple, a linearly independent
homogeneous factor is 0-uniform. Let d > 1 and assume that F is not γ-uniform, then there
exists a set of coefficients ci,j ∈ F16i6d,16j6Misuch that
‖eF(∑i,j
ci,j · Pi,j)‖Uk > γ(dim(F)),
where k = deg(∑i,j ci,j · Pi,j). We will use the following randomized algorithm to detect
such a linear combination:
108
We will estimate ‖eF(∑i,j ci,j ·Pi,j)‖2
k
Ukfor each of the |F|dim(F) linear combinations and
check whether it is greater than3γ(dim(F))2k
4 . To do this, as in the proof of Lemma 7.2.3, in-
dependently for each linear combination∑i,j ci,j ·Pi,j select C sets of vectors x(`), y
(`)1 , ..., y
(`)k
chosen uniformly at random from Fn for 1 6 ` 6 C, and compute
∣∣∣∣∣∣∑`∈[C]
eF
( ∑I⊆[k]
(−1)|I| ·∑i,j
ci,j · Pi,j(x(`) + y(`)I )
)∣∣∣∣∣∣Choosing C = Odim(F)
(1
γ(dim(F))2 log(1ρ))
, we can distinguish between Gowers norm
6 γ(F)
21/2kand Gowers norm > γ(F), with ρ′ = ρ
4|F|dim(F) probability of error.
Let Q :=∑i,j ci,j · Pi,j , with deg(Q) = k, the detected linear combination for which
the estimated ‖eF(Q)‖2kUk
is greater than3γ(dim(F))2k
4 . By a union bound on all the linear
combinations, with probability at least 1− ρ4 , ‖eF(Q)‖2k
Uk> γ(dim(F))2k
2 . Notice that
bias(DQ) = ‖e(Q)‖2k
Uk>γ (dim(F))
2,
where DQ : (Fn)k → F is defined as
DQ(y1, ..., yk)def= Dy1Dy2 · · ·DykQ(x).
Let σ and γ be as in Proposition 7.3.4. By Lemma 7.2.1 we can find, with probability 1− ρ4 ,
a set of polynomials Q1, ..., Qr of degree k − 1 such that
• DQ is σ-close to being measurable with respect to Q1, ..., Qr.
• r 6 8|F|5
γ(dim(F))2k+1σρ
.
By the induction hypothesis, with probability 1− ρ4 , we can refine Q1, ..., Qr to a new
factor Q1, ..., Qr′, which is γ-uniform. It follows from Proposition 7.3.4 that DQ is in fact
109
measurable in Q1, ..., Qr′. Since |F| > k we can write the following Taylor expansion
Q(x) =DQ(x, ..., x)
k!+R(x), (7.4)
where R(x) is a polynomial of degree 6 k− 1 which can be computed from Q. Thus we can
replace a maximum degree polynomial Pi,j , that appears in Q, with polynomials Q1, ..., Qr′ ,
and R(x). The proof follows by using the induction hypothesis for parameters ρ4 and γ.
Remark 7.3.5. The above proof can be derandomized in polynomial time. Lemma 7.2.1 can
be made deterministic as discussed after its proof. Here, the additional randomness is in
estimating the Gowers norms of each linear combination of polynomials. Again, using pseu-
dorandom generators for degree-d polynomials lets us make this estimation be deterministic.
For more details, see [13].
7.4 Algorithmic Regularity in Low Characteristic
As mentioned in previous sections, Gowers uniformity for polynomial factors fails to address
“bias implies low rank” in the case when the field F has small characteristic. Kaufman and
Lovett [54] introduce a stronger notion of regularity in order to handle the general case. To
define their notion of regularity, we first have to define the derivative space of a factor.
Definition 7.4.1 (Derivative Space). Let F = P1, ..., Pm be a polynomial factor over Fn.
Recall that
DhP (x) = P (x+ h)− P (x),
is the additive derivative of P in direction h, where h is a vector from Fn. We define
Der(F)def= (DhPi)(x) : i ∈ [m], h ∈ Fn,
as the derivative space of F .
110
Definition 7.4.2 (Strong regularity of polynomials [54]). Suppose that F : N → N is an
increasing function. Let F = P1, . . . , Pm be a polynomial factor of degree d and ∆ : F → N
assigns a natural number to each polynomial in F . We say that F is strongly F -regular with
respect to ∆ if
1. For every i ∈ [m], 1 6 ∆(Pi) 6 deg(Pi).
2. For any i ∈ [m] and r > ∆(Pi), there exist a function Γi,r such that
Pi(x+ y[r]) = Γi,r(Pj(x+ yJ ) : j ∈ [m], J ⊆ [r], |J | 6 ∆(Pj)
),
where x, y1, ..., yr are variables in Fn, and yJdef=∑k∈J yk for every J ⊆ [r],
3. For any r > 0, let C = ci,I ∈ Fi∈[m],I⊆[r],|I|6∆(Pi)be a collection of field elements.
Define
QC(x, y1, ..., yr)def=
∑i∈[m],I⊆[r],|I|6∆(Pi)
ci,I · Pi(X + YI).
Let F ′ ⊆ F be the set of all Pi’s which appear in QC . There do not exist polynomials
P1, ..., P` ∈ Der(F ′), and I1, ..., I` ⊆ [r], with l 6 F (dim(F)), for which QC can be
expressed as
Γ(P1(x+ yI1), ..., P`(x+ yI`)
),
for some function Γ : F` → F.
The following lemma states that every polynomial factor can be refined to a strongly
regular factor.
Lemma 7.4.3 (Strong-Regularity Lemma, [54]). Let F : N→ N be an increasing function.
Let F = P1, ..., Pm be a polynomial factor of degree d. There exists a refinement F ′ =
P1, . . . , Pr of same degree d along with a function ∆ : F → N such that
• F ′ is strongly F -regular with respect to ∆,
111
• dim(F ′) = OF,dim(F),d(1).
Moreover they prove that if a polynomial is approximated by a strongly regular factor,
then it is in fact measurable with respect to the factor.
Lemma 7.4.4 ([54]). For every d there exists a constant εd = 2−Ω(d) such that the following
holds. Let P : Fn → F be a polynomial of degree d, P1, ..., Ps be a strongly regular collection
of polynomials of degree d− 1 with degree bound ∆ and Γ : Fs → F be a function such that
Γ(P1, ..., Ps) is εd-close to P . Then there exists Γ′ : Fs → F such that
P (x) = Γ′(P1(x), ..., Ps(x)).
Lemma 7.4.3 and Lemma 7.4.4 combined with Bogdanov-Viola Lemma immediately gives
the following theorem.
Theorem 7.4.5 (Bias implies low rank for general fields [54]). Let P : Fn → F be a
polynomial of degree d such that bias(P ) > δ > 0. Then rankd−1 6 c(d, δ,F).
In Section 7.4.1 we will give an algorithmic version of this theorem. To do this, as in
previous sections, we will have to use a notion of regularity which is well suited for algorithmic
applications. Uniform factors (Definition 2.2.16) fail to address fields of low characteristic,
for the reason that to refine a factor to a uniform factor we make use of division by d! which
is not possible in fields with |F| 6 d. Strong regularity of [54] suggests how to extend the
notion of unbiased factors to a stronger version in quite a straight forward fashion.
Definition 7.4.6 (Strongly γ-unbiased factors). Suppose that γ : N → R+ is a decreasing
function. Let F = P1, ..., Pm be a polynomial factor of degree d over Fn and let ∆ : F → N
assign a natural number to each polynomial in the factor. We say that F is strongly γ-
unbiased with degree bound ∆ if
1. For every i ∈ [m], 1 6 ∆(Pi) 6 deg(Pi).
112
2. For every i ∈ [m] and r > ∆(Pi), there exists a function Γi,r such that
Pi(x+ y[r]) = Γi,r(Pj(x+ yJ ) : j ∈ [m], J ⊆ [r], |J | 6 ∆(Pj)
).
3. For any r 6 0 and not all zero collection of field elements
C = ci,I ∈ Fi∈[m],I⊆[r],|I|6∆(Pi)we have
bias
∑i∈[m],I⊆[r],|I|6∆(Pi)
ci,I · Pi(x+ yI)
6 γ(dim(F)),
for random variables x, y1, . . . , yr ∈ Fn.
We will say F is ∆-bounded if it only satisfies (1) and (2).
In Section 7.4.1 we will show how to refine a given factor to a strongly unbiased one.
This will allow us prove an algorithmic analogue of Lemma 7.4.4, which will be presented in
the next section.
7.4.1 Algorithmic approximate strongly unbiased refinement
In this section we will use strongly unbiased factors (Definition 7.4.6) to prove Proposi-
tion 7.3.3 and Proposition 7.3.4, respectively, without the assumption on |F| being larger
than d. The following claim states that although in part (3) there is no upper bound on r,
choosing a suitably faster decreasing function γ′ = γ2d , one can assume that r 6 d.
Claim 7.4.7. Let F = P1, . . . , Pm be a polynomial factor of degree d, and ∆ : F →
N be a function such that F is ∆-bounded. Suppose that γ : N → R+ is a decreasing
function such that for every r ∈ [d], x ∈ Fn and not all zero collection of coefficients
113
ci,Ii∈[m],I⊆[r],|I|6∆(Pi)
bias
∑i∈[m],I⊆[r],|I|6∆(Pi)
ci,I · Pi(x+ yI)
6 γ(dim(F))2d , (7.5)
where y1, . . . , yr are variables from Fn. Then F is strongly γ-unbiased.
Proof. Let r > d be an integer, x ∈ Fn be fixed and ci,Ii∈[m],I⊆[r],|I|6∆(Pi)be a set of
coefficients, not all of which are zero. We will show that
bias
∑i∈[m],I⊆[r],|I|6∆(Pi)
ci,I · Pi(x+ yI)
6 γ(dim(F)).
Define QC(y1, . . . , yr)def=∑i∈[m],I⊆[r],|I|6∆(Pi)
ci,I · Pi(x + yI), and let I ⊆ [r] be maximal
such that ci,I is nonzero for some i ∈ [m]. Without loss of generality assume that I =
1, 2, . . . , |I|. We will derive QC in directions y1, . . . , y|I|, namely
DQC,y1,...,yr(h1, . . . , h|I|) := Dh1· · ·Dh|I|QC(y1, . . . , yr),
where hi is only supported on yi coordinates. Notice that, since I was maximal, DQC does
not depend on any yi with i ∈ [r]\I, namely, for every such J ⊆ [r] with J 6⊆ I, and every
i ∈ [m], Dh1· · ·Dh|I|Pi(x+ yJ ) ≡ 0. Moreover
Dh1· · ·Dh|I|QC(y1, . . . , yr) =
∑i∈[m],J⊆[r],|J |6∆(Pi)
ci,J ·Dh1· · ·Dh|I|Pi(x+ yJ )
=∑i∈[m]
ci,I ·Dh1· · ·Dh|I|Pi(x+ yI)
=∑
i∈[m],J⊆Ici,I · (−1)|J | · Pi((x+ yI) + hJ ).
Applying the claim below for the random variable x′ := x + yI and taking the bias over
114
random variables x′, h1, . . . , h|I| we have that
γ(dim(F))2d > bias(DQC,y1,...,yr(x
′, h1, . . . , h|I|))> bias (QC(y1, . . . , yr))
2|I| .
Now the claim follows by the fact that |I| 6 d.
Following is a simple repeated application of the Cauchy-Schwarz inequality.
Claim 7.4.8.
bias(PΛ)2r 6 bias(DPΛ).
Proof. It suffices to show that we have
∣∣∣∣ Ey1,...,yk,h∈Fn
eF(PΛ(y1 + h, y2, . . . , yk)− PΛ(y1, y2, . . . , yk)
)∣∣∣∣>
∣∣∣∣ Ey1,...,yk
eF(PΛ(y1, . . . , yk)
)∣∣∣∣2 .This is a simple application of Cauchy Schwarz inequality
∣∣∣∣ Ey1,...,yk,h
eF(PΛ(y1 + h, y2, . . . , yk)− PΛ(y1, . . . , yk)
)∣∣∣∣= Ey2,...,yk
∣∣∣EzeF(PΛ(z, y2, . . . , yk)
)∣∣∣2>
∣∣∣∣ Ez,y2,...,yk
eF(PΛ(z, y2, . . . , yk ∈ Fn))
∣∣∣∣2 .
Claim 7.4.7 suggests a way of refining a factor to a strongly uniform one: We will estimate
the bias of every possible linear combination of the polynomials over all points x+ yI where
I ⊆ [r] and r 6 d, and once we have detected a biased combination, refine using Lemma 7.2.1.
This is made possible by the fact that r and dim(F) are both bounded.
115
Lemma 7.4.9 (Algorithmic strongly γ-unbiased refinement). Suppose that σ, β ∈ (0, 1] are
parameters, and γ : N → R+ is a decreasing function. There is a randomized algorithm
that given a factor F = P1, ..., Pm of degree d, runs in Oγ(nd), with probability 1 − β,
returns a strongly γ-unbiased factor F ′ with degree bound ∆ such that F ′ is σ-close to being
a refinement of F .
Proof. We will design a refinement process which has to stop in a finite number of steps
which only depends on dim(F), σ, β, and γ. We will start with the initial factor being
F and we will set ∆(Pi) := deg(Pi). Notice that this automatically satisfies the first two
properties of Definition 7.4.6, namely ∆-boundedness. This is because for every r > deg(Pi)
we can use the derivative property
Pi(x+ y[r]) =∑I([r]
(−1)r−|I| · Pi(x+ yI).
to write Pi(x+ y[r]) as a function of Pi(x+ yJ )J⊆[r],|J |6d.
As explained in proof of Lemma 7.3.1, at each step, we may assume without loss of
generality that all polynomials in the factor are homogeneous. Moreover, we will assume
that for every r 6 d, the set of polynomials Pi(x + yI)i∈[m],I⊆[r],|I|6∆(Pi)is linearly
independent, where x, y1, . . . , yr ∈ Fn are variables. This is because at each step we have
access to explicit description of the polynomials in the factor, therefore we can compute
each of the |F|B(r) possible linear combinations, where B(r) =∑i∈[m]
∑j∈[∆(Pi)]
(rj
), and
check whether it is equal to zero or not. Suppose that for a set of coefficients C = ci,I ∈
Fi∈[m],I⊆[r],|I|6∆(Pi)we have QC
def=∑i∈[m],I⊆[r],|I|6∆(Pi)
ci,IPi(x + yI) ≡ 0. Let Pi be
a polynomial that appears in QC and let I be maximal such that ci,I 6= 0. We can let
∆(Pi) := |I| − 1, and remove Pi from the factor if ∆(Pi) becomes zero.
We will stop refining if F is of degree 1. Notice that a linearly independent factor
of degree 1 is not biased at all, and therefore strongly 0-unbiased. Assume that F is not
116
strongly γ-unbiased with respect to the current ∆, then there exists r and a set of coefficients
C = ci,I ∈ Fi∈[m],I⊆[r],|I|6∆(Pi)for which
bias(QC) = bias
∑i∈[m],I⊆[r],|I|6∆(Pi)
ci,IPi(x+ yI)
> γ(dim(F)).
To detect this, we will use the following algorithm: Set K := |F|B(r). We can estimate
the bias of each of the K possible linear combinations QC and check whether our estimate
is greater than3γ(dim(F))
4 . Letting β′ := β4K , we can distinguish bias > γ(dim(F)) from
bias 6 γ(dim(F))2 correctly with probability 1 − β′ by computing the average of eF (QC) on
Odim(F))(1
γ(dim(F))2β) random sets of vectors x, y1, . . . , yr. Let C be such that the estimated
bias for QC was greater than3γ(dim(F))
4 , and let k = deg(QC). By a union bound with
probability 1− β4 , bias(QC) >
γ(dim(F))2 , and by Lemma 7.2.1 we can find, with probability
1− β4 , a set of polynomials Q1, . . . , Qs : (Fn)r+1 → F of degree k − 1 such that
• QC is σ2 -close to a function of Q1, . . . , Qs,
• s 6 16|F|5γ(dim(F))·σ·β .
Moreover we know from the proof of Lemma 7.2.1 that for each j ∈ [s] we have
Qj(x, y1, . . . , yr) = QC(x+ hx, y1 + h1, . . . , yr + hr)−QC(x, y1, . . . , yr)
=∑
i∈[m],I⊆[r],|I|6∆(Pi)
ci,I ·Dhx+hIPi(x+ yI).
for some fixed vectors hx, h1, . . . , hr ∈ Fn. Let R(j)i,I := Dhx+hIPi so that deg(R
(j)i,I ) <
deg(Pi) and Qj is measurable in R(j)i,I (x+ yI)i∈[m],I⊆[r],|I|6∆(Pi)
.
Let Pi be a polynomial of maximum degree that appears in QC , and let I be maximal such
that ci,I 6= 0. We will add each R(j)i,I to F with ∆(R
(j)i,I ) = deg(R
(j)i,I ) and let ∆(Pi) := |I|−1
and discard Pi if |I| becomes zero. Notice that the new factor is ∆-bounded because Pi(x+
117
yI) can be written as a function of Qj(x, y1, . . . , yr)j∈[s] and Pj(x + yJ )J⊆I,|J |6∆(Pj).
Moreover each Qj is measurable in R(j)i,I (x+ yI)i∈[m],I⊆[r],|I|6∆(Pi)
.
We can prove that the process stops after a constant number of steps by a strong induction
on the dimension vector (M1, . . . ,Md) of the factor F , whereMi is the number of polynomials
of degree i in the factor and d is the degree of the factor. The induction step follows from the
fact that at every step we discard Pi or decrease the ∆(Pi) for some i and add polynomials
of lower degree. Let F ′ and ∆ : F ′ → N be the output of the algorithm. The claim follows
by our choices for error distances and the probabilities.
Remark 7.4.10. The randomness involved in estimating the bias can be removed by using
PRGs for polynomials over Fn.
7.4.2 Algorithmic exact strongly unbiased refinement
It is not difficult to check that Kaufman and Lovett’s proof of Lemma 7.4.4, can be adapted
to a proof of the following analogue for our notion of strongly unbiased factors.
Lemma 7.4.11 (Approximation by Strongly Unbiased Factor implies Computation). For
every d > 0 there exists a constant σd and a decreasing function γ : N → R+ such that the
following holds. Let P : Fn → F be a polynomial of degree d, F = P1, . . . , Pm be a strongly
γ-unbiased polynomial factor of degree d− 1 with degree bound ∆, and let Γ : Fm → F be a
function such that P is σd-far from Γ(F). Then P is in fact measurable in F .
Remark 7.4.12. (Exact refinement) One can use the above lemma to modify the refine-
ment process of Lemma 7.4.9 to achieve exact refinement instead of approximate refinement.
This can be done by using induction on the degree of the factor and at every round refining
the new polynomials that are to be added to the factor to a strongly unbiased set of polynomi-
als. Then Lemma 7.4.11 ensures that the removed polynomial is measurable in the new set
of polynomials and therefore at every step we have exact refinement of the original factor.
118
7.5 Applications in High Characteristic
In this section we show some applications of the uniformity notion, and our algorithm for
uniform refinement of a factor. Throughout the section we shall assume that the field F has
large characteristic, namely |F| is greater than the degree of the polynomials and the degree
of the factors that we study.
7.5.1 Computing Polynomials with Large Gowers Norm
Following is an immediate corollary of Lemma 7.2.1, Lemma 7.2.5, and Proposition 7.3.4.
Lemma 7.5.1. Suppose that an integer d satisfies 0 6 d < |F|. Let δ, σ, β ∈ (0, 1]. There is
a randomized algorithm that given a polynomial P : Fn → F of degree d such that bias(P ) >
δ > 0, runs in Oδ,β,σ(nd) and with probability 1 − β, returns a polynomial factor F =
Pi,j16i6d−1,16j6Miof degree d− 1 and dim(F) = Oδ,β,σ(1) such that P is measurable in
F , namely there is a function Γ such that
P (x) = Γ((Pi,j(x)
)16i6d−1,16j6Mi
).
Moreover for every query to the value of Γ(·) for a vector from Fdim(F) the algorithm
returns the correct answer with probability 1− β.
Proof. Let σd and γ1 be as in Proposition 7.3.4, and let γ2 be as in Lemma 7.2.5. Set
γd := minγ1, γ2. By Lemma 7.2.1 since bias(P ) > δ, in time Oσd,β(nd), with probability
1− β2 we can find a polynomial factor F of degree d− 1 such that
• P is σd2 -close to a function of F
• dim(F) 6 4|F|5δ2σdβ
By Lemma 7.3.1, with probability 1− β2 , we can find a γd-uniform factor F ′ such that F ′ is
σd2 -close to being a refinement of F .
119
Thus with probability greater than 1 − β, P is σd-close to a function of the strongly
γd-unbiased factor F ′, and by Lemma 7.4.11 P is measurable in F ′. The query access to Γ
can be granted by Lemma 7.2.5.
The following is an algorithmic version of Proposition 6.1 of [43], which states that given
a polynomial P such that eF(P ) has large Gowers norm, then we can compute P by a few
lower degree polynomials.
Proposition 7.5.2 (Computing Polynomials with High Gowers Norm). Suppose that |F| >
d > 2 and that δ, β ∈ (0, 1]. There is a randomized algorithm that given a polynomial
P : Fn → F of degree d with ‖eF(P (x))‖Ud > δ, runs in Oδ,β(nd) and with probability 1−β,
returns a polynomial factor F of degree d− 1 such that
• There is a function Γ : Fdim(F) → F such that P = Γ(F).
• dim(F) = Od,δ,β(1).
Moreover, for every query to the value of Γ(·) for a vector from Fdim(F), the algorithm
returns the correct answer with probability 1− β.
Proof. Write ∂dP (h1, ..., hd) := Dh1· · ·DhdP (x). Since P has degree d, ∂dP does not
depend on x. From the definition of the Ud norm, we have
bias(∂dP ) = ‖e(P )‖2d
Ud> δ2d .
Applying Lemma 7.5.1 to ∂dP , with probability 1− β2 , we can find a factor F of degree
d− 1, such that dim(F) = Oδ,β,d(1) and ∂dP is measurable in F .
It is easy to check that since |F| > d, we have the following Taylor expansion
P (x) =1
d!∂dP (x, ..., x) +Q(x),
120
where Q is a polynomial of degree6 d−1. We can find explicit description of P , and therefore
Q in O(nd). Let F ′ := F ∪ Q, and let γ be as in Lemma 7.2.5 for error parameter β. By
Lemma 7.3.1, with probability 1− β2 , we can refine F ′ to a γ-uniform factor F of same degree
d−1 with dim(F) = Odim(F ′),β,γ(1). Now query access can be granted by Lemma 7.2.5.
7.5.2 Algorithmic Inverse Theorem in High Characteristic
Proposition 7.5.2 allows us to prove the following algorithmic version of Theorem 2.1.10.
Theorem 7.5.3. Suppose that |F| > d > 2 and that ε, β ∈ (0, 1]. There is an ηε,β,d ∈
(0, 1], and a randomized algorithm that given a polynomial P : Fn → F of degree d with
‖eF(P (x))‖Uk+1 > ε, runs in Oδ,β(nd) and with probability 1 − β, returns a polynomial Q
of degree 6 k such that
|〈eF(P ), eF(Q)〉| > η.
Proof. By Proposition 7.5.2, with probability 1 − β4 we can find a polynomial factor F of
degree d−1 such that P is measurable in F . Let γ : N→ R+ be a decreasing function to be
specified later. By Lemma 7.3.1, with probability 1− β4 , we can refine F to a γ-uniform factor
F = P1, ..., Pm of same degree d− 1, with dim(F) = Oγ,β,ε(1). Since P is measurable in
F , there exists Γ : Fdim(F) → F such that P = Γ(F). Letting f := eF(P ), and using the
Fourier decomposition of eF(Γ) we can write
eF(P (x)
)=
L∑i=1
ci eF(〈α(i),F〉(x)
), (7.6)
where L = |F|dim(F) = Oγ,β(1), α(i) ∈ Fdim(F), and
〈α(i),F〉(x)def=
m∑j=1
α(i)j · Pj(x).
Notice that the terms in (7.6), unlike Fourier characters, are not orthogonal. But since the
121
factor F is γ-uniform, Lemma 2.2.22 ensures approximate orthogonality. Let Qi := 〈α(i),F〉.
Choose γ(u) 6 σ|F|2u , so that γ(dim(F)) 6 σ
L2 , where σ := ε2k+1
4 . It follows from the near
orthogonality of the terms in (7.6) by Lemma 2.2.22 that
|ci − 〈f, eF(Qi)〉| 6σ
L, (7.7)
and ∣∣∣∣‖f‖22 − L∑i=1
c2i
∣∣∣∣ 6 σ. (7.8)
Claim 7.5.4. There exists δ′(ε, |F|) ∈ (0, 1] such that the following holds. Assume that f
and F are as above. Then there is i ∈ [L], for which deg(Qi) 6 k and∣∣〈f, eF(Qi)〉
∣∣ > δ′.
Proof. We will induct on the degree of F . Assume for the base case that F is of degree k,
i.e. d = k + 1, thus applying the following Cauchy-Schwarz inequality
ε2k+16 ‖f‖2
k+1
Uk+1 6 ‖f‖22‖f‖2k+1−2∞ , (7.9)
and (7.8) imply that there exists i ∈ [L] such that c2i >ε2k+1−σ
L = 3ε2k+1
4L , which combined
with (7.7) implies that |〈f, eF(Qi)〉| > ε2k+1
2L .
Now for the induction step, assume that d > k + 1. We will decompose (7.6) into two
parts, first part consisting of the terms of degree 6 k and the second part consisting of the
terms of degree strictly higher than k. Namely, letting S := i ∈ [L] : deg(Qi) 6 k we
write f = g + h where g :=∑i∈S cieF(Qi) and h :=
∑i∈[L]\S cieF(Qi). Notice that by the
triangle inequality of Gowers norm, our choice of γ, and the fact that F is γ-uniform
‖h‖Uk+1 6∑
i∈[L]\S|ci| · ‖eF(Qi)‖Uk+1 6 L · ε
2k+1
4L2=ε2k+1
4L,
122
and thus
‖g‖Uk+1 >ε
2.
Now the claim follows by the base case.
Let δ′(ε, |F|) be as in the above claim, and choose η := δ′4 . We will use the following
theorem of Goldreich and Levin [31] which gives an algorithm to find all the large Fourier
coefficients of eF(Γ).
Theorem 7.5.5 (Goldreich-Levin theorem [31]). Let ζ, ρ ∈ (0, 1]. There is a randomized
algorithm, which given oracle access to a function Γ : Fm → F, runs in time O(m2 logm ·
poly(1ζ , log(1
ρ)))
and outputs a decomposition
Γ =∑i=1
bi · eF(〈ηi, x〉) + Γ′,
with the following guarantee:
• ` = O( 1ζ2 ).
• Pr[∃i : |bi − Γ(ηi)| > ζ/2
]6 ρ.
• Pr
[∀α such that |f(α)| > ζ, ∃i ηi = α
]> 1− ρ.
We will use the above theorem with parameters ζ := δ′2 and ρ := β
4 , and use the random-
ized algorithm in Lemma 7.2.5 to provide answer to all its queries to Γ. Choose γ suitably
small, so that with probability at least 1− β4 our answer to all queries to Γ are correct. By
Claim 7.5.4 there is i ∈ [L] such that Γ(α) = ci >3δ′4 . With probability 1− β
4 there is j such
that ηj = αi and with probability at least 1− β4 , |bj − ci| 6 ζ
2 6δ′4 , and therefore ci >
δ′2 .
By a union bound, adding up the probabilities of the errors, with probability at least
1− β, we find Qi such that∣∣〈f, eF(Qi)〉
∣∣ > δ′4 = η.
123
7.5.3 Decoding Reed-Muller Codes with Structured Noise
In this section we discuss the task of decoding Reed-Muller codes when the noise is “struc-
tured”. Recall that the codewords in a Reed-Muller code of order k, correspond to evalua-
tions of all degree k polynomials in n variables over F. We will consider the task of decoding
codewords when the noise is structured in the sense that the applied noise to the codeword
itself is a polynomial of higher degree d. In other words, we are given a polynomial P of
degree d > k with the promise that it is “close” to an unknown degree k polynomial, and
the task is to find a degree k polynomial Q that is somewhat close to P . The assumption of
d > k is made, because the case d 6 k is trivial.
Theorem 7.5.6 (Reed-Muller Decoding). Suppose that d > k > 0 are integers, and let
ε ∈ (0, 1] and β ∈ (0, 1] be parameters. There is δ(ε, d, k) > 0 and a randomized algorithm
that, given a polynomial P : Fn → F of degree d, with the promise that there exists a
polynomial Q of degree k such that Pr(P (x) = Q(x)) > 1|F| + ε, runs in O(nd), and with
probability 1− β returns a polynomial Q of degree k such that
Pr(P (x) = Q(x)) >1
|F|+ δ.
Before presenting the proof of Theorem 7.5.6 we will look at the relations between Ham-
ming distance of P and Q, correlation between eF(P ) and eF(Q), and Gowers norm of
eF(P ). The following claim relates the Hamming distance of two polynomials P and Q to
the correlation between eF(P ) and eF(Q).
Claim 7.5.7. Suppose that ε ∈ (0, 1], and let P,Q : Fn → F be two functions such that
Prx(P (x) = Q(x)) > 1/|F|+ ε. Then there exists a nonzero t ∈ F for which
∣∣〈eF(t · P ), eF(t ·Q)〉∣∣ =
∣∣∣ExeF(t · (P (x)−Q(x))
)∣∣∣ > ε.
124
Proof. We have
1
|F|+ ε 6 Pr
x∈Fn(P (x) = Q(x)
)= Ex∈Fn
[1P (x)=Q(x)
]= Ex∈Fn
1
|F|∑t∈F
eF(t · (P (x)−Q(x))
)=
1
|F|∑c∈F〈eF(t · P ), eF(t ·Q)〉
=1
|F|
1 +∑
t∈F\0〈eF(t · P ), eF(t ·Q)〉
,
and thus there is t ∈ F\0 for which∣∣ 〈eF(t · P ), eF(t ·Q)〉
∣∣ > ε.
Assume that P : Fn → F is a polynomial of degree d and Q : Fn → F is a polynomial of
degree k < d. Using Lemma 2.1.9 with f∅ := eF(P −Q) and fS := 1 for S 6= ∅ implies that
| 〈eF(P ), eF(Q)〉 | = | Ex∈Fn
eF(P (x)−Q(x))| 6 ‖eF(P (x)−Q(x))‖Ud = ‖eF(P (x))‖Ud ,
(7.10)
where the last equality follows from the fact that deg(Q(x)) = k < d.
Proof of Theorem 7.5.6: Since there exists a polynomial Q of degree k with Pr(P (x) 6=
Q(x)) 6 ε, by Claim 7.5.7 there exists a nonzero t ∈ F such that
∣∣〈eF(t · P ), eF(t ·Q)〉∣∣ > ε.
Notice that
‖eF(t · P )‖Ud > ‖eF(t · P )‖Uk+1 > 〈eF(t · P ), eF(t ·Q)〉 > ε,
where the second inequality follows from (7.10) since deg(t · Q) 6 k. Notice that we may
find such a constant t ∈ F\0 with high probability by estimating the Uk+1 norm of tP for
125
all |F| − 1 possible choices of t. Set P := t · P .
Now by Theorem 7.5.3 there exists ηd,β,ε ∈ (0, 1] such that we can find in time O(nd),
with probability 1− β2 , a polynomial Q such that
∣∣∣〈eF(P ), eF(Q)〉∣∣∣ > η.
The following claim shows that, there is a constant shift of eF(Q) that approximates
eF(P ).
Claim 7.5.8. Let P and Q be as above. There is a randomized algorithm that with probability
1− β2 returns an h ∈ F for which
Pr(P (x) = Q(x) + h) >1
|F|+
η
2|F|2
Proof. Since P (x) − Q(x) takes values in F, there must be a choice of r ∈ F such that
Pr(P (x) − Q(x) = r) > 1|F| + η
|F|2 . Similar to the proof of Lemma 7.2.1 defining µ0(r) =
Pr(P (x) − Q(x) = r), we can find the estimate measure µobs by K = O|F|,δ′,β(1) random
queries to P (x)− Q(x) such that
Pr
(∃r ∈ F :
∣∣µ0(r)− µobs(r)∣∣ > η
2|F|2
)6β
2.
Choosing h ∈ F such that µobs(h) is maximized, we have
• Pr(P (x) = Q(x) + h
)> 1|F| + η
2|F|2
• deg(Q(x) + h) = k.
Since P = t · P , the same also holds between P and t−1(Q + h). The probability of
correctness is now ensured by a union bound.
126
Remark 7.5.9. The randomness involved in using the Goldreich-Levin algorithm can be
removed. Note that at the expense of a greater running time, we can simply look at all the
Fourier coefficients of Γ since its domain is of constant size. Finally, Claim 7.5.7 can be
derandomized in the same way as Lemma 7.2.1, by using PRGs with seed length O(log n) for
fooling polynomials over Fn.
7.6 Applications n Low Characterstc
7.6.1 Computing a biased Polynomial
Having Lemma 7.4.9 and Lemma 7.4.11 in hand we immediately have the following analogue
of Theorem 7.4.5, which states that if a polynomial is biased then we can find a factor that
computes it.
Theorem 7.6.1 (Computing a biased polynomial). Let β ∈ (0, 1] be an error parameter.
There is a randomized algorithm that given a polynomial P : Fn → F of degree d such that
bias(P ) > δ > 0, runs in Oδ,β(nd) and with probability 1 − β, returns a polynomial factor
F = P1, ..., Pc(d,δ) of degree d− 1 such that P is measurable in F .
Proof. Let σd and γd be as in Lemma 7.4.11. By Lemma 7.2.1 since bias(P ) > δ, in time
Oσd,β(nd), with probability 1 − β2 we can find a polynomial factor F of degree d − 1 such
that
• P is σd2 -close to a function of F
• dim(F) 6 4|F|5δ2σdβ
By Lemma 7.4.9, with probability 1− β2 , we can find a strongly γd-unbiased factor F ′ with
degree bound ∆ such that F ′ is σd2 -close to being a refinement of F .
Thus with probability greater than 1 − β, P is σd-close to a function of the strongly
γd-unbiased factor F ′, and it follows from Lemma 7.4.11 that P is measurable in F ′.
127
The algorithm can be made deterministic in time Poly(n) (with a large exponent), as
indicated in the previous sections.
7.6.2 Worst Case to Average Case Reduction
Here we will show how Theorem 7.6.1 implies an algorithmic version of worst case to average
case reduction from [54]. To present the result, we first have to define what it means for a
factor to approximate a polynomial.
Definition 7.6.2 (δ-approximation). We say that a function f : Fn → F δ-approximates a
polynomial P : Fn → F if ∣∣∣∣ Ex∈Fn
[eF(P (x)− f(x)
)]∣∣∣∣ > δ.
Kaufman and Lovett use Lemma 7.4.4 to show the following reduction.
Theorem 7.6.3 (Theorem 3 of [54]). Let P (x) be a polynomial of degree k, g1, ..., gc poly-
nomials of degree d, and Λ : Fc → F a function such that composition Λ(g1(x), . . . , gc(x))
δ-approximates P . Then there exist c′ polynomials h1, . . . , hc′ and a function Γ : Fc′ → F
such that
P (x) = Γ(h1(x), . . . , hc′(x)).
Moreover, c′ = c′(d, c, k) and each hi is of the form p(x + a) − p(x) or gj(x + a), where
a ∈ Fn. In particular, if k 6 k − 1 then deg(hi) 6 k − 1 also.
Here we will design a randomized algorithm that given g1, . . . , gk can compute a set of
h1, . . . , hc′ efficiently.
Theorem 7.6.4 (Worst-case to average case reduction). Let δ, β ∈ (0, 1] be parameters.
There is a randomized algorithm that takes as input
• A polynomial P : Fn → F of degree d
128
• A polynomial factor F = P1, . . . , Pm of degree d− 1
• A function Λ such that Λ(F) δ-approximates P
and with probability at least 1 − β, returns a polynomial factor F ′ = R1, . . . , Rm′ and a
function Γ : Fdim(F ′) → F such that
P (x) = Γ(R1(x), . . . , Rm′(x)),
moreover c′ = Odim(F),β,δ,d(1).
Proof. Looking at the Fourier decomposition of eF(Λ)(y1, . . . , ym), since Λ(P1, . . . , Pm) δ-
approximates P , there must exist α = (α1, . . . , αm) ∈ Fc such thatQα(x) =∑i∈[m] αi·Pi(x),
δ′-approximates P , where δ′ > δ|F|m . We will estimate |bias(P − Qα))| for every α ∈ Fm.
For each α, we can distinguish bias(P −Qα)) 6 δ′2 from bias(P −Qα) > δ′, with probability
1− β3|F|m , by evaluating eF(P (x)−Qα(x)) on C = Odim(F)
(1δ′2
log( 1β ))
random inputs. Let
α∗ ∈ Fm be such that our estimate for bias(P −Qα) is greater than 3δ′4 .
With probability at least 1 − β3|F|m we will find such α∗, and by a union bound with
probability at least 1 − β3 , bias(P − Qα∗) > δ′
2 . Now applying Theorem 7.6.1 to P − Qα∗
with parameters β3 and δ′
2 , we find a polynomial factor F ′ = R1, . . . , Rm of degree d− 1,
such that with probability 1− β3 , P −Qα∗ is measurable in F ′. Namely there exists Γ′ such
that P −Qα∗(x) = Γ′(F ′(x)) and therefore
P (x) = Γ′(R1(x), . . . , Rm(x)) +Qa∗(x).
This algorithm can be derandomized in time Poly(n), using the remarks made in the
previous sections.
129
7.7 Algorithmic Inverse Theorem for Polynomials
The proof of the inverse theorem for Gowers norms (Theorem 2.1.10 [78]) is based on ergodic
theory and a pure combinatorial proof of the inverse theorem for general functions is not
yet known, but [78] give an analytical proof for the case of bounded degree polynomials. In
this section we will show how to conclude an algorithmic inverse theorem for polynomials
of bounded degree from the proof presented in [78]. Observing that one can generalize one
of the steps in the proof by [78] we manage to somewhat simplify the proof of the inverse
theorem for bounded degree polynomials by skipping a few of the steps.
Theorem 7.7.1 (Algorithmic Inverse for Polynomials). Let s > 0 be an integer, and let
ε, µ ∈ (0, 1] be parameters. There is a randomized algorithm that given a (non-classical)
degree-s polynomial P : Fn → T with ‖P‖Us > ε, runs in Oε,s,β(nd) and with probability
1− µ returns a polynomial Q with deg(Q) 6 s− 1 such that
∣∣〈e(P ), e(Q)〉∣∣ > δ(ε, s) > 0.
To prove this we first show how to compute such a polynomial by a lower degree poly-
nomial factor.
Theorem 7.7.2 (Computing polynomials with large Gowers norm). Let s > 0 be an integer
and let µ ∈ (0, 1] be a parameter. There is a randomized algorithm that given a (non-
classical) polynomial P : Fn → T with ‖P‖2sUs > ε, runs in Os,β(ns) and with probability
1− µ outputs the following
1. A polynomial Q : Fn → T with
(a) deg(Q) = s.
(b) P and Q have the same s-th derivative. In particular deg(P −Q) 6 s− 1.
130
2. Polynomials (possibly non-classical) Q1, . . . , QC : Fn → T for some C = Os,β,ε(1)
such that Q is measurable in Q1, . . . , QC.
We will prove this theorem in Section 7.7.2. Being able to algorithmically compute
functions of large Gowers norm allows us to algorithmically refine any polynomial factor to
a uniform factor, in the sense of Definition 2.2.16 in low characteristic. Notice that the two
obstacles in the proof of Lemma 7.3.1 were that we were not able to compute polynomials of
high Gowers norm in the low characteristic case and the reason was that we used (7.4) which
is not possible when |F| 6 d. Theorem 7.7.2 allows us to compute each linear combination of
polynomials in the factor that has large Gowers norm, and thus the following is a corollary
to Theorem 7.7.2.
Corollary 7.7.3 (Uniform Refinement in Low Characteristic). Suppose d is a positive integer
and ρ ∈ (0, 1] is a parameter. There is a randomized algorithm that, takes as input a factor
F of degree d over Fn, and a decreasing function γ : N→ R+, runs in time Oρ,γ,dim(F)(nd),
and with probability 1 − ρ outputs a γ-uniform factor F , where F is a refinement of F , is
of same degree d, and |dim(F)| σ,γ,dim(F) 1.
Notice that both F and F ′ are now (non-classical) polynomial factors. Theorem 7.7.1
follows as a corollary by a proof identical to that of Theorem 7.5.6. Theorem 7.7.2 and
Corollary 7.7.3 can be derandomized just as it was possible in the high characteristic case.
7.7.1 Derivative Polynomials as Symmetric Multilinear Polynomials
Let P : Fn → T be a degree-d polynomial, possibly non-classical. Define the derivative
polynomial DP : (Fn)d → T by the following formula
DP (h1, . . . , hd)def= Dh1
· · ·DhdP (x),
131
where h1, . . . , hd ∈ Fn. Lemma 2.1.6 shows some useful properties of the derivative polyno-
mial. Motivated by the properties of the derivative function, we define a class of polynomials.
Definition 7.7.4 (Classical Symmetric Multilinear Polynomials). For every d, denote by
HCSMd(Fn) the set of all polynomials Q : (Fn)s → F, s 6 d, which satisfy properties
(i), (ii), (iii) and (iv) from Lemma 2.1.6. In particular for every degree-d polynomial P ,
DP ∈ HCSMd(Fn). The reason we use the name “HCSM” is that CSM which was defined
in [78] is reserved for the subclass of classical symmetric multilinear polynomials which are
the derivative polynomials of classical polynomials.
Remark 7.7.5. It is observed in [78] that if P : Fn → T is a classical degree-d polynomial
then DP (h1, . . . , hd) vanishes whenever at least p of the h1, . . . , hd are equal. Let CSMd(Fn)
denote the set of polynomials P ∈ HCSMd(Fn) which satisfy this condition.
In what follows we shall study some explicit structural properties of CSMd, HCSMd, and
derivative polynomials.
Definition 7.7.6 (Concatenation). Let P ∈ HCSMk(Fn) and Q ∈ HCSMl(Fn), for integers
l, k > 1. Define the concatenation P ∗Q ∈ HCSMk+l(Fn) as
(P ∗Q)(y1, . . . , yk+l)def=
∑A⊆[k+l],|A|=k
P((yi)i∈A
)·Q((hj)j∈[k+l]\A
).
The following lemma shows how the derivative polynomial acts on product of two poly-
nomials.
Lemma 7.7.7 ([78]). Let P,Q : Fn → T be degree-k and degree-l polynomials respectively.
Then we have the following identity
D(PQ) = DP ∗DQ.
132
The following lemma by Tao and Ziegler shows that every polynomial in CSMd(Fn) has
a d-th root which is a classical polynomial.
Lemma 7.7.8 (Part of Lemma 4.5 in [78]). Let d > 1 be an integer, then for every Q ∈
CSMd(Fn) there is a degree-d classical polynomial P : Fn → T for which DP = Q.
It is not difficult to extend this to the more general case of HCSMd(Fn) and non-classical
polynomials.
Lemma 7.7.9. Let d > 1 be an integer, then for every Q ∈ HCSMd(Fn) there is a degree-d
(non-classical) polynomial P : Fn → T for which DP = Q.
Proof. We know that Q is a classical polynomial so it there is a unique representation of Q
as following
Q(h1, . . . , hd) = ι
∑i1,...,id∈[n]
ci1,...,idh1,i1 · · ·hd,id
,
where hk = (hk,1, . . . , hk,n), and ci1,...,id ∈ F are coefficients. As a consequence of symmetry
of Q, the coefficients ci1,...,id are symmetric with respect to permutations of (i1, . . . , id) and
thus Q must be an integer linear combination of expressions of the form
ι
∑(i1,...,id):i1,...,id=A
h1,i1 · · ·hd,id
(7.11)
where A is a multiset of cardinality d with elements from [n]. For j ∈ [n], let aj denote the
multiplicity of j in the set A. Using this we can rewrite (7.11) as
Q1 ∗Q2 ∗ · · · ∗Qn,
where Qj(y1, . . . , yaj ) = y1,jy2,j · · · yaj ,j . Assume that aj = rj + (p − 1)tj where 1 6 rj 6
133
p− 1. Define
Pj(x) =|xj |rj
ptj+1mod 1.
Notice that deg(Pj) = rj + (p − 1)tj = aj and thus by Lemma 2.1.6 DPj ∈ HCSMaj (Fn).
It is easy to see that DPj(y1, . . . , yaj ) only depends on variables y1,j , . . . , yaj ,j and finaly it
follows by multilinearity of DPj that there exists c ∈ F\0 such that
DPj(y1, . . . , yaj ) = c y1,j · · · yaj ,j .
Now letting P ′j := c−1Pj , it follows by Lemma 7.7.7 that
D(∏j∈[n]
P ′j) = DP ′1 ∗ · · · ∗DP′n = Q1 ∗Q2 ∗ · · · ∗Qn.
We shall define the symmetric power of a polynomial.
Definition 7.7.10 (Symmetric Power). Let Q be a polynomial in HCSMk(Fn) for k > 1.
Define the symmetric power Symm(Q) ∈ HCSMmk(Fn) for m > 1 by
Symm(Q)(y1, . . . , ymk)def=∑A
∏A∈A
Q((yi)i∈A
),
where the sum ranges over all partitions A of 1, . . . ,mk into m subsets of size k.
Lemma 6.2 in [78] shows that for every R ∈ CSMd(Fn) there is a classical degree md
polynomial for which DQ = SymmR. By slight modification of their proof we extend this
to HCSMd(Fn).
Lemma 7.7.11 (Symmetric Power Rule). Let d,m be integers and let R ∈ HCSMd(Fn)
be a classical symmetric multilinear polynomial. Then there exists a degree-md polynomial
134
Q : Fn → T such that
DQ = Symm(R).
Moreover if m > 1 then Q can be written as a function of a polynomial of degree 6 md− 1.
Proof. By Lemma 7.7.9 there is a (non-classical) degree-d polynomial P for which DP = R.
Assume that P has depth k > 0.
Let M > 0 be such that pM 6 m < pM+1. There exists a degree d+M(p−1) polynomial
PM : Fn → T for which pM · PM ≡ P mod 1. Notice that PM is not unique, as pMT ≡ 0
mod 1 for any polynomial T of depth at most M . We can pull PM to Z/pk+M+1Z and obtain
a degree-(d+M(p− 1)) polynomial PM : Fn → Z/pk+M+1Z for which P = PMpk+1 mod 1.
Let Q :=(PMm )pk+1 mod 1. We claim that
(i) deg(Q) 6 md.
(ii) DQ = Symm(R).
Parts (i) and (ii) together imply that deg(Q) = md. The last part of the theorem follows
from the fact that d+M(p−1) < md when m > 1 and that Q can be expressed as a function
of PM which is of degree d+M(p− 1).
Part (i) will be a special case of the following claim.
Claim 7.7.12. Let j > 0 and m′ 6 m be parameters. Then
deg
(Dh1···DhjPMm′
)pk+1
mod 1
6 d− j + (m′ − 1) ·max(d− j, 1).
Proof. We break the claim to two cases.
Case 1, (d−j+(m′−1)·max(d−j, 1) < 0): This in particular implies that m′ < j−d. We
need to show that(Dh1
···DhjPMm′
)
pk+1 mod 1 ≡ 0 or equivalently that(Dh1
···DhjPMm′
)is divisible
by pk+1. Notice that deg(PM ) = d+M(p−1) which implies that deg(Dh1· · ·DhjPM ) 6 d−
135
j+M(p−1) for any fixed choice of vectors h1, · · · , hj ∈ Fn. Thus pM−aDh1· · ·DhjPM = 0
for any choice of a for which d − j + a(p − 1) 6 0. This implies that Dh1· · ·DhjPM is
divisible by pa+k+1. It is not difficult to check that choosing a := bm′−1p−1 c we have that(Dh1
···DhjPMm′
)is divisible by pk+1 since m′ < p
bm′−1p−1 c+1
= pa+1.
Case 2, (d− j+ (m′−1) ·max(d− j, 1) > 0): We will prove this by downward induction
on j and upward induction on m′. It is sufficient to show that
deg
Dhj+1
(Dh1···DhjPMm′
)pk+1
mod 1
6 d− j + (m′ − 1) ·max(d− j, 1)− 1.
We will make use of the following identity(r+sm′)
=∑m′i=0
(ri
)( sm′−i
)and write
Dhj+1
(Dh1···DhjPMm′
)pk+1
=
∑m′i=1
(Dh1···Dhj+1
PMi
)(Dh1···DhjPMm′−i
)pk+1
, (7.12)
where the equalities are modulo 1. Now by the two strong induction hypotheses the degree
of each summand in the right-hand-side summation is at most
d− (j + 1) + (i− 1) max(d− (j + 1), 1) + d− j + (m′ − i) max(d− j, 1)
6 d− j + (m′ − 1) max(d− j, 1)− 1,
which concludes the claim.
Now (i) follows by choosing j = 0 and m′ = m. We move to (ii), namely we want to
prove that DQ = SymmDP = SymmR. Notice that
Dh
(PMm
)pk+1
=
∑mi=1
(DhPMi
)( PMm−i
)pk+1
mod 1,
and thus the terms with i > 2 have degree at most mk − i 6 mk − 2 and thus do not
136
contribute to DQ. Therefore
DQ(h1, . . . , hmd) = D
(Dhmd
(PMm
)pk+1
)(h1, . . . , hmd−1)
= D
(DhmdPM
pk+1
)∗D
(( PMm−1
)pk+1
)
=∑
16i1<···<id−1<md
DP (hi1 , . . . , hid−1, hmd)D
(PMm− 1
)(hj1 , . . . , hj(m−1)d
),
where 1 6 j1 < · · · < j(m−1)d < md are such that j1, . . . , j(m−1)d = 1, . . . ,md −
1\i1, . . . , id−1 and the second equality follows by Lemma 7.7.7. Now the claim DQ =
SymmDP follows by induction on m.
7.7.2 Computing Polynomials with Large Gowers Norm
In this section we will prove Theorem 7.7.2. The next theorem is an algorithmic general-
ization of Theorem 6.6 in [78] from CSM polynomials to the more general class of HCSM
polynomials. The theorem shows what kind of structure can be obtained from biased classical
symmetric multilinear polynomials.
Theorem 7.7.13. Let s > 0 be an integer and ε, β ∈ (0, 1] be parameters. Suppose that
P ∈ HCSMs(Fn) is such that bias(P ) > ε. Then there exists a bounded index subspace V of
Fn such that on V s, P is a linear combination (over F) of a bounded number of expressions
of the form
Symm1(Q1) ∗ · · · ∗ Symmr(Qr), (7.13)
for some m1, . . . ,mr > 1 and 2 6 k1, . . . kr 6 s− 1 and Qi ∈ HCSMki(V ) for i ∈ [r] with
m1k1 + · · ·+mrkr = s.
Moreover, there is a randomized algorithm that given such P , runs in Oβ,ε(ns) and with
137
probability 1− β outputs P as a linear combination of terms of form Eq. (7.13) explicitly.
We are given a polynomial P ∈ HCSMs(Fn) that is biased. We can use Theorem 7.6.1
to compute P with a few polynomials of degree 6 s− 1. We will show that this can be done
using only classical symmetric multilinear polynomials. This requires going into the proofs
of Lemma 7.2.1 and Lemma 7.4.9 and the following claim which states that a derivative of a
classical symmetric multilinear polynomial can be written as linear combination of classical
symmetric multilinear polynomials.
Claim 7.7.14. Let s > 0 be an integer, P ∈ HCSMs(Fn), and h = (h1, . . . , hs) ∈ (Fn)s.
Then the additive derivative of P in direction h, DhP , can be written as a linear combination
of 2s − 1 polynomials(QS)S⊂[s],S 6=∅, where QS ∈ HCSMs−1(Fn).
Proof. We can write
DhP (x1, . . . , xs) = P (x1 + h1, . . . , xs + hs)− P (x1, . . . , xs)
=∑S⊂[s]S 6=∅
P ((xi)i/∈S , (hi)i∈S)
where the second equality follows from multilinearity of P . Now letting
QS := P ((xi)i/∈S , (1n)s−|S|) we have that QS ∈ HCSMs−|S|(Fn) ⊆ HCSMs−1(Fn).
Now we are able to show that a biased HCSM polynomial can be computed by an HCSM
factor of lower degree.
Definition 7.7.15 (HCSM-Factors). Let d > 0 be an integer. A collection
F = Pi,j16i6d,16j6mifor natural numbers m1, . . . ,md with Pi,j ∈ HCSMi(Fn) is called
an HCSM-factor of degree d.
Definition 7.7.16 (Measurability). Let d > k > 0 and let F = Pi,j16i6k,16j6mibe an
HCSM-factor of degree k. A polynomial P ∈ HCSMd(Fn) is said to be measurable in F , if
138
there exists a collection
A = α = (iα, jα, Sα) : 1 6 iα 6 k, jα < Miα , Sα ⊆ [d], |Sα| = iα,
and a function Γ : FA → F such that
P (h1, . . . , hd) = Γ(Piα,jα
((h`)`∈Sα
): α ∈ A
)Claim 7.7.14 allows us to compute a biased HCSM polynomial by an HCSM-factor of
lower degree.
Corollary 7.7.17. Let β ∈ (0, 1] be an error parameter. There is a randomized algorithm
that given a polynomial P ∈ HCSMs(Fn) with bias(P ) > δ > 0, runs in Oδ,β,γ(nd) and with
probability 1− β, returns an HCSM-factor F = Pi,j16i6d−1,16j6miof degree d− 1 such
that P is measurable in F .
Proof. We observe that Lemma 7.2.1 which is used in the proof of Theorem 7.6.1 and in every
step of Lemma 7.4.9, approximates (and therefore computes) a polynomial by a set of its
derivatives. The corollary then follows by using Claim 7.7.14 at every step. The γ-unbiased
factor can be achieved using the similar modification of Lemma 7.2.3.
Definition 7.7.18 (Strongly γ-unbiased HCSM-factors). Let γ : N → R+ be a decreasing
function, and let d > k > 0. Suppose that F = Pi,j16i6d,16j6miis an HCSM-factor of
degree d and ∆ = δi,j ∈ [i]16i6d,16j6miis a collection of degree bounds. We say that F is
strongly γ-unbiased if for every i ∈ [d], Fi = Pi,j(h1, . . . , hi)16j6Mdis strongly γ-unbiased
with degree bounds ∆i = δi,j16j6miin the sense of Definition 7.4.6.
Lemma 7.7.19 (Computation by strongly γ-unbiased factors). Let β ∈ (0, 1] be an error
parameter, and let γ : N → R+ be a decreasing function. There is a randomized algorithm
that given a polynomial P ∈ HCSMd(Fn) with bias(P ) > δ > 0, runs in Oδ,β,γ(nd) and with
139
probability 1− β, returns a strongly γ-unbiased HCSM-factor F = Pi,j16i6d−1,16j6miof
degree d− 1 with degree bound ∆ = δi,j ∈ [i]16i6d−1,16j6misuch that P is measurable in
F .
Proof. By Corollary 7.7.17 we can find an HCSM-factor F = Pi,j16i6d−1,16j6misuch
that P is measurable in F . Now one can use Lemma 7.4.9 and Claim 7.7.14 to repeatedly
refine the factor until we obtain a strongly γ-unbiased HCSM-factor.
Now we are ready to prove Theorem 7.7.13.
Proof of Theorem 7.7.13: We are given a polynomial P ∈ HCSMs(Fn) that is biased.
Let γ : N→ R+ be a decreasing function to be specified later. By Lemma 7.7.19 with high
probability we can find a strongly γ-unbiased HCSM-factor F = Pi,j16i6s−1,16j6misuch
that P is measurable in F . Namely there exists a collection A = α = (iα, jα, Sα) : 1 6
iα 6 s− 1, 1 6 jα < Miα , Sα ⊆ [s], |Sα| = iα and a function Γ : FA → F such that
P (h1, . . . , hs) = Γ(Piα,jα
((h`)`∈Sα
): α ∈ A
)For technical reasons to be stated later linear polynomials are problematic for the argument
and we will pass to a bounded index subspace V of Fn where F has no linear polynomials.
Notice that Γ must be invariant under permutations of h1, . . . , hs. We will show that Γ is
moreover multilinear.
Claim 7.7.20 (HCSM Analogue of Proposition 8.1 from [78]). For a suitably fast decreas-
ing choice of the function γ : N → R+, Γ((xα)α∈A) is a linear combiantion (over F) of
monomials
xα1 · · ·xαr ,
where α` = (iα` , jα` , Sα`) for ` = 1, . . . , r are elements of A for which Sα1 , . . . , Sαr partition
1, . . . , s.
140
Proof. The proof will follow the lines of the proof from [78], except now the factor is an
HCSM-factor instead of CSM-factor and the factor is strongly γ-unbiased instead of regular.
We will deduce the equidistribution properties of the HCSM-factor from Lemma 2.2.19.
Split A = A3s ∪A 63s, where A3s is the set of α = (iα, jα, Sα) ∈ A for which s ∈ Sα and
A63s = A\A3s. We will split FA = FA3s×FA 63s in the natural way, namely for every x ∈ FA
we write x = (x3s, x63s) where x3s ∈ FA3s and x 63s ∈ FA 63s . We will show the following
linearity
Γ(x3s + y3s, x63s) = Γ(x3s, x63s) + Γ(y3s, x63s), (7.14)
for every x3s, y3s ∈ FA3s and x 63s ∈ FA 63s . To prove (7.14), by multilinearity of the poly-
nomials P and Pi,j for i ∈ [s − 1], j ∈ [mi], it suffices to find h1, . . . , hs, h′s ∈ Fn such
that
• Piα,jα((h`)`∈Sα) = x3s, for every α ∈ A3s,
• Piα,jα((h`)`∈Sα) = x 63s, for every α ∈ A63s, and
• Piα,jα((h`)`∈Sα,i 6=s, h′s) = y3s, for every α ∈ A3s.
The existence of such vectors h1, . . . , hd, h′d is implied by the claim below. Now we can write
Γ(x3s, x63s)+Γ(y3s, x63s) = P (h1, . . . , hs)+P (h1, . . . , hs−1, h′s) = P (h1, . . . , hs−1, hs+h′s)
= Γ
((Piα,jα
((h`)`∈Sα\s, hs + h′s
))α∈A3s
,(Piα,jα
((h`)`∈Sα
))α∈A 63s
)= Γ(x3s + y3s, x63s),
where the second equality is multilinearity of P and the last equality is multilinearity of
the polynomials Pi,j . The following claim shows that choosing a suitably fast decreasing γ
implies existence of vectors h1, . . . , hs, h′s.
141
Claim 7.7.21. Let s > 0. There is a sufficiently fast decreasing choice of γ : N → R+
such that for every degree (s − 1) γ-unbiased HCSM-factor F = Pi,j16i6s−1,16j6mithe
following holds. For every collection
A = α = (iα, jα, Sα) : iα ∈ [s− 1], jα ∈ [mi], Sα ⊆ [s+ 1], |Sα| = iα,
and collection of field elements (xα)α∈A, there exists h1, . . . , hs+1 ∈ Fn such that for every
α ∈ A,
Piα,jα((h`)`∈Sα
)= xα.
Proof. We will prove this by showing that the distribution of the vector(Piα,jα
((h`)`∈Sα
))α∈A for uniformly at random h1, . . . , hs+1 can be made arbitrarily close
to the uniform distribution over F|A|. To do this, it is sufficient to prove that for every
collection of coefficients λα ∈ Fα∈A we have
∣∣∣∣∣∣ Eh1,...,hs+1∈Fn
e
∑α∈A
λαPiα,jα((h`)`∈Sα
)∣∣∣∣∣∣ 6 γ(dim(F))2−s+1.
This will be a corollary to Lemma 2.2.19. Let β ∈ A be such that |Sβ | = iβ is maximized as
λβ is nonzero. Without loss of generality assume that Sβ = 1, . . . , k, where iβ = k. We
will derive PA(h1, . . . , hs+1)def=∑α∈A λαPiα,jα
((h`)`∈Sα
)in directions h1, . . . , hk. Notice
that for any α with Sα 6= Sβ , the summand corresponding to α does not depend on at least
one vector among h1, . . . , hk and will therefore vanish. Thus we have
142
DY1DY2· · ·DYkPA(h1, . . . , hs+1) =
∑α∈A:
Sα=1,...,k
λαDy1Dy2 · · ·DykPiα,jα(h1, . . . , hk)
=∑α∈A:
Sα=1,...,k
λαPiα(y1, . . . , yk),
where Y1, . . . , Yk ∈ (Fn)s+1 are vectors that are supported only on h1, . . . , hk respectively
and yi is the restriction of Yi to the hi coordinates. The second equality follows from the
fact that Pi,j are multilinear and classical. Now similar to Claim 7.4.8 we have that
∣∣∣∣∣∣ Eh1,...,hs+1∈Fn
e
∑α∈A
λαPiα,jβ((h`)`∈Sα
)∣∣∣∣∣∣2k
6
∣∣∣∣∣∣∣∣∣ Ey1,...,yk∈Fn
e
∑α∈A:Sα=Sβ
λαPiα,jα(y1, . . . , yk)
∣∣∣∣∣∣∣∣∣ 6 γ(|dim(F)),
where the first inequality is repeated Cauchy-Schwarz inequality and the second inequality
follows by F being γ-unbiased. Now the claim follows by the fact that k = iβ 6 s− 1.
By symmetry (7.14) implies that
Γ(x3j + y3j , x63j) = Γ(x3j , x63j) + Γ(y3j , x63j), (7.15)
for every j ∈ [s], where A3j and A 63j are defined analogously to A3s and A63s. This shows
that Γ is multilinear. Now we show that all monomials of Γ are of the form xα1 · · ·xαr where
Sα1 , . . . , Sαr partition 1, . . . , s. For an arbitrary α ∈ A, we will look at DyαΓ where yα is
only supported on xα. Let β ∈ A be such that Sβ intersects Sα, and let ` be an arbitrary
element of Sβ ∩ Sα. Using (7.15) for j = ` implies that DyαΓ does not depend on xβ .
143
This implies that all the monomials of Γ are of the form xα1 · · · xαr where Sα1 , . . . , Sαr are
disjoint subsets of [s]. The fact that [s] = ∪ri=1Sαr follows from multilinearity of Γ.
By Claim 7.7.20, we have
P (y1, . . . , ys) =∑
α1,...,αr
cα1,...,αr · Piα1 ,jα1((yi)i∈Sα1
) · · ·Piαr ,jαr ((yi)i∈Sαr ),
for field elements cα1,...,αr , where yis are vectors from Fn, α1, . . . , αr are collections of
elements of A for which Sα1 , . . . , Sαr partition [s]. Since P is invariant under permutations
of (y1, . . . , ys), so should be the coefficients cα1,...,αr and therefore we may split this sum into
orbits of the action of the permutation group, and conclude that P is a linear combination
of basic symmetric monomials
Symm1(Piβ1,jβ1
) ∗ · · · ∗ Symml(Piβ` ,jβ`),
with m1iβ1+ · · · + m`iα` = s. Note that iα1 , . . . , iα` > 2 because we had discarded all of
the linear polynomials from our factor by restricting to the bounded index subspace V . This
finishes the proof of the existence of such a decomposition. It is left to show that we can
find the decomposition algorithmically. This is simple using Lemma 7.2.5 for Γ, since the
factor is regular and provided that γ decreases suitably fast. Since Γ : FA → F, we need only
O|A|(1) queries to Γ to find explicit representation as a linear combination of monomials,
and this can be guaranteed with high probabality using Lemma 7.2.5.
Theorem 7.7.2 now follows as a corollary.
Proof of Theorem 7.7.2: By Lemma 2.1.6 we know that DP ∈ HCSMs(Fn) and
bias(DP ) > ε. By Theorem 7.7.13 we can find with probability 1− µ a bounded index V of
Fn and write DP as a linear combination (over F) of a bounded number of expressions of
144
the form
Symm1(Q1) ∗ · · · ∗ Symmr(Qr), (7.16)
for some m1, . . . ,mr > 1 and 2 6 k1, . . . kr 6 s− 1 and Qi ∈ HCSMki(V ) for i ∈ [r] with
m1k1 + · · ·+mrkr = s.
Now notice that by Lemma 7.7.9 for every Qi we can find a degree-ki polynomial Pi such
that DPi = Qi, and by Lemma 7.7.11 we can find a degree miki (non-classical) polynomial
Pi for which Symmi(DPi) = DPi. Moreover if mi > 2, Pi can be written as a function
of a lower degree polynomial P(i)M . Now notice that if r > 1 then by Lemma 7.7.7 for the
polynomial∏i Pi we have
D(r∏i=1
Pi) = Symm1(Q1) ∗ · · · ∗ Symmr(Qr),
and∏ri=1 Pi is measurable in P1, . . . , Pr. In the case when r = 1 then 2 6 k1 6 s − 1
implies that m1 > 2 which concludes the theorem over the subspace V .
Notice that if V = Fn we would be done. We have shown that we can find a degree-
s polynomial Q and degree 6 s − 1 (non-classical) polynomials Q1, . . . , QC such that the
statement of the theorem holds on a bounded index subspace V . We can extend the statement
to all of Fn by a simple derivative trick. Let h1, . . . , hK ∈ Fn be the presentatives of Fn/V ,
where K =|F|n|V | . Suppose that x is a point in Fn, then there is i ∈ [K] such that x−hi ∈ V .
This means that Q(x) = Q(x−hi)−Q(x)−Q(x−hi) = D−hiQ(x)−Q(x−hi). Now notice
that deg(D−hiQ) 6 s − 1 and since x − hi is in V , Q(x − hi) can be written as a function
of Q1, . . . QC , and thus Q is measurable in D−h1Q, . . . , D−hKQ ∪ Q1, . . . , QC.
145
7.8 Application: Polynomial Decompositions
Consider the following family of properties of functions over a finite field F of fixed prime
order p.
Definition 7.8.1. Given a positive integer k, a vector of positive integers
∆ = (∆1,∆2, . . . ,∆k) and a function Γ : Fkp → Fp, we say that a function P : Fn → F is
(k,∆,Γ)-structured if there exist polynomials P1, P2, . . . , Pk : Fn → F with each deg(Pi) 6
∆i such that for all x ∈ Fn,
P (x) = Γ(P1(x), P2(x), . . . , Pk(x)).
The polynomials P1, . . . , Pk are said to form a (k,∆,Γ)-decomposition.
Informally, a degree-structural property refers to a property from the family of (k,∆,Γ)-
structured properties. Our main result in this section is that every degree-structural property
can be decided in polynomial time:
Theorem 7.8.2. For every finite field F of prime order, positive integers d, k, every vec-
tor of positive integers ∆ = (∆1,∆2, . . . ,∆k) and every function Γ : Fk → F, there is a
deterministic algorithm Ak,∆,Γ that takes as input a polynomial P : Fn → F of degree d,
runs in time polynomial in n, and outputs a (k,∆,Γ)-decomposition of P if one exists while
otherwise returning NO.
In [13], a similar result was obtained for the special case of d < |F|, using our algorithmic
Green-Tao regularity lemma that applies in the high characteristic case.
The algorithm itself is quite simple. Given a polynomial P : Fn → F, we first use
Corollary 7.7.3 to write it as a function of a uniform factor Q1, . . . , Qm of non-classical
polynomials, i.e. P (x) = G(Q1(x), . . . , Qm(x)) where m = O(1) and G : Tm → F. Now, the
proof technique of [13] (similar reasoning also in proof of Theorem 1.7 of [15]) shows that the
146
only way P can have a (k,∆,Γ)-decomposition is if there are functions G1, . . . , Gk : Tm → T
such that G(z1, . . . , zm) = Γ(G1(z1, . . . , zm), . . . , Gk(z1, . . . , zm)) and also, for every i ∈ [k],
Gi(Q1(x), . . . , Qm(x)) is a classical polynomial of degree at most ∆i. Since m = O(1),
there are only a constant number of possible G1, . . . , Gk, and so the whole algorithm runs
in polynomial time.
Another natural way to view the algorithm is as an induction on the dimension. The
proof shows that there exists i ∈ [n] such that P is (k,∆,Γ)-structured if and only if P |xi=0 is
(k,∆,Γ)-structured. We can keep doing this until the number of variables reaches a constant
(that is dependent only on F, k, ∆ and Γ), and moreover, at every step, in polynomial time,
we can check whether this base case has been reached. When the base case is reached, in
constant time, we can compute a (k,∆,Γ)-structure if one exists. Finally, the proof also
shows a way to lift up the recovered decomposition back to the original input on n variables.
Thus, the algorithm is another instance of the “restrict-solve-lift” paradigm that is used in
many algebraic algorithms, such as those for factoring.
147
CHAPTER 8
NORMS DEFINED BY LINEAR FORMS
In this chapter we study norms that are defined by averaging over a set of linear forms. The
results in this chapter are based on the joint work with Hamed Hatami and Shachar Lovett.
An example of such norms is the Gowers norms. Such norms can be defined over any abelian
group, but here similar to the rest of this thesis, we will only be interested in the case where
the group is Fn, where F = Fp for a prime p. For the most general averages that we could
define, we also allow the terms f(Li(X)) to have non-negative real valued powers.
Definition 8.0.3. Let L = L1, ..., Ll be a family of linear forms in Fk, and α, β ∈ Rl>0.
For a function f : Fn → C, define
tL,α,β(f)def= E
l∏i=1
f(Li(X))αi f(Li(X))βi
,where X ∈ (Fn)k is chosen uniformly at random. Letting T := (L, α, β), for a function
f : Fn → C we write
‖f‖Tdef= tL,α,β(f)1/|α,β|,
where |α, β| def=∑li=1(αi + βi).
Remark 8.0.4. For ‖·‖T to be well defined over complex valued functions, we will require
αi − βi to be an integer for every i. Notice that in the study of real functions we will not
have conjugations, namely we can assume without loss of generality that β = 0 and we will
not make any further assumptions about α.
The next theorem to be proved in Section 8.4 states that in order for tL,α,β to be a norm,
α and β must be one of two specific forms.
148
Informal Statement (Theorem 8.2.2). Let T = (L, α, β) be such that ‖·‖T is a norm.
Then either there is an s > 1 such that
tL,α,β(f) = E
l∏i=1
|f(Li(X))|s/2 ,
or for every i, αi, βi = 0, 1. We call norms of the former case type (I) norms and
norms of the latter case type (II) norms.
In order to prove Theorem 8.2.2, in Section 8.3 we develope tools such as Gowers type
Cauchy-Schwarz inequality and a Holder type inequality for ‖·‖T norms. This approach
follows the same lines as [48].
Example. Gowers U2 norm over G→ R can be defined by (L, α, β), where
L = (1, 0, 0), (1, 1, 0), (1, 0, 1), (1, 1, 1), α = (1, 1, 1, 1), and β = (0, 0, 0, 0), and therefore it
is a type (I) norm. Namely
‖f‖U2 = ‖f‖L,M.
Later we investigate the properties of type (I) and type (II) norms over the space of
bounded valued complex functions. Lemma 8.5.2 is a simple observation that type (II) norms
are equivalent to L1 norm. Let D denote the complex unit disc D def= c ∈ C : |c| 6 1.
Informal Statement (Lemma 8.5.2). Let T = (L, α, β), be such that ‖·‖T is a type (II)
norm. Then for every g : Fn → D we have
• ‖g‖1 6 ‖g‖T , and
• ‖g‖T 6 ‖g‖(1/|α,β|)1 .
In Theorem 8.4.3 we prove that every type (I) norm must be invariant under affine
transformations, which means we can assume that the first coordinate of the linear forms
defining the norm is 1. This allows us to find a positive integer z = z(L,M) for which we
prove that every type (I) norm is essentially equivalent to the Gowers norm ‖·‖Uz+1 .
149
Informal Statement (Theorem 8.5.4). Let T = (L, α, β) be such that ‖·‖T is a type (I)
norm and L ∪M is of complexity 6 p. Then there exist an integer z and functions δ1, δ2 :
R>0 → R>0 such that for every n and f : Fn → D,
i. If ‖f‖T 6 ε then ‖f‖Uz+1 6 δ1(ε), and
ii. If ‖f‖Uz+1 6 ε then ‖f‖T 6 δ2(ε).
8.1 Notation
Recall the definition of type (I) and (II) norms from the statement of Theorem 8.2.2 as
stated in the introduction. Since a type (II) norm is defined only by L and s, from now on
we denote a type (II) norm by ‖·‖T , where T = (L, s). In the case of type (I) norms, namely
when for all i, αi, βi = 0, 1, we drop the α and β and work with two collections of linear
forms L = L1, ..., Ll and M = M1, ...,Mm and write
tL,M(f) = EX∈(Fn)k
l∏i=1
f(Li(X))m∏i=1
f(Mi(X))
,moreover for T = (L,M) define
‖f‖T = tL,M(f)1/(l+m).
From now on by T = (L, α, β) we mean that L = L1, ..., Ll is a family of linear
forms and α, β ∈ Rl>0. Also by T = (L,M), we mean that L = L1, ..., Ll and M =
M1, ...,Mm are families of linear forms over k variables where k is a fixed positive integer.
Definition 8.1.1. We will define rL,M(f) and rL,α,β(f) similar to tL,M and tL,α,β,
rL,M(f)(X)def=
l∏i=1
f(Li(X))m∏i=1
f(Mi(X))
,150
and
rL,α,β(f)(X)def=
l∏i=1
f(Li(X))αi f(Li(X))βi
.This allows one to simply write
tL,α,β(f) = EX∈(Fn)k
[rL,α,β(f)(X)
],
and
tL,M(f) = EX∈(Fn)k
[rL,M(f)(X)
].
8.2 Norms defined by linear forms
Let L = L1, ..., Ll be a system of linear forms on k variables. Analytic averages such as
E[∏
i f(Li(X))]
have been subject of interest in several works, see for example [51, 37]. In
this section we will consider a generalized version of such averages and study these averages in
the cases when they obey norm properties. Namely letting T = (L, α, β), where α, β ∈ Rl>0,
we study norms of the following form:
‖f‖∑αi+
∑βj
T = Ex∈(Fn)k
l∏i=1
f(Li(X))αil∏
i=1
f(Li(x))βi
. (8.1)
We will assume that for every i, αi + βi 6= 0, since otherwise Li is redundant.
Remark 8.2.1. For ‖·‖T in Eq. (8.1) to be well defined over complex functions, we need to
assume that for all i ∈ [l], αi − βi ∈ Z. Notice that in the case of real functions there is no
need for β and T = (L, α) would be sufficient to define such norms, where α ∈ R>0.
In the following theorem we show that for ‖·‖T to be a norm, T has to be one of only
two types.
151
Theorem 8.2.2. Let T = (L, α, β) be such that ‖·‖T is a norm. Then one of the following
holds
(i) For every i ∈ [l], αi, βi = 0, 1, or
(ii) There exists s > 1 such that for every i ∈ [l], αi = βi = s/2.
We call norms of the former case type (I) norms and norms of the latter case, type (II)
norms.
We prove the above theorem by extending ideas from the study of hypergraph normings
by the first author [48]. One of the main tools that is used in [48] to prove a characterization
theorem for hypergraph normings is certain Holder type inequalities for such norms. We
prove Theorem 8.2.2 by showing that similar Holder type inequalities also hold for norms
defined by systems of linear forms. We state and prove these inequalities in Section 8.3.
Finally we present the proof of Theorem 8.2.2 in Section 8.4.
8.3 Holder type inequalities
Definition 8.3.1. Let f1, f2 : Fn → C be two functions. The tensor of f1 and f2, denoted
by f1 ⊗ f2 : Fn × Fn → C is defined as follows
f1 ⊗ f2(x1, x2) := f1(x1) · f2(x2).
Moreover by f⊗l we mean the l-th tensor power of f , f⊗l def= f ⊗ f ⊗ . . .⊗ f : Fnl → C.
It is a simple observation that || · ||T norms are multiplicative with respect the above
tensor products.
Claim 8.3.2. Let T = (L, α, β). For functions f1, f2 : Fn → C we have
tL,α,β(f1 ⊗ f2) = tL,α,β(f1) · tL,α,β(f2).
152
Proof. This follows from the independence of entries of x, and the fact that conjugation is
distributive over both addition and multiplication.
A very similar argument shows that
E
l∏i=1
fi ⊗ fi(Li(X))αigi ⊗ gi(Li(X))βi
=
∣∣∣∣∣∣E l∏i=1
fi(Li(X))αigi(Li(X))βi
∣∣∣∣∣∣2
(8.2)
Theorem 8.3.3 (Holder Type Inequality). Let T = (L, α, β). If ‖·‖T is a semi-norm, then
for every a ∈ [l], if αa 6= 0 then
∣∣∣∣∣∣Eg(La(X)) ·
l∏i=1
f(Li(X))αi−ea(i)f(Li(X))βi
∣∣∣∣∣∣ 6 ‖f‖|α,β|−1T ‖g‖T , (8.3)
and if βa 6= 0 then
∣∣∣∣∣∣Eg(La(X)) ·
l∏i=1
f(Li(X))αi f(Li(X))βi−ea(i)
∣∣∣∣∣∣ 6 ‖f‖|α,β|−1T ‖g‖T , (8.4)
where ea is the vector which is equal to 1 on the a-th entry and 0 everywhere else. Moreover
if there exists a ∈ [l] such that at least one of Eq. (8.3) or Eq. (8.4) holds for every pair of
functions f and g, then ‖·‖T satisfies the triangle inequality.
Proof. We first prove the fact that each of Eq. (8.3) or Eq. (8.4) implies the triangle inequal-
ity. Assume that f, g : Fn → C and that Eq. (8.3) holds for a. Then
tL,α,β(f + g) = E
(g(La(X)) + f(La(X))) ·l∏
i=1
(f + g)(Li(X))αi−ea(i)(f + g)(Li(X))βi
6 tL,α,β(f + g)1−1/|α,β| (‖f‖T + ‖g‖T )
which simplifies to the triangle inequality for ‖·‖T . The proof for Eq. (8.4) is identical.
153
For the other direction, assume f, g : Fn → C are functions with ‖f‖T , ‖g‖T 6∞. Since
‖·‖T is a semi-norm, for every δ > 0
‖f + δg‖T 6 ‖f‖T + δ‖g‖T ,
thus
d‖f + δg‖Tdδ
∣∣∣∣∣0
6 ‖g‖T . (8.5)
Now expanding the derivative
d‖f + δg‖Tdδ
=
‖f + δg‖1−|α,β|T
|α, β|·E
l∑i=1
(αig(Li(x))rL,α−ei,β(f + δg)(x) + βig(Li(x))rL,α,β−ei(f + δg)(x)
) .Thus by Eq. (8.5)
‖f‖1−|α,β|T
|α, β|· E
l∑i=1
(αig(Li(x))rL,α−ei,β(f)(x) + βig(Li(x))rL,α,β−ei(f)(x)
) 6 ‖g‖T ,which can be rearranged to
1
|α, β|· E
l∑i=1
(αig(Li(x))rL,α−ei,β(f)(x) + βig(Li(x))rL,α,β−ei(f)(x)
)6 ‖f‖|α,β|−1
T ‖g‖T . (8.6)
Replacing f and g by (f ⊗ f)⊗m and (g ⊗ g)⊗m for an integer m > 0, by Claim 8.3.2 and
154
Eq. (8.2) we have
1
|α, β|·
l∑i=1
(αi∣∣E[g(Li(x))rL,α−ei,β(f)(x)]
∣∣2m + βi∣∣E[g(Li(x))rL,α,β−ei(f)(x)]
∣∣2m)6(‖f‖|α,β|−1
T ‖g‖T)2m
.
The proof follows from the fact that this inequality holds for all m > 0 and that
|α, β| =l∑
i=1
αi +m∑i=1
βi.
One simple corollary to the above theorem is that the absolute value of the average of a
function f is bounded by its ‖·‖T norm.
Corollary 8.3.4. Let T = (L, α, β) be such that ‖·‖T is a norm. For every g : Fn → C we
have
|E[g]| 6 ‖g‖T .
Proof. Define f(x) = 1 for all x ∈ Fn. Assume there exists i ∈ [l] such that αi 6= 0, then we
have
|E[g(x)]| =∣∣∣E [g(Li(x))rL,α−ei,β(f)(x)
] ∣∣∣ 6 ‖g‖T ,where the inequality follows from Theorem 8.3.3. The case when for all i ∈ [l], αi = 0 is
handled similarly using the second inequality from Theorem 8.3.3.
Using the above claim we have the following corollary to Theorem 8.3.3.
155
Corollary 8.3.5. Let T = (L, α, β) be such that ‖·‖T is a norm. Then for every i ∈ [l],
αi + βi > 1.
Proof. Assume for contradiction that αi + βi < 1. By Eq. (8.3) we have
∣∣E [g(Li(X)) · rL,α−ei,β(f)(X)] ∣∣ 6 ‖f‖|α,β|−1
T ‖g‖T . (8.7)
The next claim shows that there is n and Y ∈ (Fn)k such that Li(Y ) 6= Lj(Y ) for all
j 6= i.
Claim 8.3.6. Let L = L1, ..., Ll be a family of linear forms over k variables. For every
i ∈ [l], there exists an integer n and X ∈ (Fn)k such that
Li(X) 6∈Lj(X) : j 6= i
.
Proof. For every j 6= i, letAj denote the set of solutions to Li(X) = Lj(X). Since Li−Lj 6= 0
we have that |Aj | 6 pn(k−1). Thus
∣∣∣∣∣∣⋃j 6=i
Aj
∣∣∣∣∣∣ 6 l · pn(k−1),
which for large enough n is strictly less than∣∣∣(Fn)k
∣∣∣ = pnk.
Let ω = Li(Y ), and for ε > 0 define f : Fn → C as follows
f(x) =
ε x = ω
1 Otherwise
Defining g(x) := 1 we have ‖f‖T 6 1 and ‖g‖T = 1. One the other hand
156
∣∣E [g(Li(X)) · rL,α−ei,β(f)(X)] ∣∣ > εαi+βi−1/pnk.
Choosing ε small enough this contradicts Eq. (8.7).
The next lemma is an analogue of Gowers Cauchy-Schwarz, for averages over arbitrary
linear forms that satisfy the norm properties.
Lemma 8.3.7 (Cauchy-Schwarz Type Inequality). Let T = (L,M), such that ‖·‖T is a
semi-norm. Then for every set of m+ l functions f1, ..., fl, g1, ..., gm : Fn → C, we have
∣∣∣∣∣∣ EX∈(Fn)k
l∏i=1
fi(Li(X))m∏i=1
gi(Mi(X))
∣∣∣∣∣∣ 6l∏
i=1
‖fi‖Tm∏j=1
‖gj‖T .
Proof. Assume for contradiction that there exist functions f1, ..., fl, g1, ..., gm such that the
inequality does not hold. One can normalize the functions such that
∣∣∣∣∣∣ EX∈(Fn)k
l∏i=1
fi(Li(X))m∏i=1
gi(Mi(X))
∣∣∣∣∣∣ = c > 1,
and for every i ∈ [l], tL,M(fi) 6 1, and for every j ∈ [m], tL,M(gj) 6 1. Let h be such
that c2h/(m+l) > m+ l. Taking h-th tensor power of fi ⊗ fi and gi ⊗ gj ,
tL+M(l∑
i=1
(fi ⊗ fi)⊗h +m∑i=1
(gi ⊗ gi)⊗h) =
E
[l∏
i=1
l∑j=1
(fi ⊗ fi)⊗h(Li(X)) +m∑j=1
(gi ⊗ gi)⊗h(Li(X))
·m∏i=1
l∑j=1
(fi ⊗ fi)⊗h(Mi(X)) +m∑j=1
(gi ⊗ gi)⊗h(Mi(X))
],
157
which by expanding is equal to
∑σ,π
∣∣∣∣∣E[ ∏i:σ(i)6m
(fσ(i))⊗h(Li(X))
∏i: σ(i)>m
(gσ(i))⊗h(Li(X))·
∏i:π(i)6m
(fπ(i))⊗h(Mi(X))
∏i: π(i)>m
(gπ(i))⊗h(Mi(X))
]∣∣∣∣∣2
,
where the sum is over σ ∈ [m+l]l and π ∈ [m+l]m. Using Claim 8.3.2, and Eq. (8.2) it is easy
to see that this value is greater than or equal to c2h. On the other hand∣∣∣∣∣∣(fi ⊗ fi)⊗h∣∣∣∣∣∣
T6 1,
and∣∣∣∣∣∣(gi ⊗ gi)⊗h∣∣∣∣∣∣
T6 1. This contradicts the triangle inequality as
∣∣∣∣∣∣∣∣∣∣∣∣l∑
i=1
(fi ⊗ fi)⊗h +m∑i=1
(gi ⊗ gi)⊗h∣∣∣∣∣∣∣∣∣∣∣∣T
> c2h/(m+l) > m+ l
>l∑
i=1
∣∣∣∣∣∣(fi ⊗ fi)⊗h∣∣∣∣∣∣T
+m∑i=1
∣∣∣∣∣∣(gi ⊗ gi)⊗h∣∣∣∣∣∣T.
Lemma 8.3.8 (Cauchy-Schwarz type inequality for nonnegative functions). Let
T = (L, α, β) be such that ‖·‖T is a norm, and Tii∈[m] with Ti = (L, αi, βi), be such that
for each i, j, αij + βij > 0. Moreover let f1, ..., fm : Fn → R>0, then we have
EX
[m∏i=1
rL,αi,βi(fi)(X)
]6
m∏i=1
‖fi‖|αi,βi|T
Proof. The proof is very similar to the proof of Lemma 8.3.7, with the difference that we
use the boost up argument using f⊗hi for large enough h.
8.4 A characterization theorem
Now we have the necessary tools to prove the following characterization theorem.
158
Theorem 8.2.2 (restated). Let T = (L, α, β) be such that ‖·‖T is a norm. Then one of
the following holds
(i) For every i ∈ [l], αi, βi = 0, 1, or
(ii) There exists s > 1 such that for every i ∈ [l], αi = βi = s/2.
Proof. Assume a ∈ [l] and b ∈ [l] contradict the statement of the Theorem, namely
αa, βa 6= 0, 1 and αb 6= βb. Moreover assume that αb > βb. The case when βb > αb can
be handled in a similar fashion. Let ε be a sufficiently small constant. The following lemma
states that for large enough n there exists h : Fn → −1, 1 with ‖h‖T 6 ε and∣∣Ehc
∣∣ 6 ε,
where c = αb − βb.
Lemma 8.4.1. Let T = (L, α, β) be such that ‖·‖T is a norm, and αi 6= βi for some i. For
every ε > 0, there exists n > 0 and h : Fn → (−1)1/|αi−βi|, 1 such that
• ‖h‖T 6 ε, and
• Letting g(x) = h(x)|αi−βi|, we have∣∣E[g]
∣∣ 6 ε.
We defer the proof of this Lemma to later in the section. Define f as
f(x) :=
1 h(x) = 1
0 Otherwise
Since∣∣Ehc
∣∣ 6 ε we have that ‖f‖1 > 1/3. By definition of f and the fact that either
αa − 1 6= 0 or βa 6= 0 we have
∣∣∣EXh(Li(X)) · rL,α−ea,β(f)(X)
∣∣∣ = tL,α,β(f) > 3−|α,β|, (8.8)
where the inequality follows from the following simple observation.
159
Claim 8.4.2. Let T = (L, α, β) be such that ‖·‖T is a norm. Then for every f : Fn → R>0
we have
‖f‖T > ‖f‖1
Proof. This is an immediate corollary to Theorem 8.3.3. Letting g := 1, we have ‖g‖|α,β|−1T =
1, and
‖f‖1 =∣∣E [f(L1(X)) · rL,α−e1,β(g)(X)
]∣∣ 6 ‖f‖T .
On the other hand since αa 6= 0, by Theorem 8.3.3 we have
∣∣∣EXh(Li(X)) · rL,α−ea,β(f)(X)
∣∣∣ 6 ‖h‖T 6 ε,
which for small enough ε contradicts Eq. (8.8).
Now we prove that if α = β then there is a universal s such that for every i ∈ [l],
αi = βi = s/2. Letting
s := maxiαi + βi,
we will prove that ‖·‖T ′ , for T ′ = (L, αs ,βs ), is a norm. This combined with Corollary 8.3.5
proves our claim. To prove that ‖·‖T ′ is a norm we will use the reverse direction of Theo-
rem 8.3.3 for a such that αa + βa = s. Notice that
∣∣∣E [g(La(X))rL,αs−ea,βs(f)(X)
] ∣∣∣ 6 E[rL,s·ea,0(|g|1/s)(X) · rL,α−s·ea,β(|f |1/s)(X)
]6 ‖|f |1/s‖|α,β|−sT ‖|g|1/s‖sT = ‖f‖
∣∣∣αs ,βs ∣∣∣−1
T ′ ‖g‖T ′ ,
where the last inequality follows from Lemma 8.3.8. The reason we could exchange from f
160
to |f | and back is that for every i, αi = βi.
Proof of Lemma 8.4.1. Set a := |αi− βi|. Let Ai be the set of vectors X such that Li(X) is
equal to some Lj(X), namely
A :=X ∈ (Fn)k : (∃j 6= i)
(Li(X) = Lj(X)
).
As we saw in the proof of Corollary 8.3.5,
|A| 6 l · pn(k−1). (8.9)
Let h : Fn → (−1)1/a, 1 be a random function which is a result of an independent Bernouli
process. Since ‖·‖T is a norm, then for every h we know tL,α,β(h) ∈ R>0. We will look at
the following expected value
EhtL,α,β(h) = E
hEXrL,α,β(h)(X)
= Eh
EXrL,α,β(h)(X)1A(X) + E
hEXrL,α,β(h)(X)1A(X).
By Eq. (8.9) we have
Eh
EXrL,α,β(h)(X)1A(X) 6
l
pn.
Moreover
Eh
EXrL,α,β(h)(X)1A(X) = 0,
which follows from the independence of h(Li(X)) from the other terms. Thus
EhtL,α,β(h) 6 l · p−n.
161
Moreover by Chernoff bound we have that
Prh
(∣∣Exh(x)a
∣∣ > p− 1
n1/3
)6 2e
− 1
2n1/3 .
Therefore for large enough n there exists h : Fn → −1, 1 with both ‖h‖T 6 ε and∣∣Ex h(x)∣∣ 6 ε.
Here we prove further that all type (I) norms must be invariant under affine transforma-
tions.
Theorem 8.4.3 (Affine Transformations). Every type (I) norm ‖·‖T defined by a system of
linear forms is invariant under affine transformations.
Proof. For b ∈ Fn define
f+b(x)def= f(x+ b).
We want to show that ‖f‖T = ‖f+b‖T . For every t ∈ F and j ∈ [n] define
fjt (X)
def= f(x+ tej),
then for every α ∈ F by Cauchy-Schwarz type inequality we have
‖f jt ‖mT 6
p−1∏i=0
‖f jt+αi‖ciT , (8.10)
where ci is the number of linear forms La such that La,1 = i plus the number of linear
forms Mb such that Mb,1 = i. Notice that this inequality holds for every t and α, thus for
every f one can move the fjt with maximum ‖f jt ‖T to the left hand side of Eq. (8.10), which
implies invariance of norm under transformations of type tej . Since this holds for every j,
‖·‖T is invariant under any affine transformation.
162
8.5 Topology
Theorem 8.2.2 states that norms defined by linear forms can only be of two types, type
(I) and type (II). In this subsection we study the topology that such norms induce on
bounded functions on finite dimensional vector spaces. We prove every type (I) norm || · ||T
is essentially equivalent to Gowers norm of a proper order, where the order depends only on
the system of linear forms defining || · ||T . We also show that type (II) norms are equivalent
to L1 norm and thus Lp norms. For a formal statement we have to first define what we
mean by equivalence of norms over bounded functions.
Definition 8.5.1 (Equivalence of Norms on Bounded Functions). We say that two norms
‖·‖A and ‖·‖B are equivalent on the space of bounded functions Fn → D, n < ∞, if for
every sequence of functions fi : Fni → Di,ni∈N we have
limi→∞‖fi‖A = 0 if and only if lim
i→∞‖fi‖B = 0.
Type (II) norms turn out to be easy to handle. The following lemma relates type (II)
norms to the L1 norm.
Lemma 8.5.2. Let T = (L, s), be such that ‖·‖T is a type (II) norm. Then for every
g : Fn → D we have
• ‖g‖1 6 ‖g‖T , and
• ‖g‖T 6 ‖g‖(1/ls)1 .
Proof. The first part follows from Theorem 8.3.3 letting f(x) := 1 for all x ∈ Fn, and the
second part is a simple consequence of the fact that ‖g‖∞ 6 1.
This immediately implies the following corollary.
163
Corollary 8.5.3. Let T = (L, s), be such that ‖·‖T is a type (II) norm. Then ‖·‖T is
equivalent to L1 norm on bounded complex functions.
Let || · ||T be a type (I) norm. We show that there exists some integer z > 0 such that for
every bounded function f , ‖f‖T is small if and only if ‖f‖Uz+1 is small. This is the statement
of the next theorem which we will prove in Section 8.6. Unfortunately for technical reasons
we have to make an assumption that p, the characteristic of the underlying field, is larger
than the complexity of the system of linear forms defining T .
Theorem 8.5.4. Let T = (L,M) be such that ‖·‖T is a type (I) norm and L ∪ M is
of Cauchy-Schwarz complexity s 6 p. Then there exist an integer z 6 s and functions
δ1, δ2 : R>0 → R>0 such that for every n and f : Fn → D,
i. If ‖f‖T 6 ε then ‖f‖Uz+1 6 δ1(ε), and
ii. If ‖f‖Uz+1 6 ε then ‖f‖T 6 δ2(ε).
This in particular implies that ‖·‖T is equivalent to Gowers’ Uz+1 norm.
Corollary 8.5.5. Let T = (L,M), be such that ‖·‖T is a type (I) norm. Then there exists
z > 0 such that ‖‖T is equivalent to Gowers’ Uz+1 norm on the space of bounded complex
functions.
8.6 Type (I) norms
In the case of type (I) norms, we may write T = (L,M) where |L| = l, and |M | = m,
defining the norm by
tL,M(f) = E
l∏i=1
f(Li(X))m∏i=1
f(Mi(X))
,and
164
‖f‖m+lT = tL,M(f).
By Theorem 8.4.3 we know that that L and M are systems of affine linear forms. The-
orem 8.5.4 states a relation between type (I) norms and Gowers’ Uz+1 norm of a function
for some integer z > 0. Here we will show how z is derived for two given systems of linear
forms L,M and in Lemma 8.6.2 and Corollary 8.6.3 will show that in the case when the
linear forms are affine such z(L,M) exists.
Definition 8.6.1. Let T = (L,M). Define z(L,M) as maximum d > 1 for which
L⊗d1 + L⊗d2 + · · ·+ L⊗dl −M⊗d1 −M⊗d2 − · · · −M⊗dm = 0 mod p,
where L⊗di is the d-th tensor power of Li. Existence of such z is the subject of Corollary 8.6.3.
Lemma 8.6.2. Let L1, ..., Lm ∈ Fk be a set of distinct linear forms. Assume that α ∈ Fm\0.
There exists z such that ∑i
αiL⊗zi 6= 0 mod p.
Proof. Let αL = αi if L = Li and αL = 0 otherwise, where L ranges over all linear forms over
k variables, namely Fk. So α ∈ Fpk and if no such z exists then∑L∈Fk αL
∏kj=1 L
zjj = 0
for all w ∈ Fk. Let M be the pk × pk matrix over F defined as
ML,w =k∏j=1
Lzjj .
(note that we also allow w = (0, 0, ..., 0); and 00 = 1, 0a = 0 for 1 6 a 6 p− 1). Then if no
such z exists then Mα = 0. However, we can factor M as the k-th tensor power of the p× p
matrix M ′ given by (M ′)a,b = ab mod p. Then M ′ is a Vandermonde matrix and hence
det(M ′) 6= 0. So also det(M) = det(M ′)k 6= 0, hence we must have α = 0.
165
Corollary 8.6.3. Let L1, ..., Ll,M1, ...,Mm ∈ Fk be a set of distinct affine linear forms.
There exists a maximal z > 0 such that for every d 6 z,
∑i
L⊗di −∑j
M⊗dj ≡ 0 mod p.
Proof. Since
∑i
L⊗zi −∑j
M⊗zj ≡∑i
L⊗zi +∑j
(p− 1) ·M⊗zj mod p,
by Lemma 8.6.2 there exists a finite maximal z such that
∑i
L⊗zi −∑j
M⊗zj ≡ 0 mod p.
This completes the proof because for an affine linear form L, L⊗d is a sub-vector of L⊗z for
d 6 z .
Claim 8.6.4. If ‖·‖T is a norm then z(L,M) > 0, namely
l∑i=1
Li −m∑i=1
Mi ≡ 0 mod p.
Proof. Assume for contradiction that z(L,M) = 0. We look at the linear Fourier characters
χq, for q ∈ Fn, notice that since∑li=1 Li −
∑mi=1Mi 6≡ 0 mod p, thus ‖χq‖T = 0, unless
q = 0. This is in contradiction with ‖·‖T being a norm.
Theorem 8.5.4 (restated). Let T = (L,M) be such that ‖·‖T is a type (I) norm and
L∪M is of Cauchy-Schwarz complexity s 6 p. Then there exist an integer z = z(L,M) 6 s
and functions δ1, δ2 : R>0 → R>0 such that for every n and f : Fn → D,
i. If ‖f‖T 6 ε then ‖f‖Uz+1 6 δ1(ε), and
166
ii. If ‖f‖Uz+1 6 ε then ‖f‖T 6 δ2(ε).
Remark 8.6.5. Notice that z(L,M) 6 s 6 p, where s is the Cauchy-Schwarz complexity of
the system of linear forms. This is because for any s < d and a high enough rank polynomial
P of degree 6 d+1, we would have ‖eF(P )‖Us+1 6 ε. But at the same time ‖eF(P )‖L,M = 1
which would contradict Lemma 2.4.3.
We will prove Theorem 8.5.4 in two parts through the next two subsections, namely
Lemma 8.6.6 and Theorem 8.6.8.
8.6.1 Theorem 8.5.4, the Direct Part
Lemma 8.6.6 (Direct Theorem). Let T = (L,M) be such that ‖·‖T is a norm. For every
ε there is a δ1(ε) such that for every n, for every function f : Fn → D, with ‖f‖T 6 δ1(ε)
we have ‖f‖Uz(L,M)+1 6 ε.
Proof. Assume that ‖f‖T 6 ε, then for every polynomial q of degree d 6 z
∣∣〈f, q〉∣∣ 6 ‖f · q‖T = ‖f‖T 6 ε,
where the first inequality follows by Corollary 8.3.4 and the equality is the subject of the
following claim.
Claim 8.6.7. Let T = (L,M). Let f : Fn → D, and q be a non-classical polynomial of
degree d with d 6 z(L,M). We have
tL,M (f · eF(q)) = tL,M(f).
Proof. Since d 6 z(L,M)
167
rL,M(f · eF(q)) = rL,M(f)l∏
i=1
eF(q(Li(X))m∏i=1
eF(−q(Mi(X))
= rL,M(f) · eF
l∑i=1
q(Li(X))−m∑i=1
q(Mi(X))
= rL,M(f),
where the last equality is due to the fact that all the coefficients of monomials of∑li=1 q(Li(X))−
∑mi=1 q(Mi(X)) are zero since d 6 z(L,M).
We conclude by the inverse theorem for Gowers Norms (Theorem 2.1.10).
8.6.2 Theorem 8.5.4, the Inverse Part
Here we will prove the following inverse theorem.
Theorem 8.6.8 (Inverse Theorem). Let T = (L,M) be such that ‖·‖T is a norm. Assume
that L ∪M has Cauchy-Schwarz complexity d 6 p. For every ε there exists δ2(ε) such that
for every function f : Fn → D, with ‖f‖Uz(L,M)+1 6 δ2(ε) we have ‖f‖T 6 ε.
Proof. First of all by Theorem 8.4.3 we may assume that L and M are systems of affine
linear forms, namely we can assume that for every i ∈ [l] and j ∈ [m], Li,1 = 1 and Mj,1 = 1.
Following is a list of the constants we will be working with.
(1) Let z := z(L,M).
(2) Let δ′ = (ε/3)m+l.
(3) Let r(C) := τp,d
((ε/3)m+lp−C
), where τp,d(·) is the function in Lemma 2.2.20.
Let f : Fn → D be a function with ‖f‖Uz+1 6 δ2 = (ε/3)m+lp−C?, where C? =
Cmax(p, d, δ′, r(·)) is the constant from Theorem 2.3.1. By Theorem 2.3.1 and Remark 2.3.2,
we can write
f = f1 + f2,
168
such that
f1 = E(f |B), and ‖f2‖Ud+1 6 δ′,
where B is a polynomial factor of degree at most d, complexity C 6 Cmax(p, d, δ′, r(·)), and
rank at least r(C). Moreover B is defined by homogeneous polynomials.
By Lemma 2.4.3,
‖f2‖T 6 ‖f2‖1/m+lUs+1 6 ε/3. (8.11)
Now we will bound ‖f1‖T . Since f1 = Γ(P1, ..., PC), for a set of homogeneous polynomials
Pi ∈ Poly6s, we can write the degree s Fourier decomposition of f ,
f1(x) =∑γ∈FC
Γ(γ)eF
C∑i=1
γ(i)Pi(x)
.
Therefore
‖f1‖T =
∣∣∣∣∣∣∣∣∣∣∣∣∑γ∈FC
Γ(γ)eF
C∑i=1
γ(i)Pi(x)
∣∣∣∣∣∣∣∣∣∣∣∣T
6∑γ∈FC
∣∣∣Γ(γ)∣∣∣ ·∣∣∣∣∣∣∣∣∣∣∣∣eF(
C∑i=1
γ(i)Pi(X))
∣∣∣∣∣∣∣∣∣∣∣∣T
, (8.12)
where the last inequality is the triangle inequality and positive homogeneity for the ‖·‖T
norm. Looking at the terms in the RHS of Eq. (8.12), for each γ ∈ FC one of the following
holds:
(i) (∑Ci=1 γ(i)Pi(X)) is of degree 6 z.
(ii) (∑Ci=1 γ(i)Pi(X)) has multinomial of degree higher than z.
Case (i): By Remark 2.2.6 and the fact that ‖f‖Uz+1 6 (ε/3)m+lp−C?6 (ε/3)m+lp−C ,
using the direct theorem for Gowers norms we have that the terms satisfying (i) contribute
at most (ε/3)m+l to the RHS of Eq. (8.12).
169
Case (ii): Since the degree of such Polynomials is higher than z, by Lemma 2.2.20 we have
l∑j=1
C∑i=1
γ(i)Pi(Lj(X))−m∑j=1
C∑i=1
γ(i)Pi(Mj(X)) 6≡ 0,
and that the terms satisfying this condition contribute at most (ε/3)m+l to the RHS of
Eq. (8.12).
Finally adding everything up and using the triangle inequality we conclude
‖f‖T 6 ‖f1‖T + ‖f2‖T 6 2(ε/3)m+l + ε/3 6 ε,
since m+ l > 1.
170
CHAPTER 9
CONCLUSION
Algebraic property testing. In Chapter 6 we proved the following characterization the-
orem for affine invariant properties defined by induced affine constraints.
Theorem 6.4.5 (restated). For any integer d > 0 and (possibly infinite) fixed collection
A = (A1, σ1), (A2, σ2), . . . , (Ai, σi), . . . of induced affine constraints, each of complexity
6 d, there are functions qA : (0, 1)→ Z+, δA : (0, 1)→ (0, 1) and a tester T which, for every
ε > 0, makes qA(ε) queries, accepts A-free functions and rejects functions ε-far from A-free
with probability at least δA(ε). Moreover, qA is a constant if A is of finite size.
This implies that every locally characterized affine-invariant property is proximity-obliviously
testable (Theorem 6.1.1), however in the case of testability, our main theorem implies that
every subspace hereditary property of bounded complexity is testable (Theorem 6.3.3). This
leaves the following conjecture of [17] open for the case of unbounded complexity.
Conjecture 6.3.2 [17] (restated). Every subspace hereditary property is testable.
Affine invariant properties of unbounded complexity are those which can only be defined
by an infinite collection of induced affine constraints with unbounded Cauchy-Schwarz com-
plexity. We are not aware of any natural property of unbounded complexity, and moreover
suspect that the answer to the above conjecture should be in the negative.
Another interesting question is whether Theorem 6.4.5 can be extended to linear-invariant
properties. The proof of Theorem 6.4.5 makes use of near-equidistribution of regular poly-
nomial factors over affine collections of linear forms (Theorem 2.2.21). Although this near-
equidistribution property was not known for general collections of linear forms at the time
in [15], we now have such a general equidistribution (Theorem 4.1.1, [50]). However, it is
still unclear how to adapt the rest of our proof to the linear-invariant case.
171
Reed-Muller codes beyond list decoding radius. Our decoder in Section 7.5.3 only
works for a P which is a polynomial of degree d < p. It is a fascinating problem to find out
if this can be extended to general functions. A first step to this program would be to get
a deeper understanding of cubic polynomials. The known testers and decoders in this case
follow the same steps as the proofs of the inverse theorems for the Gowers norms [68, 79, 18].
However, in some cases, even a combinatorial proof of such inverse theorems is not yet
available. It has been seen, in the context of property testing, that one can bypass such
obstacles by considering algorithms that require the inverse theorems and decomposition
theorems only in their analysis [52, 15]. It is therefore plausible to hope for such algorithms
for the decoding question as well.
Norms defined by linear forms. In Chapter 8, we gave necessary conditions for a
collection of linear forms defining a norm (Theorem 8.2.2). A great and possibly hard open
problem is to give a full characterization of collections of linear forms that define a norm,
and we think of our result as a very small step towards this. Theorem 8.2.2 shows that
a system of linear forms that defines a norm must be one of only two types, one of which
(type (II)) is easily seen to be essentially equivalent to Lp norms. Next, we prove under the
assumption of |F| being sufficiently large, that type (I) norms are essentially the same as
some Gowers norm (Theorem 8.5.4). We believe that the assumption on the size of the field
is unnecessary, meaning that every type (I) norm is equivalent to a Gowers norm, and we
leave this as an open problem.
172
REFERENCES
[1] Noga Alon, Richard A. Duke, Hanno Lefmann, Vojtech Rodl, and Raphael Yuster. Thealgorithmic aspects of the regularity lemma. J. Algorithms, 16(1):80–109, 1994.
[2] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. Efficient testing oflarge graphs. Combinatorica, 20(4):451–476, 2000.
[3] Noga Alon, Eldar Fischer, Ilan Newman, and Asaf Shapira. A combinatorial charac-terization of the testable graph properties: it’s all about regularity. SIAM J. Comput.,39(1):143–167, 2009.
[4] Noga Alon, Tali Kaufman, Michael Krivelevich, Simon Litsyn, and Dana Ron. TestingReed-Muller codes. IEEE Trans. Inform. Theory, 51(11):4032–4039, 2005.
[5] Noga Alon and Assaf Naor. Approximating the cut-norm via Grothendieck’s inequality.SIAM J. Comput., 35(4):787–803, 2006.
[6] Noga Alon and Asaf Shapira. A characterization of the (natural) graph propertiestestable with one-sided error. SIAM J. on Comput., 37(6):1703–1727, 2008.
[7] Noga Alon and Asaf Shapira. Every monotone graph property is testable. SIAM J. onComput., 38(2):505–522, 2008.
[8] L. Babai, L. Fortnow, and C. Lund. Addendum to: “Nondeterministic exponentialtime has two-prover interactive protocols” [Comput. Complexity 1 (1991), no. 1, 3–40;MR1113533 (92h:68031)]. Comput. Complexity, 2(4):374, 1992.
[9] Laszlo Babai, Lance Fortnow, Leonid A. Levin, and Mario Szegedy. Checking compu-tations in polylogarithmic time. In Proc. 23rd Annual ACM Symposium on the Theoryof Computing, pages 21–32, New York, 1991. ACM Press.
[10] Eli Ben-Sasson, Noga Ron-Zewi, Madhur Tulsiani, and Julia Wolf. Sampling-basedproofs of almost-periodicity results and algorithmic applications. In Automata, Lan-guages, and Programming, pages 955–966. Springer, 2014.
[11] Vitaly Bergelson, Terence Tao, and Tamar Ziegler. An inverse theorem for the unifor-mity seminorms associated with the action of F∞p . Geom. Funct. Anal., 19(6):1539–1596,2010.
[12] Arnab Bhattacharyya. Guest column: On testing affine-invariant properties over finitefields. ACM SIGACT News, 44(4):53–72, 2013.
[13] Arnab Bhattacharyya. Polynomial decompositions in polynomial time. Technical report,Electronic Colloquium on Computational Complexity (ECCC), 2014.
[14] Arnab Bhattacharyya, Victor Chen, Madhu Sudan, and Ning Xie. Testing linear-invariant non-linear properties. Theory Comput., 7:75–99, 2011.
[15] Arnab Bhattacharyya, Eldar Fischer, Hamed Hatami, Pooya Hatami, and ShacharLovett. Every locally characterized affine-invariant property is testable. In Proceed-ings of the 45th annual ACM symposium on Symposium on theory of computing, STOC’13, pages 429–436, New York, NY, USA, 2013. ACM.
173
[16] Arnab Bhattacharyya, Eldar Fischer, and Shachar Lovett. Testing low complexity affine-invariant properties. In Proc. 24th ACM-SIAM Symposium on Discrete Algorithms,pages 1337–1355, 2013.
[17] Arnab Bhattacharyya, Elena Grigorescu, and Asaf Shapira. A unified framework fortesting linear-invariant properties. In Proc. 51st Annual IEEE Symposium on Founda-tions of Computer Science, pages 478–487, 2010.
[18] Arnab Bhattacharyya, Pooya Hatami, and Madhur Tulsiani. Algorithmic regularity forpolynomials and applications. In Proceedings of the Twenty-Sixth Annual ACM-SIAMSymposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6,2015, pages 1870–1889, 2015.
[19] Abhishek Bhowmick and Shachar Lovett. List decoding Reed-Muller codes over finitefields. STOC ’15, 2015.
[20] Manuel Blum, Michael Luby, and Ronitt Rubinfeld. Self-testing/correcting with appli-cations to numerical problems. In Proceedings of the 22nd Annual ACM Symposium onTheory of Computing (Baltimore, MD, 1990), volume 47, pages 549–595, 1993.
[21] Andrej Bogdanov and Emanuele Viola. Pseudorandom bits for polynomials. In Proc.48th Annual IEEE Symposium on Foundations of Computer Science, pages 41–51,Washington, DC, USA, 2007. IEEE Computer Society.
[22] Christian Borgs, Jennifer Chayes, Laszlo Lovasz, Vera T. Sos, Balazs Szegedy, andKatalin Vesztergombi. Graph limits and parameter testing. In STOC’06: Proceedingsof the 38th Annual ACM Symposium on Theory of Computing, pages 261–270. ACM,New York, 2006.
[23] Uriel Feige, Shafi Goldwasser, Laszlo Lovasz, Shmuel Safra, and Mario Szegedy. In-teractive proofs and the hardness of approximating cliques. J. ACM, 43(2):268–292,1996.
[24] Eldar Fischer. The art of uninformed decisions: A primer to property testing. InG. Paun, G. Rozenberg, and A. Salomaa, editors, Current Trends in Theoretical Com-puter Science: The Challenge of the New Century, volume 1, pages 229–264. WorldScientific Publishing, 2004.
[25] Eldar Fischer, Arie Matsliah, and Asaf Shapira. Approximate hypergraph partitioningand applications. SIAM J. Comput., 39(7):3155–3185, 2010.
[26] Eldar Fischer and Ilan Newman. Testing versus estimation of graph properties. SIAMJ. Comput., 37(2):482–501 (electronic), 2007.
[27] Alan Frieze and Ravi Kannan. The regularity lemma and approximation schemes fordense problems. In 37th Annual Symposium on Foundations of Computer Science(Burlington, VT, 1996), pages 12–20. IEEE Comput. Soc. Press, Los Alamitos, CA,1996.
[28] Alan M. Frieze and Ravi Kannan. A simple algorithm for constructing Szemeredi’sregularity partition. Electr. J. Comb., 6, 1999.
174
[29] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connectionto learning and approximation. J. ACM, 45(4):653–750, 1998.
[30] Oded Goldreich and Tali Kaufman. Proximity oblivious testing and the role of invari-ances. In Approximation, randomization, and combinatorial optimization, volume 6845of Lecture Notes in Comput. Sci., pages 579–592. Springer, Heidelberg, 2011.
[31] Oded Goldreich and Leonid A Levin. A hard-core predicate for all one-way functions. InProceedings of the twenty-first annual ACM symposium on Theory of computing, pages25–32. ACM, 1989.
[32] Oded Goldreich and Dana Ron. On proximity-oblivious testing. SIAM J. Comput.,40(2):534–566, 2011.
[33] Parikshit Gopalan. A Fourier-analytic approach to Reed-Muller decoding. IEEE Trans.Inform. Theory, 59(11):7747–7760, 2013.
[34] Parikshit Gopalan, Ryan O’Donnell, Rocco A. Servedio, Amir Shpilka, and Karl Wim-mer. Testing Fourier dimensionality and sparsity. In Proc. 36th Annual InternationalConference on Automata, Languages, and Programming, pages 500–512, 2009.
[35] W. T. Gowers. A new proof of Szemeredi’s theorem. Geom. Funct. Anal., 11(3):465–588,2001.
[36] W. T. Gowers. Decompositions, approximate structure, transference, and the Hahn-Banach theorem. Bull. Lond. Math. Soc., 42(4):573–606, 2010.
[37] W. T. Gowers and J. Wolf. The true complexity of a system of linear equations. Proc.Lond. Math. Soc. (3), 100(1):155–176, 2010.
[38] W. T. Gowers and J. Wolf. Linear forms and higher-degree uniformity for functions onFnp . Geom. Funct. Anal., 21(1):36–69, 2011.
[39] W. T. Gowers and J. Wolf. Linear forms and quadratic uniformity for functions on Fnp .Mathematika, 57(2):215–237, 2011.
[40] Ben Green. Montreal notes on quadratic Fourier analysis. In Additive combinatorics,volume 43 of CRM Proc. Lecture Notes, pages 69–102. Amer. Math. Soc., Providence,RI, 2007.
[41] Ben Green and Terence Tao. An inverse theorem for the Gowers U3-norm. Proc. Edin.Math. Soc., 51:73–153, 2008.
[42] Ben Green and Terence Tao. The primes contain arbitrarily long arithmetic progressions.Ann. of Math. (2), 167(2):481–547, 2008.
[43] Ben Green and Terence Tao. The distribution of polynomials over finite fields, withapplications to the Gowers norms. Contrib. Discrete Math., 4(2):1–36, 2009.
[44] Ben Green, Terence Tao, and Tamar Ziegler. An inverse theorem for the Gowers u4-norm. Glasg. Math. J., 53(1):1–50, 2011.
175
[45] Ben Green, Terence Tao, and Tamar Ziegler. An inverse theorem for the Gowers us+1[n]-norm. Ann. of Math. (2), 176(2):1231–1372, 2012.
[46] Benjamin Green and Terence Tao. Linear equations in primes. Ann. of Math. (2),171(3):1753–1850, 2010.
[47] Hamed Hatami. On generalizations of Gowers norms. PhD thesis, University of Toronto,2009.
[48] Hamed Hatami. Graph norms and Sidorenko’s conjecture. Israel Journal of Mathemat-ics, 175(1):125–150, 2010.
[49] Hamed Hatami, Pooya Hatami, and James Hirst. Limits of Boolean functions on Fnp .Electron. J. Combin., 21(4):Paper 4.2, 15, 2014.
[50] Hamed Hatami, Pooya Hatami, and Shachar Lovett. General systems of linear forms:equidistribution and true complexity. arXiv preprint arXiv:1403.7703, 2014.
[51] Hamed Hatami and Shachar Lovett. Higher-order Fourier analysis of Fnp and the com-plexity of systems of linear forms. Geom. Funct. Anal., 21(6):1331–1357, 2011.
[52] Hamed Hatami and Shachar Lovett. Estimating the distance from testable affine-invariant properties. In Foundations of Computer Science (FOCS), 2013 IEEE 54thAnnual Symposium on, pages 237–242. IEEE, 2013.
[53] Charanjit S. Jutla, Anindya C. Patthak, Atri Rudra, and David Zuckerman. Testinglow-degree polynomials over prime fields. In Proc. 45th Annual IEEE Symposium onFoundations of Computer Science, pages 423–432, 2004.
[54] Tali Kaufman and Shachar Lovett. Worst case to average case reductions for poly-nomials. Foundations of Computer Science, IEEE Annual Symposium on, 0:166–175,2008.
[55] Tali Kaufman, Shachar Lovett, and Ely Porat. Weight distribution and list-decodingsize of reed–muller codes. Information Theory, IEEE Transactions on, 58(5):2689–2696,2012.
[56] Tali Kaufman and Dana Ron. Testing polynomials over general fields. SIAM J. onComput., 36(3):779–802, 2006.
[57] Tali Kaufman and Madhu Sudan. Algebraic property testing: the role of invariance. InProc. 40th Annual ACM Symposium on the Theory of Computing, pages 403–412, 2008.
[58] Yoshiharu Kohayakawa, Vojtech Rodl, and Lubos Thoma. An optimal algorithm forchecking regularity. SIAM J. Comput., 32(5):1210–1235, 2003.
[59] Janos Komlos, Ali Shokoufandeh, Miklos Simonovits, and Endre Szemeredi. The reg-ularity lemma and its applications in graph theory. In Theoretical aspects of computerscience (Tehran, 2000), volume 2292 of Lecture Notes in Comput. Sci., pages 84–112.Springer, Berlin, 2002.
176
[60] Daniel Kral, Oriol Serra, and Lluis Vena. A removal lemma for systems of linear equa-tions over finite fields. Israel J. Math., 187:193–207, 2012.
[61] Laszlo Lovasz and Balazs Szegedy. Szemeredi’s lemma for the analyst. Geom. Funct.Anal., 17(1):252–270, 2007.
[62] Shachar Lovett. Unconditional pseudorandom generators for low degree polynomials.Theory of Computing, 5(1):69–82, 2009.
[63] Shachar Lovett, Roy Meshulam, and Alex Samorodnitsky. Inverse conjecture for theGowers norm is false. Theory Comput., 7:131–145, 2011.
[64] Dana Ron. Algorithmic and analysis techniques in property testing. Foundations andTrends in Theoretical Computer Science, 5(2):73–205, 2009.
[65] Ronitt Rubinfeld. Sublinear time algorithms. In Proceedings of International Congressof Mathematicians 2006, volume 3, pages 1095–1110, 2006.
[66] Ronitt Rubinfeld and Madhu Sudan. Robust characterizations of polynomials withapplications to program testing. SIAM J. Comput., 25(2):252–271, 1996.
[67] Imre Z. Ruzsa and Endre Szemeredi. Triple systems with no six points carrying three tri-angles. In Combinatorics (Proc. Fifth Hungarian Colloq., Keszthely, 1976), Vol. II, vol-ume 18 of Colloq. Math. Soc. Janos Bolyai, pages 939–945. North-Holland, Amsterdam-New York, 1978.
[68] Alex Samorodnitsky. Low-degree tests at large distances. In STOC’07—Proceedings ofthe 39th Annual ACM Symposium on Theory of Computing, pages 506–515. ACM, NewYork, 2007.
[69] Asaf Shapira. Green’s conjecture and testing linear-invariant properties. In STOC’09—Proceedings of the 2009 ACM International Symposium on Theory of Computing, pages159–166. ACM, New York, 2009.
[70] Madhu Sudan. Invariance in property testing. Technical Report 10-051, ElectronicColloquium in Computational Complexity, March 2010.
[71] Balazs Szegedy. Gowers norms, regularization and limits of functions on abelian groups.arXiv preprint arXiv:1010.6211, 2010.
[72] Balazs Szegedy. On higher order fourier analysis. arXiv preprint arXiv:1203.2260, 2012.
[73] E. Szemeredi. On sets of integers containing no k elements in arithmetic progression.Acta Arith., 27:199–245, 1975. Collection of articles in memory of Juriui VladimirovivcLinnik.
[74] Endre Szemeredi. Regular partitions of graphs. In Problemes combinatoires et theoriedes graphes (Colloq. Internat. CNRS, Univ. Orsay, Orsay, 1976), volume 260 of Colloq.Internat. CNRS, pages 399–401. CNRS, Paris, 1978.
[75] Terence Tao. Structure and randomness in combinatorics. In Foundations of ComputerScience, 2007. FOCS’07. 48th Annual IEEE Symposium on, pages 3–15. IEEE, 2007.
177
[76] Terence Tao. Higher order Fourier analysis, volume 142. American Mathematical Soc.,2012.
[77] Terence Tao and Tamar Ziegler. The inverse conjecture for the Gowers norm over finitefields via the correspondence principle. Anal. PDE, 3(1):1–20, 2010.
[78] Terence Tao and Tamar Ziegler. The inverse conjecture for the Gowers norm over finitefields in low characteristic. Ann. Comb., 16(1):121–188, 2012.
[79] Madhur Tulsiani and Julia Wolf. Quadratic Goldreich-Levin theorems. In Foundationsof Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 619–628.IEEE, 2011.
[80] Emmanuele Viola. The sum of D small-bias generators fools polynomials of degree D.Computational Complexity, 18(2):209–217, 2009.
[81] Yuichi Yoshida. A characterization of locally testable affine-invariant properties viadecomposition theorems. In Proceedings of the 46th Annual ACM Symposium on Theoryof Computing, pages 154–163. ACM, 2014.
[82] Yuichi Yoshida. Gowers norm, function limits, and parameter estimation. arXiv preprintarXiv:1410.5053, 2014.
[83] Yufei Zhao. An arithmetic transference proof of a relative Szemeredi theorem. Math.Proc. Cambridge Philos. Soc., 156(2):255–261, 2014.
178