Hardness of approximating the minimum distance of a linear...

22 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 1, JANUARY 2003

Hardness of Approximating the MinimumDistance of a Linear Code

Ilya Dumer, Member, IEEE, Daniele Micciancio, Member, IEEE, and Madhu Sudan, Member, IEEE

Abstract—We show that the minimum distance of a linearcode is not approximable to within any constant factor in randompolynomial time (RP), unless nondeterministic polynomial time(NP) equals RP. We also show that the minimum distance is notapproximable to within an additive error that is linear in theblock length of the code. Under the stronger assumption thatNP is not contained in random quasi-polynomial time (RQP), weshow that the minimum distance is not approximable to within thefactor 2log ( ), for any 0. Our results hold for codes overany finite field, including binary codes. In the process, we showthat it is hard to find approximately nearest codewords even ifthe number of errors exceeds the unique decoding radius 2 byonly an arbitrarily small fraction . We also prove the hardnessof the nearest codeword problem for asymptotically good codes,provided the number of errors exceeds(2 3) . Our results forthe minimum distance problem strengthen (though using strongerassumptions) a previous result of Vardy who showed that theminimum distance cannot be computedexactly in deterministicpolynomial time (P), unless P= NP. Our results are obtained byadapting proofs of analogous results for integer lattices due toAjtai and Micciancio. A critical component in the adaptation isour use of linear codes that perform better than random (linear)codes.

Index Terms—Approximation algorithms, bounded distance de-coding, computational complexity, dense codes, linear codes, min-imum-distance problem, NP-hardness, relatively near codewordproblem.

I. INTRODUCTION

I N this paper, we study the computational complexity of twocentral problems from coding theory: 1) the complexity of

approximating the minimum distance of a linear code and 2) thecomplexity of error correction in codes of relatively large min-imum distance. An error-correcting codeof block lengthover a -ary alphabet is a collection of strings (vectors) from

Manuscript received November 24, 2000; revised September 20, 2002. Thework of I. Dumer was supported in part by the National Science Foundationunder Grants NCR-9703844 and CCR-0097125. The work of D. Miccianciowas supported in part by the National Science Foundation CAREER AwardCCR-0093029. The work of M. Sudan was supported in part by a Sloan Foun-dation Fellowship, an MIT-NEC Research Initiation Grant, and the NationalScience Foundation CAREER Award CCR-9875511. The material in this paperwas presented in part at the 40th IEEE Annual Symposium on Foundations ofComputer Science (FOCS), New York, October 1999, and the IEEE Interna-tional Symposium on Information Theory (ISIT), Sorrento, Italy, June 2000.

I. Dumer is with the College of Engineering, University of California at River-side, Riverside, CA 92521 USA (e-mail: [email protected]).

D. Micciancio is with the Department of Computer Science and Engineering,University of California, San Diego, La Jolla, CA 92093 USA (e-mail:[email protected]).

M. Sudan is with the Department of Electrical Engineering and ComputerScience, Massachusetts Institute of Technology, Cambridge, MA 02139 USA(e-mail: [email protected]).

Communicated by N. I. Koblitz, Associate Editor for Complexity and Cryp-tography.

Digital Object Identifier 10.1109/TIT.2002.806118

, called codewords. For all codes considered in this paper, thealphabet size is always a prime power and the alphabetis the finite field with element. A code is linear if itis closed under addition and multiplication by a scalar, i.e.,isa linear subspace of over base field . For such a code, theinformation content (i.e., the number of informa-tion symbols that can be encoded with a codeword1 ) is just itsdimension as a vector space and the code can be compactly rep-resented by a generator matrix of rank suchthat . An important property of a code is itsminimum distance. For any vectors , the Hammingweight of is the number of nonzero coordinates of.The weight function is a norm, and the induced metric

is called the Hamming distance. The(minimum) distance of the code is the minimum Ham-ming distance taken over all pairs of distinct codewords

. For linear codes, it is easy to see that the minimumdistance equals the weight of the lightest nonzerocodeword . If is a linear code over withblock length , rank , and minimum distance, then it is cus-tomary to say that is a linear code. Throughout thepaper, we use the following notational conventions: matrices aredenoted by boldface uppercase Roman letters (e.g.,, ), theassociated linear codes are denoted by the corresponding calli-graphic letters (e.g., , ), and we write to meanthat is a linear code over with block length , informationcontent , and minimum distance.

A. The Minimum-Distance Problem

Three of the four central parameters associated with a linearcode, namely, , , and , are evident from its matrix representa-tion. The minimum distance problem (MINDIST) is that of evalu-ating the fourth—namely—given a generator matrixfind the minimum distance of the corresponding code. Theminimum distance of a code is obviously related to its error cor-rection capability and, therefore, finding is a fun-damental computational problem in coding theory. The problemgains even more significance in light of the fact that long-arycodes chosen at random give the best parameters known for any

(in particular, for ).2 A polynomial time algo-rithm to compute the distance would be the ideal solution to theproblem, as it could be used to construct good error-correctingcodes by choosing a generator matrix at random and checking

1Throughout the paper, we writelog for the logarithm to the base2, andlogwhen the baseq is any number possibly different from2.

2For square prime powersq � 49, linear algebraic-geometry (AG) codes canperform better than random ones [21] and are constructible in polynomial time.For all otherq � 46, it is still possible to do better than random codes, however,the best known procedures to construct them run in exponential time [24].

0018-9448/03$17.00 © 2003 IEEE

DUMER et al.: HARDNESS OF APPROXIMATING THE MINIMUM DISTANCE OF A LINEAR CODE 23

if the associated code has a large minimum distance. Unfor-tunately, no such algorithm is known. The complexity of thisproblem (can it be solved in polynomial time or not?) was firstexplicitly questioned by Berlekamp, McEliece, and van Tilborg[5] in 1978, who conjectured it to be nondeterministic polyno-mial time (NP)-complete. This conjecture was finally resolvedin the affirmative by Vardy [23], [22] in 1997, proving that theminimum distance cannot be computed in polynomial time un-less P NP. ([23], [22] also give further motivations and de-tailed account of prior work on this problem.)

To advance the search of good codes, one can relax the re-quirement of computing exactly in two ways.

• Instead of requiring the exact value of, one can allowfor approximate solutions, i.e., an estimatethat is guar-anteed to be at least as big, but not much bigger than thetrue minimum distance. (For example, forsome approximation factor).

• Instead of insisting on deterministic solutions that alwaysproduce correct (approximate) answers, one can considerrandomized algorithms such that only holds in aprobabilistic sense. (Say, with probability at least

.) 3

Such algorithms can still be used to randomly generate relativelygood codes as follows. Say we want a code with minimum dis-tance . We pick a generator matrix at random such that thecode is expected to have minimum distance. Then, we runthe probabilistic distance approximation algorithm many timesusing independent coin tosses. If all estimates returned by thedistance approximation algorithm are at least, then the min-imum distance of the code is at leastwith very high probability.

In this paper, we study these more relaxed versions of theminimum distance problem and show that the minimum dis-tance is hard to approximate (even in a probabilistic sense) towithin any constant factor, unless NP RP (random polyno-mial time) (i.e., every problem in NP has a polynomial timeprobabilistic algorithm that always rejects NO instances and ac-cepts YES instances with high probability). Under the strongerassumption that NP does not have random quasi-polynomial4

time algorithms (RQP), we prove that the minimum distanceof a code of block length is not approximable to within afactor of for any constant . (This is a natu-rally occurring factor in the study of the approximability of opti-mization problems—see the survey of Arora and Lund [4].) Ourmethods adapt the proof of the inapproximability of the shortestlattice vector problem (SVP) due to Micciancio [18] (see also[19]) which, in turn, is based on Ajtai’s proof of the hardness ofsolving SVP [2].

B. The Error-Correction Problem

In the process of obtaining the inapproximability result for theminimum-distance problem, we also shed light on the general

3One can also consider randomized algorithms with two-sided error, wherealso the lower boundd � d holds only in a probabilistic sense. All our resultscan be easily adapted to algorithms with two-sided error.

4f(n) is quasi-polynomial inn if it grows slower than2 for some con-stantc.

error-correction problem for linear codes. The simplest formu-lation is the nearest codeword problem (NCP) (also known asthe “maximum-likelihood decoding problem”). Here, the inputinstance consists of a generator matrix and a receivedword and the goal is to find the nearest codewordto . A more relaxed version is to estimate the minimum“errorweight” that is the distance to the nearest code-word, without necessarily finding codeword. The NCP is awell-studied problem: Berlekamp, McEliece, and van Tilborg[5] showed that it is NP-hard (even in its weight estimation ver-sion); and more recently, Arora, Babai, Stern, and Sweedyk [3]showed that the error weight is hard to approximate to withinany constant factor unless PNP, and within factorfor any , unless NP QP (deterministic quasi-polynomialtime). This latter result has been recently improved to inapprox-imability within

under the assumption that PNP by Dinur, Kindler, Raz, andSafra [9], [8]). On the positive side, NCP can be trivially ap-proximated within a factor . General, nontrivial approxima-tion algorithms have been recently discovered by Berman andKarpinski [6], who showed that (for any finite field ) NCP canbe approximated within a factor for any fixedin probabilistic polynomial time, and in deterministic poly-nomial time.

However, the NCP only provides a first cut at understandingthe error-correction problem. It shows that the error-correctionproblem is hard, if we try to decodeevery linear coderegardlessof theerror weight. In contrast, the positive results from codingtheory show how to perform error correction inspecific linearcodescorrupted byerrors of small weight(relative to the codedistance). Thus, the hardness of the NCP may come from one oftwo factors: 1) the problem attempts to decode every linear codeand 2) the problem attempts to recover from too many errors.Both issues have been raised in the literature [23], [22], but onlythe former has seen some progress [7], [25], [10], [20].

One problem that has been defined to study the latterphenomenon is the “bounded distance decoding problem”(BDD, see [23], [22]). This is a special case of the NCP, wherethe error weight is guaranteed (or “promised”) to be less than

. This case is motivated by the fact that within sucha distance, there may be at most one codeword and hencedecoding is clearly unambiguous. Also, this is the case wheremany of the classical error-correction algorithms (for sayBose–Chaudhuri–Hocquenghem (BCH) codes, Reed–Solomon(RS) codes, algebraic-geometry (AG) codes, etc.) work inpolynomial time.

C. Relatively Near Codeword Problem (RNC)

To compare the general NCP, and the more specific BDDproblem, we introduce a parameterized family of problems thatwe call therelatively near codeword problem(RNC). For real, RNC is the following problem.

Given a generator matrix of a linear codeof (not necessarily known) minimum distance, an integer

with the promise that , and a received word


, find a codeword within distancefrom . (Thealgorithm may fail if the promise is violated, or if no suchcodeword exists. In other words, in contrast to the papers[3] and [5], the algorithm is expected to work only when theerror weight is limited in proportion to the code distance.)

Both the NCP and the BDD are special cases of RNC: NCPRNC while BDD RNC . Until recently, not much

was known about RNC for constants , leave alone(i.e., the BDD problem). No finite upper bound on

can be easily derived from Aroraet al.’s NP-hardness proof forNCP [3]. (In other words, their proof does not seem to hold forRNC for any .) It turns out, as observed by Jainet al.[14], that Vardy’s proof of the NP-hardness of the minimum-distance problem also shows the NP-hardness of RNCfor

(and actually extends to some ).In this paper, we significantly improve upon this situation,

by showing NP-hardness (under randomized reductions) ofRNC for every bringing us much closer to an eventual(negative?) resolution of the BDD problem.

D. Organization

The rest of the paper is organized as follows. In Section II,we precisely define the coding problems studied in this paper,introduce some notation, and briefly overview the random re-duction techniques and coding theory notions used in the restof the paper. As explained in Section II, our proofs rely oncoding theoretic constructions that might be of independent in-terest, namely, the construction of codes containing unusuallydense clusters. These constructions constitute the main tech-nical contribution of the paper, and are presented in SectionIII (for arbitrary linear codes) and Section VI (for asymptot-ically good codes). The hardness of the relatively near code-word problem and the minimum-distance problem are provedin Sections IV and V, respectively. Similar hardness results forasymptotically good codes are presented in Section VII. Thereader mostly interested in computational complexity may wantto skip Section III at first reading, and jump directly to theNP-hardness results in Sections IV and V. Section VIII con-cludes with a discussion of the consequences and limitations ofthe proofs given in this paper, and related open problems.

II. BACKGROUND

A. Approximation Problems

In order to study the computational complexity of codingproblems, we formulate them in terms of promise problems. Apromiseproblem is a generalization of the familiar notion ofdecision problem. The difference is that in a promise problemnot every string is required to be either a YESor a NO instance.Given a string with the promise that it is either a YESor NO in-stance, one has to decide which of the two sets it belongs to.

The following promise problem captures the hardness of ap-proximating the minimum-distance problem within a factor.

Definition 1—Minimum Distance Problem:For prime powerand approximation factor , an instance of GAPDIST

is a pair , where and , such that

• is a YES instance if .

• is a NO instance if .

In other words, given a code and an integer with thepromise that either or , one mustdecide which of the two cases holds true. The relation betweenapproximating the minimum distance of and the abovepromise problem is easily explained. On the one hand, if onecan compute a -approximation to theminimum distance of the code, then one can easily solve thepromise problem above by checking whether or

. On the other hand, assume one has a decision oraclethat solves the promise problem above. Then, the minimum

distance of a given code can be easily approximated usingthe oracle as follows. Notice that always returns YES

while always returns NO. Using binary search, onecan efficiently find a number such that YES

and NO.5 This means that is not a NO

instance and is nota YESinstance, and the minimumdistance must lie in the interval .

Similarly, we can define the following promise problem tocapture the hardness of approximating RNCwithin a fac-tor .

Definition 2—Relatively Near Codeword Problem:Forprime power and factors and , an instance ofGAPRNC is a triple , where , ,and , such that and6



It is immediate that the problem RNC gets harder as in-creases, since changinghas the only effect of weakening thepromise . RNC is the hardest when inwhich case the promise is vacuously true and weobtain the familiar (promise version of) the nearest codewordproblem.

Definition 3—Nearest Codeword Problem:For prime powerand , an instance of GAPNCP is a triple ,

, , and , such that



The promise problem GAPNCP is NP-hard for every con-stant (cf. [3]7 ), and this result is critical to our hard-ness result(s). We define one last promise problem to study the

5By definition, the oracle can give any answer if the input is neither a YES

instance nor a NO one. So, it would be wrong to conclude that(AAA; d� 1) is aNO instance and(AAA; d) is a YES one.

6Strictly speaking, the conditiont < � �d(C ) is a promise and hence shouldbe added as a condition in both the YESand NO instances of the problem.

7To be precise, Aroraet al. [3] present the result only for binary codes. How-ever, their proof is valid for any alphabet. An alternate way to obtain the resultfor any prime power is to use a recent result of Håstad [13] who states his re-sult in linear-algebra (rather than coding-theoretic) terms. We will state and usesome of the additional features of the latter result in Section VII.


hardness of approximating the minimum distance of a code withlinear additive error.

Definition 4: For and prime power , letGAPDISTADD be the promise problem with instances

, where and , such that



B. Random Reductions and Techniques

The main result of this paper (see Theorem 22) is that approx-imation problem GAPDIST is NP-hard for any constant factor

under polynomialreverse unfaithful randomreductions(RUR-reductions, [15]), and for

it is hard under quasi-polynomial RUR-reductions. These areprobabilistic reductions that map NO instances always to NOinstances and YES instances to YES instances with high prob-ability. In particular, given a security parameter, all reductionspresented in this paper warrant that YES instances are properlymapped with probability in time.8 The ex-istence of a (random) polynomial time algorithm to solve thehard problem would imply NP RP, i.e., every problem in NPwould have a probabilistic polynomial time algorithm that al-ways rejects NO instances and accepts YES instances with highprobability.9 Similarly, hardness for NP under quasi-polynomialRUR-reductions implies that the hard problem cannot be solvedin RQP unless NP RQP. Therefore, similarly to a properNP-hardness result (obtained under deterministic polynomialreductions), hardness under polynomial RUR-reductions alsogives evidence of the intractability of a problem.

In order to prove these results, we first study the problemGAPRNC . We show that the error weight is hard to approxi-mate to within any constant factorand for any unlessNP RP (see Theorem 16). By using , we immediatelyreduce our error-correction problem GAPRNC to the min-imum-distance problem GAPDIST for any constant .We then use product constructions to “amplify” the constant andprove the claimed hardness results for the minimum-distanceproblem.

The hardness of GAPRNC for is obtained byadapting a technique of Micciancio [18], which is, in turn,based on the work of Ajtai [2] (henceforth Ajtai–Micciancio).They consider the analogous problem over the integers (rather

8Here, and in the rest of the paper, we use notationpoly (n) to denote anypolynomially bounded function ofn, i.e., any functionf(n) such thatf(n) =O(n ) for some constantc indepentent ofn.

9Notice that we are unable to prove that the existence of a polynomial timeapproximation algorithm implies the stronger containment NP� ZPP (ZPP isthe class of decision problemsL such that bothL and its complement are inRP, i.e., ZPP= RP\ coRP), as done, for example, in [12]. The difference isthat [12] proves hardness underunfaithful randomreductions (UR-reductions[15], i.e., reductions that are always correct on YES instances and often correcton NO instances). Hardness under UR-reductions implies that no polynomialtime algorithm exists unless NP� coRP, and, therefore, RP� NP� coRP. Itimmediately follows that coRP� RP and NP= RP= coRP= ZPP. However,in our case where RUR-reductions are used, we can only conclude that RP�

NP�RP, and therefore NP= RP, but it is not clear how to establish any relationinvolving the complementary class coRP and ZPP.

than finite fields) with Hamming distance replaced by Eu-clidean distance. Much of the adaptation is straightforward; infact, some of the proofs are even easier in our case due to theuse of finite fields. The main hurdle turns out to be in adaptingthe following combinatorial problem considered and solved byAjtai– Micciancio.

Given an integer construct, in time, integersan -dimensional lattice (i.e., a subset of closed

under addition and multiplication by an integer) with min-imum (Euclidean) distance and a vectorsuch that the (Euclidean) ball of radiusaround containsat least vectors from (where is a constant inde-pendent of ).

In our case, we are faced with a similar problem withre-placed by and Euclidean distance replaced by Hamming dis-tance. The Ajtai–Micciancio solution to the above problem in-volves number-theoretic methods and does not translate to oursetting. Instead, we show that if we consider a linear code whoseperformance (i.e., tradeoff between rate and distance) is betterthan that of a random code, and pick a random light vector in,then the resulting construction has the required properties. Wefirst solve this problem over sufficiently large alphabets usinghigh-rate RS codes. (See next subsection for the definition ofRS codes. This same construction has been used in the codingtheory literature to demonstrate limitations to the “list-decod-ability” of RS codes [16].) We then translate the result to smallalphabets using the well-known method of concatenating codes[11].

In the second part of the paper, we extend our methods to ad-dress asymptotically good codes. We show that even for suchcodes, the RNC is hard unless NPRP (see Theorem 31),though for these codes we are only able to prove the hardness ofGAPRNC for . Finally, we translate this to a result(see Theorem 32) showing that the minimum distance of a codeis hard to approximate to within an additive error that is linearin the block length of the code.

C. Coding Theory

For a through introduction to coding theory the reader is re-ferred to [17]. Here we briefly review some basic results as usedin the rest of the paper. Readers with an adequate backgroundin coding theory can safely skip to the next section.

A classical problem in coding theory is to determine forgiven alphabet , block length , and dimension , what isthe highest possible minimum distancesuch that there existsan linear code. The following two theorems givewell-known upper and lower bounds for.

Theorem 5—Gilbert–Varshamov (GV) Bound:For finitefield , block length , and dimension , there exists alinear code with minimum distance satisfying

(1)

where


is the volume of the-dimensional Hamming sphere of radius. For any , the smallest such that (1) holds true is

denoted . (Notice that the definition of depends both onthe information content and the block length. For brevity,we omit from the notation, as the block length is usually clearfrom the context.)

See [17] for a proof. The lower bound ongiven by theGilbert–Varshamov theorem (GV bound, for short) is not effec-tive, i.e., even if the theorem guarantees the existence of linearcodes with a certain minimum distance, the proof of the theorememploys a greedy algorithm that runs in exponential time, andwe do not know any efficient way to find such codes for small al-phabets . Interestingly, random linear codes meet the GVbound, i.e., if the generating matrix of the code is chosen uni-formly at random, then the minimum distance of the resultingcode satisfies the GV bound with high probability. However,given a randomly chosen generator matrix, it is not clear howto efficiently check whether the corresponding code meets theGV bound or not. We now give an upper bound on.

Theorem 6—Singleton Bound:For every linear code, the minimum distance is at most

See [17] for a proof. A code is calledmaximumdistance separable(MDS) if the Singleton bound is satisfiedwith equality, i.e., if . RS codes are an importantexample of MDS codes. For any finite field , and dimension

, theRScode can be definedas the set of -dimensional vectors obtained evaluating all

polynomials

of degree less than at all nonzero points . Ex-tended RScodes are defined similarly, butevaluating the polynomials at all points , including .Since a nonzero polynomial of degree can have atmost zeros, then every nonzero (extended) RS code-word has at most zero positions, so its Hamming weightis at least (resp., ). This proves that (extended)RS codes are MDS. Extended RS codes can be further extendedconsidering the homogeneous polynomials

and evaluating them at all points of the projective line. This increases both the block

length and the minimum distance by one, givingtwice extendedRS codes .

Another family of codes we are going to use are theHadamardcodes . In these codes,each codeword corresponds to a vector . The codewordassociated to is obtained by evaluating all nonzero linearfunctions at . Since there are linear functionsin variables (including the zero function), the code has blocklength . Notice that any nonzero vector ismapped to by exactly a fraction of the linear functions

(including the identically zero function). So theminimum distance of the code is .

In Section VI, we will also use AG codes. The definition ofthese codes is beyond the scope of the present paper, and theinterested reader is referred to [21].

An important operation used to combine codes is the con-catenating code construction of [11]. Letand be two linearcodes over alphabets and , with generating matrices

and . is called theouter code, andis called theinner code. Notice that the dimension of the

inner code equals the dimension of the alphabet of the outercode , viewed as a vector space over base field. Theidea is to replace every component of outer codeword

with a corresponding inner codeword. More precisely, let be a basis

for as a vector space over and let bethe (unique) linear function such that for all

. The concatenation function isgiven by

Function is extended to sets of vectors in the usual way, and concatenated code is just

the result of applying the concatenation function to set .It is easy to see that if is an code and is an

code, then the concatenation is ancode with generator matrix given by

for all and .

III. D ENSECODES

In this section, we present some general results about codeswith a special density property (to be defined), and their algo-rithmic construction. The section culminates with the proof ofLemma 15 which shows how to efficiently construct a gadgetthat will be used in Section IV in our NP-hardness proofs.Lemma 15 is, in fact, the only result from this section directlyused in the rest of the paper (with the exception of Section VI,which extends the results of this section to asymptoticallygood codes). The reader mostly interested in computationalcomplexity issues may want to skip this section at first reading,and refer to Lemma 15 when used in the proofs.

A. General Overview

Let be the ball of radiuscentered in .10 In this section, we wish to find codes

that include multiple codewords in some ball(s)of relatively small radius , where is some

positive real number. Obviously, the problem is meaningfulonly for , as any ball of radius cannot containmore than a single codeword. Below we prove that for any

it is actually possible to build such a code. These codesare used in the sequel to prove the hardness of GAPRNCby reduction from the nearest codeword problem. Of particularinterest is the case (i.e., ), as the corresponding

10All proofs in this paper are still valid ifB(v; r) is defined as the set ofwords at distance exactlyr from v. This property is not used in this paper, butit is needed, for example, in [20].


hardness result for GAPRNC can be translated into an inap-proximability result for the minimum distance problem.

We say that a code is -dense aroundif the ball contains at least codewords. We

say that is -dense if it is -dense around forsome . We want to determine for what values of the parameters

there exist -dense codes. The technique weuse is probabilistic: we show that there exist codessuch that the expected number of codewords in a randomlychosen sphere is at least . It easily follows, by asimple averaging argument, that there exists a centersuch thatthe code is -dense around. Let

be the expected number of codewords in when thecenter is chosen uniformly at random from . Notice that

(2)

This shows that the average number of codewords in a randomlychosen ball does not depend on the specific codebut onlyon its information content (and the block length, which isusually implicit and clear from the context). So, we can simplywrite instead of .

In the rest of this subsection, we show that dense codes areclosely related to codes that outperform the GV bound. Thisconnection is not explicitly used in the rest of the paper, andit is presented here only for the purpose of illustrating the in-tuition behind the choice of codes used in the sequel. Usingfunction , the definition of the minimum distance (for codeswith information content ) guaranteed by the GV bound (seeTheorem 5) can be rewritten as

In particular, for any code , the expected number ofcodewords in a random sphere of radiusis bigger than , andthere must exist spheres with multiple codewords. If the distanceof the code exceeds the GV bound for codes with the sameinformation content (i.e., ), is -dense for some

and . Several codes are known for whichthe minimum distance exceeds the GV bound.

• RS codes or, more generally, MDS codes, whose distancesexceed for all code rates. RS codes of rate ap-

proaching will be of particular interest. This is due to thefact that the ratio grows with code rate of RScodes and tends tofor high rates. We implicitly use thisfact in the sequel to build-dense codes for any .

• Binary BCH codes, whose distances exceedfor veryhigh code rates approaching. These codes are not used inthis paper, but they are an important family of codes thatbeat the GV bound.

• AG codes, meeting the Tsfasman–Vladut–Zink (TVZ)bound, whose distances exceed for most code ratesbounded away from and for any . These codes areused to prove hardness results for asymptotically goodcodes, and approximating the minimum distance up to anadditive error.

Remark 7: The relation between dense codes and codes thatexceed the GV bound can be given an exact, quantitative char-acterization. In particular, one can prove that for any positiveinteger and real , the expected number of codewords from

in a random sphere of radius is at leastif and only if .

B. Dense Code Families From MDS Codes

We are interested in sequences of -dense codes withfixed and arbitrarily large . Moreover, parametershould bepolynomially relatedto the block length, i.e.,

for some . (Notice that is a necessarycondition because codecontains at most codewords and wewant distinct codewords in a ball.) This relation is essentialto allow the construction of dense codes in time polynomial in

. Sequences of dense codes are formally defined as follows.

Definition 8: A sequence of codes is calleda -dense code family if, for every , is a -densecode, i.e., there exists a center such that the ball

contains codewords. Moreover, we say thatis apolynomialcode family if the block length

is polynomially bounded, i.e., for some constantin-dependent of .

In this subsection, we use twice-extended RS codes to build-dense code families for any .

Lemma 9: For any , and sequence of prime powerssatisfying

(3)

there exists a polynomial-dense code familywith .

Proof: Fix the value of and and let and be asspecified in the lemma. Define the radius . Notice that,from the lower bound on the alphabet size, we get

(4)

In particular, since , we have and . Sinceis an integer, it must be

Consider an MDS code with andmeeting the Singleton bound (e.g.,

twice-extended RS codes, see Section II). From the definitionof , we immediately get

(5)

We will prove that , i.e., the expected number ofcodewords in a randomly chosen ball of radiusis at least .It immediately follows from (4) and (5) that


i.e., the code is -dense on the average. Since MDS codesmeet the Singleton bound with equality, we have

Therefore, the expected number of codewords in a randomsphere satisfies

(6)

We want to bound this quantity. Notice that for alland wehave

(7)

where we have used that for all and .A slightly stronger inequality is obtained as follows:

(8)

Moreover, the reader can easily verify that for all

(9)

We use (8) and (9) to bound the volume of the sphereas follows:

where in the last inequality we have used and

Combining the bound with inequality (6)we get

This proves that our MDS codes are -dense. Moreover, if, then the block length is also polyno-

mial in , and is a polynomial -dense family of codes.

C. Codes Over a Fixed Alphabet

Lemma 9 gives a -dense code family forany fixed with . In the sequel, wewish to find a -dense family of codes over some fixed alphabet

. In order to keep the alphabet size fixed and still get arbi-trarily large , we take the extension field and use the MDScodes from Lemma 9 with alphabet size. These codes areconcatenated with equidistant Hadamard codes

to obtain a family of dense codes over fixed alphabet. Thisprocedure can be applied to any dense code as described in thefollowing lemma.

Lemma 10: Let be an arbitrary code, and letbe the equidistant Hadamard code of

size . If is -dense (around some center), then theconcatenated code is -dense (around ).

Proof: Let be a code and letbe the equidistant Hadamard code of size.

Define code as the concatenation of and . (See SectionII for details.) The resulting concatenated code hasparameters

Now assume is -dense around some center. Noticethat the concatenation function is injective andsatisfies

Therefore, the ball is mapped into balland the number of -codewords contained in equalsthe number of -codewords contained in .Therefore, is dense for .

By increasing the degree of the extension field , weobtain an infinite sequence of-ary codes . CombiningLemmas 9 and 10 we get the following proposition.

Proposition 11: For any and prime power , thereexists a polynomial -dense family of codes over afixed alphabet .

Proof: As in Lemma 9, let be an arbitrarily smallconstant and let . For every , define

. Notice that satisfies

so we can invoke Lemma 9 with alphabet size andobtain a -dense code . Let be the con-catenation of with Hadamard code

By Lemma 10, is -dense. Moreover, the block lengthof is , which is polynomial in , because both

and are . This proves that is a polynomial-dense family.

D. Polynomial Construction

We proved that -dense families of codes exist for any. In this subsection, we address two issues related to the

algorithmic construction of such codes.

• Can -dense codes be constructed in polynomial time?That is, is there an algorithm that on inputoutputs (intime polynomial in ) a linear code which is -dense?

• Given a -dense code, i.e., a code such that someball contains at least codewords, can we effi-ciently find the center of a dense sphere?


The first question is easily answered: all constructions de-scribed in the preceding subsection are polynomial in, so theanswer to the first question is “yes.” The second question is notas simple and to-date we do not know any deterministic proce-dure that efficiently produces dense codes together with a pointaround which the code is dense. However, we will see that, pro-vided the code is dense “on the average,” the center of the spherecan be efficiently found at least in a probabilistic sense. Weprove the following algorithmic variant of Proposition 11.

Proposition 12: For every prime power and real constant, there exists a probabilistic algorithm that on input two

integers and , outputs (in time polynomial in and ) threeintegers , a generator matrix , and a center

(of weight )11 such that

1) ;

2) with probability at least , the ball containsor more codewords.

Before proving the proposition, we make a few observations.The codes described in Proposition 11 are the concatenation oftwice-extended RS codes with Hadamard codes, and thereforethey can be explicitly constructed in polynomial time. While thecodes are described explicitly, the proof that the code is denseis nonconstructive: in Lemma 9, we proved that the averagenumber of codewords in a randomly chosen sphere is large, sospheres containing a large number of codewords certainly exist,but it is not clear how to find a dense center. At a first glance,one can try to reduce the entire set of centersused in Lemma9. In particular, it can be proved that can be replaced by anyMDS code that includes our original MDS code. However,even with two centers left, we still need to count codewords inthe two balls to find which is dense indeed. To-date, explicit pro-cedures for finding a unique ball are yet unknown. Therefore, inwhat follows we use a probabilistic approach and show how tofind a dense center with high probability.

First, let the center be chosen uniformly at random from theentire space . Given the expected number of codewords,Markov’s inequality yields

showing that spheres containing at most codewords can befound with high probability . It turns out that a similarlower bound

can be proved if we restrict our choice ofto a uniformlyrandom element of only. This is an instance of a quitegeneral lemma that holds for any pair of groups .12 InLemma 13, we usemultiplicativenotation for groups and

. However, to avoid any possible confusion, we clarify inadvance that in our application and will be groupsand of codewords with respect to vectorsumoperation.

11See footnote 10.12In fact, it is not necessary to have a group structure on the sets, and the

lemma can be formulated in even more general settings, but working with groupsmakes the presentation simpler.

Lemma 13: Let be a group, a subgroup, andan arbitrary subset of . Given , consider the subset

and let be the average size of asthe element runs through . Choose

uniformly at random. Then for any

Proof: Divide the elements of into equivalence classeswhere and are equivalent if . Then choose

uniformly at random. If is chosen, then has thesame size as the equivalence class of. (Notice that the equiv-alence class of is .) Since the numberof equivalence classes is (at most) , the number of el-ements that belong to equivalence classes of sizeor less isbounded by (i.e., the maximum number of classestimes the maximum size of each class), and the probability thatsuch a class is selected is at most . The followingsimple calculation shows that and, therefore,the probability to select an elementsuch thatis at most . The value of is

We are now ready to prove Proposition 12.

Proof: Fix some and and let be the inputto the algorithm. We consider the -dense code fromProposition 11, where . This code is the concatenationof a twice extended RS code from Lemma 9,and a Hadamard code withblock-length polynomial in . Therefore, a generator matrix

for can be constructed in time polynomial in.

At this point, we instantiate the Lemma 13 with groups, , and , where .

Notice that for any center , the set is just the ballof radius centered in . From the proof of Lemma 9,

the average size of (i.e., the expected number of code-words in a random ball when the center is chosen uniformly atrandom from ) is at least . Following Lemma 13, wechoose uniformly at random from

. By Lemma 13 (with ), we get that con-tains at least codewords with probability . Finally, byLemma 10, we get that the corresponding ball in(with radius and center ) contains atleast codewords from . The output of the algorithm is givenby a generating matrix for code , the block length, and in-formation content of this matrix, radius , and vector .

E. Mapping Dense Balls Onto Full Spaces

In the next section, we will use the codewords inside theball to represent the solutions to the nearest codewordproblem. In order to be able to represent any possible solution,


we need first to project the codewords in to the setof all strings over of some shorter length. This is accom-plished in the next lemma by another probabilistic argument.Given a matrix and a vector , letdenote the linear transformation from to . Further, let

.

Lemma 14: Let be any fixed subset of of size. If matrix is chosen uniformly at random, then

with probability at least we have .Proof: Choose uniformly at random. We want

to prove that with very high probability . Choosea vector at random and define a new function

. Clearly, if and only if .

Notice that the random variables (indexed by vector, and defined by the random choice ofand ) are pair-

wise independent and uniformly distributed. Therefore, for anyvector , with probability . Letbe the number of such that . By linearity ofexpectation and pairwise independence of the we have

and

Applying Chebychev’s inequality we get

Therefore, for any , the probability that forevery is at most . By union bound, the probabilitythat there exists some such that is at most

, i.e., with probability at least , and,therefore, .

We combine Proposition 12 and Lemma 14 to build thegadget needed in the NP-hardness proofs in the followingsection.

Lemma 15: For any and finite field there exists aprobabilistic polynomial time algorithm that on input out-puts, in time polynomial in and , integers , matrices

, and , and a vector (of weight) such that

1) ;

2) with probability at least , ,i.e., for every there exists a such that

and .

Proof: Run the algorithm of Proposition 12 on inputand to obtain integers a matrixsuch that , and a center such that

contains at least codewords with probability. Let be the set of all codewords in , and

choose uniformly at random. By Lemma 14, theconditional probability, given , that

is at least . Therefore, the probability thatis at most .

IV. HARDNESS OF THERELATIVELY NEAR

CODEWORDPROBLEM

In this section, we prove that the relatively near codewordproblem is hard to approximate within any constant factorforall . The proof uses the gadget from Lemma 15.

Theorem 16:For any , , and any finite fieldGAPRNC is hard for NP under polynomial RUR-reductions.Moreover, the error probability can be made exponentially smallin a security parameterwhile maintaining the reduction poly-nomial in .

Proof: Fix some finite field . Let be a real such thatLemma 15 holds true, let be an arbitrarily small positivereal, and such that GAPNCP is NP-hard. We prove thatGAPRNC is hard for and .Since can be arbitrarily small, Lemma 15 holds for any

, and GAPNCP is NP-hard for any , this proves thehardness of GAPRNC for any and .

The proof is by reduction from GAPNCP . Letbe an instance of GAPNCP with . We want to

define an instance of GAPRNC such that ifis a YES instance of GAPNCP , then is

a YES instance of GAPRNC with high probability, while ifis a NO instance of GAPNCP , then is

a NO instance of GAPRNC with probability . Notice thatthe main difference between the two problems is that while in

the minimum distance can be arbitrarily small,in the minimum distance must be relativelylarge (compared to error weight parameter). The idea is toembed the original code in a higher dimensional space tomake sure that the new code has large minimum distance. Atthe same time, we want also to embed target vectorin thishigher dimensional space in such a way that the distance ofthe target from the code is roughly preserved. The embeddingis easily performed using the gadget from Lemma 15. Detailsfollow.

On input GAPNCP instance , we invoke Lemma 15on input (the information content of input code )and security parameter, to find integers , a generatormatrix , a mapping matrix , and a vector

such that

1) ;2) with probability at least .

Consider the linear code . Notice that allrows of matrix are codewords of . (However, only

at most are independent.) We define matrix by concate-nating13 copies of and copies of

(10)

13Here the word “concatenation” is used to describe the simple juxtappositionof matrices or vectors, and not the concatenating code construction of [11] usedin Section III.


and vector as the concatenation ofcopies of and copiesof

(11)

Finally, let . The output of the reduction is.

Before we can prove that the reduction is correct, we need tobound the quantity . Using the definition of and we get

and

So, we always have . We can now prove thecorrectness of the reduction. In order to consider asan instance of GAPRNC , we first prove that .Indeed, and, therefore,

(12)

Now, assume is a YES instance, i.e., there existssuch that . Let be a codeword in suchthat and . We know such a codeword existswith probability at least . In such a case, we have

(13)

proving that is a YES instance.Conversely, assume is a NO instance, i.e., the dis-

tance of from is greater than . We want to prove that forall we have . Indeed

(14)

proving that is a NO instance. (Notice that NOinstances get mapped to NO instances with probability , asrequired.)

Remark 17: The reduction given here is a randomizedmany–one reduction (or a randomized Karp reduction) whichfails with exponentially small probability. However, it is not aLevin reduction: i.e., given a witness for a YES instance of thesource of the reduction we do not know how to obtain a witnessto YES instances of the target in polynomial time. The problem

is that given a solution to the nearest codeword problem,one has to find a codeword in the sphere such that

. Our proof only asserts that with high probability sucha codeword exists, but it is not known how to find it. This wasthe case also for the Ajtai–Micciancio hardness proof for theshortest vector problem, where the failure probability was onlypolynomially small.

As discussed in Section II-B, hardness under polynomialRUR-reductions easily implies the following corollary.

Corollary 18: For any , , and any finite fieldGAPRNC is not in RP unless NP RP.

Since NP is widely believed to be different from RP, Corol-lary 18 gives evidence that no (probabilistic) polynomial timealgorithm to solve GAPRNC exists.

V. HARDNESS OF THEMINIMUM -DISTANCE PROBLEM

In this section, we prove the hardness of approximating theminimum-distance problem. We first derive an inapproxima-bility result to within some constant bigger thanby reductionfrom GAPRNC . Then we use direct product constructions toamplify the inapproximability factor to any constant and to fac-tors , for any .

A. Inapproximability to Within Some Constant

The inapproximability of GAPDIST to within a con-stant immediately follows from the hardness ofGAPRNC .

Lemma 19: For every , and every finite field ,GAPDIST is hard for NP under polynomial RUR-reductionswith exponentially small soundness error.

Proof: The proof is by reduction from GAPRNC . Letbe an instance of GAPRNC with distance. Assume without loss of generality thatdoes not

belong to code . (One can easily check whether bysolving a system of linear equations. If then is aYESinstance because , and the reduction can outputsome fixed YES instance of GAPDIST .) Define the matrix

(15)

Assume is a YES instance of GAPRNC , i.e., thereexists an such that . Then, is a YES in-stance of GAPDIST , since the nonzero vector belongsto code and has weight at most.

Conversely, assume is a NO instance ofGAPRNC . We prove that any nonzero vectorof code has weight above . Indeed, if then is anonzero codeword of and, therefore, has weight(since ). On the other hand, if then

as . Hence, and is a NO instanceof GAPDIST .


B. Inapproximability to Within Bigger Factors

To amplify the hardness result obtained above, we take thedirect product of the code with itself. We first define directproducts.

Definition 20: For , let be a linear code gen-erated by . Then the direct product of and ,denoted , is a code over of block length anddimension . Identifying the set of -dimensionalvectors with the set of all matrices in the ob-vious way, the direct product is conveniently definedas the set of all matrices in .

Notice that a generating matrix for the product code can beeasily computed from and , defining a basis codeword

for every row of and row of (wheredenotes the transpose of vector, and is the standard

“external” product of column vector and row vector ). No-tice that the codewords of are matrices whose rowsare codewords of and columns are codewords of . In ourreduction, we will need the following fundamental property ofdirect product codes. For completeness (see also [17]), we proveit below.

Proposition 21: For linear codes and of minimumdistance and , their direct product is a linear code of dis-tance .

Proof: First, has at least nonzero entriesif . Indeed, consider the matrix whose rows arecodewords from . Since this matrix is nonzero, some row isa nonzero codeword of weight or more. Thus, has atleast nonzero columns. Now consider the matrix .At least columns of this matrix are nonzero codewords of.Each of weight at least , for a total weight of or more.

Second, we verify that the minimum distance ofis exactly . To see this consider vectors suchthat has exactly nonzero elements. Then notice that thematrix is a codeword of . Expressing

as we see that itsth column is zero if theth coordinate of is zero and the th row of is zero

if the th coordinate of is zero. Thus, is zero on allbut columns and rows and, thus, at most entries arenonzero.

We can now prove the following theorem.

Theorem 22:For every finite field the following holds.

• For every , GAPDIST is hard for NP under poly-nomial RUR-reductions.

• For every , GAPDIST is hard for NP underquasipolynomial RUR-reductions for .

In both cases, the error probability is exponentially small in asecurity parameter.

Proof: Let be such that GAPDIST is hard by Lemma19. Given an instance of GAPDIST , consider the in-stance of GAPDIST , where

is a generator matrix of

for an integer parameter . By Proposition 21, it followsthat YESinstances map to YESinstances and NO instances to NOinstances. Setting yields the first part of the theorem.

Notice that for constant, the size of is polynomial in ,and can be constructed in polynomial time (for any fixedindependent of the size of).

To show the second part, we use the first part by settingand in the previous reduction, where is the

block length of . This time, the block length of will bewhich is quasi-polynomial in the block

length of the original instance . The reduction can be com-puted in quasi-polynomial (in ) time, and the approximationfactor achieved is

As for the relatively near codeword problem, the followingcorollary can be easily derived from the hardness result underRUR-reductions.

Corollary 23: For every finite field the following holds.

• For every , GAPDIST is not in RP unless NPRP.

• For every , GAPDIST is not in RQP forunless NP RQP.

VI. A SYMPTOTICALLY GOOD DENSECODES

Our concatenated codes used in the proof of Proposition 11employ outer MDS codes over and inner Hadamard codes

, for fixed and growing . As a result, overall code rate van-ishes for growing lengths, since so does the inner rate .Relative distance also tends to, both in outer MDS codes andoverall concatenations. To date, all other codes used to proveNP-hardness also have been asymptotically bad. To get hardnessresults (in both the relatively near codeword problem and theminimum-distance problem) even when the codes are asymp-totically good, we will need constructions of asymptoticallygood “dense” codes. To get such constructions, we will fix theouter alphabet and the inner code . Then we use AG codesover to show that there exist dense families of asymptot-ically good codes over . In particular, we show that thereexist -dense code families with relative dis-tance and rate , where , ,and are positive real numbers14 independent of . Thesecodes will be used to prove the NP-hardness of GAPRNCrestricted to asymptotically good codes, and the hardness of ap-proximating the minimum distance of a code within an additivelinear error.

Given a square prime power , we use long AG codesmeeting the TVZ bound (see

14Notice that the relative distance satisfiesd=l > (r=�)=l � �=� and thecode is asymptotically good. In fact, for all dense codes considered in this paper,we have� < 1 and, therefore, the relative distance is at least�.


[21]). The generator matrices of these codes can beconstructed for any and in time polynomial in . Asremarked in Section III, the TVZ bound exceeds the GV boundfor most code rates and, therefore, allows to obtain sequencesof dense codes with fixed. The resulting code sequence turnsout to be -dense for . We do not know how to getarbitrarily close to using this technique.

Proposition 24: For every prime power and everythere exist and such that there is a polynomialfamily of asymptotically good -dense codeswith rate and relative distance .

Proof: Let and . Notice that bytaking sufficiently close to , we can get arbitrarily close to

. We first prove the result assuming and thatis a square. In this case, the AG codes (for appropriate choice

of parameters) already give what we want. Later, we will useconcatenation to get the proposition for arbitrary.

a) Case 1—( and is a square): Given param-eter , let , , and .Let be a code meeting the TVZ bound, i.e., with

. We show that this code has rate ,

relative distance , , and that the expectednumber of codewords of in a random ball of radius is atleast . We start with bounding the rate

We want to prove that this bound is strictly positive. Using theassumption and we get

Substituting in the bound for the rate of the code we get

Next we bound the minimum distance. From the TVZ bound weget

(16)

Using and dividing by the block length of the code, weget

This proves that the relative distance is bounded away from,and the code is asymptotically good. We still need to prove thatthe code is -dense (on the average), i.e., and theexpected number of codewords in a random ball of radiusisat least .

We use (16) to bound the ratio

This proves that the radiusis small, relative to the minimumdistance of the code. We want to bound , the expectednumber of codewords in a random ball of radius. Using (2)and (7), we have

We want to prove that this bound is at least. Combiningand , we get . Substituting in the

last equation

It follows that there exists a vectorsuch that. This concludes the analysis for large square.b) Case 2—(Arbitrary ): In this case, let be the smallest

even integer such that . Given , letbe a code and a radius as given in Case 1

with rate , relative distance ,, and . We concatenate with

equidistant Hadamard code to get acode of rate


and relative distance

Now, given a vector that has vectors in the ball ofradius around it, the vector also has vectorsof in the ball of radius around it. Noticethat the ratio does not change in this step, and,therefore, holds true. This gives the code as claimed inthe proposition.

The relative distance of the codes built in Proposition 24tends to for large alphabets. We remark that using a similarconstruction it is possible to build dense codes (withand ) for any . The following lemmastresses the algorithmic components of the code constructionabove and in particular stresses that we know how to constructthe generator matrix of theth code in the sequence, and how tosample dense balls from this code in polynomial time. (We alsoadded one more parameterto the algorithm to ensure that theblock length of code is at least . This additional parameterwill be used in the proofs of Section VII and is being added forpurely technical reasons related to those proofs.)

Proposition 25: For every prime power and every ,there exist , , and a probabilistic algorithm that oninput three integers and , outputs (in time polynomial in

) three other integers, , , a generator matrixof a code , and a center (of weight ) such that

1) ;2) ;3) ;4) with probability at least , the ball contains

at least codewords of ;5) .

Proof: Given inputs to the algorithm, weconsider the -dense code from Proposition 24,where . This code is the concatena-tion of an AG code and a Hadamard code

. Therefore, a generator matrixfor can be constructed in time polynomial in

. Let , , chooseuniformly at random from and let . Fromthe proof of Proposition 24, we know that the expected numberof codewords in a randomly chosen ball is at least

. We now apply the Lemma 13 instantiating it with groups, , and set . Notice that

for any center , the set is just the ballof radius centered in . From the proof of Proposition 24, theaverage size of (i.e., the expected number of codewordsin a random ball) is at least . Setting in Lemma13, and noting that , we immediately getthat contains at least codewords with probability

. Finally, by Lemma 10, we get that the correspondingball in contains at least codewords from .

We conclude with the following lemma analogous to Lemma15 of Section III.

Lemma 26: For every and prime power, there exist, , and a probabilistic polynomial time algorithm

that on input outputs (in time polynomial in ),integers , and matrices , and avector such that

1) ;2) ;3) ;4) with probability at least , ,

i.e., for every there exists a such thatand .

5) .

The proof is identical to that of Lemma 15, with the use ofProposition 12 replaced by Proposition 25.

VII. H ARDNESSRESULTS FORASYMPTOTICALLY GOOD CODES

In this section, we prove hardness results for the decodingproblem even when restricted to asymptotically good codes, andthe hardness of approximating the minimum distance problemwith linear (in the block length) additive error. First, we definea restricted version of GAPRNC where the code is required tobe asymptotically good.

Definition 27: For , let -restrictedGAPRNC be the restriction of GAPRNC to instances

satisfying and .We call relative error weight. Notice that for every constant

, the minimum distance of the code satisfies ,therefore, the code is asymptotically good with information rateat least and relative distance at least

. In particular, when , the relative distance isstrictly bigger than .

In the following subsections, we prove that -restrictedGAPRNC is NP-hard (under RUR-reductions). Then we use thisresult to prove that the minimum distance of a linear code ishard to approximate additively, even to within alinear additiveerror relative to the block length of the code. The proofs will beessentially the same as those for nonasymptotically good codes.The main differences are as follows.

• We use the asymptotically good dense code constructionfrom Section VI. Before we relied on dense codes thatwere not asymptotically good.

• We reduce from a restricted version of the nearest code-word problem in which the error weight is required to bea constant fraction of the block length.

In the following subsection, we observe that a result of Håstad[13] gives us hardness of NCP with this additional property.Then we use this result to prove the hardness of the -re-stricted GAPRNC problem, and the hardness of GAPDISTADD.

A. The Restricted Nearest Codeword Problem

We define a restricted version of the nearest codewordproblem in which the target distance is required to be at least alinear fraction of the block length.


Definition 28: For , let -restricted GAPNCP( -GAPNCP for short) be the restriction of GAPNCP toinstances satisfying .

We will need the fact that-GAPNCP is NP-hard. This re-sult does not appear to follow from the proof of [3]. Instead,we observe that this result easily follows from a recent and ex-tremely powerful result of Håstad [13].

Theorem 29—[13]:For every Abelian group and for every, given a system of linear equations

over , it is NP-hard to distinguish instances in whichequations can be satisfied, from those in which at most

equations can be satisfied.15

Notice that a system of linear equations where is themaximum number of simultaneously satisfiable equations cor-responds to an instance of the nearest codeword problem wherethe distance of the target from the code is . So, appliedto group , and phrased in coding-theoretic terms,the theorem says that for every prime powerit is hard to tellwhether the distance of the target from the code is at mostor more than . In other words, -GAPNCP isNP-hard for every constant approximation factor .In particular, -GAPNCP is NP-hard for . Thisgives the following version of Theorem 29.

Corollary 30: For every , there exists a suchthat -GAPNCP is NP-hard for all prime powers.

Notice that the in the corollary is independent of the al-phabet size.

B. Inapproximability of the Restricted Relatively NearCodeword Problem

In this subsection, we prove the hardness of the -re-stricted GAPRNC problem. The proof is the same as thatof Theorem 16 with the following modifications: 1) instead ofusing Lemma 15 (i.e., the dense code family based on asymp-totically bad MDS codes) we use the asymptotically good densecode family from Lemma 26; 2) we consider only , sothat Lemma 26 holds true; 3) we use Corollary 30 to select a

such that -GAPNCP is NP-hard; 4) we assume thatthe GAPNCP instance we are reducing is in fact a

-GAPNCP instance. Details follow.

Theorem 31:For every prime power, and constantsand , there exists and such that

-restricted GAPRNC is hard for NP under RUR-re-ductions with error probability exponentially small in a securityparameter.

Proof: Fix a prime power and real , and letand be the corresponding constants such that

Lemma 26 holds true. Let be an arbitrarily small con-stant, and the approximation factor for which we want toprove the NP-hardness of the restricted GAPRNC problem.Let , and let be the constant from Corollary30 such that -GAPNCP is NP-hard, e.g., .

15Håstad’s result has the further property that every linear equation only in-volves three variables, but we do not need this extra property.

We prove that -restricted GAPRNC is NP-hard for, , and

Since can be arbitrarily close to , and arbitrarilyclose to , this proves the NP-hardness of -restrictedGAPRNC for any .

We prove the NP-hardness of -restrictedGAPRNC by reduction from -GAPNCP . Letbe an instance of-GAPNCP , where . We invokethe algorithm of Lemma 26 on input and , to find integers

a generator matrix , a mapping matrix, and a vector such that

1) ;2) for every there exists a satisfying

and ;3) ;4) ;5) , and therefore too;

where the second condition holds with probability . No-tice that, since any sphere containing more than a single code-word must have radius at least , the relative radius mustbe at least

The reduction proceeds as in the proof of Theorem 16. Let, , , and define code , and target

as in (10) and (11). The output of the reduction is .We want to prove that the map is a valid(RUR) reduction from -GAPNCP to -restrictedGAPRNC . The proof of the following facts is exactly thesame as in Theorem 16 (see (12)–(14)):

• ;• if , then ;• if , then .

It remains to prove that GAPRNC instance satisfiesthe restriction. Notice that the code has block length

. Therefore, the relative error weight satisfies

Finally, the information content of is , i.e., the same ascode . Therefore, the rate is and in orderto show that the code is asymptotically good we need to provethat . In fact

and


This proves that the rate of the code is at least

C. Minimum Distance With Linear Additive Error

In Lemma 19, we proved that the minimum-distance problemis hard to approximate within some constant factor. The proofwas by reduction from GAPRNC. The same reduction, when ap-plied to the restricted GAPRNC problem gives an inapproxima-bility result for the minimum-distance problem with linear ad-ditive error. In particular, Lemma 19 reduces -restrictedGAPRNC to GAPDISTADD with

This result is formally proved in the following theorem.

Theorem 32:For every prime power, there exists asuch that GAPDISTADD is hard for NP under polynomial

RUR-reductions with soundness error exponentially small in asecurity parameter.

Proof: The proof is by reduction from -restrictedGAPRNC with , , and . Thereduction is the same as in the proof of Lemma 19. On input

, the reduction outputs where is the codedefined in (15). We know from the proof of Lemma 19 that

is a valid reduction from GAPRNC toGAPDIST . We show that if is a YES (resp., NO) in-stance of GAPDIST , then it is also a YES(resp., NO) instanceof GAPDISTADD . This proves that isalso a valid reduction from GAPRNC to GAPDISTADD .

The case of YES instances is trivial, because the definitionof YES instances in GAPDIST and GAPDISTADD is thesame, namely . So, assume is a NO instance ofGAPDIST . Then

and is a NO instance of GAPDISTADD .

VIII. O THER REDUCTIONS AND OPEN PROBLEMS

We proved that approximating the minimum-distanceproblem is hard for NP under RUR-reductions, i.e., proba-bilistic reductions that map NO instances to NO instances, andYES instances to YES instances with high probability. (Thesame is true for all other reductions presented in this paper, aswell as the hardness proof for the shortest vector problem in[2], [18].)

An obvious question is whether it is possible to remove therandomization and make the reduction deterministic. (Observethat a deterministic NP-hardness result is known for the exactversion of the minimum-distance problem [23], [22]. In con-trast, for the shortest vector problem, even the exact version isknown to be NP-hard only under randomized reductions [2].)We notice that our reductions (as well as the Ajtai–Micciancio

ones for SVP) use randomness in a very restricted way. Namely,the only part of the reduction where randomness is used is theproof of Lemma 15 (and Lemma 26 for asymptotically goodcodes). The construction in the lemma depends only on the inputsize, and not on the particular input instance we are reducing.So, if the construction succeeds, the reduction will faithfullymap all YESinstances (of the appropriate size) to YESinstances.Therefore, the statements in Lemmas 15 and 26 can be easilymodified to obtain hardness results for NP underdeterministicnonuniformreductions, i.e., reductions that take a polynomiallysized advice that depends only on the input size.16 (The adviceis given by the sphere centerand transformation matrix sat-isfying Lemmas 15 and 26.)

Corollary 33: For every finite field the following holds.

• For every and , GAPRNC is hard forNP under nonuniform polynomial reductions. Therefore,no nonuniform polynomial time (P/poly) algorithm existsfor GAPRNC unless NP P/poly.

• For every , GAPDIST is hard for NP undernonuniform polynomial reductions. Therefore, no P/polyalgorithm exists for GAPDIST unless NP P/poly.

• For every and , GAPDIST ishard for NP under nonuniform quasi-polynomial reduc-tions. Therefore, no nonuniform quasi-polynomial time(QP/poly) algorithm exists for GAPDIST unless NPQP/poly.

• For every and , there exists andsuch that -restricted GAPRNC is hard for NP

under nonuniform polynomial reductions. Therefore, noP/poly algorithm exists for -restricted GAPRNCunless NP P/poly.

• There exists a such that GAPDISTADD is hard forNP under nonuniform polynomial reductions. Therefore,no P/poly algorithm exists for GAPDISTADD unless NP

P/poly.

We also notice that auniformdeterministic construction sat-isfying the properties of Lemmas 15 and 26 would immediatelygive a proper NP-hardness result (i.e., hardness under determin-istic Karp reductions) for all problems considered in this paper.Interestingly, for the shortest vector problem, Micciancio [18](see also [19]) showed that a deterministic NP-hardness resultis possible under a reasonable (but unproven) number-theoreticconjecture regarding the distribution of smooth numbers (i.e.,numbers with no large prime factors). No deterministic proofunder similar number-theoretic conjectures is known for thecoding problems studied in this paper.

Finally, we notice that all our results rely on the fact that thecode is given as part of the input. Thus, it is conceivable thatfor every error-correcting code, there exists a fast algorithm tocorrect errors (say up to the distance of the code), however,this algorithm may be hard to find (given a description of thecode). A result along the lines of the one of Bruck and Naor [7],

16Since our reduction achieves exponentially small error probability, hardnessunder nonuniform reductions also follows from general results about derandom-ization [1]. However, thead hocderandomization method we just described ismore efficient and intuitive.


showing that there are specific sequences of codes for whichnearest codewords are hard to find (even if the code can be ar-bitrarily preprocessed) would be desirable to fix this gap in ourknowledge. Feige and Micciancio [10] recently showed that thenearest codeword problem with preprocessing is hard to approx-imate within any factor less than . Inapproximability resultsfor NCP with preprocessing are interesting because they canbe combined with our reduction (Theorem 16) to prove the in-approximability of RNC with preprocessing. Specifically, The-orem 16 gives a reduction from GAPNCP to GAPRNCwith and , with the property thatthe GAPRNC code depends only on the GAPNCP code (and notthe target vector). So, if GAPNCP is hard to solve even whenthe code can be preprocessed, then also GAPRNC with pre-processing cannot be efficiently solved. Unfortunately, in orderto get nontrivial hardness results for GAPRNC with prepro-cessing (i.e., results with ) using this method, one needsto start from inapproximability results for GAPNCP with pre-processing with . So, inapproximability factorsproved in [10] do not suffice. Very recently, the results of [10]have been improved by Regev [20] to , giving the first in-approximability results for RNC with preprocessing. Noticethat applying the reduction from Theorem 16 to GAPNCPwith allows to establish the inapproximability (withinsome constant factor ) of GAPRNC for

. Notice that in order to get , one needs, and, therefore, . Interestingly, [20] also shows how

to modify the reduction in Theorem 16 to get inapproximabilityof RNC with preprocessing for any . The improve-ment is based on the following two observations. (The reader isreferred to [20] for details.)

1) The lower bound in (14) can bestrengthened to

2) The center of the sphereis at distance at least

from the code .Based on these two observations, Regev shows that (with a dif-ferent choice of parametersand ) the reduction given in theproof of Theorem 16 maps GAPNCP to GAPRNC for any

. Combined with the improved inapproxima-bility results for GAPNCP with preprocessing, this shows thatGAPRNC with preprocessing is NP-hard (under RUR-reduc-tions as usual) for any and .

We conclude with a brief discussion of the main problemleft open in this work. Our hardness results for RNCholdonly for . For the case of general (not asymptoticallygood) codes, we showed that one can makearbitrarily close to

. However, we cannot achieve equality . RNC(also known as the BDD) is arguably the most relevant decodingproblem in practice, and the one for which most known polyno-mial time algorithms (for specific families of codes, e.g., RScodes) work. The problems RNC and RNC forare qualitatively different, because for , the solution tothe decoding problem is guaranteed to be unique. Our reductionrelies on the construction of a codeand a sphere ofradius such that contains several (in fact,

exponentially many) codewords. Clearly, for , no sphereof radius can contain more than a single codeword.So, the bound seems an intrinsic limitation of our re-duction technique.17 We leave determining the computationalcomplexity of BDD as an open problem.

REFERENCES

[1] L. M. Adleman, “Two theorems on random polynomial time,” inProc.19th Symp. Foundations of Computer Science, 1978, pp. 75–83.

[2] M. Ajtai, “The shortest vector problem in L_2 is NP-hard for random-ized reductions (extended abstract),” inProc. 30th Annual ACM Symp.Theory of Computing, Dallas, TX, May 23–26, 1998, pp. 10–19.

[3] S. Arora, L. Babai, J. Stern, and E. Z. Sweedyk, “The hardness of ap-proximate optima in lattices, codes, and systems of linear equations,”J.Comput. Syst. Sci., vol. 54, no. 2, pp. 317–331, Apr. 1997. Preliminaryversion in FOCS’93.

[4] S. Arora and C. Lund,Approximation Algorithms for NP-Hard Prob-lems. Boston, MA: PWS, 1996, ch. 10, Hardness of Approximation.

[5] E. Berlekamp, R. McEliece, and H. van Tilborg, “On the inherent in-tractability of certain coding problems,”IEEE Trans. Inform. Theory,vol. IT-24, pp. 384–386, May 1978.

[6] P. Berman and M. Karpinski, “Approximating minimum unsatisfiabilityof linear equations,” inProc. 13th Annu. ACM-SIAM Symp. DiscreteAlgorithms, SODA’02, San Francisco, CA, 2002, pp. 74–83.

[7] J. Bruck and M. Naor, “The hardness of decoding linear codes with pre-processing,”IEEE Trans. Inform. Theory, vol. 36, pp. 381–385, Mar.1990.

[8] I. Dinur, G. Kindler, R. Raz, and S. Safra, “An improved lower boundfor approximating CVP,”Combinatorica, to be published.

[9] I. Dinur, G. Kindler, and S. Safra, “Approximating CVP to within al-most-polynomial factors is NP-hard,” inProc. 39th Annu. Symp. Foun-dations of Computer Science. Palo Alto, CA, Nov. 7–10, 1998.

[10] U. Feige and D. Micciancio, “The inapproximability of lattice andcoding problems with preprocessing,” inProc. IEEE Conf. Computa-tional Complexity—CCC 2002. Montreal, QC, Canada, May 21–23,2002, pp. 44–52.

[11] G. D. Forney, Jr.,Concatenated Codes, ser. Research Monograph, no.37. Cambridge, MA: MIT Press , 1966.

[12] J. Håstad, “Clique is hard to approximate withinn ,” Acta Math., vol.182, pp. 105–142, 1999.

[13] , “Some optimal inapproximability results,”J. ACM, vol. 48, pp.798–859, 2001. Preliminary version inProc. STOC’97.

[14] K. Jain, M. Sudan, and V. Vazirani, private communication, May 1998.[15] D. S. Johnson,Handbook of Theoretical Computer Science. Amster-

dam, The Netherlands: Elsevier, 1990, vol. A (Algorithms and Com-plexity), ch. 2, A catalog of complexity classes, pp. 67–161.

[16] J. Justesen and T. Høholdt, “Bounds on list decoding of MDS codes,”IEEE Trans. Inform. Theory, vol. 47, pp. 1604–1609, May 2001.

[17] F. MacWilliams and N. Sloane,The Theory of Error-CorrectingCodes. Amsterdam, The Netherlands: North-Holland, 1981.

[18] D. Micciancio, “The shortest vector problem is NP-hard to approxi-mate to within some constant,”SIAM J. Computing, vol. 30, no. 6, pp.2008–2035, Mar. 2001. Preliminary version in FOCS 1998.

[19] D. Micciancio and S. Goldwasser,Complexity of Lattice Problems: ACryptographic Perspective. Boston, MA: Kluwer, Mar. 2002, vol. 671,The Kluwer International Series in Engineering and Computer Science.

[20] O. Regev, “Improved inapproximability of lattice and coding problemswith preprocessing,”, Manuscript, Aug. 2002.

[21] M. Tsfasman and S. Vladut, Algebraic–Geometric Codes. Dordrecht,The Netherlands: Kluwer, 1991.

[22] A. Vardy, “Algorithmic complexity in coding theory and the minimumdistance problem,” inProc. 29h Annu. ACM Symp. Theory of Com-puting, STOC’97. El Paso, TX, May 4–6, 1997, pp. 92–109.

[23] , “The intractability of computing the minimum distance of a code,”IEEE Trans. Inform. Theory, vol. 43, pp. 1757–1766, Nov. 1997.

[24] V. Zinoviev and S. Litsyn, “On codes exceeding the Gilbert bound”(in Russian),Problems of Information Transmission, vol. 21, no. 1, pp.109–111, 1985.

[25] A. Lobstein, “The hardness of solving subset sum with preprocessing,”IEEE Trans. Inform. Theory, vol. 36, pp. 943–946, July 1990.

17The same problem on integer lattices is also the current bottleneck to extendthe inapproximability factor of the shortest vector problem beyond

p2 [18].

Hardness of approximating the minimum distance of a linear...

Documents

Transcript of Hardness of approximating the minimum distance of a linear...