A generalized model for understanding evasiveness

4
Information Processing Letters 30 (1989) 205-208 North-Holland 27 February1989 A GENERALIZED MODEL FOR UNDERSTANDING EVASIVENESS Alok AGGARWAL and Don C0PPERSMITII IBM Research Division, T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10.598, U.S.A. Dan IUEITMAN Department of Applied Mathematics, MIT, Cambridge, MA 02139, U.S.A. Communicated by David Gries Received 15 June 1988 Revised 3 August 1988 We consider the foIIowing prob,em that is related to the notion of evasiveness: Suppose an oracle contains a symmetric n X n matrix in which aB entries are either 0 or 1. And suppose an algorithm can ask the oraclc how many 1s are present in the submatrix formed by rows and columns indexed il, iz,. . . , ik, for any 1~ k G n. Then, determine the minimum number of questions that must be asked by the algorithm in order to correctly output the entire matrix. We show that n2/4 log n questions are sometimes necessary and there exists an algorithm that correctly outputs the matrix by asking at most (2n2/log n) + o(n2/log n) questions. In the corresponding generalized decision tree model, we observe an upper bound on the number of questions asked for determining the connected components of an n-vertex graph; this upper bound is away from the straightforward lower bound by a log n factor. Keywords: Evasive, symmetric matrix, monotone graph property, undirec&d graph, connectivity. 1. Introduction 3, we first consider the following question: Let C(k) denote the minimum number of en- tries that need be examined in the worst case by any algorithm for computing an n-vertex graph property P, when the input graph is given by its adjacency matrix. Then, P is said to be euasive if C(P) = Q(n2) and, in 1978, Rivest and VuiIlemin [2] established the Aanderaa-Rosenberg conjec- ture [3] by showing that every monotone graph property is evasive. In this paper, we consider a variant of evasiveness in which the kind of ques- tions allowed are more general than those allowed by the Aanderaa-Rosenberg conjecture. In par- ticular, if G = (v, E) and if U c v, then the al- gorithm can inquire about the number of edges that are in the subgraph that is induced by the vertices of U. Motivated by this, in Sections 2 and Suppose an oracle contains a symmetric n X n matrix in which all entries are either 0 or 1. And suppose an algorithm can ask from the oracle how many 1s are present in the submatrix formed by rows and columns indexed iI, iz, . . . , ik for any 1~ k 6 n. Then, determine the minimum number of questions that must be asked by the algorithm in order to correctly output the entire matrix. Section 2 considers the corresponding problem for one-dimensional arrays and, in Section 3, we show that n2/4 log n questions are necessary and there exists an algorithm that correctly outputs the matrix by asking at most (2n*/log n ) + o(n*/log n) questions. In Section 4, we observe an upper bound of O(n log n) questions asked in this model for determining the connected compo- nents of an n-vertex undirected graph. This upper 0020-0190/89/$3.50 6 1989, Elsevier Science Publishers B.V. (North-Holland) 2G5

Transcript of A generalized model for understanding evasiveness

Information Processing Letters 30 (1989) 205-208 North-Holland

27 February 1989

A GENERALIZED MODEL FOR UNDERSTANDING EVASIVENESS

Alok AGGARWAL and Don C0PPERSMITII

IBM Research Division, T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10.598, U.S.A.

Dan IUEITMAN

Department of Applied Mathematics, MIT, Cambridge, MA 02139, U.S.A.

Communicated by David Gries Received 15 June 1988 Revised 3 August 1988

We consider the foIIowing prob,em that is related to the notion of evasiveness: Suppose an oracle contains a symmetric n X n matrix in which aB entries are either 0 or 1. And suppose an algorithm can ask the oraclc how many 1s are present in the submatrix formed by rows and columns indexed il, iz,. . . , ik, for any 1~ k G n. Then, determine the minimum number of questions that must be asked by the algorithm in order to correctly output the entire matrix.

We show that n2/4 log n questions are sometimes necessary and there exists an algorithm that correctly outputs the matrix by asking at most (2n2/log n) + o(n2/log n) questions. In the corresponding generalized decision tree model, we observe an upper bound on the number of questions asked for determining the connected components of an n-vertex graph; this upper bound is away from the straightforward lower bound by a log n factor.

Keywords: Evasive, symmetric matrix, monotone graph property, undirec&d graph, connectivity.

1. Introduction 3, we first consider the following question:

Let C(k) denote the minimum number of en- tries that need be examined in the worst case by any algorithm for computing an n-vertex graph property P, when the input graph is given by its adjacency matrix. Then, P is said to be euasive if C(P) = Q(n2) and, in 1978, Rivest and VuiIlemin [2] established the Aanderaa-Rosenberg conjec- ture [3] by showing that every monotone graph property is evasive. In this paper, we consider a variant of evasiveness in which the kind of ques- tions allowed are more general than those allowed by the Aanderaa-Rosenberg conjecture. In par- ticular, if G = (v, E) and if U c v, then the al- gorithm can inquire about the number of edges that are in the subgraph that is induced by the vertices of U. Motivated by this, in Sections 2 and

Suppose an oracle contains a symmetric n X n matrix in which all entries are either 0 or 1. And suppose an algorithm can ask from the oracle how many 1s are present in the submatrix formed by rows and columns indexed iI, iz, . . . , ik for any 1~ k 6 n. Then, determine the minimum number of questions that must be asked by the algorithm in order to correctly output the entire matrix.

Section 2 considers the corresponding problem for one-dimensional arrays and, in Section 3, we show that n2/4 log n questions are necessary and there exists an algorithm that correctly outputs the matrix by asking at most (2n*/log n ) + o(n*/log n) questions. In Section 4, we observe an upper bound of O(n log n) questions asked in this model for determining the connected compo- nents of an n-vertex undirected graph. This upper

0020-0190/89/$3.50 6 1989, Elsevier Science Publishers B.V. (North-Holland) 2G5

Voluw 30, Number 4 INFORMATION PROCESSING LETI’ERS 27 February 1989

bound is away from the straightforward lower bou!a:‘s by a log n factor. Section 4 also discusses some ops problems.

2. Tight bounds on the number of questions for linear arrays

2.1. ition. Suppose an oracle contains a lin- ear array that has n cells containing either 0 or 1. And suppose an algorithm’ can give the oracle sub- sets q $ (l,..., n) and ask the number of Is among the cells of q. The algorithm may be adaptive (i.e.+ the t-th question may depend on the answers to the first t - 1 questions). Any such algorithm must ask at least n/log(n + 1) questions to correctly output this array, in the worst case.

Prmf. For every question asked by the algorithm, t3.c oracle’s answer is an integer between 0 and n. If the algorithm asks at most m questions, it can get (n + l)m distinct answers, and since there are 2” distinct linear arrays, (n + l)m >, 2”. Cl

2.2. hoposition. Xhere is an adaptive deterministic algorithm that asks the oracle at most 4n/log n + o(nflog n) questions and correctly outputs the array.

Proof. Let Ct denote the set of possible configura- tions of the array that is consistent with the first t answers, and let K, denote its size, so that K, = 2”. An element of C, will be called a codeword. A question will be denoted by a subset q of (1,2,=.., c

n>, and its answer will be A,(x) = iEqXi where x E C, is the actual configuration of

the array. The first question will be q = {1,2,... , n ); its answer reveals m = wt(x), the total number of 1s in the entire array. Conse- quently,

K,=(Z)<( ;n). We will show that by using the answers of the

first t questions, we can determine the (t + l)st question such that the answer to this question will yield

K,+~ G /&/n’“-’ if Kt 2 2@

\K : I otherwise

for aily c between 0 and l/68, and for sufficiently large values of n. Using this, it can be readily seen that the total number of questions is at most

log :n ( 1 - log(2"1-e)

l+ log( n1’4-t)

+ I + log(2”Y

log(:) ’

i.e., at most (4 + 17e)n/log n for sufficiently large values of n.

To prove the above claim regarding K,, select a subset q of (1,. . . , n > randomly, with the uniform distribution. Define p(x, x) = 1. For two differ- ent elements X, y of Cl, we compute the probabil- ity p(x, y) that AJ x) = AJ y). Suppose that x and y differ in 24 = 24(x, y) positions, i.e.; there are A positions where x has 1 and y has 0, a.nd there are A positions where x has 0 and y has 1, since wt( x) = wi( y ). Then,

p(X, y)=Probl(((iEq:xi=kAyi=C) 1

= ((iEq:Xi=OAyi=l) I)

= P(A)

= /22A = y /22A ( )

<l/G. Now, if N(x, A) denotes the number of code-

words y with wt(x) = wt( y) at Hamming dis- tance 24 from a given cudeword X, then

Clearly, the answer to a question q will break C, will break Cr into several blocks, of sizes b,, b i ,..., bI,,, where Cibi= Kt. Then, K,+* < maibi. Consider the random variable Cibf. Its expectation (over the various choices of q) is

= c z P(X? Y) XEC, yEC,

= c (I+ c JJ(x, d)f’(n)) XEC, 82.1

<C l+ (

C n2’*1 x l<A<T

+ c Nb9 wn) A>,T

< K,(l -I- n2T+ K/m)

206

Volume 30, Number 4 INFORMATION Z’ROCESSINC LE’lTERS 27 February 1989

for a suitably chosen value of T. On selecting T = n1W4r, we find that, for Kt > 2”-*’ and for sufficiently large values of n, this expression is bounded by K~/n’/2-2C.

Gven the above bound on the expectation E(ci bf ), select a particular question 4 for which Cibf is no larger than the expected value. Clearly, for this value of q, the largest block bi is bounded

bY

For smaller values of K,, the fact that p(x, u) < 4 for x z y implies that

E

Now, the case Kt E (2,3) can be checked im- mediately. And, for Kt >, 4, if we select a question q such that the expected value given above is not exceeded, then we find that Kt+I/Kt < i. (For Kt > 8, K,+,/K, < i can be checked easily and a simple case analysis shows that K,,,/K, < $ also holds for Kt = 4, 5, 6, 7.) 0

3. Tight bounds on the number af questions for two-&nensional arrays

3.1. Proposition. Suppose an oracle contains a sym- metric n x n array whose (i, j)-th entry xi,j is 0 or 1. And suppose that an algorithm may supply the oracle with a subset q of (1, 2,. . . , n ), and receive the number of Is among the cells whose row and column indices both lie in q: Ci, jc qxij. Then, n*/(4 log n) questions are always necessary, and there exists an algorithm that can correctly output the math by asking at most (2n*/log n) + o( n ‘jiffy n ) questions.

Proof. The proof of the lower bound is similar to that given in Proposition 2.1. For every question, the answer of the oracle has tc lie between 0 and n*. So if the algorithm asks at most m questions, then

(n* + l)m 2 2n(n+1)/2e

This implies that n2 2 II *,X log n. For the upper bound, we present below an algorithm that asks at most F(n) questions where

+ (W x (4 + 00)) x ((tn)h(+n))) questions to correctly output the array. Since F(1) = 1, this recurrence can be easily seen to yield

F(n) f (h*/log n) + o(n*/log n).

The algorithm first calls itself recursively to deduce the upper left-hand quarter and the itiwci

right-hand quarter of the given matrix. In doing so, it uses F([inl) + F(l$tJ) queries. Next, for each iG {1,2,..., C&l), the algorithm asks ques- tions that are restricted to the set (i, [$zl + 1, [$I + 2,..., n), and that contains i. Clearly, this re- duces the two-dimensional problem to the one-di- mensional case-the entries in the lower right and upper left are known, and the subset of the ith column in the query simply duplicates the subset of the ith row. So, after at most

(4 + o(l)) x 4l~nl)/logM)

queries, it can determine the contents of the ith row. Hence, by repeating this for i E (1, 2, . . . . i$nl), the algorithm can determine the con- tents of the given zratrix, and a sLrp!e ana!ysis shows chat it asks at most F(n) questions where F(n) obeys the recurrence relation given above.

cl

4. Discussion

In Sections 2 and 3, we obtained optimal bounds (within constant factors) OR the number of ques- tions asked to determine the contents of a one-di- mensional array and of a two-dimensional sym- metric array containing 0s and 1s only. However, in both cases, we proved the upper bound by showing that there exists an optimal algorithnh, even though we could not exhibit one. Conse- quently, i-t remains challenging to exhibit al- gorithms that ask the optimal numbers of ques- tions for determining the contents of one- and two-dimensional arrays.

207

Volume 30, Number 4 INFORMATION PROCESSING LETTERS 27 February 1989

Given an undirected graph with n vertices, suppose 51,*r algorithm can inquire the number of edges in the subgraph induced by any set 0% vertic:e Following Rosenberg [3], it would be of great interest to obtain tight bounds in this model, on the number of questions asked for finding the connected components and for other graph prob- lems. In this regard, we observe that the result of Hajnal, Mass, and Turan [l] can be modified in the following manner to show that O(n log n) queries are sufficient to determine the connected componcwts.

Hajnal, Mass, and Turan assume a model in which an algorithm can inquire whether there is at least one edge between subsets of vertices A, ip (where AC V, B’c I/ and AnB=@. And they show that O(n log n) queries are sufficient to determine the connected components in their model. Now, we can simulate a question asked by their algorithm by asking three questions in our model, nmely, the ntumber of edges in A c V, the number of edges in B c V, and the number of edges in A U B C I? Iat, our model also, 0( n log n) questions are sufficient to determine the con- nected components of an n-vertex undirected graph. Now, for the model considered in [l], 52( n log n) is also a lower bound on the number of queries asked for determining the connected corn--

ponents, since the corresponding decision tree has bounded degree and since there exists a pos;ltive constant c such that there are a(zc” log “) ways in which the n vertices could be assigned to their connected components. (In fact, Hajnal et al. show that Q(n log n) is a lower bound in their model even for determining whether a graph is con- nected.) However, unlike the model given by Hajnal et al., in our model, the above algorithm is a log n factor from the straightforward lower bound.

Acknowkdgment

We are grateful to the referees for improving the readability of this paper.

Referemes

(11

PI

[31

A. Hajnal, W. Maass and G. Turan, On the communication complexity of graph properties, In: Proc. 20th Ann. ACM Symp. on Theory of Computing (1988) 186-191. R. Rivest and J. Vuillemin, On recognizing graph proper- ties from adjacency matrices, Thewe?. Compur. Sci. 3 (1978) 371-384. A. Rosenberg, On the time required to recognize properties of graphs: A problem, SIGACT News 5 (1973) 15-16.

208