The 2-Catalog

Post on 04-Jan-2016

34 views 1 download

description

The 2-Catalog. Segmentation. Problem. Joint work with Shmuel Safra. Motivation. Motivation. The Catalog Problem. Input: A set of customers C . A set of pages P . A function  : C  2 P . The catalog size r . - PowerPoint PPT Presentation

Transcript of The 2-Catalog

1

Joint work with Shmuel Safra

Joint work with Shmuel Safra

2

MotivationMotivation

3

MotivationMotivation

4

The Catalog ProblemThe Catalog ProblemInput: A set of customers C. A set of pages P. A function : C 2P. The catalog size r.

Output: A catalog P’ P of size r s.t. is maximal.

Cc'Pc

5

The Catalog Problem The Catalog Problem (cont.)(cont.)Algorithm:Take the r most popular pages.

6

Catalog SegmentationCatalog Segmentation

7

The k-Catalog The k-Catalog SegmentationSegmentationInput: A set of customers C. A set of pages P. A function : C 2P. The catalog size r.

Output: k catalogs P1,…,Pk P of size r each,

s.t. is maximal.

Cc

iki

Pcmax

8

Representation as a Representation as a GraphGraph We can consider the input as a bipartite

graph G = (C, P, E), whereE = { (c,p) | c C, p (c) }.

Then, our goal is to find k sets of vertices P1,…Pk P of size r each, and a partition of C into k sets C1,…,Ck s.t.| E ( P1C1 … Pk Ck) | is maximal.

9

Uniform Catalog ProblemUniform Catalog ProblemDefinition: A catalog problem is called

uniform if there exists a number d such that the degree of every vertex p P is d.

The maximum possible number of hits for a uniform catalog problem is krd.

Thus, we can normalize the number of hits and define

drkPC...PCE kk11maxGsat

10

HardnessHardnessTheorem (Kleinberg, Papadimitriou and

Raghavan): It is NP-hard to precisely

compute the optimal k catalogs.

11

ApproximationApproximationProposition: Taking the r most popular

pages in all k catalogs gives an approximation factor of 1/k.

Proof: In the optimal solution, there is a catalog that gives at least 1/k of the hits. Thus, using only this catalog leaves us with at least 1/k of the hits. Replacing this catalog by the r most popular pages can only increase the number of hits.

12

Dense InstancesDense InstancesKleinberg, Papadimitriou and Raghavan

gave an approximation scheme for dense instances, i.e. instances in which each customer is interested in at least fraction of the pages.

13

The PCPThe PCP A SAT instance = (1,…,n) over 2

types of variables: X and Y. The range of the variables x X is

RX = {0,1}l. The range of the variables y Y is {0,1}. Each i depends on exactly one x

X and one y Y, s.t the value assigned to x determines the value of y. Thus, we can write it as a function xy : Rx {0,1}.

14

The PCP (cont.)The PCP (cont.)It is NP-hard to distinguish between the

following 2 cases:

Good: There exists an assignment A s.t.

Bad: For any assignment A

1yAxAPr yxyx

21

yx yAxAPryx

15

The ReductionThe ReductionGiven an instance for the above PCP, let

G be the following instance for the 2-catalog segmentation problem:

P = { (x, a, s) | x X, a RX, s {0,1} } C = { (y, b) | y Y, b {0,1} } (x, a, s) (y, b)

xy and xy(a) = b s r = |X|

16

CompletenessCompletenessTheorem: If is satisfiable then sat(G) =

1.

Proof: Consider the following segmentation: i {0,1}, Pi = { (x, A(x), i) | x X}. y Y, (y, A(y)) gets P0 and (y, A(y))

gets P1.Thus, for every page in the catalogs, all the

customers that are interested in it get it, and hence sat(G) = 1.

17

We would like to show that: , = (), = () s.t. if sat(G) > ½ + then there exists an assignment A s.t.

.

We would like to construct an assignment according to the catalogs.

SoundnessSoundness

21

yx yAxAPryx

Problem: A catalog might contain many pages for the same x with different assignments.

18

Refining the PCPRefining the PCPSolution: Changing the PCP.

Good: There exists an assignment A s.t.

Bad: For any assignment A

1yAxAPr yxyx

21

yx yAxAPryx

21

yxXx

yAxAPrPryx

19

Choosing One CatalogChoosing One CatalogNow, assume sat(G) > ½ + . Thus, for

one of the catalogs, Pi’,

and hence

222

1'icp:cPp

CcPrPr'i

21

'icp:c,PpCcPr

'i

20

Choosing a Subset of Choosing a Subset of PagesPages Let .

Thus, |Pi’’| /2 |X|.

Now, let us keep only one page in Pi’’ for each x X, and denote the set by Pi’’’.|Pi’’’| 2-l /2 |X|.

221

'icp:c'i'i CcPr|Pp'P

21

Enforcing the Same sEnforcing the Same s s’ {0,1} s.t.

|{ (x, a, s’) | (x, a, s’) Pi’’’ }| 2-l+1 /2 |X|.

Denote the set of the corresponding x’s by X’.

For an appropriate value of , |X’| |X|.

22

Constructing an Constructing an AssignmentAssignmentWe would like to construct an assignment

as follows: x X’, assign the value of the

appropriate page. y Y, if (y, b) gets the catalog Pi’,

assign the value b s’ to y.

Thus, x X’, ½ + /2 of the clauses xy are satisfied.

23

ProblemProblemFor a variable y Y, both (y, 0) and (y, 1)

might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.

24

ProblemProblemFor a variable y Y, both (y, 0) and (y, 1)

might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.

25

Taking Subsets of x’sTaking Subsets of x’sInstead of taking one page for each (x, a,

s), we take a page for every tuple of: A subset of m x’s An assignment to A bit s

x

xA x

26

The PCPThe PCP = (1,…,n) over variables, X and Y, s.t.

it is NP-hard to distinguish between:

Good: There exists an assignment A s.t.

Bad: For any assignment A

1yAxAPr yxyx

21

yxXx

yAxAPrPryx

27

par[par[,k] - Definitions,k] - Definitions For a 3SAT formula over boolean

variables Y, let Y(k) be the set of allk-subset of Y, and let (k) be the set of all k- subset of .

VY(k), let SV be the set of all assignments to V.

C(k), let SC be the set of all satisfying assignments to C.

28

par[par[,k] – Definitions ,k] – Definitions (cont.)(cont.) VY(k), C(k), let V C if V is a choice

of one variable of each clause in C.

VY(k), C(k), s.t. V C let a|V denote the natural restriction of an a SC to SV.

29

par[par[,k] ,k] Definition: For a 3SAT formula over

boolean variables Y, denote by par[,k] the following instance:

There are 2 types of variables: W : x[V] for every V Y(k), over SV

Z : x[C] for every C (k), over SC

There is a local test [C,V] for everyV C that accepts x[C]|v = x[V].

30

par[par[,k] (cont.),k] (cont.)Definition: For a set of boolean clauses ,

let sat() denote the maximal fraction of clauses of that can be satisfied simultaneously.

Theorem: If sat() = 1 then sat(par[,k]) = 1. sat(par[, k]) sat()c·k for some c>0.

31

Long CodeLong CodeDefinition: An R-long-code has one bit for

each boolean f : [R] {0,1}.

32

The PCP of [ST]The PCP of [ST]For any bipartite graph G = ([k], [k], E) we

construct a SAT instance (G), that contains one boolean function for every choice of:

z Z v1,…vk LC[z] w1,…,wk W, s.t. 1 i k, wi z 1 i k, ui wi

k2 perturbation functions p1,1,…,pk,k

33

The PCP of [ST] (cont.)The PCP of [ST] (cont.) (v1,…,vk,u1,…,uk,p1,1,…,pk,k) = TRUE

(i,j)E, vi uj = ‘vi uj pi,j’.

Denote TRUEp,...,p,u,...,u,v,...,vPrp k,k1,1k1k1

p,u,v t,sji

34

The PCP of [ST] (cont.)The PCP of [ST] (cont.)Theorem: > 0, it is NP-hard to

distinguish between the following 2 cases:

Good: G = ([k], [k], E), p > (1 - )-|E|

Bad: G = ([k], [k], E), p < 2-|E|

35

Our PCPOur PCP A SAT instance = (1,…,n) over 2

types of variables: X and Y. The range of the variables x X is

RX = {0,1}l. The range of the variables y Y is

{0,1}. Each i is of the type xy : Rx

{0,1}.

36

Our PCP (cont.)Our PCP (cont.) Let k = l/2. Given an instance (G) as above, we

construct an instance as follows: There is a variable x X for every

test (G). An assignment to x is an assignment to the bits v1,…,vk,u1,…,uk.

Y = LC[W].

37

Our PCP (cont.)Our PCP (cont.)Theorem: , > 0 and for some

constant c = c( ) > 0, it is NP-hard to distinguish between:

Good: There exists an assignment A s.t.

Bad: For any assignment A

1yAxAPr yxyx

21

yxXx

yAxAPrPryx

2cl2

38

Our PCP (cont.)Our PCP (cont.)Lemma: If there exists an assignment A

s.t.

,

then, there exists a graph G = (V, U, E) and an assignment to LC[W] and LC[Z] s.t.p 2-|E|.

21

yxXx

yAxAPrPryx

39

Our PCP (cont.)Our PCP (cont.)Proof: Assume there exists an assignment

A s.t.

.

We assign the bits of LC[W] the values assigned to them by A, and the bits of LC[Z] are assigned random values.

21

yxXx

yAxAPrPryx

40

Our PCP (cont.)Our PCP (cont.)We now have to construct a graph G that

would satisfy the lemma.

We call an x good if .

Let x be good and let V0, U0 be the corresponding vertices.

21

yx yAxAPryx

41

Our PCP (cont.)Our PCP (cont.)V0 U0

V1 U1

U2

The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.

|V1| /2 k

The set of vertices in U0 that are consistent with x.

U0 \ U1

42

Our PCP (cont.)Our PCP (cont.)Proposition: There exists i {1,2} s.t.

|Ui| /4 k, and at least ½ + /4 of the edges between Ui and V1 are consistent with x.

43

Our PCP (cont.)Our PCP (cont.)The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.

|V1| /2 k

The set of vertices in U0 that are consistent with x.

U0 \ U1

V1 U1

V’

U’

44

Our PCP (cont.)Our PCP (cont.)V1 U1

V1

U1

U2

The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.

|V1| /2 k

The set of vertices in U0 that are consistent with x.

U0 \ U1

45

Our PCP (cont.)Our PCP (cont.) Let U’ Ui, V’ V1, s.t. |U’| = |V’| = /4

k, and at least ½ + /4 of the edges between U’ and V’ are consistent with x.

There are less than 22k possibilities to choose U’ and V’ there is a subset X’ of at least 2-2k (and thus of size at least2-2k |X|) of the good x’s with the same choice of U’ and V’.

46

Our PCP (cont.)Our PCP (cont.) Let X’’ be the subset of variables x X’

that are consistent with the random assignment to LC[Z].

The probability that A(x) is consistent with a random assignment to LC[Z] is 2-k

the expected size of X’’ is 2-k |X’|.

Therefore, there exists an assignment to LC[Z] s.t. |X’’| 2-3k |X|.

47

Our PCP (cont.)Our PCP (cont.) Let G be the multi-set of all graphs

G = (V’, U’, E), corresponding to the variables x X’’, where E is the set of all edges between U’ and V’ that are consistent with x.

|G| 2-3k |X|.

GG, |E| (½ + /4) (/4 k)2.

48

Our PCP (cont.)Our PCP (cont.)Lemma: Let G be a multi-set of bipartite

graphs on [k’][k’], s.t. each graph in G has at least (½ + ’)k’2 edges.Then, t ’/2 k’2, G = ([k’], [k’], E), s.t. |E| t and

. t2

'1

'E,'k,'k'GE'EPr

G

49

Our PCP (cont.)Our PCP (cont.)By the above lemma, for k’ = /4 k and

’ = /2, G = ([/4 k], [/4 k], E), s.t.|E| = t = c’ (/4 k)2, where c’ < /4, and all the edges of this graph are consistent in at least 2-3k (/4)t fraction of the variables in X.

Considering this graph over the vertex sets U and V gives the desired result.