Satisfiable k -CNF formulas above the threshold Danny Vilenchik.
-
Upload
gary-patterson -
Category
Documents
-
view
218 -
download
2
Transcript of Satisfiable k -CNF formulas above the threshold Danny Vilenchik.
Satisfiable Satisfiable kk-CNF formulas -CNF formulas aboveabove the threshold the threshold
Danny VilenchikDanny Vilenchik
SAT – Basic Notions
3CNF form:
F = (x1Çx2Çx5) Æ (x3Çx4Çx1) Æ (x1Çx2Çx6) Æ…
Ã
F = ( F ÇF Ç T ) Æ ( T Ç T Ç T ) Æ ( T Ç F Ç T )Æ…
x1x2x3x4x5x6
FFTFFT
Goal: algorithm that produces optimal result, efficient, and works for all inputsGoal: algorithm that produces optimal result, efficient, and works for all inputs
SAT - More Background …
Finding a satisfying assignment is NP Hard [Cook’71] No approximation for MAX-SAT with factor better than 7/8 [Hastad’01] How to proceed? Hardness results only show that there exist hard instances Many researchers take the heuristical approach
Typical instance? One possibility: random models, average case analysis
Heuristic is a polynomial time algorithm that produces optimal resultson typical instances
Heuristic is a polynomial time algorithm that produces optimal resultson typical instances
Random SAT Distributions Most popular k-SAT distribution is the uniform distribution:
Fix c,k and chose u.a.r. m=cn clauses out of possible clauses
[Fri99] Phase Transition: there exists a number d=d(k,n) such that m/n>d: most k-CNF's are not satisfiable (k=3, d<4.506, [DBM00])
m/n<d : most k-CNF’s are satisfiable (d>3.42, [KKL02])
Simple upper bound on d - 2kln2
Proof idea: pick (2kln2)n random clauses; the expected number of satisfying assignments goes from !(1) to o(1)
Too far off for small k: 23ln2 ≈ 5.545 (in particular, no tight concentration)
Major open question: what is the correct value of d? Conjectured: 2kln2-c, c some universal constant
Proven so far: at least 2kln2-k/2 [AP05]
k
nk2
Random SAT cont.
The below threshold regime was extensively studied (satisfiability) :
Rigorous analysis of heuristics: Pure Literal [BFU93], RWalkSAT [AB03]
Experimental results: Survey Propagation [BMZ05]
The typical structure of the solution space (clustering) [AR06, MMZ05]
Focus on near-threshold formulas: trying to figure out the threshold
Above-threshold regime is interesting mathematically and algorithmically The above-threshold regime is not necessarily “easy”
Why not consider a satisfiable 3CNF instance with 7n clauses?
The uniform distribution with m/n>d is not suitable for average case analysis of satisfiability heuristics (it is for refutation)
Above Threshold SAT Distributions
Average case analysis of above threshold k-CNF formulas is scarce (no more than 5 papers until 2003)
Why is that? How to define a “meaningful” distribution over a negligible fraction of k-CNFs?
Maybe such a distribution will not be approachable using current techniques
Maybe such a distribution will not even be efficiently-sampleable
Our main contribution:
arigorousaverage case study (algorithmically and structural properties)
of the above-threshold satisfiable kSAT regime-
Our main contribution:
a rigorous average case study (algorithmically and structural properties)
of the above-threshold satisfiable k-SAT regime
(x1Çx2Çx5)Æ(x3Çx4Çx1)Æ(x1Çx2Çx6) …
What was known so far? The only distribution that was studied is the planted distribution
Fix some assignment à to the n variables
Fix c and include m=cn clauses u.a.r. out of clauses which are satisfied by Ã
Planted models also “fashionable” for graph coloring, max clique, max independent set, min bisection …
What was known so far? (for k=3)
k
nk )12(
[KP92] Greedy Algorithm
m/n £(n)
Simple Exercise:Majority Vote
£(logn)Suff. Large Constant
[Fla03]Spectral Algorithm
[AB03]RWalkSAT fails
Experimental:a variant of RWalkSAT
O(1)
x1x2x3x4x5x6
FFTFFT
Our Results
[Krivelevich, Coja-Oghlan, V. 07]We characterize the structure of the solution space of a typical planted formula with m/n>c, c some sufficiently large constant:
1. All sat. assignment are within Hamming distance e-(m/n)n
2. All sat. assignments agree on all but e-(m/n)n variables
Below threshold “complicated” clustering of random k-SATPart of this was proven in [AR06]
When m/n= (logn) then 2. implies only one satisfying assignment
The Uniform Distribution
Pick m clauses u.a.r. out of all possibilities conditioned on satisfiability
What makes this distribution harder to analyze?
Clauses are no longer independent
Not clear how (if at all possible) to sample it efficiently
Long standing open question: is this distribution tractable for m/n=O(1)?
Was shown to be tractable for m/n= (logn) [BBG03]
k
nk2
[Krivelevich, Coja-Oghlan, V. 07]
We describe a deterministic polynomial time algorithm that finds a satisfying assignment foralmost all satisfiable kCNF- formulas
with m/n>C,C )k( a sufficiently large constant
[Krivelevich, Coja-Oghlan, V. 07]
We describe a deterministic polynomial time algorithm that finds a satisfying assignment for almost all satisfiable k-CNF formulas
with m/n>C, C(k) a sufficiently large constant
The Uniform Distribution
Improving upon the exponential time algorithm for uniform satisfiable 3CNFs in this regime (only one known so far, [Chen03])
We show that the planted and uniform distributions share many
structural properties (“close”)
In particular, same single-cluster structure of the solution space
Flaxman’s algorithm [Fla03] works for the uniform distribution as well
Justifying the somewhat unnatural usage of planted-solution models
How to approach the uniform distribution?
A – a “bad” structural property
¹ – expected number of satisfying assignments of uniform k-CNF
In the sparse regime, ¹ is exponential in n This approach works only for extremely rare bad properties
How about bad properties that occur w.p. 1/poly(n)?
(tedious) counting argument …
Lemma [KCOV’07]: Pruniform[A] < ¹¢Prplanted[A] Lemma [KCOV’07]: Pruniform[A] < ¹¢Prplanted[A]
Exclude a fixed graph on 10 vertices with 40
edges
Case study: proving the existence of a single cluster
Consider the planted 3SAT distribution (m clauses included u.a.r.)
m/n sufficiently large constant
Every variable x is expected to support 3m/(7n) clauses w.r.t. the planted
Pr[x supports C]=Pr[x supports C|x appears in C]Pr[x appears in C]=
1/7 ¢ 3/n = 3/(7n)
) E[support of x]=3m/(7n)
( x Ç y Ç z ) = (T Ç F Ç F)
Typical planted 3CNF instances
Fact 1: whp there is no set H of h variables s.t. h<n/1000 and thereare at least hm/(10n) clauses containing two variables from H
Fact 1: whp there is no set H of h variables s.t. h<n/1000 and thereare at least hm/(10n) clauses containing two variables from H
V
H
( x4 Ç x7 Ç x16 )
( x43 Ç x10 Ç x41 )
( x1 Ç x4 Ç x6 )
( x22 Ç x7 Ç x54 )
( x21 Ç x4 Ç x88 )
Pr[orange clause]· 3(h/n)2
E[ # orange clauses] = 3m¢ (h/n)2 = (hm/n) ¢ (3h/n) · hm/(300n)
Typical planted 3CNF instances
Fact 2: whp there are no two satisfying assignments at distance > n/1000Fact 2: whp there are no two satisfying assignments at distance > n/1000
T T T T T T T T T T … T (the planted)
à F F F F F F F F T T…T T
n/1000
( x4 Ç x7Ç x16 )
1. Unsatisfied under à but satisfied by – can potentially be included, but is not
2. There are (n3) such clauses – very small probability for none to be included
Clustering cont.
Claim: suppose that F is typical and every variable has the expected support then F is uniquely satisfiable
Claim: suppose that F is typical and every variable has the expected support then F is uniquely satisfiable
Proof: suppose not,
Let be the planted assignment and à some other satisfying assignment
Take x s.t. Ã(x)(x), x supports 3m/(7n) clauses w.r.t.
Consdier such clause (T Ç F Ç F)
Define H={ x : Ã(x)(x) }, h=|H|<n/1000 (Fact 1)
There exists 3hm/(7n) clauses containing two variables from H
This contradicts Fact 2
Proof: suppose not,
Let be the planted assignment and à some other satisfying assignment
Take x s.t. Ã(x)(x), x supports 3m/(7n) clauses w.r.t.
Consdier such clause (T Ç F Ç F)
Define H={ x : Ã(x)(x) }, h=|H|<n/1000 (Fact 1)
There exists 3hm/(7n) clauses containing two variables from H
This contradicts Fact 2
F TÃ:
Clustering cont.
This picture is whp the case when m/n>Clog n When m/n=O(1) - whp not the case (some variables have 0 support)
Definition: Given a 3CNF F and a satisfying assignment Ã, a set CµV is called a core of F if 8x2C, x supports at least m/(4n) clauses in F[C]
Definition: Given a 3CNF F and a satisfying assignment Ã, a set CµV is called a core of F if 8x2C, x supports at least m/(4n) clauses in F[C]
Claim: For F in the planted distribution, m/n sufficiently large constantthere exists a core C s.t. w.r.t. the planted assignment s.t. |V(C)|>(1-e-m/n)n C is frozen in F
Claim: For F in the planted distribution, m/n sufficiently large constantthere exists a core C s.t. w.r.t. the planted assignment s.t. |V(C)|>(1-e-m/n)n C is frozen in F
Corollary: one-cluster structureCorollary: one-cluster structure
( x Ç y Ç z )
x z
y
w
( x Ç y Ç w )
Moving to the Uniform Case
A – a “bad” structural property (in our case: “no big core”) –expected number of satisfying assignments of uniform 3CNF
Lemma: Pruniform[A] < ¢Prplanted[A] Lemma: Pruniform[A] < ¢Prplanted[A]
Claim: Pruniform[no big core] < ¢Prplanted[no big core]< ¢e-®nClaim: Pruniform[no big core] < ¢Prplanted[no big core]< ¢e-®n
Claim: <e¯n, ¯<®Claim: <e¯n, ¯<®
Corollary: Pruniform[no big core] = o(1)Corollary: Pruniform[no big core] = o(1)
The Average-Case Complexity of k-SAT
2kln2
Unit Clause [CF86]
2k/k 1002kln2
[KCOV’07]threshold
The conditioned uniformdistribution
Planted k-SAT – closer to the thresholdJoint work with U. Feige and A. Flaxman
What happens when m/n=O(1), not necessarily a large constant
Let F be a random planted k-CNF with m clauses
Set fx - # sat. assignments at distance xn from the planted
Set px - the probability that assignment à at distance xn satisfies F
[ ]x x
nE f p
xn
px depends only on x
px decreases with x, binomial coefficient maximized at n/2
k=35, c=1.05¢2kln2
lnfx/n
x
x
lnE[fx]/n
x
Single cluster regime
This is true actually for c=2kln2+k
k=35, c=(1+0.0000…)2kln2
x
x
lnE[fx]/n
This may imply that there is more than one cluster – to verify can use second moment (similar stuff were done by [AR06, MMZ05])
We show a regime with same plot but there is only one cluster (counting minimal satisfying assignments)
The Uniform Distribution
Define Dx = # of pairs of sat. assignments at distance xn
Similar phenomenon occurs with Dx (single cluster) near the threhold
Need to estimate events of the form Pr[Ai and Aj satisfy F]
Ai and Aj are assignments at distance xn
It is not even clear how to estimate Pr[Ai]
This is easy in the non-conditioned uniform distribution: (1-2-k)m
Future Challenges
The algorithmic understanding of sparse instances is lacking
m/n=O(1), not necessarily large enough constant
Experimental results for algorithms that work for planted 3SAT
The geometry of the solution space is simple – adjust current algorithms
Thanks !
The Uniform Distribution – “Online” Version Joint work with M. Krivelevich, B. Sudakov
Randomly permute all possible clauses
Start with F=;; F=F[Ci if F remains satisfiable
Similar models were studied for graph problems
[ESW92] for random triangle-free graphs
Easy fact: at the end of the process F has only one satisfying assignment and contains clauses
What happens when the process is stopped after m iterations?
We describe a deterministic polynomial time algorithm
that finds whp a satisfying assignment for such k-CNF formulas
with m/n>c, c(k) a sufficiently large constant
We describe a deterministic polynomial time algorithm
that finds whp a satisfying assignment for such k-CNF formulas
with m/n>c, c(k) a sufficiently large constant
k
nk2
k
nk )12(
SAT and Message Passing
Graphical Models for SAT and Warning Propagation:
x1
.
.
.
xi
.
.
.
xn
C1
.
.
.
Cj
.
.
.
Cm
xi appears in Cj
xi tells Cj its current preferred assignment (-1,0,1)
Cj sends xi a warning if all other
literals indicate that they falsify Cj
Message passing algorithms are widely used in many areas of CS (also practically): AI, Coding Theory, CSP
Warning Propagation is the “primal ancestor“ of the Belief Propagation algorithm [Pearl88] and the Survey Propagation [MMZ05]
Survey Propagation seems powerful in solving “hard” 3SAT instances (where other methods fail) [MMZ05]
SAT and Message PassingJoint work with U. Feige and E. Mossel
Reinforces the following “folklore” view:
When clustering is complicated ) formulas are hard ) sophisticated algorithms needed: Survey Propagation
When clustering is simple ) formulas are easy ) naïve algorithms work: Warning Propagation
Warning Propagation solves whp (planted/uniform) 3CNF formulas
with m/n>c ,c some sufficiently large constant, inOlog)n(iterations
Warning Propagation solves whp (planted/uniform) 3CNF formulas
with m/n>c, c some sufficiently large constant, in O(logn) iterations