1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

43
1 Sampling, Counting, and Probabilistic Inference Wei Wei joint work with Bart Selman
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

Page 1: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

1

Sampling, Counting, and Probabilistic Inference

Wei Weijoint work with Bart Selman

Page 2: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

2

The problem: counting solutions

¬a b c¬a ¬b

¬b ¬c

c d

Page 3: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

3

Motivation

Consider the standard logical inference

iff ( ) is unsat

there doesn’t exist a model in in which is true.

in all models of , query holds

holds with absolute certainty

Page 4: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

4

Degree of belief

Natural generalization: degree of belief of is defined as P( | ) (Roth, 1996)

In absence of statistical information, degree of belief can be calculated as

M( ) / M( )

Page 5: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

5

Bayesian Nets to Weighted Counting (Sang, Beame, and Kautz, 2004)

Introduce new vars so all internal vars are deterministic

A

B

A ~A

B .2 .6

A .1Query: Pr(A B)

= Pr(A) * Pr (B|A)

= .1 * .2 = .02

Page 6: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

6

SAT is NP-complete. 2-SAT is solvable in linear time.

Counting assignments (even for 2cnf, Horn logic, etc) is #P-complete, and is NP-hard to approximate to a factor within ( (Valiant 1979, Roth 1996).

Approximate counting and sampling are equivalent if the problem is “downward self-reducible”.

Complexity

Page 7: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

7

(Roth, 1996)

Page 8: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

8

Existing method: DPLL(Davis, Logemann and Loveland, 1962)

(x1 x2 x3) (x1 x2 x3) (x1 x2)

DPLL was first proposed as a basic depth-first tree

search.

x1

x2

FT

T

null

F

solution

x2

Page 9: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

9

Existing Methods for Counting

CDP (Birnbaum and Lozinskii, 1999)

Relsat (Bayardo and Pehoushek, 2000)

Page 10: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

10

Existing Methods

cachet (Sang, Beame, and Kautz, 2004)

1. Component caching

2. Clause learning

Page 11: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

11

Conflict Graph

Decision scheme(p q b)

1-UIP scheme(t)

p

q

b

a

x1

x2

x3

y

yfalset

Known Clauses(p q a)

( a b t)(t x1)(t x2)(t x3)

(x1 x2 x3 y)(x2 y)

Current decisionsp falseq falseb true

Page 12: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

12

Existing Methods

Pro: get exact count

Cons: 1. Cannot predict execution time

2. Cannot halt execution to get an approximation

3. Cannot handle large formulas

Page 13: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

13

Our proposal: counting by sampling

The algorithm works as follows (Jerrum and Valiant, 1986):

1. Draw K samples from the solution space2. Pick a variable X in current formula3. Set variable X to its most sampled value t, and

the multiplier for X is K/#(X=t). Note 1 multiplier 2

4. Repeat step 1-3 until all variables are set5. The number of solutions of the original formula is

the product of all multipliers.

Page 14: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

14

X1=TX1=F

assignments

models

Page 15: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

15

Research issues

how well can we estimate each multiplier?

    we'll see that sampling works quite well.      

how do errors accumulate? (note formula can have  hundreds of variables; could potentially be very bad)

surprisingly, we will see that errors often cancel each other out.

Page 16: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

16

Standard Methods for Sampling - MCMC

Based on setting up a Markov chain with a predefined stationary distribution.

Draw samples from the stationary distribution by running the Markov chain for sufficiently long.

Problem: for interesting problems, Markov chain takes exponential time to converge to its stationary distribution

Page 17: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

17

Simulated Annealing

Simulated Annealing uses Boltzmann distribution as the stationary distribution.

At low temperature, the distribution concentrates around minimum energy states.

In terms of satisfiability problem, each satisfying assignment (with 0 cost) gets the same probability.

Again, reaching such a stationary distribution takes exponential time for interesting problems. – shown in a later slide.

Page 18: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

18

Question: Can state-of-the-art local search procedures be used for SAT sampling? (as alternatives to standard Monte Carlo Markov Chain)

Yes! Shown in this talk

Page 19: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

19

Our approach – biased random walk Biased random walk = greedy bias +

pure random walk. Example: WalkSat (Selman et al, 1994), effective on SAT.

Can we use it to sample from solution space?

– Does WalkSat reach all solutions?

– How uniform is the sampling?

Page 20: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

20

WalkSat (50,000,000 runs in total)

visited 500,000 times

visited 60 times

Hamming distance

Page 21: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

21

Probability Ranges in Different Domains

Instance Runs Hits Rarest

Hits Common

Common-to -Rare Ratio

Random 50 106

53 9 105 1.7 104

Logistics planning

1 106 84 4 103 50

Verif. 1 106 45 318 7

Page 22: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

22

Improving the Uniformity of Sampling

SampleSat:– With probability p, the algorithm makes a

biased random walk move– With probability 1-p, the algorithm makes a

SA (simulated annealing) move

WalkSat

Nonergodic

Quickly reach sinks

Ergodic

Slow convergence

Ergodic

Does not satisfy DBC

SA = SampleSat+

Page 23: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

23

Comparison Between WalkSat and SampleSat

WalkSat SampleSat

104

10

Page 24: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

24

WalkSat (50,000,000 runs in total)

Hamming distance

Page 25: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

25

SampleSat

Hamming Distance

174 sols, r = 11Total hits = 5.3mAverage hits = 30.1k

704 sols, r = 14Total hits = 11.1mAverage hits = 15.8k

39 sols, r = 7Total hits = 5.1mAverage hits = 131k

212 sols, r = 11Total hits = 2.9mAverage hits = 13.4k

192 sols, r = 11Total hits = 5.7mAverage hits = 29.7k

24 sols, r = 5Total hits = 0.6mAverage hits = 25k

1186 sols, r = 14Total hits = 17.3mAverage hits = 14.6k

Page 26: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

26

Instance Runs Hits Rarest

Hits Common

Common-to -Rare Ratio

WalkSat

Ratio SampleSat

Random 50 106

53 9 105 1.7 104 10

Logistics planning

1 106 84 4 103 50 17

Verif. 1 106 45 318 7 4

Page 27: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

27

Analysis

c1 c2 c3 … cn a bF F F … F F F

F F F … F F T

Page 28: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

28

Property of F*

Proposition 1 SA with fixed temperature takes exponential time to find a solution of F*

This shows even for some simple formulas in 2cnf, SA cannot reach a solution in poly-time

Page 29: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

29

Analysis, cont.

c1 c2 c3 … cn aT T T … T T

F F F … F T

F F F … F F

Proposition 2: pure RW reaches this solution with exp. small prob.

Page 30: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

30

SampleSat

In SampleSat algorithm, we can devide the search into 2 stages. Before SampleSat reaches its first solution, it behaves like WalkSat.

instance WalkSat SampleSat SA

random 382 677 24667

logistics 5.7 104 15.5 105 > 109

verification 36 65 10821

Page 31: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

31

SampleSat, cont.

After reaching the solution, random walk component is turned off because all clauses are satisfied. SampleSat behaves like SA.

Proposition 3 SA at zero temperature samples all solutions within a cluster uniformly.

This 2-stage model explains why SampleSat samples more uniformly than random walk algorithms alone.

Page 32: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

32

Back to Counting: ApproxCount

The algorithm works as follows (Jerrum and Valiant, 1986):

1. Draw K samples from the solution space2. Pick a variable X in current formula3. Set variable X to its most sampled value t, and

the multiplier for X is K/#(X=t). Note 1 multiplier 2

4. Repeat step 1-3 until all variables are set5. The number of solutions of the original formula is

the product of all multipliers.

Page 33: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

33

Random 3-SAT, 75 Variables(Sang, Beame, and Kautz, 2004)

sat/unsat threshhold

CDP

Relsat

Cachet

Page 34: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

34

Page 35: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

35

Within the Capacity of Exact Counters We compare the results of approxcount with those of the exact

counters.

instances #variables Exact count

ApproxCount Average Error per step

prob004-log-a 1790 2.6 1016

1.4 1016 0.03%

wff.3.200.810 200 3.6 1012

3.0 1012 0.09%

dp02s02.shuffled 319 1.5 1025

1.2 1025 0.07%

Page 36: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

36

And beyond …

We developed a family of formulas whose solutions are hard to count– The formulas are based on SAT encodings

of the following combinatorial problem– If one has n different items, and you want

to choose from the n items a list (order matters) of m items (m<=n). Let P(n,m) represent the number of different lists you can construct. P(n,m) = n!/(n-m)!

Page 37: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

37

Page 38: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

38

Page 39: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

39

Page 40: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

40

Conclusion and Future Work Shows good opportunity to extend

SAT solvers to develop algorithms for sampling and counting tasks.

Next step: Use our methods in probabilistic reasoning and Bayesian inference domains.

Page 41: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

41

The end.

Page 42: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

42

Page 43: 1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

43