Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases

Paul Beame Jerry Li Sudeepa Roy Dan Suciu

University of Washington

Model Counting• Model Counting Problem:

Given a Boolean formula F, compute #F = #Models (satisfying assignments) of F

e.g. F = (x y) (x u w) (x u w z) #Assignments on x, y, u, z, w which make F = true

• Probability Computation Problem:Given F, and independent Pr(x), Pr(y), Pr(z), …,

compute Pr(F)

Model Counting• #P-hard

▫ Even for formulas where satisfiability is easy to check

• Applications in probabilistic inference ▫ e.g. Bayesian net learning

• There are many practical model counters that can compute both #F and Pr(F)

•CDP•Relsat•Cachet•SharpSAT•c2d•Dsharp•…

Exact Model Counters

Search-based/DPLL-based(explore the assignment-space and count the satisfying ones)

Knowledge Compilation-based(compile F into a “computation-friendly” form)

[Survey by Gomes et. al. ’09]

Both techniques explicitly or implicitly • use DPLL-based algorithms • produce FBDD or Decision-DNNF compiled forms (output or trace)

[Huang-Darwiche’05, ’07]

[Birnbaum et. al.’99]

[Bayardo Jr. et. al. ’97, ’00]

[Sang et. al. ’05]

[Thurley ’06]

[Darwiche ’04]

[Muise et. al. ’12]

Model Counters Use Extensions to DPLL

• Caching Subformulas▫ Cachet, SharpSAT, c2d, Dsharp

• Component Analysis▫ Relsat, c2d, Cachet , SharpSAT, Dsharp

• Conflict Directed Clause Learning▫ Cachet, SharpSAT, c2d, Dsharp

• DPLL + caching + (clause learning) FBDD• DPLL + caching + component + (clause learning) Decision-DNNF

How much more does component analysis add?i.e. how much more powerful are decision-DNNFs than FBDDs?

Theorem:

• Decision-DNNF of size N FBDD of size Nlog N + 1

• If the formula is k-DNF, then FBDD of size Nk

• Algorithm runs in linear time in the size of its output

Main Result

Consequence: Running Time Lower Bounds

Model counting algorithm running time ≥ compiled form size

Lower bound on compiled form size Lower bound on running time

▫Note: Running time may be much larger than the size▫e.g. an unsatisfiable CNF formula has a trivial compiled form

Our quasipolynomial conversion+ Known exponential lower bounds on FBDDs

[Bollig-Wegener ’00, Wegener’02]

Exponential lower bounds on decision-DNNF size

Exponential lower bounds on running time of exact model counters

Consequence: Running Time Lower Bounds

Outline

•Review of DPLL-based algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDD & Decision-DNNF)

•Our Contributions▫Decision-DNNF to FBDD conversion▫ Implications of the conversion▫Applications to Probabilistic Databases

•Conclusions

DPLL Algorithms

Davis, Putnam, Logemann, Loveland [Davis et. al. ’60, ’62]

1 0 1 0

F: (xy) (xuw) (xuwz)

y(uw)3/87/8

Assume uniform distribution for simplicity

// basic DPLL:Function Pr(F):

if F = false then return 0if F = true then return 1select a variable x, return

½ Pr(FX=0) + ½ Pr(FX=1)

DPLL Algorithms

1 0 1 0

y(uw)3/87/8

The trace is a Decision-Tree for F

Extensions to DPLL

• Caching Subformulas

• Component Analysis

• Conflict Directed Clause Learning▫ Affects the efficiency of the algorithm, but not the final “form” of the trace

Traces of• DPLL + caching + (clause learning) FBDD• DPLL + caching + component + (clause learning) Decision-DNNF

Caching

½ Pr(FX=0) + ½ Pr(FX=1)

// DPLL with caching:Cache F and Pr(F);look it up before computing

Caching & FBDDs

y(uw)The trace is a decision-DAG for F

FBDD (Free Binary Decision Diagram)or

ROBP (Read Once Branching Program)

• Every variable is tested at most once on any path

• All internal nodes are decision-nodes

Decision-Node

Component Analysis

y (uw)

½ Pr(FX=0) + ½ Pr(FX=1)

// DPLL with component analysis (and caching):

if F = G Hwhere G and H have disjoint set of variablesPr(F) = Pr(G) × Pr(H)

Components & Decision-DNNF

y (uw)

The trace is a Decision-DNNF [Huang-Darwiche ’05, ’07]

FBDD + “Decomposable” AND-nodes

(Two sub-DAGs do not share variables)

Decision Node

01AND Node

How much power do they add?

Main Technical Result

Decision-DNNF FBDDEfficient construction

Size N Size Nlog N+1

(quasipolynomial)

Size Nk

(polynomial)k-DNFe.g. 3-DNF: (x y z) (w y z)

Outline

•Review of DPLL algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDDs & Decision-DNNF)

•Conclusions

Need to convertall AND-nodes to Decision-nodeswhile evaluating the same formula F

Decision-DNNF FBDD

A Simple Idea

0 1 0 1

0 1Decision-DNNF FBDD

G and H do not share variables, so every variable is still tested at most once on any path

But, what if sub-DAGs are shared?

0 10 1

Decision-DNNF

Conflict!

Obvious Solution: Replicate Nodes

No conflictApply the simple idea

But, may need recursive replicationCan have exponential blowup!

Main Idea: Replicate Smaller Sub-DAG

Edges coming from other nodes in the decision-DNNF

Smaller sub-DAG

Larger sub-DAG

Each AND-node creates a private copy of its smaller sub-DAG

Light and Heavy Edges

Smaller sub-DAG

Larger sub-DAG

Light Edge Heavy Edge

Each AND-node creates a private copy of its smaller sub-DAG

Þ Recursively each node u is replicated #times in a smaller sub-DAG

Þ #Copies of u = #sequences of light edges leading to u

Quasipolynomial Conversion

L = Max #light edges on any path

L ≤ log N

N = Nsmall + Nbig ≥ 2 Nsmall ≥ ... ≥ 2L

#Copies of each node ≤ NL ≤ Nlog N

We also show that our analysis is tight

#Nodes in FBDD ≤ N. Nlog N

Polynomial Conversion for k-DNFs

•L = #Max light edges on any path ≤ k – 1

•#Nodes in FBDD ≤ N. NL = Nk

Outline

•Conclusions

Separation Results

AND-FBDDDecision-DNNF

FBDDd-DNNF

• FBDD: Decision-DAG, each variable is tested once along any path

• Decision-DNNF: FBDD + decomposable AND-nodes (disjoint sub-DAGs)

Exponential Separation

Poly-size AND-FBDD or d-DNNF exists

Exponential lower bound on decision-DNNF size

• AND-FBDD: FBDD + AND-nodes (not necessarily decomposable) [Wegener’00]

• d-DNNF: Decomposable AND nodes + OR-nodes with sub-DAGs not simultaneously satisfiable [Darwiche ’01, Darwiche-Marquis ’02]

Outline

•Conclusions

Probabilistic Databases AsthmaPatien

Friend

Ann Joe

Ann Tom

Bob Tom

Smoker

Boolean query Q: x y AsthmaPatient(x) Friend (x, y) Smoker(y)

• Tuples are probabilistic (and independent)▫ “Ann” is present with probability 0.3

• What is the probability that Q is true on D?▫ Assign unique variables to tuples

• Boolean formula FQ,D = (x1y1z1) (x1y2z2) (x2y3z2)▫ Q is true on D FQ,D is true

0.30.1

0.51.0

0.90.5

Pr(x1) = 0.3

Probabilistic Databases

• FQ,D = (x1y1z1) (x1y2z2) (x2y3z2)

• Probability Computation Problem: Compute Pr(FQ,D) given Pr(x1), Pr(x2), ….

• FQ,D can be written as a k-DNF ▫ for fixed, monotone queries Q

For an important class of queries Q, we get exponential lower bounds on decision-DNNFs and model counting algorithms

Outline

•Conclusions

Summary

• Quasi-polynomial conversion of any decision-DNNF into an FBDD (polynomial for k-DNF)

• Exponential lower bounds on model counting algorithms • d-DNNFs and AND-FBDDs are exponentially more

powerful than decision-DNNFs

• Applications in probabilistic databases

Open Problems

• A polynomial conversion of decision-DNNFs to FBDDs?

• A more powerful syntactic subclass of d-DNNFs than decision-DNNFs?▫ d-DNNF is a semantic concept▫ No efficient algorithm to test if two sub-DAGs of an OR-node are

simultaneously satisfiable

• Approximate model counting?

Thank You

Questions?

Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases

Documents

Transcript of Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases

Java Algorithms for Computer Performance Analysis...A Java implementation of Asymptotic Bounds, Balanced Job Bounds and Geometric Bounds (as proposed in [6]), providing bounds on throughput,

Probabilistic and Logical Inference Methods for Model Counting and Sampling Bart Selman with Lukas Kroc, Ashish Sabharwal, and Carla P. Gomes Cornell University.

A Linear-Time Probabilistic Counting Algorithm for Database …dblab.kaist.ac.kr/Publication/pdf/ACM90_TODS_v15n2.pdf · 2017-04-03 · A Linear-Time Probabilistic Counting Algorithm

1 Scheduling Reserved Traffic in Input-Queued Switches: New Delay Bounds via Probabilistic Techniques Milan Vojnović EPFL Joint work with Matthew Andrews.

Side Channel Analysis Using a Model Counting Constraint ...bultan/courses/292C/soap16.pdf · Hive Data Warehouse Symbolic Execution Engines Worst -case Analysis (bounds) Side-channel

Probabilistic Bounds on the Impact of Potential Data ...€¦ · analysis. 1. Introduction Microgrids (MGs) are increasingly being introduced in distribution networks to promote the

Zhang Q - A probabilistic approach to k-mer counting

Model Counting: A New Strategy for Obtaining Good Bounds

Decision Tree, Linear-time Sorting, Lower Bounds, Counting ...sourav/Lecture-05.pdf · Decision Tree, Linear-time Sorting, Lower Bounds, Counting Sort, Radix Sort Lecture 5. L5.2

Stochastic comparisons and bounds for aging renewal ... · Pellerey (1994) has investigated some properties of shock models with different underlying counting processes. In contrast,

Data Stream Methods - Rutgers Universitymuthu/198-4.pdf · Probabilistic Counting • The approach of Probabilistic Counting, due to Flajolet & Martin, 1982, is a powerful method

1 Oblivious Bounds on the Probability of Boolean Functionshomes.cs.washington.edu/~suciu/file40_main.pdfweight is the probability of this truth assignment. Weighted model counting

New Probabilistic Bounds on the Impact of Potential Data Integrity … · 2019. 7. 3. · analysis. 1. Introduction Microgrids (MGs) are increasingly being introduced in distribution

William Bounds Spice and Nut Mills - Fante'sfantes.net/manuals/wm-bounds-instructions.pdf · Thank you for purchasing a William Bounds product. All William Bounds mechanisms are guaranteed

On the Complexity of Non-Overlapping Multivariate … the Complexity of Non-Overlapping Multivariate Marginal Bounds for Probabilistic Combinatorial Optimization Problems Xuan Vinh

Lecture 10: Sorting III: Sorting Lower Bounds, Counting ...

Chapter 5 Bounds on Performancelazowska/qsp/Images/Chap_05.pdf · mance bounds: asymptotic bounds and balanced system bounds. Asymp- totic bounds hold for a wider class of systems

Algorithms and Lower Bounds for Distributed Coloring Problemsgroups.csail.mit.edu/tds/seminars/f08/coloring.pdf• Proof Sketch: (simple probabilistic method proof) – Choose random

Probabilistic Counting with Stochastic Averaging … · Probabilistic Counting with Stochastic Averaging (PCSA)

COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,