Enumeration Algorithms - Introduction and Techniques · Enumeration Algorithms Introduction and...

39
Enumeration Algorithms Introduction and Techniques Andrea Marino Dipartimento di Informatica, University of Pisa Sapporo, Hokkaido University, Japan June 5th, 2017 Andrea Marino Enumeration Algorithms

Transcript of Enumeration Algorithms - Introduction and Techniques · Enumeration Algorithms Introduction and...

Enumeration AlgorithmsIntroduction and Techniques

Andrea Marino

Dipartimento di Informatica, University of Pisa

Sapporo, Hokkaido University, JapanJune 5th, 2017

Andrea Marino Enumeration Algorithms

The aim of enumeration is to list all the solutions of a problem.

Enumeration algorithms1 are particularly useful whenever the goalof a problem is not clear and all its solutions need to be checked.

1We consider enumeration and listing as synonyms.Andrea Marino Enumeration Algorithms

Uncertain Structural Analysis and Dynamical PropertiesFor instance in Biological Networks

Bias in the model: arc dependencies are neglected andunderlying hyper-graph behaviour forced in simple graph toavoid intractability.

Available biological network reconstructions are potentialnetworks, where all the possible connections are indicated.

Structural analysis should select among the structures.Efficiently listing all the structures and select a posteriori the good

ones from a biological point of view.

Andrea Marino Enumeration Algorithms

Definition

Let R ⊆ Σ∗ × Σ∗ be a binary relation. Given a string x ∈ Σ∗, wedefine Y (x) := y : (x , y) ∈ R. The following problems areassociated to relation R:

Decision: given x , decide whether Y (x) 6= ∅.Search: given x , output y ∈ Y (x) (if any).

Counting: given x , output |Y (x)|.Listing: given x , output all strings in Y (x) (if any).

Moreover:

R is polynomially balanced if for any (x , y) ∈ R|y | = poly(|x |).

R is polynomially decidable if given x , y there exists apolynomial algorithm deciding (x , y) ∈ R or not.

The class of decision problems associated to polynomially balancedpolynomially decidable relations is exactly NP.

Andrea Marino Enumeration Algorithms

Part I

Measuring the Efficiency

Andrea Marino Enumeration Algorithms

Worst case overall time

Bounding the total time with a function depending on the inputsize.

Usually bounds of the following kind, for some constant c

O(cn)

These bounds do not depend on the number of solutions.

In the case of Tomita et al. for maximal clique enumeration,they consider the graph with n nodes with maximum numberof cliques as the worst case.

Often, we have less solutions than the worst case and the numberof solutions is variable, from exponential to polynomial, e.g

st-paths in a graph vs tree.This motivates output sensitive enumeration.

Andrea Marino Enumeration Algorithms

Total Time

Bounding the total time with a function depending on the inputsize and the number of solutions.

A listing algorithm runs in polynomial total time if its timecomplexity is polynomial in the input size and the output size.That is:

O(αhnk)

with h, k constants, α is the number of solutions, n is the sizeof the instance

R.E. Tarjan: Enumeration of the elementary circuits of adirected graph.SIAM Journal on Computing (1973)

Andrea Marino Enumeration Algorithms

Enumerability

Measuring the average cost per solution.

A listing algorithm P-enumerates the solutions of a relation Rif the time required to output them is bounded by p(n)α,where p(n) is a polynomial in the input size n and α is thenumber of solutions to output. That is:

O(αnk)

L.G. Valiant: The complexity of computing the permanent.Theoretical Computer Science (1979)

Andrea Marino Enumeration Algorithms

Delay

Measuring the delay between consecutive solutions.

A listing algorithm has polynomial delay if it lists all solutions oneafter the other in some order, in such a way that:

it takes polynomial time in the input size before producing thefirst solution or halting.

after returning any solution, it takes polynomial time beforeproducing another solution or halting.

D.S. Johnson, M. Yannakakis, and C.H. Papadimitriou.: Ongenerating all maximal independent sets. InformationProcessing Letters (1988)

Andrea Marino Enumeration Algorithms

Part II

Techniques

Andrea Marino Enumeration Algorithms

What should our algorithm guarantee?

This is the Usual Correctness Theorem.

Theorem (CORRECTNESS PROTOTYPE)

All the solutions are listed

Just the solutions are listed

There are no duplicated solutions

Andrea Marino Enumeration Algorithms

Brute Force

Generate a new solution Y enlarging a partial solution X andcheck whether Y has been already generated.

The Generation process could lead to duplication.

Maintaining all the solutions in a database.

Andrea Marino Enumeration Algorithms

Recursive Binary Partition Scheme

Example: binary partition of the paths.Paths from s to t, i.e. Paths(G , s, t), are either the paths usingthe edge (s, f ) or the ones not using that edge.

s

e

z

f

dh

g

c j

i

a

b

l

k

t

To solve Paths(G , s, t), we reduce to Paths(G − e, s, t) andPaths(G − s, f , t)

Andrea Marino Enumeration Algorithms

Recursive Binary Partition Scheme

Binary partition checking for useful calls.

Algorithm 1: Paths(G , s, t,S)

Input: A graph G , the vertices s and t, a sequence of vertices SOutput: All the paths from s to t in Gif s = t then output S; return;choose an arc e = (s, f )if ∃(s, t)-path in G − e then Paths(G − e, s, t,S) ;if ∃(r , t)-path in G − s then Paths(G − s, f , t, 〈S , s〉) ;

Polynomial total time if the cost of a node is polynomial.

If the height of the recursion tree is polynomial then thealgorithm is polynomial delay.

Andrea Marino Enumeration Algorithms

Backtracking (’70 style)

Exploring all the ways of enlarging a partial solution:given a set S ⊆ U, a call returns all the solutionscontaining S .

The raw schema works well when any part of a solution is asolution. Otherwise it should be more sophisticated:

Bron-Kerboch Algorithms for Maximal Clique (1973).

Johnson Algorithm for listing paths (1975).

Andrea Marino Enumeration Algorithms

Backtracking for paths

Exploiting all the ways of enlarging a partial solutions, e.g. thepath s, f , e, d , g .

s

e

z

f

dh

g

c j

i

a

b

l

k

t

In the case of Johnson Algorithm, some vertices are blocked(e.g. f ) because they are in the prefix or some vertex in theprefix is blocking their only way to go to t.

Andrea Marino Enumeration Algorithms

Reverse Search (made in Japan)

Define parent-children relationship between solutions, thatinduce a tree and apply depth first search.

Cost per solution equal to cost per iteration.

The delay dominated by the cost of an iteration by usingalternative output

Otherwise returning from the recursive call is expensive.

Algorithm 2: ReverseSearch

output Sforeach child S ′ of S do

ReverseSearch(S ′)end

Andrea Marino Enumeration Algorithms

Analysis by Amortization (made in Japan)

Average cost per solution by studying recursion tree shape.

Some examplesAmortized costO(n)

n

n − 1

2

1 1

1

1

1

1

. . .

Amortized costO(n)

n

n − 1n − 2

1

Amortized costO(1)

n

n − 1

. . .

1

Basic amortization: if the height is n and each internal nodehas degree at least 2, then many leaves to charge.

Amortization by children: express the cost per node withrespect to the number of its children.

Push out amortization: if the children grow faster than thesum of their cost; or if there is sufficient quantity of output orchildren per node.

Andrea Marino Enumeration Algorithms

Algorithm & Analysis: Amortization with Certificate(made in Italy)

Recursive approach ensuring that each call is productive usinga data structure helping the decision, i.e. the certificate.The cost to update the data structure while recurring isamortized by the number of solutions.

Applied for instance for paths and cycles enumeration

Algorithm 3: Paths(G , u, t, S)

Input: A graph G , the vertices u and t, a sequence of vertices SOutput: All the paths from u to t in Gif u = t then output S ; return;for each useful neighbor v of u do

Paths(G − u, v , t, 〈S , u〉);end

Andrea Marino Enumeration Algorithms

The certificate are the biconnected components of the currentgraph updated while recurring.

s

e

z

f

dh

g

c j

i

a

b

l

k

t

The useful neighbors of s are the ones in the biconnectedcomponent of s leading to t, e.g. z is a useless neighbor for s.

Andrea Marino Enumeration Algorithms

s

e

f

dh

g

c j

i

a

b

l

k

t

The parent s will compute the biconnected components ofG − s for all its children in one shot.

When s is deleted, the recursion starting from f ,e,c ,a will usethe same decomposition.

Andrea Marino Enumeration Algorithms

s

e

f

dh

g

c j

i

a

b

l

k

t

When s is deleted, the just the first biconnected componentmodifies, we can reuse the others.

We can pay for this cost, since we have a lower bound on thenumber of solutions a branch will produce.

Andrea Marino Enumeration Algorithms

Measure & Conquer (made in Norway)

Given the instance I , defines µ(I ) as the measure.

Denote with T (µ) as an upper bound on the number of leavesof the search trees modeling the recursive calls of thealgorithm for all I with µ(I ) < µ.

Define rules reducing I and observing how the measure andT (µ) modifies on the new instances.

Andrea Marino Enumeration Algorithms

Part III

Positive Results

Andrea Marino Enumeration Algorithms

Enumeration of Enumeration Algorithms and ItsComplexity, by Kunihiro Wasa

Andrea Marino Enumeration Algorithms

Part IV

Negative Results

Andrea Marino Enumeration Algorithms

The LP class

#P is the set of the counting problems associated with thedecision problems in the set NP.

Let us call LP is the listing version for problems which are inNP.2

LP is the listing analogue of class #P for counting

2See the PhD Thesis by Marco Rospocher (University of Udine, Italy).Andrea Marino Enumeration Algorithms

A listing problem E ∈LP belongs to

Ptot if it admits a PTT listing algorithm3, i.e. O(αhnk)

Penu if E is P-enumerable (polynomial average cost per solution,(i.e. O(αnk)).

Pdel if E admits a polynomial delay algorithm.

Lemma

Pdel ⊆ Penu ⊆ Ptot ⊆LP

3This class is called EP by K. Fukuda (Note on new complexity classesENP, EP and CEP. Technical Report)

Andrea Marino Enumeration Algorithms

On the possibility to get output sensitive algorithms: onthe relationship between Ptot and LP .

In which cases can we do an algorithm whose cost is polynomial inthe size of the instance and in the number of solutions?

Listing versions of NP-complete problems cannot be solved withoutput sensitive algorithms unless P = NP.a

aOtherwise, you can use it to decide the problem. Indeed, let A be a listingoutput sensitive algorithm with cost O(αhnk), then you can let run A for Ω(nk):if it stops before there are no solutions, otherwise there is at least one solution.

Andrea Marino Enumeration Algorithms

Many problems whose decision problem is easy have a listingversion which is difficult.

1-valid CNF formula

1-valid CNF formula are the SAT formula that can be satisfiedmapping all the variables to 1.

Listing all the truth assignment of a 1-valid formula in outputsensitive fashion cannot be done unless P = NP.

The decision version is easy.

Maximal Independence Sets in a Set System

Given (E ,F ), with F ⊆ 2E , output each maximal set S in F (noother set in F contains S).

Listing all such S cannot be done in output sensitive unlessP = NP.

The decision version is easy.

Andrea Marino Enumeration Algorithms

Many problems whose decision problem is easy we do not knowwhether there exists or not an output sensitive algorithm.

Minimal Hitting Set

Also called minimal hypergraph traversal.

Consider a family S of sets E1, . . . ,Ek . A hitting set of S is aset H with the property that H intersects every Ei . H isminimal if any proper subset of H is no longer a hitting a set.

More than thirty years of research: no output sensitivealgorithm exists.

Several important contributions for example by Boros,Elbassioni, Makino, Gurvich, Khachiyan, Kratsch etc.

Relation with minimal dominating sets (Kante et al., SIAM J.Discrete Math. 2014).

Andrea Marino Enumeration Algorithms

Reducibility among listing problems: One-to-one reductions

A one-to-one certificates reduction f from an NP relation R1

to an NP relation R1 is a redution providing a one-to-onemapping between y : (x , y) ∈ R1 and z : (f (x), z) ∈ R2.If for R1 there cannot be an output sensitive algorithm andthere exists a polynomial one-to-one certificates reductionfrom R1 to R2 then for R2 there cannot be an output sensitivealgorithm.

Andrea Marino Enumeration Algorithms

For both counting and listing, one-to-one reduction are not alwayspossible:

Consider double-SAT (SAT formulas like F ∧ x , x, where xdoes not appear in F ): there is no one-to-one reductionbetween double-SAT and SAT formulas, since the number ofassignments of double-SAT is always even.

Also there is no one-to-one reduction for 3-colourability and4-colourability.

Andrea Marino Enumeration Algorithms

A more powerful reduction tool

R1 reduces to R2: let σ(x) be a function transforminginstances of R1 into instances of R2.

Given a solution in SolR2(σ(x)) we can find in output sensitiveway the corresponding solutions in SolR1(x).

The mapping between solutions is such that:

Creignou et al. LATA 2017

Andrea Marino Enumeration Algorithms

Creignou et al. LATA 2017

Getting a PTT algorithm for R1, if R2 has PTT algorithm

1 Transform the instance x of R1 into σ(x), instance of R2.2 Run a polynomial total time algorithm for σ(x) and for each

new y you get:

Compute the corresponding solutions for R1 of y and for eachof them check whether already generated.

It can happen to get duplicate z associated with different y .However z is associated with at most polynomially many y .

Andrea Marino Enumeration Algorithms

This is based on the declarative style reduction shown (for delaypurposes) in:

Nadia Creignou, Markus Krll, Reinhard Pichler, SebastianSkritek, Heribert Vollmer: On the Complexity of HardEnumeration Problems. LATA 2017: 183-195

Something similar has been used to prove relationship betweenDominating Sets and Hitting sets in:

Mamadou Moustapha Kante, Vincent Limouzy, Arnaud Mary,Lhouari Nourine: On the Enumeration of Minimal DominatingSets and Related Notions. SIAM J. Discrete Math. 28(4):1916-1929 (2014)

Andrea Marino Enumeration Algorithms

Open problems: relationships among the classes

Lemma

Pdel ⊆ Penu ⊆ Ptot ⊆LP

Which one is the relationship between Pdel , Penu, and Ptot?

Some problems have been shown to be in Ptot but not in Penu

unless P = NP.

Some results of completeness for problems in Pdel , not usingdeclarative-style reductions, but procedural-style reductions.

Andrea Marino Enumeration Algorithms

Open problems: how to prove lower bounds?

Can we lower bound the cost per solution? We are not able touse SETH conjecture for this.4

Theorem

Given an instance of size n for a problem P, the cost per solutionof P is Ω(p(n)) unless some CONJECTURE is false.

Can we lower bound the delay or the overall time?

4There is no algorithm for solving k-SAT in time O((2 − ε)n), where ε doesnot depend on k.

Andrea Marino Enumeration Algorithms

Thanks

Andrea Marino Enumeration Algorithms