The computation of hitting sets: Review and new algorithms

8
Information Processing Letters 86 (2003) 177–184 www.elsevier.com/locate/ipl The computation of hitting sets: Review and new algorithms Li Lin a,b,, Yunfei Jiang b a Department of Mathematics, Jinan University, 510632, Guangzhou, PR China b Institute of Computer Software, Sun Yat-sen University, 510275, Guangzhou, PR China Received 12 December 2001; received in revised form 20 November 2002 Communicated by H. Ganzinger Abstract In model-based diagnosis or other research fields, the hitting sets of a set cluster are usually used. In this paper we introduce some algorithms, including the new BHS-tree and Boolean algebraic algorithms. In the BHS-tree algorithm, a binary-tree is used for the computation of hitting sets, and in the Boolean algebraic algorithm, components are represented by Boolean variables. It runs just for one time to catch the minimal hitting sets. We implemented the algorithms and present empirical results in order to show their superiority over other algorithms for computing hitting sets. 2002 Elsevier Science B.V. All rights reserved. Keywords: Model-based diagnosis; Set cluster; Hitting set; BHS-tree; Boolean algebraic algorithms; Algorithms 1. Introduction A lot of theoretical and practical problems can be partly reduced to an instance of the minimal hitting set problem, or to one of its relatives such as the mini- mum set cover problem, especially in model-based di- agnosis and Reiter’s [1] first principles. Technical sys- tems that are composed of several components will probably cease to operate as designed. This discrep- ancy between the expected behavior of the system and the observed behavior of the system results from the malfunctioning of one or more components. The diag- nostic problem is to identify the faulty components re- sponsible for the malfunctioning of the system. When the faulty components are identified, they must be re- * Corresponding author. E-mail address: [email protected] (L. Lin). placed by non-faulty components in order to obtain a system that is correctly working. Thus, the conflict sets should be known, and an efficient method of trans- forming conflict sets into diagnoses should be pre- sented. 2. Backgrounds Reiter [1], who used the notion of minimal hitting sets, showed that diagnoses are minimal hitting sets of conflict sets. Hitting sets can be computed by so-called HS-trees. The problem with complete HS- trees is that the size of the tree grows exponentially with the size of the incoming collection of sets. Minimal hitting sets can be efficiently obtained by a pruned HS-tree. The idea is that subtrees producing only non-minimal hitting sets are pruned away. The construction of pruned HS-trees, however, is difficult 0020-0190/02/$ – see front matter 2002 Elsevier Science B.V. All rights reserved. doi:10.1016/S0020-0190(02)00506-9

Transcript of The computation of hitting sets: Review and new algorithms

l

introducee is usedvariables.in order

n a

Information Processing Letters 86 (2003) 177–184

www.elsevier.com/locate/ip

The computation of hitting sets: Review and new algorithms

Li Lin a,b,∗, Yunfei Jiangb

a Department of Mathematics, Jinan University, 510632, Guangzhou, PR Chinab Institute of Computer Software, Sun Yat-sen University, 510275, Guangzhou, PR China

Received 12 December 2001; received in revised form 20 November 2002

Communicated by H. Ganzinger

Abstract

In model-based diagnosis or other research fields, the hitting sets of a set cluster are usually used. In this paper wesome algorithms, including the new BHS-tree and Boolean algebraic algorithms. In the BHS-tree algorithm, a binary-trefor the computation of hitting sets, and in the Boolean algebraic algorithm, components are represented by BooleanIt runs just for one time to catch the minimal hitting sets. We implemented the algorithms and present empirical resultsto show their superiority over other algorithms for computing hitting sets. 2002 Elsevier Science B.V. All rights reserved.

Keywords:Model-based diagnosis; Set cluster; Hitting set; BHS-tree; Boolean algebraic algorithms; Algorithms

1. Introduction placed by non-faulty components in order to obtai

besetini-di-

ys-willrepandthe

ag-re-enre-

.

system that is correctly working. Thus, the conflict setsns-re-

gsetsbyS-

allys.a

cingheult

ll rig

A lot of theoretical and practical problems canpartly reduced to an instance of the minimal hittingproblem, or to one of its relatives such as the mmum set cover problem, especially in model-basedagnosis and Reiter’s [1] first principles. Technical stems that are composed of several componentsprobably cease to operate as designed. This discancy between the expected behavior of the systemthe observed behavior of the system results frommalfunctioning of one or more components. The dinostic problem is to identify the faulty componentssponsible for the malfunctioning of the system. Whthe faulty components are identified, they must be

* Corresponding author.E-mail address:[email protected] (L. Lin)

0020-0190/02/$ – see front matter 2002 Elsevier Science B.V. Adoi:10.1016/S0020-0190(02)00506-9

-

should be known, and an efficient method of traforming conflict sets into diagnoses should be psented.

2. Backgrounds

Reiter [1], who used the notion of minimal hittinsets, showed that diagnoses are minimal hittingof conflict sets. Hitting sets can be computedso-called HS-trees. The problem with complete Htrees is that the size of the tree grows exponentiwith the size of the incoming collection of setMinimal hitting sets can be efficiently obtained bypruned HS-tree. The idea is that subtrees produonly non-minimal hitting sets are pruned away. Tconstruction of pruned HS-trees, however, is diffic

hts reserved.

178 L. Lin, Y. Jiang / Information Processing Letters 86 (2003) 177–184

due to the fact that no unnecessary result is generated.Unnecessary subtrees are usually detected only after

blemr ofofwilllessor

etsas

he

soictal

et-alingbe

forphtsBy

ph,atedal

ts,lgo-

Wotawa [6] presented the HST-tree algorithm. Thisalgorithm constructs the tree in a unique manner where

thehehmarelues,a’sthe

tingem

eed,

old

ed

re

the entire subtree has been generated. The prois that Reiter’s method does not specify the ordethe incoming collection of sets. If the constructionthe tree starts with an unfavorable set, the methodalways produce some non-minimal results regardof whether the tree is constructed breadth-firstdepth-first. But, in a pruned HS-tree, the hitting smay be deleted by pruning; therefore, Greiner [2] hrevised the HS-tree into an HS-DAG in which tminimal hitting sets are not be deleted.

As we know, the set-covering problem is alrelated to computing hitting sets from minimal conflsets (Reggia [3]). In fact, the diagnosis is a minimset cover of the collection of conflict sets. The scovering problem is to find a solution for the minimsize. Therefore, algorithms to solve the set-coverproblem will not be useful if all diagnoses have tofound.

Haenni [4] presented a more general frameworkthe corresponding problem of computing hypergrainversions. The collection of minimal conflict seis considered as a hypergraph on conflict sets.sequentially eliminating the leaves of the hypergrait is guaranteed that no unnecessary result is generThe inversions of the hypergraph are the minimhitting sets.

Vinterbo [5] presented approximate hitting sei.e., the sets with weights. He also used genetic arithms for computing the hitting sets.

.

nodes are ordered from left to right depending onsize of their edge labels lying on the path from troot to the node. The implementation of the algoritcan be done in a straightforward way. Since arraysused, and almost all accesses are numeric index vaan efficient implementation is possible. In Wotawpaper, the index values are used for computingminimal hitting sets.

The BHS-tree (see Section 3) computes the hitsets with a binary-tree. It does not need to prune thaway, because the functionµ is used for the deletionof non-minimal conflict/hitting sets [4]. When somnew (possibly non-minimal) conflict sets are addthe new hitting sets can be computed based on thehitting sets and the new conflict sets.

3. BHS-tree

Definition 3.1 (Binary Hitting Set Tree, in brief BHS-tree). Given a minimal set cluster MCS= {C1,C2,

. . . ,Cn}, a BHS-tree is a recursive binary-tree definas follows: each node is a tuple(C,H), whereC andH are set clusters. The root node is denoted by (C =MCS, H = { }); the left and right children of a node adenoted by(Cl,Hl) and (Cr ,Hr), respectively. Thetree is defined recursively as follows:

(1) if C = { }, then the BHS-tree is an empty tree;

Fig. 1. BHS-tree (see Example 3.1).

L. Lin, Y. Jiang / Information Processing Letters 86 (2003) 177–184 179

f

Fig. 2. Computing hitting sets recursively with a BHS-tree.

Fig. 3. BHS-tree after〈5,6〉 is inserted in the conflict sets (Example 3.1 continued).

(2) else select any elementa ∈ ⋃Ci , (Cl = {Ci − Algorithm 3.1. Computing the minimal hitting sets o

{a} | a ∈ Ci}, Hl = {a}) and (Cr = {Ci | a /∈ Ci},

e:

a BHS-tree.

g3

n

Hr = { }).

In a BHS-tree, setC is denoted by〈 · 〉, while H isdenoted by[ · ].

Example 3.1. The BHS-tree of conflict sets{2,4,5},{1,2,3}, {1,3,5}, {2,4,6}, {2,4}, {2,3,5}, {1,6}is shown in Fig. 1. The minimal conflict sets ar{1,2,3}, {1,3,5}, {2,4}, {2,3,5}, {1,6}.

Step 1. If a node is leaf node, then the minimal hittinset of this node isH ; else run Steps 2 andrecursively.

Step 2. Replace every parent nodeH with {H, {ml ∪mr | ml ∈ Hl, mr ∈ Hr}}. Notice thatH may bethe empty set.

Step 3. MinimizeH at the root node with the functioµ until it comprises all minimal hitting sets [4].

180 L. Lin, Y. Jiang / Information Processing Letters 86 (2003) 177–184

This algorithm is shown in Fig. 2. A “*” next to aset indicates that the set is not a minimal hitting set

flicttheew

e of

be

ato

ead

eantedbe

ch

n

itsn

HF = h1h2 . . .hn.

al-op-

e

ofg

bythe

and will be eliminated in the next step.When a new measurement is added to the con

sets (Hou [8]), it is not necessary to compute againold conflict sets, but it is only necessary to add a nbranch to the BHS-tree. See Fig. 3, for an examplthis algorithm.

In the BHS-tree algorithm, the tree shouldpruned away instead of using the functionµ tominimize the conflict sets and hitting sets [4]. Ifconflict set is a minimal conflict set, it does not needbe pruned away, but it needs to be minimized instbefore the hitting sets are computed.

4. Boolean algorithm

If the components are represented as Boolvariables, and the minimal hitting sets are compuby Boolean properties, then the diagnoses cancomputed.

Definition 4.1 (Conflict-set Boolean formula(CSF)).The conflict setsCSare presented as CNFs where eaatom is negative.

Example 4.1. SupposeCS= {C1,C2, . . . ,Cm} is aconflict set cluster, whereCi = {ei1, ei2, . . . , ein}, i =1,2, . . . ,m. CSFis defined by the conflict-set Booleaformula:

CSF= (e11e12 . . . e1n1 + e21e22 . . . e2n2 + · · ·+ em1em2 . . . emnm

).

Definition 4.2 (Hitting-set Boolean formula(HF)).A hitting set H is presented as a disjunction ofelements.HF is defined by a hitting-set Booleaformula.

Example 4.2. H = {h1, h2, . . . , hn}, HF = h1h2 . . .

hn.

Theorem 4.1. Suppose CS= {C1,C2, . . . ,Cm} is a setcluster,H = {h1, h2, . . . , hn} is a set. CSF and HF aretheir Boolean formulas, respectively,

CSF= (e11e12 . . . e1n1 + e21e22 . . . e2n2 + · · ·+ em1em2 . . . emnm

),

If CSF and HF satisfy:

CSF· HS= (e11e12 . . . e1n1 + e21e22 . . . e2n2 + · · ·+ em1em2 . . . emnm

) · (h1h2 . . .hn) = 0.

ThenH is the hitting set of CS.

Theorem 4.1 can easily be proved with Booleangebraic distributive properties and partial order prerties [9] and is omitted here.

Example 4.3.

CS= {〈2,4,5〉, 〈1,2,3〉, 〈1,3,5〉, 〈2,4,6〉,〈2,4〉, 〈2,3,5〉, 〈1,6〉}, H = {1,2};

CSF= (245+ 123+ 135+ 246+ 24+ 235+ 16),

HF = (12);(245+ 123+ 135+ 246+ 24+ 235+ 16) · (12) = 0.

So,H is a hitting set ofCS.

Definition 4.3 (H(C) function). C is a Booleanformula, e is the negation ofe (both are atoms of thBoolean formula),H(C) is defined recursively:

(1) H(0) = 1,H(1) = 0;(2) H(e) = e;(3) H(e · C) = e +H(C);(4) H(e + C) = e ·H(C).

If C does not meet the four cases above, then:

(5) H(C) = e ·H(C1) +H(C2).

e is an arbitrary atom ofC,C1 ⊆ C ande /∈ C1; C2 ={c | c ∪ {e} ∈ C, or c ∈ C and e /∈ c}. (C,C1,C2 maybe Boolean formulas or equivalent sets,e may be anatom or its equivalent element.)

Theorem 4.2. Suppose CSF is a Boolean formulaCS, thenH(CSF) is a Boolean formula of the hittinset of CS.

Proof. The proof of Theorem 4.2 can be doneusing mathematical induction over the size ofconflict sets,k = |CS|.

L. Lin, Y. Jiang / Information Processing Letters 86 (2003) 177–184 181

Now, we give the proof of case (5). The other casesare straightforward.

.

fan

CS= {〈2,4,5〉, 〈1,2,3〉, 〈1,3,5〉, 〈2,4,6〉, 〈2,4〉,}

Suppose that, whenk � n, situation (5) is provedNow take k = n + 1. For an arbitrary elemente ∈⋃

S∈CSS,

H(CSF) = e ·H(CSF1) +H(CSF2),

CS1 ⊆ CS and e /∈ CS1;CS2 ∪ {e} ⊆ CS and e /∈ CS2,

there must exist|CS1| � n. (If |CS1| > n, then e /∈⋃S∈CSS, contradiction.) By assumption,H(CSF1) is

the hitting set ofCS1, and CS1 ⊆ CS and e /∈ CS1,e · H(CSF1) must be the hitting set ofCS. H(CSF2)

is the hitting set ofCS2, and, for allC ∈ CS2, theremust beC ∈ CS. Obviously,H(CSF2) is the hittingset of CS also. H(CSF) must be the hitting set oCS, the minimum property can be proved by Booleabsorption properties.✷Example 4.4. Suppose

〈2,3,5〉, 〈1,6〉 ;CSF= (245+ 123+ 135+ 246+ 24+ 235+ 16);H(CSF) = 2 ·H(135+ 16)

+H(45+ 13+ 46+ 4+ 35+ 135+ 16);2 ·H(135+ 16)

= 2 · (1+H(35+ 6))

= 2 · (1+ 36+ 56)

= 12+ 236+ 256,

H(45+ 13+ 46+ 4+ 35+ 135+ 16)

= 4 ·H(13+ 35+ 135+ 16)

= 4 · (1 ·H(35) +H(3+ 35+ 6))

= 4 · (1 · (3+ 5) + 3 ·H(6))

= 4 · ((13+ 15) + 36)

= 413+ 415+ 436.

heents, shown

Fig. 4. Comparison of the HS-tree (�) (the number|CS| is less than or equal to 7), BHS-tree(+) and Boolean algebraic algorithms (O). Tabscissa gives the number of conflict sets, the ordinate gives the running time. Each conflict set is composed of 1 to 20 random elemas Program 1.

182 L. Lin, Y. Jiang / Information Processing Letters 86 (2003) 177–184

So the minimal hitting sets are:

ep,ionededre,is

ee

)lgo-al-wnz,B,

Program 1.For i := 1 to MaxElementdo

heedsheis to.

msandces

{1,2}, {1,3,4}, {1,4,5}, {2,5,6}, {2,3,6}, {3,4,6}.The Boolean algebraic algorithm runs just for 1 stand it can get the minimal hitting sets by absorptproperties of Boolean formulas. So it does not neto be pruned away. This algorithm does not nea tree structure; it just needs a link data structuwhich is easier than a tree structure. The programalso simpler than that of the BHS-tree algorithm (sProgram 1).

5. Empirical results

We implemented the programs in ANSI C (UNIXand use them to compare the standard HS-tree arithms with the BHS-tree and Boolean algebraicgorithms (Jiang and Lin [7]). The results are shoin Figs. 4 and 5. (SGI 2200 Origin, CPU 4400 MHMIPS R12000 (IP27) processors, main memory 2 GOS IRIX 64 Release 6.5.)

BEGINr := random( );if odd(r) then i ∈ C;else i /∈ C;

END;

From the empirical results, we can obtain tconclusion that the Boolean algebraic algorithm neless memory or running time. It is not sensitive to tselection sequence of element (see Fig. 6). Thatsay, only|CS| and|⋃C∈CSC| influence the efficiency

6. Conclusions

The BHS-tree and Boolean algebraic algorithshare some properties. They are both recursively,will reduce one element in each step. The differenbetween them are:

heents.

Fig. 5. Comparison of the HS-tree (�) (the number|CS| is less than or equal to 8), BHS-tree(+) and Boolean algebraic algorithms (O). Tabscissa gives the number of conflict sets, the ordinate gives the running time. Each conflict set is composed of 1 to 15 random elem

L. Lin, Y. Jiang / Information Processing Letters 86 (2003) 177–184 183

d of 1

hich

Fig. 6. The selection of the maximum frequency (+, dash line), and the minimum frequency (O, solid line). Each conflict set is composeto 20 random elements. P2 CPU 667M, 128M main memory.

(1) Binary tree structures are necessary in the BHS- paper and their comments and suggestions, w

tree algorithm, while list structures are enough in

S-ndly.utean

il.

usthe

helped us to substantially improve the paper. Thisralural

l

ete),

msine

97;

the Boolean algebraic algorithm.(2) Two steps are critically necessary in the BH

tree algorithm: constructing the binary-tree ausing it to compute the hitting sets recursiveWhereas, only one step will be enough to compthe minimal hitting sets recursively in the Boolealgebraic algorithm.

In the future, we will improve the algorithms in detaThe above results are just for comparison.

Acknowledgements

The authors would like to thank the anonymoreferees and H. Ganzinger for carefully reading

work was partially supported by the China NatuScience Fund (NSF) and the Guangdong NatScience Fund.

References

[1] R. Reiter, A theory of diagnosis from first principles, ArtificiaIntelligence 32 (1987) 57–95.

[2] R. Greiner, B.A. Smith, R.W. Wilkerson, A correction to thalgorithm in Reiter’s theory of diagnosis (Research NoArtificial Intelligence 41 (1989) 79–88.

[3] J.A. Reggia, D.S. Nau, P.Y. Wang, Diagnostic expert systebased on a set covering model, Internat. J. Man-MachStud. 19 (1983) 437–460.

[4] R. Haenni, Generating diagnosis from conflict sets, 19http://www.aaai.org/.

184 L. Lin, Y. Jiang / Information Processing Letters 86 (2003) 177–184

[5] S. Vinterbo, A. Ohrn, Minimal approximate hitting sets and ruletemplates, Internat. J. Approximate Reasoning 25 (2000) 123–

.

t).

[8] A.M. Hou, A theory of measurement in diagnosis from firstprinciples, Artificial Intelligence 65 (1994) 281–328.

uc-97,

143.[6] F. Wotawa, A variant of Reiter’s hitting set algorithm, Inform

Process. Lett. 79 (2001) 45–51.[7] Y.F. Jiang, L. Lin, Computing minimal hitting set from firs

principles with BHS-tree, J. Software, to appear (in Chinese

[9] B. Kolman, C.B. Robert, S. Ross, Discrete Mathematical Strtures, 3rd edn., Prentice-Hall, Englewood Cliffs, NJ, 19pp. 250–252.