Checking query containment with the CQC method

Checking query containment with the CQC method

Carles Farre, Ernest Teniente *, Toni Urpı

Universitat Politecnica de Catalunya, 08034 Barcelona, Catalonia, Spain

Received 27 January 2004; received in revised form 19 July 2004; accepted 3 August 2004

Available online 11 September 2004

Abstract

We present the Constructive Query Containment (CQC) method to check query containment and query

containment under constraints for queries over databases with safe negation in both IDB and EDB sub-

goals and with or without built-in predicates. The aim of the CQC method is to construct a counterexample

that proves that the query containment relationship being checked does not hold. The method uses different

Variable Instantiation Patterns (VIPs) to generate only relevant counterexamples according to the syntactic

properties of the queries and the databases considered in each test.

The main contribution of the CQC method is threefold: it handles broader cases of queries and databaseschemas than most previous methods, it checks ‘‘true’’ containment instead of uniform containment (which

is a sufficient but not necessary condition for containment) and it is not less efficient than other methods for

the cases that they handle. Moreover, we prove also soundness and completeness of our method both for

success and for failure.

� 2004 Elsevier B.V. All rights reserved.

Keywords: Deductive databases; Query containment; Constraint management

1. Introduction

There are several reasoning tasks involving queries that are supported by DBMS�s. Amongthem, checking query containment is one of the most important. Roughly, a query is said to be

0169-023X/$ - see front matter � 2004 Elsevier B.V. All rights reserved.

doi:10.1016/j.datak.2004.08.002

* Corresponding author.

E-mail addresses: [email protected] (C. Farre), [email protected] (E. Teniente), [email protected] (T. Urpı).

www.elsevier.com/locate/datak

Data & Knowledge Engineering 53 (2005) 163–223

mailto:[email protected]



contained in another query when the answers that the former obtains as a result are a subset of theanswers obtained by the latter, for every database state.

Query containment checking is applied as a base reasoning technique in several contexts: queryoptimization [2,13,48], rewriting queries using views [14,26], detecting independence of queriesfrom database updates [36], constraint verification [25,35], etc. Therefore, it is fair to considerquery containment as one of the most fundamental database reasoning tasks.

We illustrate the query containment problem with the following example. Consider a databaseconsisting of two relations. Emp(E-NAME) indicates that E-NAME is an employee. WorksFor(E-NAME, SUP-NAME) indicates that an employee E-NAME works for another employee SUP-NAME. There are also two views. Boss(E-NAME) when E-NAME has someone else workingfor him/her. Chief(E-NAME) when E-NAME has some boss working for him/her.

CREATE VIEW Boss(E-NAME) as

SELECT SUP-NAME FROM WorksForCREATE VIEW Chief(EMP) as

SELECT WorksFor.SUP-NAME FROM WorksFor, Boss WHERE WorksFor.E-NAME =Boss. E-NAME

Then we define two different queries:

Qa: SELECT E-NAME FROM Boss

Qb: SELECT E-NAME FROM Chief

Clearly, Qa and Qb retrieve the same kind of information: employees occurring in the WorksForrelation. However, Qb is more restrictive than Qa, in the sense that Qb requires those employees tohave some boss working for them whereas Qa requires only to have someone else working forthem. Therefore, the answer that Qb obtains from the database will be always, that is, independ-ently of the concrete state of the database, a subset of the answer that Qa would obtain. Thus, Qb

is contained in Qa. This is commonly denoted by Qb v Qa. On the contrary, Qa is not contained inQb, Qa 6v Qb.

Reasoning on query containment will not be always as easy as in the case of the preceding que-ries. Consider the following two new queries:

Q1: SELECT E-NAME FROM Emp WHERE Emp.E-NAME NOT IN (SELECT E-NAME

FROM Chief)Q2: SELECT E-NAME FROM Emp WHERE Emp.E-NAME NOT IN (SELECT E-NAME

FROM Boss)

Intuitively, Q1 retrieves employees that are not Chief, that is, those employees that do not haveany boss working for them. Instead, Q2 retrieves employees that are not bosses, that is, thoseemployees that have nobody working for them. In this sense, Q1 is less restrictive than Q2 becauseQ2 does not allow anyone to work for E-NAME, while Q1 only applies this restriction to the onesthat are Boss. For these reasons, we say Q1 6v Q2 whereas Q2 v Q1.

164 C. Farre et al. / Data & Knowledge Engineering 53 (2005) 163–223

The more complex the queries are, the more difficult is to answer them. This holds always forhumans as well as for machines. Query complexity correlates with the expressive power providedby the query language that is used for formulating queries. Query languages define the distinctclasses to which the queries that are handled by each specific query containment checking methodbelong, as well as their differences in expressive power derive to different results on decidabilityand complexity bounds.

QC was first studied for the class of conjunctive queries [13,15,44]. QC of conjunctive querieswith order comparisons was studied in [31,36,42,50,51,56]. QC between unions of conjunctivequeries was addressed in [48]. Conjunctive QC with safe negated extensional atoms was investi-gated in [36,50,53].

The methods that deal with negated intensional subgoals can be classified into two different ap-proaches. The first one is taken by those methods that check QC for query classes where negationis used in a restrictive way [37]. The second approach is represented by those methods that do notcheck ‘‘true’’ QC but another related property called Uniform QC [36,47], which is a sufficient butnot necessary condition for QC [45].

The problem broadens when integrity constraints are taken into account. Integrity constraintsare part of the database schema and they are defined to hold for all instances of the database thatcorrectly model the real world. When considering integrity constraints, the containment relation-ship between two queries does not need to hold for any state of the database but only for thosethat satisfy the integrity constraints. Query Containment under Constraints checking was inves-tigated for conjunctive queries under integrity constraints expressing functional dependencies[2,30], inclusion dependencies [30], disjunctive datalog constraints [19], implication and disjunctivereferential constraints [52], disjunctive constrained tuple-generating dependencies [54] or objectdatabase schemas [10,38]. Query Containment for datalog queries, without negation, under integ-rity constraints expressing tuple-generating dependencies was addressed in [45,47] by taking theuniform containment approach. Query Containment under Constraints (QCuC) was also investi-gated in the context of hybrid systems combining conjunctive or conjunctive queries and con-straints expressed in a Description Logic language [4,9,29,34].

The main goal of this paper is to address ‘‘real’’ query containment in the absence/presence ofintegrity constraints in a rather rich setting, namely the one where both queries and integrity con-straints may contain the following features:

• Negation on extensional and intensional predicates.• Equality/inequality comparisons (=, 5).• Order comparisons (<, 6, P, >) interpreted on a either dense or discrete order domain.

Given this combination of features, the setting in which query containment is addressed sub-sumes those considered in the literature up to now, as we will see in Section 7.

Intuitively, the aim of the CQC method is to construct a counterexample that proves that thequery containment relationship being checked does not hold. The method uses different VariableInstantiation Patterns (VIPs), according to the syntactic properties of the queries and the data-bases considered in each test. It is worth to note that such a customization does not underminethe generality and uniformity of the calculus defined by the CQC method but it only affects theconcrete way in which the EDB facts to be part of the counterexample are generated. The aim

C. Farre et al. / Data & Knowledge Engineering 53 (2005) 163–223 165

here is to prune the search of possible counterexamples by generating only the relevant ones but,at the same time, without diminishing the requirement of completeness.

In the paper, we show that the CQC method is sound and complete in the following terms:Failure soundness. If the method terminates without building any counterexample, then con-

tainment holds.Finite success soundness. If the method builds a finite counterexample when queries contain no

recursively-defined derived predicates then containment does not hold.Failure completeness. If containment holds between two queries then the method terminates

reporting its failure to build a counterexample when queries contain no recursively-defined de-rived predicates.

Finite success completeness. If there exists a finite counterexample, the method finds it and ter-minates when either recursively-defined derived predicates are not considered or recursion andnegation occurs together in a strict-stratified manner [11].

Unfortunately, the high expressiveness of the queries and databases considered in the papermakes the problem of query containment undecidable in the general case [1]. Hence, the CQCmethod does not always terminate. More concretely, it does not terminate when there are no finitecounterexamples but infinite ones. However, if there is a finite counterexample our method finds itand terminates and if containment holds our method fails finitely and terminates.

Besides the issue of decidability, the paper tackles also the one of efficiency. In this sense, thepaper shows that the CQC method is not less efficient than other methods that deal with conjunc-tive queries with or without safe extensional negation, namely references [36,50,53]. It followsfrom these results that the CQC method improves previously proposed algorithms since it pro-vides an efficient decision procedure for known decidable cases and can also be applied for moregeneral forms of queries that were not handled by those algorithms.

Summarizing, the main contribution of the CQC method is threefold: it handles broader casesof queries and database schemas than most previous methods, it checks ‘‘true’’ containment in-stead of uniform containment and it is not less efficient than other methods for the cases that theyhandle. Moreover, we have shown also soundness and completeness of our method both for suc-cess and for failure.

The work reported here extends our previous work published in [22,24] in several directionsFirst, it is the first time we provide illustrative examples that focus on all relevant features ofthe method. Second, we provide a detailed and formal definition of the VIPs that allows to com-plete the formal definition of the method as well as the proofs regarding the correctness. Third, anexhaustive and detailed comparison of CQC method with the related ones already presented in theliterature is provided.

This paper is organized as follows. Section 2 states formally the framework within which thequeries and databases addressed in this paper will be defined. Section 3 is devoted to the descrip-tion of our method for checking query containment, such a description is introduced throughillustrative examples that focus on the main relevant features of the method. A detailed and for-mal definition of the method is then provided, together with the proofs regarding the correctnessof the method, in Sections 4 and 5. Decidability issues are discussed in Section 6.

In Section 7, the CQC method is compared with other algorithms and approaches that haveaddressed query containment in the presence or absence of constraints. Moreover, the chapteralso provides comparisons with methods that are addressed to similar problems in other research


areas, namely concept subsumption in Description Logics [9,17,28,29,41] and theorem proving inAutomated Reasoning [3,7,8].

2. Base concepts

This section sets the formal background required for the technical development of the rest ofthis paper. In particular, we recall some basic concepts and notation of logic programming, data-bases, queries and query containment. The reader is referred to [1,49] for a more detailed discus-sion of the notation and theory of those topics.

Throughout the paper, a,b,c,a1,b1, . . . are constants. The symbols X,Y,Z,X1,Y1, . . ., denotevariables. �a; �b;�c; �a1; �b1 denote sets of constants. X ; Y ; Z;X 1; . . . denote sets of variables.p,q, r,p1,q1, . . . are predicate symbols. A term is either a variable or a constant. If p is a n-ary pred-icate and T1, . . . ,Tn are terms, then p(T1, . . . ,Tn) is an atom, which can also be written as pðT Þwhen n is known from the context. An atom is ground if every Ti is a constant. An ordinary literalis defined as either an atom or a negated atom, i.e. :pðT Þ. A built-in literal has the form of A1 h A2,where A1 and A2 are terms. Operator h is either <, 6, >, P, = or 5.

Throughout this paper, a normal clause has the form

A L1 ^ � � � ^ Lm with m P 0

where A is an atom and each Li is a literal, either ordinary or built-in. All the variables occurringin A, as well as in each Li, are assumed to be universally quantified over the whole formula. A isoften called the head and L1 ^ � � � ^ Lm is the body of the clause.

A set of normal clauses is called a normal program. The definition of a predicate symbol r in anormal program P is the set of all clauses in P that have r in their head.

A normal program P is stratified if there is a partition P = P1 [ � � � [ Pn such that the followingtwo conditions hold for i = 1,2, . . . ,n:

1. If an atom rðT Þ occurs positively in the body of a clause in Pi then the definition of r is con-tained within Pj with j 6 i.

2. If an atom rðT Þ occurs negatively (as :rðT Þ) in the body of a clause in Pi then the definition of ris contained within Pj with j < i.

Likewise, a normal program P is hierarchical if there is a partition P = P1 [ � � � [ Pn such thatthe following condition hold for i = 1,2, . . . ,n: if an atom rðT Þ occurs positively or negatively inthe body of a clause in Pi then the definition of r is contained within Pj with j < i. Note that everyhierarchical program is stratified as well.

A level mapping of a normal program P is a mapping from its set of derived predicates to thenon-negative integers. The value of a derived predicate r under this mapping is the level of r. Ifthere is a partition P = P1 [ � � � [ Pn such that r is defined within Pk with 1 6 k 6 n, then the levelof r is k.

Terms, literals and the syntactic structures made of them, such as rule bodies, whole rules orfacts, are expressions. If E is an expression, then constants(E) and variables(E) are the sets contain-ing the constants and variables, respectively, occurring in E.


Likewise, a substitution h is a set of the form {X1/T1, . . . ,Xn/Tn}, where each variable Xi isunique and each term Ti is different from Xi. The term Ti is called a binding for Xi. h is calleda ground substitution if each Ti is a ground term, that is, a constant.

Likewise, let E be an expression and h = {X1/T1, . . . ,Xn/Tn} a substitution. Then Eh, the in-stance of E by h, is the expression obtained from E by simultaneously replacing each occurrenceof the variable Xi in E by the term Ti.

A fact is a normal clause of the form: pð�aÞ , where pð�aÞ is a ground atom. Less formally, pð�aÞalso may denote the fact pð�aÞ .

A deductive rule is a normal clause of the form: pðT Þ L1 ^ � � � ^ Lm with mP 1 where p is thederived predicate defined by the deductive rule.

A condition is a formula of the (denial) form: L1 ^ � � � ^ Lm with mP 1.A database schema S is a tuple (DR, IC) where DR is a finite set of deductive rules and IC is a

finite set of conditions. Literals occurring in the body of deductive rules and conditions in S areeither ordinary or built-in. The predicate symbols in ordinary literals range over the extensional

database (EDB) predicates, which are the relations that will be stored directly in the database,and the intensional database (IDB) predicates, which are the relations defined by the deductiverules in DR. EDB predicates cannot be derived. Conditions in IC define the integrity constraintsof the schema S.

Deductive rules as well as conditions are required to be safe, that is, every variable occurring inthe head or in negated or built-in atoms of their body must also occur in an ordinary positive lit-eral of the same body.

For a database schema S = (DR, IC), a database state, database instance, or just database, D is atuple (E,S) where E is an EDB, that is, a set of ground facts about EDB predicates. DR(E) de-notes the whole set of ground facts about EDB and IDB predicates that are inferred from a data-base state D = (E,S). DR(E) corresponds to the fixpoint model of DR [ E. Moreover, D can alsorefer to DR(E) itself.

A database D violates, does not satisfy, a condition L1 ^ � � � ^ Ln if there exists a ground sub-stitution h such that D � (L1 ^ � � � ^ Ln)h. In other words, when {L1h, . . . ,Lnh} � DR(E). A data-base D is consistent, or sound, when it violates no condition in IC.

A query Q is a finite set of deductive rules that define the same n-ary query predicate q. Withoutloss of generality, predicate symbols other than q occurring in ordinary literals in the bodies ofdeductive rules in Q are either EDB or IDB predicate symbols. A conjunctive query is a query withno IDB predicates. A non-recursive datalog query is a query that does not contain recursively-de-fined IDB predicates, that is when the query and the database involved are hierarchical. A datalog

query is a query that may involve recursive IDB predicates. A plain datalog query is a datalogquery without negation nor built-in predicates.

Let S = (DR, IC) a database schema and D = (E,S) a database. The answer to a query Q on D,written as AQ(D), is the set of all ground facts about q obtained as a result of evaluating the deduc-tive rules from both Q and IDB on E: AQðDÞ ¼ fqð�aÞ j qð�aÞðQ [ DRÞðEÞg. If the deductive rules inQ contain no IDB predicates then AQðDÞ ¼ fqð�aÞjqð�aÞ 2 QðEÞg.

A query Q1 is contained in an another query Q2 when the set of ground facts answering Q1 is asubset of the set of ground facts answering Q2, regardless of the current content of D. More for-mally, let Q1 and Q2 be two queries defining the same n-ary query predicate q. Q1 is contained in

Q2, written Q1 v Q2, if AQ1(D) � AQ2

(D) for any D.


When considering integrity constraints, the containment relationship between two queries mustnot hold for any database but only for the consistent ones. This idea is captured by the notion ofquery containment under constraints. Let Q1 and Q2 be two queries defining the same n-ary querypredicate q and IC be a finite set of conditions defining schema�s integrity constraints. Q1 is con-tained in Q2 wrt. IC, written Q1 vIC Q2, if AQ1

(D) � AQ2(D) for any consistent D.

3. Introduction to the Constructive Query Containment method

The containment relationship between two queries must hold for the whole set of possible dat-abases in the general case, or for those that satisfy the integrity constraints when these constraintsmust be enforced. A suitable way of checking query containment is to check the lack of contain-ment, that is, to find just one database D where the containment relationship being checked doesnot hold:

• Q1 is not contained in Q2, written Q1 6v Q2, if there is at least one database D such thatAQ1

(D) X AQ2(D).

• Q1 is not contained in Q2 wrt. IC, written Q1 6vIC Q2, if there is at least one database D satisfyingIC such that AQ1

(D)X AQ2(D).

Given two queries Q1 and Q2 defining the same n-ary query predicate q, the Constructive querycontainment method, CQC method for shorthand, is addressed to construct the extensional part(EDB) of a database for which the containment relationship does not hold.

The CQC-method requires two main inputs. The first one is the definition of the goal to attain,which must be achieved on the database that the method will attempt to obtain by constructing itsEDB. This goal is defined G0 = q1(X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn), meaning that it is required toconstruct a database where (X1, . . . ,Xn) could be instantiated in such a way that q1(X1, . . . ,Xn) istrue and q2(X1, . . . ,Xn) is false. Notice that predicate symbols q1 and q2 must substitute q in G0 aswell as in the head of deductive rules in Q1 and Q2 in order to distinguish properly their respectivesources.

The second main input for the CQC method is the set of conditions to enforce, which must notbe violated by the constructed database. This set is defined F0 = IC to prove that Q1 6vIC Q2,where IC is the set of conditions that defines schema�s integrity constraints. Otherwise, when prov-ing Q1 6v Q2, the initial set of conditions to enforce is empty.

Positive literals in the goal to attain define the information that must be made true by the data-base to be obtained. Since some of the predicates of these literals, like q1, are defined in terms ofother EDB/IDB predicates, they will be ‘‘unfolded’’ using the deductive rules that define thosepredicates, in an attempt to ‘‘reduce’’ the initial goal in terms of the underlying EDB facts thatcan make it true. These EDB facts will be the ones to be included in the EDB to be constructed.

Moreover, the initial goal and its unfolded derivatives may also contain negative literals.:q2ðX 1; . . . ;XnÞ, for instance, will always be one of such literals but other negative literals mayalso appear during the unfolding process. A negative literal �Li corresponds to a new condition Li that must be enforced to guarantee that the initial goal G0 remains satisfied. For instance, inthe particular case of �q2(X1, . . . ,Xn), it must be guaranteed that EDB facts required for making


q1(X1, . . . ,Xn) true satisfy also the condition q2(X1, . . . ,Xn), that is, they do not makeq2(X1, . . . ,Xn) true. In addition, condition enforcement may produce ‘‘collateral effects’’ in termsof new subgoals that must also be attained by the resultant database.

Therefore, the work performed by the CQC method can be seen as an interleaving of twoactivities:

1. including EDB facts in the EDB under construction to attain the initial goal as well as the onesgenerated in 2;

2. enforcing that the initial set of conditions as well as the ones ‘‘triggered’’ during 1 are not vio-lated by the constructed EDB.

The rest of this section is devoted to show, by means of several examples, how the CQC-methodoperates in different scenarios. In Section 3.1, the first example is used to explain step by step asuccessful application of the CQC-method to prove a non-containment case, from the initial prob-lem statement to the final resolution. Section 3.2 introduces the concept of Variable InstantiationPatterns (VIPs), which is a basic feature of the method. Section 3.3 presents an example thatshows clearly that query containment under constraints can be checked by the CQC method bysimply stating that the initial set of conditions to enforce, which is always empty when checking‘‘general’’ containment, is the set of integrity constraints defined for the database. Section 3.4shows how the CQC method proves either non-containment or containment by performing a sin-gle derivation when the Simple VIP is applicable. Finally, in Section 3.5, a new example illustratesthe application of two alternative VIPs, Dense Order VIP and Discrete Order VIP, when ordercomparisons occur in deductive rules defining queries and IDB relations.

3.1. Example of Q1 6v Q2

Recall the example presented in the introduction, but now using the logical notation defined inSection 2: we had two queries, Q1 and Q2 defined as follows:

Q1 ¼ fsubðX Þ empðX Þ ^ :chief ðX ÞgQ2 ¼ fsubðX Þ empðX Þ ^ :bossðX Þg

where emp is an EDB predicate and chief and boss are IDB predicates defined by a set DR ofdeductive rules

DR ¼fbossðX Þ worksForðZ;X Þchief ðX Þ worksForðY ;X Þ ^ bossðY Þg

where worksFor is another EDB predicate.Q1 is less restrictive than Q2 because Q2 does not allow anyone to work for X, while Q1 only

applies this restriction to the ones that are boss. Consider, for instance, a database containing justemp(joan) and worksFor(ann, joan). In this database, sub1(joan) is true but sub2(joan) is false be-cause chief(joan) is false, since boss(ann) is also false, whereas boss(joan) is true. Therefore, Q1 isnot contained in Q2. Conversely, if Q1 is less restrictive than Q2 then all the answers to Q2 will bealso answers to Q1. Therefore, Q2 is contained in Q1.


The 13 steps of a CQC-derivation that constructs an EDB that proves Q1 6v Q2 are shown inFig. 3.1. Each row on the figure corresponds to a CQC-node that contains the following informa-tion (columns):

1. The goal to attain. The literals that must be made true by the EDB under construction. Whenthe goal is [ ] it means that no literal needs to be satisfied. The initial CQC-node contains thegoal G0 = sub1(X) ^ �sub2(X). That is, we want the CQC method to construct a databasewhere exists at least a constant k such that both sub1(k) and �sub2(k) are true.

2. The conditions to be enforced. The set of conditions that the constructed EDB is required tosatisfy. Recall that a condition is violated whenever all of its literals are evaluated as true. Here,the initial CQC-node contains the set of conditions to enforce F0 =B.

Fig. 3.1. A successful CQC-derivation that proves Q1 6v Q2.


3. The EDB under construction. Initial CQC-Nodes always have empty EDBs.4. The conditions to be maintained. A set containing those conditions that are known to be satis-

fied in the current CQC-Node and that must remain satisfied until the end of the CQC-deriva-tion. Initial CQC-Nodes have always this set empty.

5. The account of constants introduced in the current and/or the ancestor CQC-nodes to instan-tiate the EDB facts in the EDB under construction. Initially, such a set contains always the con-stants appearing already in DR [ Q1 [ Q2 [ G0 [ F0.

The transition between two consecutive CQC-nodes, i.e. between an ancestor node and itssuccessor, is a CQC-step that is performed by applying a CQC-expansion rule on a selectedliteral of the ancestor CQC-node. There are two types of CQC-expansion rules, dependingon the source of the selected literal. If the selected literal comes from the goal part of theCQC-node, then an A#-rule is applied. Otherwise, if the source of the selected literal is oneof the conditions, then the CQC-expansion rule to apply is a B#-rule. Depending on the spe-cific characteristics of the selected literal (if it is EDB, derived or built-in; if it is negative orpositive; if it is ground or not) a concrete rule of the corresponding type is applied. TheCQC-method defines four distinct A#-rules and five distinct B#-rules, which are formalizedin Section 4. In Fig. 3.1, the CQC-steps are labeled with the name of the CQC-expansion rulethat is applied.

The selection of literals in the CQC-derivation of Fig. 3.1 is arbitrary. The selected literal ineach step is underlined. In general, the only necessary criterion for selecting a literal is to avoidchoosing a non-ground literal that is negative or built-in. If the queries and databases are safe,negative and built-in literals will always become fully grounded at some stage of the derivationafter first processing some positive literals.

The first step unfolds the selected literal, the IDB atom sub1(X) from the goal part, by substi-tuting it with the body of its defining rule. At the second step, the selected literal from the goalpart is emp(X), which is a positive EDB literal. To get a successful derivation, i.e. to obtain anEDB satisfying the initial goal, emp(X) must be true on the constructed EDB. Hence, the methodinstantiates X with a constant and includes the new ground EDB fact in the EDB under construc-tion. The procedure assigns an arbitrary constant to X, e.g. 0. So emp(0) is the first fact included inthe EDB under construction.

�chief(0) is the selected literal in step 3. To get success for the derivation, chief(0) must not betrue on the EDB. This is guaranteed by adding chief(0) as a new condition to be enforced. Step4 is similar to step 3, yielding sub2(0) to be considered as another condition to be enforced. Afterperforming this later step, we get a CQC-node with a goal like [ ]. However, the work is not doneyet, since we must ensure that the two conditions sub2(0) and chief(0) are not violated by thecurrent EDB. In other words, we must make both chief(0) and sub2(0) false.

Step 5 unfolds the selected literal chief(0) from one of the two conditions, getting works-

For(Y, 0) ^ �boss(Y) as a new condition that replaces chief(0). At least one of the two literalsof this condition must be false. In step 6, the selected literal is the positive EDB literal is works-For(Y, 0). Since it matches with no EDB atom in the EDB under construction, worksFor(Y, 0) isfalse and, consequently, the whole condition worksFor(Y,0) ^ �boss(Y) is not violated by thecurrent EDB. For this reason, such a condition is moved from the set of conditions to enforceto the set of conditions to maintain.


Step 7 unfolds the selected IDB atom sub2(0) from the remaining condition to enforce. TheEDB atom emp(0) is the selected literal in step 8. Since emp(0) is also present in the EDB underconstruction, it cannot be false. So this literal is dropped from the condition because it does nothelp to enforce the condition. In step 9 the selected literal is the negative literal �boss(0). Since it isthe only literal of the condition, it must be made false necessarily. So boss(0) becomes a new (sub)-goal to achieve and is transferred, thus, to the goal part.

Step 10 unfolds the selected literal boss(0) from the goal part as in step 1. worksFor(Z, 0) is theselected literal in step 11. As in step 2, the method should instantiate Z with a constant. In thiscase, the chosen constant is 1, so worksFor(1,0) is added to the EDB under construction. More-over, the condition worksFor(Y, 0) ^ �boss(Y) is moved back to the set of conditions to enforceto avoid that the new inclusion of worksFor(1,0) violates it.

In step 12, the selected literal is the positive EDB literal is worksFor(Y, 0) from the remainingcondition to enforce. Now, it matches with the current contents of the EDB with Y = 1. As in step8, such a literal is dropped from the condition. However, the whole condition works-

For(Y,0) ^ �boss(Y) is moved again to the set of conditions to maintain in order to prevent fur-ther inclusions of new facts about worksFor from violating it.

Step 13 unfolds the selected literal boss(1). In step 14, the selected literal is the only literal of theremaining condition: worksFor(Z, 1). This EDB atom does not match the content of the currentEDB = {emp(0),worksFor(1,0)}. This means that worksFor(Z, 1) is false and, thus, the formercondition sub2(0) is not violated.

The CQC-derivation ends successfully since it reaches a CQC-node where the goal to attain is[ ] and the set of conditions to satisfy is empty. In other words, the resulting EDB,{emp(0),worksFor(1,0)}, contains a set of facts that makes the database satisfy the goals andconditions of all preceding CQC-nodes, including, naturally, the first CQC-node. Then the con-clusion is Q1 6v Q2.

3.2. Variable instantiation patterns

Before introducing more examples, it is necessary to review more accurately how the CQC-method assigns constants to variables in order to obtain ground EDB facts to be included inthe EDB under construction. Regarding to the CQC-derivation depicted in Fig. 3.1, the questionis which criterion determined the election of the constants used to instantiate variables in steps 2and 11.

In step 2, the selected literal was the non-ground EDB atom emp(X). The constant 0 was as-signed to X to get a ground fact to be included in the EDB. The election of such a constant iscompletely arbitrary. This is possible because (1) X is the first variable instantiated in the deriva-tion and (2) no constant appears in the initial goal G0 or in the set of deductive rules to consider.Moreover, no built-in predicate as < or 6 is present in G0 nor in the database�s deductive rules. Sothere is no need that the constant assigned to X belongs to a domain with a defined order relation.However, for the sake of simplicity, a value from the natural number domain is selected and 0 wasarbitrarily considered as the first.

A second variable instantiation is required in step 11 in the CQC-derivation from Fig. 3.1. Theselected literal was worksFor(Z, 0) and the constant 1 was assigned to z. In this case, there was twopossible alternatives to consider: either ‘‘reuse’’ 0 and make Z = 0 or select any other constant,


say 1. As it is shown in Fig. 3.1, the CQC-derivation opted for the second alternative. Fig. 3.2shows how the derivation would have continued if 0 had been selected instead of 1 in step 11.

Steps 12–13 in this alternative CQC-derivation are performed like steps 12–13 in the derivationfrom Fig. 3.1. In contrast, there is no step 14 because no CQC-expansion rule is applicable on theCQC-node obtained after performing step 13. worksFor(Z, 0) is the only literal eligible by themethod at this stage. B2 is the candidate CQC-expansion rule to be applied on this EDB atom,as it was in step 14 in the previous CQC-derivation. Recall that the B2-rule tries to match the se-lected EDB atom with the current content of the EDB, in order to determine if such an atom isalready true of false. In this particular case, worksFor(Z, 0) can clearly be unified with works-

For(0,0) in the EDB. However, B2-rule is never applied when the selected literal unifies withthe EDB and there is not any other literal in the condition that could be made false. In thisway, the derivation ends unsuccessfully since it remains a condition, worksFor(Z, 0), which isobviously unsatisfiable. In this way, the selection of 0 instead of 1 in step 11 leads to an EDB,{emp(0), worksFor(0,0)}, where both queries, Q1 and Q2, obtain the same empty answer.

When a CQC-derivation terminates successfully, there is a proof, the constructed database,which shows that the containment relationship is not true. On the contrary, when a derivationends unsuccessfully, it fails, the conclusion that containment holds cannot be established on thebasis of a single result. Then the question is how many derivations must be considered beforeachieving a reliable conclusion. Indeed, rather than the account of all possible derivations, whichmay depend also on the order with which literals are selected, the real point is to identify howmany variable instantiation alternatives must be considered when adding new facts to the EDBunder construction. Since the set of possible and alternative constant assignments to instantiatethe EDB facts is what determines the set of different EDBs that can be constructed.

The aim of the CQC method is to test only those variable instantiations that are relevant with-out losing completeness. The ‘‘strategy’’ for instantiating the EDB facts to be included in the EDBunder construction is connected to, indeed it is inspired by, the concept of canonical databasesfound in [31,36,44,50]. This concept is based on the idea that it is not necessary to check the whole(infinite) set of possible EDBs to prove containment but only a (finite) subset of them, the setof canonical EDBs. In this way, if it is proved that a containment relationship holds on any

Fig. 3.2. A failed CQC-derivation with an alternative variable instantiation.


canonical EDB, then query containment holds for any EDB. The soundness of this approach isguaranteed by proving that any possible EDB is represented by one canonical EDB and that thiscorrespondence preserves the containment relationship.

Returning to the example, other different constants, say 2, 5, or 1000, could be considered in-stead of 1 to be assigned to z when instantiating worksFor(Z, 0) in step 9 in the CQC-derivationshown in Fig. 3.1. However, they all would lead to an equivalent EDB. The real options in thisstep are either to select the former introduced constant 0 or to use a new one, regardless of itsconcrete value.

The approach based on canonical databases was initially applied to check query containmentbetween conjunctive queries containing only EDB and/or built-in atoms [31,44,50]. In these cases,the whole set of canonical databases is bounded a priori and it is generated easily before perform-ing the containment tests. The containment tests performed with the CQC-method end as soon asa successful CQC-derivation leading to a canonical counterexample is found. It is only in theworst case, when no counterexample exists, when the complete tree of failed CQC-derivationshas to test every canonical database. However, even in this case, recreating completely eachcanonical database may not be always required before discarding it since, for instance, it is earlydetected that some condition is violated with no possible repair.

Such a ‘‘dynamic-canonical’’ approach is more suitable for dealing with negated derived atoms,since it is more difficult or impossible to determine a priori the whole set of relevant EDBs to betested in these cases. In this way, the CQC method can be considered an extension of the canon-ical-database approach beyond the class of conjunctive queries.

Since the canonical databases to be taken into account depend on the concrete subclass of que-ries that are considered, four different variable instantiation patterns, VIPs for shorthand, are de-fined. Each one of them determines how EDB facts can be instantiated in order to be added to theEDB under construction. The following four VIPs are formalized in Section 4.4: Simple VIP,Negation VIP, Dense Order VIP and Discrete Order VIP.

The Simple VIP is applicable when checking query containment in the absence of integrity con-straints. Moreover, the deductive rules defining query predicates as well as IDB predicates mustsatisfy the following conditions: they must not have any negative or built-in literal 1 in their rulebodies; they must not have constants in their heads; and they must not have any variable appear-ing twice or more times in their heads. According to the Simple VIP, each distinct variable isbound to a distinct new constant.

The Negation VIP is applicable when checking containment in the presence of integrity con-straints and/or negated IDB subgoals, negated EDB subgoals and/or (in)equality comparisons(=, 5). In any case, order comparisons (<, 6, >, P) are not allowed. The EDBs generatedand tested with this VIP correspond to the canonical EDBs considered in [36,50] for the conjunc-tive query case with negated EDB subgoals. The intuition behind this VIP is clear: eachnon-ground variable appearing in a EDB fact to be included in the EDB under construction isinstantiated with either some constant previously used or a constant never used before. This isthe pattern used in the CQC-derivations shown in Figs. 3.1 and 3.2.

1 However, note that if a deductive rule has a literal of the form Z = k or Z = X in its body such that Z does not

appear in the head, then that literal can be omitted by replacing each occurrence of Z in the rule body by k or X,

respectively.


The other two VIPs, Dense Order VIP and Discrete Order VIP, are applied when there are or-der comparisons (<, 6, >, P) in the deductive rules, with or without negation. In this case, eachdistinct variable must be bound to a constant according to either a former or a new location in thetotal linear order of constants introduced previously [31,36,40,50]. The election between to applyeither the Dense Order VIP or the Discrete Order VIP depends on whether the comparisons areinterpreted on a dense order domain or on a discrete order domain.

Section 3.5 provides a concrete example that illustrates how the selection between either thedense order interpretation or the discrete one may determinate the final result of a containmenttest.

3.3. Example with integrity constraints

This section shows how the CQC method tackles the problem of checking query containmentunder constraints in a uniform way, without needing to add any extra feature to the functioning ofthe method when integrity constraints are not considered. Indeed, the CQC method in both casesis the same and the only difference stays in whether the initial set of constraints to enforce is emptyor it contains the set of integrity constraints defined for the database.

Consider again the same database schema and queries used in Section 3.1. Now two integrityconstraints are defined in order to avoid that people who work for someone or have someoneworking for them are not employees:

IC ¼f worksForðX ; Y Þ ^ :empðY Þ ðIc1Þ worksForðX ; Y Þ ^ :empðX Þ ðIc2Þg

Notice that the EDB constructed by the CQC-derivation that proved Q1 6v Q2 in Section 3.1,{emp(0),worksFor(1,0)}, does not satisfy the integrity constraints that have been just re-intro-duced. In particular, the EDB violates Ic2 because emp(1) is not true. In this particular case, thisviolation can be repaired without violating Ic1 by just including emp(1) in the EDB. Then theresultant EDB is {emp(0), worksFor(1,0), emp(1)}, proving that Q1 6v Q2 can be true without vio-lating any integrity constraint. In other words, Q1 6vIC Q2.

Obviously, goal satisfaction and constraint enforcing are not always compatible and, thus,many times no repair will be able to satisfy both simultaneously.

A CQC-derivation that constructs an EDB proving Q1 6vIC Q2 is partially shown in Fig. 3.3.Steps 1–14 in this CQC-derivation correspond to the same steps of the CQC-derivation in Section3.1, shown in Fig. 3.1. After processing step 14, the goal sub1(X) ^ �sub2(X) has been attainedwith the EDB = {emp(0),worksFor(1,0)}, but Ic1 and Ic2 still wait for enforcement. In addition,the set of constrains to maintain after performing step 14 is C13 = { worksFor(Y, 0) ^Boss(Y), worksFor(Z, 1)}.

In step 15, the selected literal is worksFor(X,Y) from condition Ic1. Since worksFor(X,Y) al-ready matches with the current content of the EDB, {emp(0),worksFor(1,0)}, the derivation looksfor another literal to be failed. Moreover, the condition worksFor(X,Y) ^ �emp(Y) is added tothe condition set to be maintained. This is done to enforce the condition in case that new factsabout worksFor are included in the EDB further on.


In step 16, the selected literal is �emp(0). In order to enforce Ic1, emp(0) must be made true.This is achieved by making emp(0) a new (sub)goal to attain. Since emp(0) already belongsto the EDB, emp(0) is true (step 17) and, consequently, Ic1 is not violated.

In step 18, the selected literal is worksFor(X,Y) from Ic2, which unifies with worksFor(1,0) inthe EDB. Since further fact inclusions in the EDB may also unify with worksFor(X,Y), the wholecondition Ic2 is recorded in the set of conditions to maintain.

In step 19, the selected literal is �emp(1). In order to ensure that Ic2 is satisfied, emp(1) must bemade true by making emp(1) a new (sub)goal to attain. Step 20 adds emp(1) on the EDB. There-fore, emp(1) becomes true and, consequently, Ic2 is enforced.

Since both Ic1 and Ic2 are satisfied, the CQC-derivation ends successfully with the resultantEDB = {emp(0),emp(1),worksFor(1,0)}.

3.4. Example with the simple variable instantiation pattern

The purpose of the following example is to show how the CQC method operates when the Sim-ple Variable Instantiation Pattern is applicable. The difference with respect to previous examplesstays in the way in which EDB facts are instantiated. Let A and B be two (conjunctive) queries:

A ¼ fpðX ; Y Þ rðX ;W Þ ^ bðW ;ZÞ ^ rðZ; Y ÞgB ¼ fpðX ; Y Þ rðX ;W Þ ^ bðW ;W Þ ^ rðW ; Y Þg

where r and b are the EDB relations stored in the database.

Fig. 3.3. A fragment of a successful CQC-derivation that proves Q1 6vIC Q2.


A successful CQC-derivation that proves A 6v B is depicted in Fig. 3.4. The final EDB isobtained by assigning to each distinct variable occurring in the body of A a distinct constant,according to the use of the Simple VIP. The resultant EDB facts are added to the EDB.It is straightforward to see that pA(0,3) is true whereas pB(0,3) is not on the constructedEDB.

The application of the Simple VIP does not offer many alternatives when instantiating varia-bles. Indeed, the only alternative to consider is to assign a new constant each time that a distinctvariable must be instantiated. This explains why, when applying the Simple VIP, a single failedCQC-derivation suffices to prove that the containment relation to be refuted holds actually. Thisis the case of the CQC-derivation shown in Fig. 3.5, which proves B v A.

Containment of conjunctive queries, like A and B, is a well-known problem that can be solvedwith the method of Containment Mappings [13,49]. According to this method, to prove B v A isnecessary to show that there is a mapping, an isomorphism, from the variables of A to variables ofB. Such a mapping must show that the head of A becomes the head of B and each subgoal of Abecomes some subgoal of B (it is not necessary that every subgoal of B is the target of some sub-goal of A). In the particular case of A and B, there exists a containment mapping from A to B:XA! XB (variable X of A maps to variable X of B), YA! YB, WA!WB and ZA!WB. Con-sequently, B v A.

The other ‘‘classical’’ approach, due to [44], for checking B v A is to construct a canonical EDBby ‘‘freezing’’, that is, assigning to each distinct variable a distinct constant, the EDB subgoals inthe body of B. The EDB D = {r(0,1), b(1,1), r(1,2)}, for instance, is a valid canonical EDB con-structed from the body of B by using the following constant assignments: 0! XB, 1!WB and2! YB. The frozen head of B is pB(0,2), according to that constant assignments. Since

Fig. 3.4. A successful CQC-derivation using the Simple VIP.


pA(0,2) 2 A(D) = {r(0,1),b(1,1), r(1,2),pA(0,2)}, then the canonical database method concludesthat B v A.

Although the constructed canonical EDB D fails as a counterexample that proves B 6v A, it be-comes indeed an indirect proof of the existence of a containment mapping from the variables of Ato the variables of B. When A is evaluated on D, pA(0,2) is obtained as a result of performing thefollowing variable instantiations: XA! 0, WA! 1, ZA! 1, and YA! 2. In this way, the con-stants of D allow to relate the variables of A with the ones of B, so their mapping follows easilyXA! 0! XB, WA! 1!WB, ZA! 1!WB, and YA! 2! YB.

Finally, note that the CQC method, in conjunction with the Simple VIP, acts just as a variant ofthe method of the canonical databases when it is used to check containment between conjunctivequeries: A2-rule applications in the CQC-derivations construct the canonical EDB with the sub-goals of the first query, whereas B2-rule applications perform the evaluation of the second queryon the constructed EDB.

3.5. Example with the dense order and discrete order variable instantiation patterns

The simple example used in this section is adapted from [36]. Let C and D be two queries

C ¼ fqðX Þ pðX Þ ^ X < 10gD ¼ fqðX Þ pðX Þ ^ X < 20 ^ X P 5g

where p is an IDB predicate defined by a set DR of deductive rules:

DR ¼ fpðX Þ eðX Þ ^ X > 4gwhere e is an EDB predicate. Suppose also that no integrity constraint is defined.

Fig. 3.5. A failed CQC-derivation with the Simple VIP. It suffices to prove B v A.


As usual, the objective is to prove C 6v D is to construct an EDB that satisfies the goalG0 = qC(X) ^ �qD(X).

Clearly, the EDB satisfying G0 should contain a ground fact about e. Therefore, the problemhere is to find the right constant to instantiate this fact. In this case, either the Dense OrderVIP or the Discrete Order VIP are applicable. The EDBs generated and tested with these VIPsresemble the canonical EDBs considered in [31,36,40,50] for the conjunctive query case with orderconstraints. Each canonical EDB, called representative in [31], represents a different allowablearrangement of variable instantiations according to the total order relationship of the consideredvalue domain.

The difference between considering either a discrete order or a dense order is manifested whenthe variable instantiation procedure attempts to introduce a new constant knew ‘‘between’’ two‘‘old’’ constants ki and ki+1 such that ki < knew < ki+1. On a dense order interpretation, this is al-ways possible since there are infinite many spaces to fill up between any pair of numbers. On thecontrary, only a certain finite number of new constants can be placed between two constants on adiscrete domain.

For this reason, the CQC method considers two different VIP according to the selected compar-ison interpretation. The Dense Order VIP is applied when order comparisons are interpreted ona dense order domain such as rational and real numbers. Moreover, this VIP is also suitableon a discrete domain interpretation when there is none or just one discrete constant inR = DR [ C [ D[[IC], where DR is the set of deductive rules defining the IDB relations. The rea-son is that only one discrete constant in R, or none, does not determine by itself an interval thathas to know how many constants can be placed inside before including a new one.

The Discrete Order VIP is applied when order comparisons are interpreted on a discrete orderdomain such as integers. It is required when there are two or more discrete constants inR = DR [ C [ D[[IC].

In the example, the range of possible values for X to be tested will be determined by the initialset of constants appearing in R, {4, 5, 10, 20}, and the intervals that they define.

When the discrete domain of integers is applied, the total set of constants that should be testedto instantiate e(X) is {d1,4,5,d2,10,d3,20,d4}. Each di is a virtual constant [40] whose value is notdetermined by a numeric quantity but its relative position in the total linear ordering. In this way,d1 < 4,5 < d2 < 10,10 < d3 < 20 and 20 < d4. Notice that there cannot be any integer virtual con-stant dj such that 4 < dj < 5. Unfortunately, none of these constants, virtual or not, leads to a suc-cessful EDB. Therefore, C v D in this example provided that the values used to instantiate theEDB facts range over the integer domain.

Note that virtual constants never can have absolute values in the discrete case. For instance,consider the following set of integer constants: {1,d1,4}, where d1 is a virtual constant such that1 < d1 < 4. Hence, the two possible absolute integer values that d1 can take are 2 and 3. There isroom for another distinct virtual constant, d2, inside that interval [1,4], with two alternative place-ments: 1 < d2 < d1 and d1 < d2 < 4.

According to the first alternative, d2 and d1 would correspond to the integer values 2 and 3,respectively. According to the second alternative, d2 and d1 would interchange those values.Therefore, the possible absolute value of d1 after placing d2 may be either 2 or 3, depending onwhere d2 is actually placed. In general, the possible set of absolute values that might correspondto a virtual constant placed into an absolute interval [k,q], will vary in the course of time, as new


virtual constants are included into the interval and depending on where those new virtual con-stants are placed.

Returning to the example, if the dense interpretation is taken for the order comparisons, thenC 6v D. In this case, the total set of constants to be considered is {k1,4,k2,5,k3,10,k4,20,k5}. As inthe discrete case, each ki may be a virtual constant, but now the difference is that the method canplace one of them, k2, between 4 and 5.

Another difference with respect to the discrete case is that the ‘‘dense’’ virtual constants may cor-respond to an absolute value that will remain unchanged. Indeed, virtual constants can be replacedby absolute values. In this way, for the sake of simplicity, k1 = (system_min_value + 4)/2,k2 = 4.5 = (4 + 5)/2, k3 = 7.5 = (5 + 10)/2, k4 = 15 = (10 + 20)/2 and k5 = (20 + system_max_value)/2.

Taking those values into account, the only successful CQC-derivation is the one that leads to anEDB containing just e(4.5). Therefore, the conclusion is C 6v D provided that the values used toinstantiate the EDB facts range over a dense domain. Fig. 3.6 shows this successful CQC-deriva-tion. Since 4 < 4.5 < 5, this explains why C v D when a discrete domain is considered.

Fig. 3.6. A successful CQC-derivation that applies the Dense Order VIP.


In steps 4 and 5 in the CQC-derivation shown in Fig. 4.9, the selected literals are 4.5 > 4 and4.5 < 10, respectively. Since these order expressions are fully instantiated, a simple evaluation isjust performed to determine whether or not they are true. In this case, both literals are evaluatedtrue and, thus, the CQC-derivation continues.

In steps 10 and 11, the selected literals are 4.5 > 4 and 4.5 < 20, respectively. As before, theseexpressions are evaluated. Since both are true, they cannot make the CQC-derivation succeedyet. Conversely, the selected literal in step 12, 4.5 P 5, is evaluated false. It follows that qD(4.5)is false and thus C 6v D. Note that the derivation would have ended successfully earlier if the lit-eral 4.5P 5 had been selected after unfolding qD(4.5) in step 7.

4. Formalization of the CQC method

Let S = (DR, IC) be the database schema and Q1 and Q2 two queries defining the same n-aryquery predicate q. If the CQC method performs a successful CQC-derivation from( q1(X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn) B B B K) to ([ ] B T C K 0) then Q1 6v Q2, where K is theset of constants appearing in DR [ Q1 [ Q2. Moreover, if the CQC method performs a successfulCQC-derivation from ( q1(X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn) I C B B K00) to ([ ] B T C K000) thenQ1 6vIC Q2, where K00 is the set of constants appearing in DR [ Q1 [ Q2 [ IC.

CQC-derivations start from a 5-tuple (G0 F0 T0 C0 K0) consisting of the goal G0 = q1(X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn), the set of conditions to enforce F0 =B or IC, the initially-empty EDBT0 =B, the empty set of conditions to maintain C0 =B and the set K0 of constant values appear-ing in DR [ Q1 [ Q2[[IC].

A successful CQC-derivation reaches a 5-tuple (Gn Fn Tn Cn Kn) = ([ ] B T C K 0), where theempty goal Gn = [ ] means that the initial goal G0 is attained. The empty set Fn =B means thatno condition is waiting to be satisfied. Tn = T is an EDB that satisfies G0 as well as F0. Cn = Cis a set of conditions recorded along the derivation and that T also satisfies. Kn = K 0 is the setof constant values appearing in DR [ Q1 [ Q2[[IC] [ T.

On the contrary, if every ‘‘fair’’ CQC-derivation starting from ( q1(X1, . . . ,Xn) ^�q2(X1, . . . ,Xn) B [[IC] B B K) is finite but does not reach ([ ] B T C K 0), it will mean that noEDB satisfies the goal G0 = q1 (X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn) together with the set of conditionsF0 =B[[IC], concluding that Q1 v Q2 (Q1 vIC Q2). Section 4.3 provides the complete results andproofs regarding the soundness and completeness of the CQC method.

4.1. CQC-nodes, CQC-Trees and CQC-derivations

Let S = (DR, IC) be the database schema and Q1 and Q2 be two queries defining the same n-aryquery predicate q. A CQC-node is a 5-tuple of the form (Gi Fi Ti Ci Ki), where

• Gi is a goal to attain;• Fi is a set of conditions to enforce;• Ti is a set of ground EDB atoms, an EDB under construction;


• Ci is a set of conditions that are currently satisfied on Ti and that must be maintained; and• Ki is the set of constants appearing in R = DR [ Q1 [ Q2[[IC] and Ti.

A CQC-tree is inductively defined as follows:

(1) The tree consisting of the single CQC-node (G0 F0 B B K) is a CQC-tree.(2) Let E be a CQC-tree, and (Gn Fn Tn Cn Kn) a leaf CQC-node of E such that Gn 5 [ ]

or Fn 5 B. Then the tree obtained from E by appending one or more descendant CQC-nodes according to a CQC-expansion rule applicable to (Gn Fn Tn Cn Kn) is again a CQC-tree.

It may happen that the application of a CQC-expansion rule on a leaf CQC-node (Gn Fn Tn Cn

Kn) does not obtain any new descendant CQC-node to be appended to the CQC-tree becausesome necessary constraint defined on the CQC-expansion rule is not satisfied. In such a case,(Gn Fn Tn Cn Kn) is a failed CQC-node.

Each branch in a CQC-tree is a CQC-derivation consisting of a (finite or infinite) sequence (G0

F0 T0 C0 K0) (G1 F1 T1 C1 K1), . . . of CQC-nodes.A CQC-derivation is finite if it consists of a finite sequence of CQC-nodes; otherwise it is infi-

nite. A CQC-derivation is successful if it is finite and its last (leaf) CQC-node has the form ([ ]B Tn

Cn Kn). That is, both the goal to attain and the set of conditions to satisfy are empty. A CQC-der-ivation is failed if it is finite and its last (leaf) CQC-node is failed.

A CQC-tree is successful when at least one of its branches is a successful CQC-derivation. ACQC-tree is finitely failed when each one of its branches is a failed CQC-derivation.

4.2. Resolution principle and unification

Before describing the CQC-expansion rules some base concepts regarding the resolution prin-ciple applied in some of such rules are required.

A unifier of two expressions E and E 0 is a substitution h such that Eh is syntactically identical toE 0h. If two expressions do not have a unifier then they are not unifiable. A unifier h is called a most

general unifier (mgu) for the two expressions if for each unifier a of E and E 0 there exists a sub-stitution b such that a = hb.

Let G be a goal or a condition of the form L1 ^ � � � ^ p (X 1Þ ^ � � � ^ Lp; p P 1, and C be aninput clause of the form PðX 2Þ M1 ^ � � � ^Mq; q P 1. Then S is a resolvent for G and C usingmgu h if the following conditions hold: (1) pðX 1Þ is an atom, called the selected atom, in G; (2) h isa mgu of pðX 1Þ and pðX 2Þ and (3) S is the clause (L1 ^ � � � ^M1 ^ � � � ^Mq ^ � � � ^ Lp)h.

4.3. The CQC-expansion rules

The nine CQC-expansion rules are listed in Tables 4.1 and 4.2.For the sake of notation, if Gi = L1 ^ � � � ^ Lj�1 ^ Lj ^ Lj+1 ^ � � � ^ Lm then GinLj =

L1 ^ � � � ^ Lj�1 ^ Lj+1 ^ � � � ^ Lm. If Gi = L1 ^ � � � ^ Lm then Gi ^ pðX Þ ¼ L1 ^ � � � ^Lm ^ pðX Þ.


The application of a CQC-expansion rule on a given CQC-node (Gi Fi Ti Ci Ki) may result innone, one or several alternative (branching) descendant CQC-nodes depending on the selectedliteral P(Ji) = L. Here, Ji is either the goal Gi or any of the conditions Fi,j in Fi. L is selectedaccording to a safe computation rule P [33], which selects negative and built-in literals only whenthey are fully grounded. To guarantee that such literals are selected sooner or later deductive rulesand goals are required to be safe.

Once a literal is selected, only one of the CQC-expansion rules can be applied. Two classes ofrules are defined: A#-rules and B#-rules. A#-rules are those where the selected literal belongs tothe goal Gi. Instead, B#-rules correspond to those where the selected literal belongs to any of theconditions Fi,j in Fi. Inside each class of rules, they are differentiated in relation to the type of theselected literal.

In each CQC-expansion rule, the part above the horizontal line presents the CQC-node towhich the rule is applied. Below the horizontal line is the description of the resulting descendantCQC-nodes. Vertical bars separate alternatives corresponding to different descendants. Somerules like A1, A5, B2 and B4 include also an ‘‘only if’’ condition that constraints the circumstancesunder which the expansion is possible. If such a condition is evaluated false, the CQC-node towhich the rule is applied becomes a failed CQC-node.

Table 4.1

CQC-expansion rules: the A#-rules

(A1) PðGiÞ ¼ dðX Þ is a positive atom of a derived predicate:

ðGi F i T i Ci KiÞðGiþ1;1 F i T i Ci KiÞ j � � � j ðGiþ1;m F i T i Ci KiÞ

only if mP 1 and each Gi+1,j is the resolvent for Gi and some deductive rule dðX Þ M1 ^ � � � ^Mq in DR.

(A2) PðGiÞ ¼ bðX Þ is a positive EDB atom:

ðGi F i T i Ci KiÞððGi n bðX ÞÞr1 F iþ1;1 T iþ1;1 Ciþ1;1 Kiþ1;1Þ j � � � j ððGi n bðX ÞÞrm F iþ1;m T iþ1;m Ciþ1;m Kiþ1;mÞ

such that ðF iþ1;j ¼ F i [ Ci; T iþ1 [ fbðX ÞrjgÞ and Ci+1,j = B if bðX Þrj 62 T i; otherwise Fi+1,j = Fi,Ti+1,j = Ti and

Ci+1,j = Ci. Each rj is one out of m possible distinct ground substitutions, obtained via a variable instantiation

procedure from (variables(X Þ;£;KiÞ to (B,rj,Ki+1,j) according to the appropriate variable instantiation

pattern, that assigns a constant from ki+1,j to each variableðX Þ. See more details in Section 4.4.

(A3) PðGiÞ ¼ :pðX Þ is a ground negated atom:

ðGi F i T i Ci KiÞðGi n :pðX ÞF i [ f pðX ÞgT i Ci KiÞ

(A4) P(Gi) = L is a ground built-in literal:

ðGi F i T i Ci KiÞðGinL F i T i Ci KiÞ

only if L is evaluated true.


4.4. Variable instantiation patterns

Four different variable instantiation patterns are defined, VIPs for shorthand. Each of themdetermines how the CQC method has to instantiate the EDB facts to be added on the EDB underconstruction. The application of a VIP in each case depends on the syntactical properties of thedatabases and queries considered. The four VIPs are: Simple VIP, Negation VIP, Dense OrderVIP and Discrete Order VIP.

The CQC method uses the Simple VIP when checking query containment in the absence ofintegrity constraints. Moreover, the deductive rules defining query predicates as well as IDB pred-icates must satisfy the following conditions: they must not have any negative or built-in literal intheir rule bodies; they must not have constants in their heads; and they must not have any variableappearing twice or more times in their heads. According to the Simple VIP, each distinct variableis bound to a distinct new constant.

Table 4.2

CQC-expansion rules: the B#-rules

(B1) PðF i;jÞ ¼ dðX Þ is a positive atom of a derived predicate:

ðGi fF i;jg [ F i T i Ci KiÞðGi S [ F i T i Ci KiÞ

where S is the set of all resolvents Su for clauses in R and Fi,j on dðX ). S may be empty.

(B2) PðF i;jÞ ¼ bðX Þ is a positive EDB atom:

ðGi fF i;jg [ F i T i Ci KiÞðGi S [ F i T i Ciþ1 KiÞ

only if [ ] 62S.Ci+1 = Ci if X contains no variables and bðX Þ 2 T i; otherwise, Ci+1 = Ci [ {Fi,j}

S is the set of all resolvents of causes in Ti with Fi,j on bðX Þ. S may be empty, meaning that bðX Þ cannot beunified with any atom in Ti.

(B3) PðF i;jÞ ¼ :pðX Þ is a ground negative ordinary literal:

ðGi fF i;jg [ F i T i Ci KiÞðGi f pðX Þg [ fF i;j n :pðX Þg [ F i T i Ci KiÞ only if F i;j n :pðX Þ 6¼ ½ � j ðGi ^ pðX Þ F i T i Ci KiÞ

(B4) P(Fi,j) = L is a ground built-in literal that is evaluated true:

ðGi fF i;jg [ F i T i Ci KiÞðGi fF i;j n Lg [ F i T i Ci KiÞ

only if FinL 5 [ ].

(B5) P(Fi,j) = L is a ground built-in literal that is evaluated false:

ðGi fF i;jg [ F i T i Ci KiÞðGi F i T i Ci KiÞ


The CQC method uses the Negation VIP when checking containment in the presence of integ-rity constraints and/or negation and or (in)equality comparisons (=, 5) or. In both cases, ordercomparisons (<, 6, >, P) are not allowed.

The two other VIPs, Dense Order VIP and Discrete Order VIP, are applied when there areorder comparisons (<, 6, >, P) in the deductive rules, with or without negation, interpretedon a dense or discrete, respectively, order domain. In both cases, each distinct variable must bebound to a constant according to either a former or a new location in the total linear (dense ordiscrete) order of constants.

A variable instantiation procedure from ({X1,X2, . . . ,Xn}h0K0) to (B hn Kn) is a sequence({X1,X2, . . . ,Xn} h0 K0), ({X2, . . . ,Xn} h1 K1), . . . , (B hn Kn) such that for each 0 6 i 6 n,hi is aground substitution and Ki is a set of constants.

A variable instantiation step performs a transition from ð�X i hi KiÞ to ð�X iþ1 hiþ1 Kiþ1Þ thatinstantiates the variable Xi+1 of �X i according to one of the VIP-rules defined by the selectedvariable instantiation pattern (VIP). The application of the appropriate VIP to a given classof queries and databases ensures the completeness of the CQC method with respect to thatclass.

The formalization of the VIP-rules is given in Table 4.3. Constants are denoted as k, knew andki. Max and min are two functions that range over sets of constants and they return the constantshaving the greatest value and the least value, respectively, of those sets.

The formalization of the rules for the Discrete Order VIP requires supplementary definitions. Avirtual constant [40] is a discrete constant whose value is not determined by a numeric quantity butby its relative position in a linear ordering of constants. Let from now on static constant stand fora discrete constant that is not a virtual constant, that is, it has a numeric value that already ap-peared in R = DR [ Q1 [ Q2[[IC]. dnew and di denote virtual constants and ci, cmin and cmax de-note static constants from R.

5. Correctness results for the CQC method

In this section, we summarize and sketch the proofs of correctness of the CQC method. Werefer the reader to [23] for the detailed proofs. We also state the class of queries that can be actu-ally decided by the CQC method. Before proving these results, we need to make explicit themodel-theoretic semantics to with respect those results are established.

We view the CQC method as an extension of SLDNF-resolution [12,33]. However, there is animportant difference between both methods. When applying SLDNF-resolution, the input set ofinformation, the logic program and the goal to attain, is closed, that is, neither new facts nor rulesare added on behave of a SLDNF-resolution procedure. Instead, the CQC method enforces theaddition of new information, in terms of EDB facts, on behave of the method, if it is considerednecessary for assuring the satisfaction of the non-containment goal. This key difference will be re-flected on the semantics that founds each method.

The model-theoretic counterpart of the procedural semantics of SLDNF-resolution is Clark�scompletion [12,33]. In this way, the soundness and completeness results of SLDNF-resolution are established with respect to the semantics of the completed logic programs takenas a input. In a similar way, we introduce the notion of partial completion of deductive rules


to provide a model-theoretic foundation to prove the soundness and completeness of the CQCmethod.

Let R be a set of deductive rules. We define the partial completion of R, denoted by pComp(R),as the collection of completed definitions [12,33] of IDB predicates in R together with an equalitytheory. This later one includes a set of axioms stating explicitly the meaning of the built-in pred-icate = introduced in the completed definitions.

Our partial completion is defined similarly to Clark�s completion, Comp(R), but without includ-ing the axioms of the form 8xð:bið�X ÞÞ for each predicate bi which only appear in the body of the

Table 4.3

Formalization of the VIP-rules

VIP-rule for the simple VIP

S. X iþ1 ¼ X i n X iþ1; hiþ1 ¼ h1 [ fX iþ1=knewg and Ki+1 = Ki [ {knew}, where knew 62 Ki.

VIP-rules for the negation VIP

N1. X iþ1 ¼ X i n X iþ1; hiþ1 ¼ h1 [ fX iþ1=kg and Ki+1 = Ki, where k 2 Ki.

N2. X iþ1 ¼ X i n X iþ1; hiþ1 ¼ h1 [ fX iþ1=knewg and Ki+1 = Ki [ {knew}, where knew 62 Ki.

VIP-rules for the dense order VIP

Den1. X iþ1 ¼ X i n X iþ1; hiþ1 ¼ h1 [ fX iþ1=kg and Ki+1 = Ki, where k 2 Ki.

Den2. X iþ1 ¼ X i n X iþ1; hiþ1 ¼ h1 [ fX iþ1=knewg and Ki+1 = Ki [ {knew}, where knew < min(Ki).

Den3. X iþ1 ¼ X i n X iþ1; hiþ1 ¼ h1 [ fX iþ1=knewg and Ki+1 = Ki [ {knew}, where kj < knew < kj+1,

{kj,kj+1} � Ki and there is no kh 2 Ki such that kj < kh < kj+1.

Den4. X iþ1 ¼ X i n X iþ1; hiþ1 ¼ h1 [ fX iþ1=knewg and Ki+1 = Ki [ {knew}, where maX(Ki) < knew.

VIP-rules for the Discrete Order VIP

Dis1. X iþ1 ¼ X i n X iþ1; riþ1 ¼ r1 [ fX iþ1=kg and Ki+1 = Ki, where k 2 Ki.

Dis2. X iþ1 ¼ X i n X iþ1; riþ1 ¼ r1 [ fX iþ1=dnewg and Ki+1 = Ki [ {dnew}, where dnew is a new virtual constant such

that dnew < min(Ki).

Dis3. X iþ1 ¼ X i n X iþ1; riþ1 ¼ r1 [ fX iþ1=dnewg and Ki+1 = Ki [ {dnew}, where dnew is a new virtual constant such

thatmin(Ki) 6 dj < dnew < kj+1 6 cmin,{dj,kj+1,cmin � Ki,

there is no virtual constant dh 2 Ki such that dj < dh < kj+1 andthere is no static constant cp 2 Ki such that cp < cmin.

Dis4. X iþ1 ¼ X i n X iþ1; riþ1 ¼ r1 [ fX iþ1=dnewg and Ki+1 = Ki [ {dnew}, where dnew is the virtual constant such that

cj 6 kj < dnew < kj+1 6 cj+1,{cj,kj,kj+1,cj+1} � Ki,

there is no virtual constant dh 2 Ki such that kj < dh < kj+1,there is no static constant cp 2 Ki such that ci < cp < cj+1 andj{dqjdq 2 Ki and cj < dq < cj+1}j < jcj+1�cjj�1


cmax 6 kj < dnew < dj+1 6 max(Ki), {cmax,kj,dj+1} � ki,there is no virtual constant dh 2 Ki such that Kj < dh < dj+1 andthere is no static constant cp 2 Ki such that cmax < cp.


max(ki) < dnew.


clauses in R. We assume that these predicates are EDB predicates that, obviously, are not definedin R.

If Q1 and Q2 are two queries, DR is the set of deductive rules defining the database IDB rela-tions and IC be a finite set of conditions expressing the database integrity constraints, we considerthat problem of knowing whether Q1 v Q2 (Q1 vIC Q2) is equivalent to the problem of provingthat pComp(R)[["IC] � "X1, . . . ,Xn q1(X1, . . . ,Xn)! q2(X1, . . . ,Xn) is true, where R = DR [Q1 [ Q2. If we define the initial goal G0 = q1(X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn) then testingQ1 v Q2 (Q1 vIC Q2) is equivalent to proving pComp(R)[["IC] � G0. This proof is tackled bythe CQC method, which tries to refute pComp(R)[["IC] � G0 by constructing an EDB T suchthat R(T) is a model for pComp(R) [["IC] [ {$X1 . . . $ Xn (q1(X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn))}.

In the following theorems, let G0 = q1(X1, . . . ,Xn) ^ � q2(X1, . . . ,Xn) be the initial goal,F0 =B[[IC] be the initial set of conditions to enforce and K be the set of constants appearingin DR [ Q1 [ Q2 [ F0.

Before proving results related to failure of the CQC method, we review the results related tofinite success already stated in [22].

Theorem 5.1. (Finite success soundness). If there exists a finite successful CQC-derivation startingfrom (G0 F0 B B K) then Q1 6v Q2 (Q1 6vIC Q2) provided that {G0} [ F0 [ DR [ Q1 [ Q2 is safeand hierarchical.

Theorem 5.2. (Finite success completeness). If Q1 6v Q2 (or Q1 6vIC Q2) then there exists a success-ful CQC-derivation from (G0 F0 B B K) to ([]B T C K 0) provided that {G0} [ F0 [ DR [ Q1 [ Q2

is safe and either hierarchical or strict-stratified [11].

These results ensure that, in the absence of recursive IDB predicates, if the method builds afinite counterexample, then containment does not hold (Theorem 5.1); and that if there exists afinite counterexample, then our method finds it and terminates (Theorem 5.2). In [24] we extendedthese results by assessing the properties regarding failure of our method. In this sense, we provedfailure soundness (Theorem 5.3) which guarantees that if the method terminates without buildingany counterexample then containment holds; and failure completeness (Theorem 5.5) which statesthat if containment holds between two queries then our method fails finitely.

Theorem 5.3. (Failure soundness). If there exists a finitely failed CQC-Tree rooted at (G0 F0 B

B K) then Q1 v Q2 (Q1 vIC Q2) provided that the deductive rules and conditions inDR [ Q1 [ Q2[[ IC] are safe.

The proof of Theorem 5.3 is made by using the principle of contradiction and may be intuitivelyexplained as follows. Le us suppose that we have a finitely failed CQC-tree but Q1 6v Q2

(Q1 6vIC Q2). If Q1 6v Q2 (Q1 6vIC Q2) it means for us that pComp(R)[[ "IC] [{$X1 . . . $Xn(q1(X1, . . . ,Xn) ^� q2(X1, . . . ,Xn))} has a model. However, if this is true, we provethat there is at least one CQC-derivation not finitely failed.

Lemma 5.4 is needed for proving Theorem 5.5. Before stating it, we need some new definitions.A CQC-derivation is open when it is not failed. That is, when the derivation is either infinite or

finite with its last (leaf) CQC-node having the form of ([ ] B Tn Cn Kn). A CQC-derivation h issaturated for the CQC-expansion Rules if for every CQC-node (Gi Fi Ti Ci Ki) in h the followingproperties hold:


1. For each literal Li,j 2 Gi there exists a node (Gn Fn Tn Cn Kn), n P i, such that P(Gn) =Li,jri+1 . . . rn is the selected literal on that node to apply a CQC A-rule, where ri+1 . . . rn isthe composition of the substitutions used in the intermediate nodes.

2. For each condition F i;ji 2 F i there exists a node (Gn Fn Tn Cn Kn), n P i, such that F n;jn 2 F n isthe selected condition on that node to apply a CQC B-rule and F n;jn ¼ F i;ji .

A CQC-derivation is said to be fair when it is either failed or open and saturated for the CQC-expansion rules. A CQC-tree is fair if each one of its CQC-derivations (branches) is fair. Note thata finitely failed CQC-tree is always fair, but the inverse is not necessarily true.

A set of deductive rules P is hierarchical if there is a partition P = P1 [ � � � [ Pn such that forany ordinary atom rð�X Þ occurring positively or negatively (as :rð�X ÞÞ in the body of a clause inPi, the definition of r is contained within Pj with j < i. Note that a hierarchical set of deductiverules contains no recursive definitions about IDB relations.

Lemma 5.4. Let R be a set of deductive rules, G = L1 ^ � � � ^ Lk be a goal, F be a set ofconditions and K be the set of constants in {G0} [ F0 [ R. If there exists a saturated open CQC-derivation starting from (G0 F0 BB K) then pComp(R) [ {$(L1 ^ � � � ^ Lk)} [ "F0 has a modelprovided that {G0} [ F0 [ R is safe and hierarchical.

Theorem 5.5. (Failure completeness). If Q1 v Q2 (Q1 vIC Q2) then every fair CQC-Treerooted at (G0 F0 B BK) is finitely failed provided that {G0} [ F0 [ DR [ Q1 [ Q2 is safe andhierarchical.

The proof is made by contradiction and may be intuitively explained as follows. If Q1 v Q2

(Q1 vIC Q2) then pComp(R)[["IC] [ {$X1, . . . ,Xnq1(X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn)} cannot havea model. Assuming that it is true, let us suppose that we have a non-failed CQC-derivation start-ing from (G0 F0 BB K). However, Lemma 5.4 shows that this derivation would indeed constructa model for pComp(DR)[ [ "IC] [ {$X1 . . . $Xn (�q1(X1, . . . ,Xn) ^ �q2(X1, . . . ,Xn))}.

6. Decidability

Most of the past research in query containment has been focused on conjunctive queries anddifferent results are obtained according to the syntactic features that each proposal considered.Rather than providing a more efficient algorithm for those cases, the aim of the CQC methodis to cover and extend the classes of queries and databases for which query containment has beeninvestigated. Indeed, the CQC method addresses those cases that we believed not to have beendealt with properly before. In particular, when considering negated IDB atoms, order comparisonliterals under the two possible interpretations, discrete or dense, and integrity constraints (see Sec-tion 6 for a more detailed comparison with the methods that handle such features).

Query containment is undecidable for the general case of queries and databases that the CQCmethod covers [1]. One possible source of undecidability is the presence of recursively-defined de-rived predicates that could make the CQC method build and test an infinite number of EDBs. In-deed, the proofs of the failure completeness (Theorem 5.5) and the finite success soundness(Theorem 5.1) of the CQC method exclude explicitly the presence of any type of recursion.


Another reason for undecidability is the presence of ‘‘axioms of infinity’’ [6] or ‘‘embeddedTGD�s’’ [45]. In this case, the initial goal to attain could only be satisfied on an EDB with an infi-nite number of facts because each new addition of a fact to the EDB under construction triggers acondition to be repaired with another insertion on the EDB.

For this reason, the CQC method is semidecidable for the general case, in the sense that if eitherthere exist one or more finite EDBs for which containment does not hold or there is no EDB (fi-nite or infinite), the CQC method terminates according to the completeness results demonstratedin the previous section.

One of the forms to assure termination when the CQC method it is used is to delimit a priori thetype of schemas an queries for which a infinite EDB can be a non-containment counterexample ifand only if a finite subset of that EDB is a counterexample. Therefore, it is important here to re-view those cases defined in the previous literature in which it is not necessary to construct infiniteEDB�s for the sake of completeness:

• Conjunctive query containment in the absence of integrity constraints, where conjunctive que-ries may include negated EDB atoms or built-in atoms [13,31,50].

• Query containment in the absence of integrity constraints, where queries may include negationon EDB and IDB atoms provided that all the EDB relations are unary [27].

• Conjunctive query containment under constraints, where conjunctive queries may includebuilt-in atoms in some cases [52,54,56] and integrity constraints have the form of tuple gener-ating dependencies if they are full [19,54] or acyclic [52,54,56] (see Section 7.2.1.1 for moredetail).

Other approach to assure always termination is to avoid directly the CQC method to constructinfinite EDBs. This can be done by restricting the maximum number of constants used in CQC-derivations. Once reached that maximum, the CQC-derivation would be ‘‘closed’’ and consideredfailed since it probably does not lead to a finite counterexample. Such a solution could be re-garded as some kind of ‘‘trickery’’ that puts the correctness of the CQC method in awkward sit-uation. However, it is a pragmatic solution in the sense that no ‘‘real’’ database is supposed tostore an infinite number of facts.

7. Related work

The goal of this section is to compare the CQC method with the methods aimed at checkingcontainment in the Database area. Moreover, the CQC method is compared with other methodsthat have been addressed to similar problems in other research areas. These problems are ConceptSubsumption in Description Logics and Theorem Proving in Automated Reasoning.

The following comparisons are intended to point out that

• The CQC method performs containment tests for more and broader classes of queries andintegrity constraints than most previous methods, featuring negation on both EDB and IDBsubgoals as well as built-in subgoals.

• The CQC method checks ‘‘true’’ query containment instead of uniform query containment.


• The CQC method is not less efficient than other methods for those cases that those methodsalready cover.

This chapter is organized as follows. Section 7.1 compares the CQC method with those methodsthat check query containment in the absence of integrity constraints. In Section 7.2, the compar-ison is made with respect to those methods that check query containment in the presence of integ-rity constraints.

Section 7.3 relates query containment with concept subsumption in Description Logics andcompares the CQC method with methods defined in that research area. Finally, Section 7.4 com-pares the CQC method with a Tableaux-based theorem-proving method that has been proposedalso to check database-schema satisfiability.

7.1. Methods for checking query containment in the absence of integrity constraints

The CQC method checks query containment in the absence or presence of integrity constraintsfor queries defined over database schemas that have safe negation on IDB and EDB subgoals aswell as equality and order comparison built-in subgoals in the rule bodies. The expressiveness ofsuch a class of queries and schemas allows the CQC method to handle most of the query/schemaclasses for which have been proposed containment checking methods in the database literature.

In general, methods that have been addressed to check query containment in the absence ofintegrity constraints can be classified into two main groups:

1. Methods that check query containment for very restricted classes of queries, where• negated and built-in subgoals are not considered at all [13,15,16,37,44,48];• built-in subgoals are allowed without negated subgoals [31,42,55];• negated EDB subgoals are allowed with [36,50] or without [53] built-in subgoals.

2. Methods that address queries with negated EDB and IDB subgoals, as well as built-in sub-goals, but checking uniform containment instead of ‘‘true’’ containment [36,47].

It is worth noticing that no method from the first group handles queries having negated IDBsubgoals. Allowing subgoals about IDB predicates in the bodies of deductive queries increasesthe expressive power of such queries. If these IDB subgoals can also be negated, expressivenessis enhanced dramatically. Therefore, negation is handled in a very restrictive way in the methodsfrom the first group above. The simple example used in Sections 3.1 and 3.2, for instance, does notfall into the query classes that those methods cover. Remember that in that example there were-two queries defining the same query predicate sub:

Q1 ¼ fsubðX Þ empðX Þ ^ :chief ðX ÞgQ2 ¼ fsubðX Þ empðX Þ ^ :bossðX Þg

where emp is an EDB predicate and chief and boss are IDB predicates defined by a set DR ofdeductive rules

DR ¼fbossðX Þ worksForðZ;X Þchief ðX Þ worksForðY ;X Þ ^ bossðY Þg


where worksFor is another EDB predicate. Since worksFor is not unary, neither Q1 nor Q2 couldbe rewritten in terms of a union of conjunctive queries, what is feasible only when all EDB sub-goals are unary [27].

For the sake of fairness, it must also be noted that Refs. [16,37,44] cover some decidable casesof queries with recursively-defined IDB predicates for which failure completeness, finite successsoundness and finite success completeness of the CQC method have not been proven yet. Forthe rest of the methods from that group [13,15,31,36,42,48,50,55], 2 the CQC method is clearlyan alternative that is addressed to a large superset of all the cases that they cover. Moreover, Sec-tion 3.4 shows that the CQC method with the Simple VIP obtains the same results and it is not lessefficient than the methods of [13,44] when plain conjunctive queries are considered. Section 7.1.1below provides an example that reveals the clear correspondence between the CQC method, withthe Negation VIP, and the algorithm defined in [36,50], when considering conjunctive queries withnegated EDB subgoals. Section 7.1.2 reviews the novel work of [53], where the authors providenew results that allow checking conjunctive query containment with negated EDB subgoals ina more efficient way than [36,50]. Fortunately, this improvement of efficiency is also applicableto CQC method, as it also shown in Section 7.1.2. Finally, Section 7.1.3 discusses uniform-con-tainment-based methods.

7.1.1. The method of Levy, Sagiv and Ullman [36,50]The algorithm of [50] is an adaptation of the uniform equivalence checking method of [36] for

the class of conjunctive queries with negated EDB subgoals. Recall that conjunctive queries arethose queries that do not have any literal about IDB predicates in their defining rule bodies.Therefore, this is a simple case of query containment with negation that the CQC method coversbroadly.

Nevertheless, the goal of this Section is to show, by means of an example, that the performanceof the CQC method is as good as the one of [50]. In particular, the example will show that bothmethods generate and test the same EDB to check whether a query is contained in another one.

Example 7.1. Let Q1 and Q2 be two queries defining the same 2-ary query predicate p:

Q1 ¼ fpðX ; Y Þ aðX ;ZÞ ^ aðZ; Y Þ ^ :aðX ; Y ÞgQ2 ¼ fpðX ; Y Þ aðX ;ZÞ ^ aðZ; Y Þ ^ aðZ;W Þ ^ :aðX ;W Þg

where a fact like a(0,1) is an EDB fact that is true whenever an arc connects 0 with 1.As stated before, the CQC method is intended to prove that Q1 v Q2 is not true by constructing

an EDB for which such a relationship does not hold.Different CQC-derivations starting from G0 = p1(X,Y) ^ �p2(X,Y) and considering all the

relevant variable instantiations are partially shown in Fig. 7.1. Note that the CQC method appliesthe Negation VIP. Since no CQC-derivation ends successfully, Q1 v Q2.

2 [42] also addresses query containment under bag semantics, which is a feature that the CQC method does not

consider at all.


The CQC-(sub)derivation ST1, which continues with �a(0,0) ^ �p2(0,0) as the goal to attainand {a(0,0)} as the constructed EDB by then, is shown in Fig. 7.2. This derivation fails mainlybecause the content of EDB itself cannot satisfy p1 even before enforcing p2 to be false, because�a(0,0) cannot be made false with {a(0,0)}. ST2 and ST4 fail in a similar way for the same reason.

The CQC-(sub)derivation ST5, which continues with a(0,2) ^ �p2(0,2) as the goal to satisfyand {a(0,1),a(1,2)} as the constructed EDB by then, is shown in Fig. 7.3. Although an EDBsatisfying is constructed p1, e.g. p1(0,2) holds on it, the CQC-derivation fails because p2(0,2)cannot be made false. Note that p2(0,2) is attempted to be made false by adding a(0,2), but suchan inclusion would also make p1(0,2) false. ST3 fails in a similar way for the same reason.

Fig. 7.2. The failed subderivation ST1.

← ¬a(0, 0) ¬p2(0, 0)

¬a(0, 2) ¬p2(0, 2)

← ← ← ← ¬a(0, 1) ¬p2(0, 1)

¬a(0, 0) ¬p2(0, 0)

¬a(0, 1) ¬p2(0, 1)

a(0, Y) ¬a(0, Y) ¬p2(0, Y) a(1, Y) ¬a(0, Y) ¬p2(0, Y)

a(X, Z) ∧

∧ ∧ ∧ ∧ ∧

∧∧ ∧ ∧

∧ ∧a(Z, Y) ¬a(X, Y) ¬p2(X, Y)

←

←

← ←

p1(X, Y) ∧ ¬p2(X, Y)

A2 2a 2b

A1 1

X = Z = 0

T = {a(0, 0)}

X = 0, Z =1

T = {a(0, 1)}

X = Z = 0, Y = 1

X = 0, Z = Y = 1

X = 0, Z = 1, Y = 2

A2 3ba 3bb 3ab 3bc

T3ba = {a(0, 1), a(1, 0)}

T3bb = {a(0, 1), a(1, 1)}

T3aa = {a(0, 0)} T3ab = {a(0, 0), a(0, 1)}

ST1

fail

T3bc = {a(0, 1), a(1, 2)}

X = Y = 0, Z = 1

X = Z = Y = 0

A2 3aa

ST2

fail

ST2

fail

ST4

fail

ST5

fail

Fig. 7.1. The top level CQC-nodes of a failed CQC-Tree rooted at p1(X,Y) ^ �p2(X,Y).


Table 7.1 summarizes the steps followed to check that Q1 v Q2 holds according to theprocedure described in [50]:

Table 7.1

Summary of the steps defined by algorithm of [50]

Step 1 Step 2 Step 3 Step 4

Variable

partitions

Canonical

Databases CDi

p(X,Y)ri 2 Q1(CDi))p(X,Y)ri 2 Q2(CDi)

Extended Canonical

Databases ECDi

s.t. a(X,Y)ri 62 ECDi

p(X,Y)ri 2 Q1(ECDi))p(X,Y)ri 2 Q2(ECDi)

(1) {X,Y,Z} {a(0,0)} p(0,0) 62 Q1(CD1) – –

(2) {X,Z},Y {a(0,0),a(0,1)} p(0,1) 62 Q1(CD2) – –

(3) {X,Y}{Z} {a(0,1),a(1,0)} p(0,0) 2 Q1(CD3) and

p(0,0) 2 Q2(CD3)

{a(0,1),a(1,0),a(1,1)} OK:p(0,0) 2 Q1(ECD3) and

p(0,0) 2 Q2(ECD3)

(4) {X} {Z,Y} {a(0,1),a(1,1)} p(0,1) 62 Q1(CD4) – –

(5) {X}{Z}{Y} {a(0,1),a(1,2)} p(0,2) 2 Q1(CD5) and

p(0,2) 2 Q2(CD5)

{a(0,1),a(1,2),a(0,0),a(1,0),

a(1,1),a(2,0),a(2,1),a(2,2)}

OK:p(0,2) 2 Q1(ECD5) and

p(0,2) 2 Q2(ECD5)

Fig. 7.3. The failed subderivation ST5.


1. Construct the set of basic canonical EDBs that correspond to all the partitions of the set ofvariables in Q1. For each variable partition, define a variable substitution ri that assigns aunique constant to each block of the partition. For each resultant ri construct a canonicaldatabase by applying ri to the positive atoms of the Q1 rule body. In this example, a canon-ical EDB is obtained for each one of the five possible partitions of the variables X, Z, and Yfrom Q1.

2. For each canonical EDB CDi check that if Q1(CDi) contains the frozen head of Q1, p(X,Y)ri,then so does Q2(CDi). On CD1, CD2 and CD4, p(X,Y)ri does not hold for Q1 because the neg-ative literal �a(X,Y) becomes false according to ri, i.e. a(X,Y)ri is true. These three CDi arediscarded for the following steps.

3. For the remaining canonical EDBs, CD3 and CD5, construct the set of extended canonicalEDBs, ECD3 and ECD5, by adding to these CDi other ground facts about A formed fromall possible combinations with the constant values in ri, but not these ones that would makea(X,Y)ri true. For instance, a(1,1) but not a(0,0) has been added to ECD3.

4. For ECD3 and ECD5, check that if Q1(ECDi) contains p(X,Y)ri then so does Q2 (ECDi). Inthis case, it is true for the two extended canonical EDBs and it proves that Q1 v Q2 is true(see [36,50] for more details).

It is easy to see that there is a clear correspondence between the CQC method and theprocedure described in [50] for this example. In particular, looking at Fig. 7.1, it can be noticedthat each EDB constructed at the third-level steps of the CQC-derivations correspond to one ofthe canonical EDBs build at the first step of Table 7.1. Moreover, CQC-derivations ST1, ST2 andST4 fail because �a(X,Y)ri is false and it makes p1(X,Y)ri be false too, as it happens when Q1 isevaluated on canonical EDBs CD1, CD2 and CD4, in step 2 in Table 7.1.

In addition, the CQC-derivations for ST3 and ST5 correspond to the steps 2–4 followed for thecanonical EDBs CD3 and CD5, respectively. In particular, both methods use the concept ofextended EDBs, but in a slightly different way. [50] extends their canonical EDBs by adding newfacts that keep p1(X,Y)ri true to check if p2(X,Y)ri still holds. In contrast, since the CQC methodwants to prove the non-containment relationship, it tries to extend T by adding a new fact thatwill make p2(X,Y)ri be false. However, such an addition also makes p1(X,Y)ri be false, and thus,it cannot be performed. Therefore, ST3 and ST5 fail while p2(X,Y)ri still holds on ECD3 andECD5.

The previous comparison illustrates that both the CQC method and the algorithm of [50]achieve the same results for conjunctive queries with negated EDB atoms, but their strategiesare different. In the CQC method, EDB construction and validation can be interleaved. In somecases, it might be unnecessary to generate the whole set of complete canonical EDBs to proveeither containment or non-containment. Instead, the method of [50] first builds all the canonicalEDBs and then, it tests if each of them accomplishes the containment relationship.

Finally, the approach of [50] can be easily extended to consider order predicates in the rule bod-ies of conjunctive queries over the two types of interpretations, dense or discrete. In this case, thecanonical databases that would be built in step 1 of the algorithm should take into account everypossible total ordering of variables appearing in Q1. Again, the CQC method not only covers thisclass of queries but also constructs similar (canonical) EDBs.


7.1.2. The method of Wei and Lausen [53]

Recently, a new method for conjunctive query containment with safe negated EDB atoms hasbeen published [53]. The algorithm proposed there is aimed at proving either Q1 6v Q2 or Q1 v Q2

without generating necessarily the complete set of canonical EDB�s that the method of [36,50]needs to construct. As it has been shown in the previous section, the CQC method uses the Nega-tion VIP to construct those canonical EDB�s and it must recreate them all when Q1 v Q2 actuallyholds. In contrast, when Q1 6v Q2 holds, the CQC method can construct the counterexample with-out generating all the EDB�s required by [36,50].

The results obtained in [53] allow defining an algorithm that extends the method of Contain-ment Mappings, reviewed in Section 3.4, to the more general case of conjunctive queries with ne-gated EDB atoms. In this section, instead of analyzing the concrete algorithm proposed in [53],the theoretical results in which this algorithm is based will be presented and then applied to theCQC method. More specifically, it will be shown that the Simple VIP can replace the NegationVIP when using the CQC method to check QC for conjunctive queries with negated EDB atom,without any loss of completeness. This would explain why there is no need in most cases of gen-erating all the canonical EDBs that the CQC method + Negation VIP and [50,36] would do toprove Q1 v Q2.

Let Q1 and Q2 be two conjunctive queries with negated EDB atoms:

Q1 ¼ fqð�X Þ p1ð�X 1Þ ^ � � � ^ pnð�XnÞ ^ :s1ðY 1Þ ^ � � � ^ :smðY mÞg

Q2 ¼ fqð �UÞ r1ð �U 1Þ ^ � � � ^ rhð �UhÞ ^ :t1ð �W 1Þ ^ � � � ^ :tkðW kÞg

According to [53, Theorem 2], Q1 v Q2 if and only if the following two conditions get satisfied:

1. There is a containment mapping q from Qþ2 to Qþ1 such that Qþ1 v Qþ2 , where

Qþ1 ¼ fqð�X Þ p1ð�X 1Þ ^ � � � ^ pnð�XnÞg

Qþ2 ¼ fqð �UÞ r1ð �U 1Þ ^ � � � ^ rhð �UhÞg

2. For each j, 1 6 j 6 k, Pj v Q2 holds, where

P j ¼ fqð�X Þ p1ð�X 1Þ ^ � � � ^ pnð�XnÞ ^ qðtjðW jÞÞ ^ :s1ð�Y 1Þ ^ � � � ^ :smð�Y mÞg

Notice that this result has an intrinsic recursive structure: each test for Pj v Q2 may requirere-evaluating the two conditions just defined. There are two base cases that stop recursion. Thefirst one occurs when Pj is unsatisfiable since qðtjð �W jÞÞ ¼ sið�Y iÞ, for some i, 1 6 i 6 m. Con-sequently, Pj v Q2 holds trivially. The second base case occurs when Pj 6v Q2 since for eachpi being a containment mapping from Qþ2 to Pþj there exists at least one g, 1 6 g 6 k, suchthat piðtgðW gÞÞ 2 fp1ðX 1Þ; . . . ; pnðXnÞ; qðtjðW jÞÞg [53, Theorem 1]. When this latter occurs,Q1 6v Q2.

In the previous section, Example 7.1 helped to show that the CQC method + Negation VIPoperates similarly than the method of [36,50] when checking conjunctive query containment withnegated EDB atoms. Now, the same example will help to grasp these new results from [53] as wellas to show how they are applied inherently by the CQC method + Simple VIP.


Example 7.2. Recall Q1 and Q2 being two queries defining the same 2-ary query predicate p:

Q1 ¼ fpðX ; Y Þ aðX ; ZÞ ^ aðZ; Y Þ ^ :aðX ; Y ÞgQ2 ¼ fpðX ; Y Þ aðX ; ZÞ ^ aðZ; Y Þ ^ aðZ;W Þ ^ :aðX ;W Þg

where a is, of course, an EDB relation.In this example, there is just one containment mapping from Qþ2 to Qþ1 , which proves Qþ1 v Qþ2 :

q ¼ fXQ2! XQ1

; Y Q2! Y Q1

; ZQ2! ZQ1

;W Q2! Y Q1

g;According to [53, Theorem 2] only one of the all possible containment mappings suffices to

accomplish the second condition of the theorem. Obviously, any algorithm claiming completenessmust systematically test all the containment mappings before concluding that none is the‘‘elected’’ one. Fortunately, in this example, there are no many alternatives to explore, just one.Moreover, since Q2 only contains a negated atom, �a(X,W), only one new conjunctive queryP1 needs to be generated:

P 1 ¼ fpðX ; Y Þ aðX ;ZÞ ^ aðZ; Y Þ ^ aðX ; Y Þ ^ :aðX ; Y Þgwhere a(XQ1

,YQ1) comes from q(a(XQ2

,WQ2)).

Clearly, P1 is unsatisfiable since it contains both a(X,Y) and �a(X,Y) and, thus, it computes noanswer in any database. Consequently, P1 v Q2, so Q1 v Q2.

Now, consider again the CQC-derivation depicted partially in Fig. 7.1, as the most-rightbranch in the CQC-Tree sketched there and then concluded in Fig. 7.3. This derivation introducesa new constant each time that a distinct variable requires to be instantiated, so it can be thought ofimplementing the Simple VIP, in the same way that the CQC-derivation shown in Fig. 3.4, Section3.4, did. As it was already pointed out there, constructing and then testing a canonical EDB withthe Simple VIP is an indirect method to find out containment mappings.

In this way, steps 2b and 3bc in the CQC-tree shown in Fig. 7.1 construct a canonical EDB forQþ1 according to the Simple VIP: {a(0,1),a(1,2)}. Conversely, steps 8bc, 9bc and 10bc in Fig. 7.3successfully match the atoms in Qþ2 with the constructed EDB, so the containment mapping q canbe derived straightforwardly: fXQ2

! 0!XQ1;Y Q2

! 2! Y Q1;ZQ2! 1! ZQ1

;W Q2! 2! Y Q1

g.The second part of the test, that is, whether P1 v Q2 holds, can also be tracked easily on the

CQC-derivation in Fig. 7.3. The generation itself of the new query P1 by addingqðaðXQ2

;W Q2ÞÞ ¼ aðXQ1

; Y Q1Þ to the body of Q1 has its ‘‘CQC-counterpart’’ in the addition of

að0; 2Þ ¼ aðXQ2;W Q2

ÞfXQ2n 0;W Q2

n 2g to the goal part, in step 11bc, which was ‘‘inhabited’’previously by the atoms coming from the body of Q1. The unsatisfiability of P1 is detected as soonas a(0,2) is added to the EDB and condition C1 is triggered and evaluated. Notice that the EDBconstructed by then is nothing but the ‘‘frozen’’ body of Pþ1 . Moreover, if P1 had not beenunsatisfiable then the triggering and later evaluation of condition C2 = a(0,Z) ^ a(Z, 2) ^a(Z,W) ^ �a(X,W) would have determined whether there existed a containment mapping fromQþ2 to Pþ1 in order to prove Pþ1 v Qþ2 as a first step towards proving P1 v Q2.

7.1.3. Uniform query containment with negated IDB subgoalsIn contrast to the previous methods aimed at query containment checking, other research

works tackle the general class of datalog queries with safe negated IDB and built-in subgoals froma different approach: they check uniform query containment instead of ‘‘true’’ query containment.


Uniform query containment was coined in [45] as an alternative concept to query containmentand it was proved decidable for datalog queries. Let S = (DR, IC) be the database schema and Q1

and Q2 be two queries defining the same n-ary query predicate q. Q1 is uniformly contained in Q2,written Q1 vu Q2, if for every finite set I of ground facts holds that

fqð�aiÞjqð�aiÞ 2 ðQ1 [ DRðIÞg � fQð�ajÞjQð�ajÞ 2 ðQ2 [ DRÞðIÞg

In this case, I is an arbitrary set of ground facts about EDB and derived (query or IDB) predi-cates. It is important to note that, in contrast to ‘‘true’’ QC, derived facts in I are independentfrom and may not be related to the ones computed by applying the rules in DR (and/or the onesfrom the queries) on the EDB facts only.

In [36], a method to check Uniform Query Equivalence (that is, whether Q1 vu Q2 andQ2 vu Q1 hold at a time) for queries with stratified negation is provided. Q2 and Q1 are distinctprograms, i.e. sets of deductive rules, which do not only contain the rules defining the queriesbut also the derived predicate definitions that those rules need.

The algorithm exploits the stratified structure of the queries that it considers. A (derived) pred-icate belongs to a stratum Si if its defining rules contain positive literals about predicates belong-ing to strata Sj, j 6 i, and/or negative literals about predicates belonging to strata Sk, k < i. Thelowest stratum, S0, corresponds to the EDB predicate level. The highest one corresponds to thelevel of the query predicate. Q1 vu Q2 if and only if the derived predicates used to define Q1 con-tain the ones used to define Q2 for every stratum Si. For this reason, the algorithm requires thequeries to have the same predicates at each stratum.

The containment of the derived predicates belonging to a certain stratum is checked accordingto the approach of the canonical databases already explained in Section 7.1.1 [50]. In this way,suppose that pi is a (derived) predicate in Q1 and Q2 belonging to the same stratum Si in both que-ries. Let P i

1 and P i2 be the subset of deductive rules that define pi in Q1 and Q2, respectively. To

check whether P i1 contains P i

2 at stratum Si, Pi1vi P i

2, it is applied the method described in [50].Accordingly, P i

1 and P i2 are handled like conjunctive queries and the canonical databases that

are built for P i1 contain facts about derived predicates belonging to the same or a lower stratum.

The method proposed in [47] also checks uniform QC for queries with safe stratified negation.This method is more efficient that the one of [36], but it is incomplete since there are cases where itshows that Q1 6vu Q2 when Q1 vu Q2 actually holds.

The main difference between the CQC method and the methods of [36,47] is that they check adifferent property. The methods of [36,47] do not check containment but uniform containment. Aspointed out in [45], uniform QC provides a sufficient but not necessary condition for QC. If theyprove, for a given pair of queries Q1 and Q2, that Q1 vu Q2, then Q1 v Q2. In contrast, if the resultis that Q1 6vu Q2, then nothing can be said about whether or not Q1 v Q2 holds.

Next, Example 7.3 will show that checking uniform containment instead of checking ‘‘true’’containment does not help much in even simple cases of queries with negation.

Example 7.3. Review again the examples used in Sections 3.1 and 3.2. In Section 3.1, Q1 6v Q2

was proved because a CQC-derivation constructed an EDB where such a relationship was true.A uniform-containment based method, either [36] or [47], endeavors to demonstrate that

Q1 vu Q2 holds in order to prove that Q1 v Q2 is true. However, the fact is that Q1 vu Q2 does nothold in this example. For example, consider I = {emp(ann),boss(ann)}, according to the definition


of uniform containment that allows I to contain also ground facts about derived predicates.Computing the answers for each query on I

(Q1 [ DR)(I) = {emp(ann), boss(ann), sub(ann)}, is obtained by applying

DR = {boss(X) worksFor(Z,X)chief(X) worksFor(Y,X) ^ boss(Y)} and

Q1 : sub(X) emp(X) ^ �chief(X)

so the answer to Q1 on I is {sub(ann)}. Note that the single rule from Q1 produces the fact sub-(ann) because chief(ann) does not appear in I.

(Q2 [ DR)(I) = {emp(ann), boss(ann)}, from applying DR andQ2 : sub(X) emp(X) ^ �boss(X)

so the answer to Q2 on I is B. Note that here the fact boss(ann) in I does not allow the query rulefrom Q2 to produce sub(ann).

Therefore, in this example, any uniform containment based method fails to prove thatQ1 vu Q2 and, thus, it would not be able to show that Q1 6v Q2 actually holds. Q2 6vu Q1 can bealso proved by using I 0 = {emp(mary),chief(mary)}. However, Q2 v Q1 holds as it has beenexplained in Section 3.1. Therefore, the methods that check uniform containment are not a goodalternative to the CQC method.

7.2. Methods for checking query containment under constraints

The CQC method handles integrity constraints, defined in the denial form of conditions, in anatural way by incorporating them in the initial set of conditions to enforce in every con-straint-aware CQC derivation. Section 7.1 above points out that the queries considered by theCQC method belong to a large superset of the cases covered by the previous related work in querycontainment without constraints. When addressing query containment under constraints, the con-tribution of the CQC method is fourfold:

1. The query classes considered by the CQC method are larger.2. The constraint classes considered by the CQC method are larger too, since they admit the same

syntactic features that are found in query and IDB definitions. In this way, integrity constraintsmay contain negated atoms about IDB and EDB predicates, equalities, inequalities and ordercomparisons, allowing them to capture and extend the class of integrity constraints covered inprevious methods.

3. The CQC method checks ‘‘true’’ containment, instead of uniform containment.4. The CQC method checks query containment in the presence or the absence of integrity con-

straints in a uniform way, without needing to add any extra processing to check query contain-ment under constraints. Indeed, the CQC method is the same in both cases and the differencebetween either considering or not the integrity constraints is expressed in terms of the contentof the initial set of conditions to enforce.


In general, the methods that have been addressed to check query containment under constraintscan be classified into three main groups:

1. Methods that check query containment for conjunctive queries under dependencies expressedas:• functional dependencies [2],• functional and inclusion dependencies [30],• disjunctive datalog constraints [19],• implication and referential constraints [56],• implication and disjunctive referential constraints [52],• disjunctive constrained tuple-generating dependencies [54].

2. Methods that check uniform query containment for plain datalog queries under dependenciesexpressed as• (full and embedded) tuple generating dependencies [45],• (full and embedded) tuple generating dependencies and equality generating dependencies[47].

3. Methods that check query containment for conjunctive queries under integrity constrainsexpressed as:• object taxonomies [10,38],• assertions in Description Logics [4,9,29,34].

Among the methods belonging to the first group above, the one defined in [54] is the methodthat handles the broader class of queries and integrity constraints. Section 7.2.1 reviews thismethod in more detail and it shows that the setting considered by the CQC method subsumesclearly the queries and integrity constraints addressed in [54].

Regarding the methods belonging to the second group above [45,47], it is true that they addressqueries with recursively-defined IDB. However, it should be pointed out that (1) such queries donot allow neither negation nor built-in comparisons; (2) the class of integrity constraints thatthose methods considered is largely subsumed by the one considered by the CQC method (see Sec-tion 7.2.1 below); and (3), and most important, those methods do not check containment but uni-form containment. Section 7.1.3 above has already highlighted the drawbacks arisen from takingthis latter approach.

The issue of Description Logics and its relationship with the problem of query containment isaddressed in Section 7.3 below. In particular, Section 7.3.2 reviews the methods of [9,29], whichcheck query containment for plain conjunctive queries under integrity constraints expressed interms of a Description Logic language called DLR. The setting addressed in [9,29] clearly sub-sumes the ones considered by the other methods of that group [4,10,34,38].

7.2.1. The method of Wang, Topor and Maher [54]

The so-called disjunctive constrained tuple-generating dependencies (dctgd�s) considered by [54]define a class of integrity constraints that generalizes other classes of dependencies addressed byprevious research. In particular, as pointed out by [1], almost all dependencies (functional, multi-valued, join and inclusion) studied in the framekork of relational model can be expressed using


dctgd�s. However, Section 7.2.1.1 below shows that dctgd�s can be translated into conditions thatthe CQC method can process, but the converse does not hold.

To check conjunctive query containment under dctgd�s, [54] does not indeed propose a methodbut two. Both are reviewed in Section 7.2.1.2.

7.2.1.1. Disjunctive constrained tuple-generating dependencies. Dctgd�s are first-order-logic sen-tences that have the general form of

8x1 � � � 8xn

"uðx1; . . . ; xnÞ !

_hi¼1½9y1 . . .9ymi

wiðx1; . . . ; xki ; y1; . . . ; xmiÞ�#

ð7:1Þ

where ki 6 n and u and wi are conjunctions of atoms, which are either EDB atoms or built-inatoms. These latter may express equality, inequality or dense order comparisons. u is also calledthe antecedent part and each wi is a consequent.

In particular, the results given in [54] are applicable only to regular dctgd�s. A dctgd is regular if(1) u contains at least one EDB atom and (2) the variables occurring in the built-in atoms of u, ifany, are a subset of the variables in the EDB atoms of u.

Ref. [39] defines a procedure that allows arbitrary first-order-logic sentences to be expressed asconditions in denial form. According to that procedure, dctgd�s can be reformulated in terms ofconditions having the general form of:

uðX 1; . . . ;XnÞ ^ :aux1ðX 1; . . . ;X k1Þ ^ � � � ^ :auxhðX 1; . . . ;X khÞ

where each auxi, i 6 h, is a new (auxiliary) IDB predicate defined by a deductive rule:

auxiðX 1; . . . ;X kiÞ wiðX 1; . . . ;X ki ; Y 1; . . . ;XmiÞ

For instance, a dctdg as

8x8y½worksForðx; yÞ ! ½9s9t salðx; sÞ ^ salðy; tÞ ^ s < t� _ ½9w worksForðy;wÞ�

is equivalent to a condition

worksForðX ; Y Þ ^ : aux1ðX ; Y Þ ^ :aux2ðY Þ

where aux1 and aux2 are defined, respectively, by

aux1ðX ; Y Þ salðX ; SÞ ^ salðY ; T Þ ^ S < T

aux2ðX Þ worksForðX ; ZÞ

Note that disallowing negated EDB atoms to occur in the conjunct u of a (7.1)-shaped dctgd isnot a problem, since they can be expressed as single-positive-atom disjunctand in the other side ofthe implication. However, the following first-order-logic sentence, for instance, cannot be ex-pressed as dctgd:

8x8y½worksForðx; yÞ ! 9w worksForðy;wÞ ^ :empðwÞ�since disjunctions are not allowed in u. In contrast, this sentence can be straightforwardly trans-lated into an equivalent condition.


Moreover, if dctgd�s are required to be regular, then this restriction is equivalent to forcing con-ditions to contain at least one positive EDB subgoal. This latter restriction is not demanded whenusing the CQC method to check query containment under constraints.

In general, what make dctgd�s less expressive than the conditions addressed by the CQC methodis that (7.1)-shaped dctgd�s do not allow IDB atoms to occur in neither u nor wi.

7.2.1.2. Two methods to check conjunctive query containment under DCTGD�s. Let Q1 and Q2 betwo conjunctive queries, possibly having built-in subgoals but not negated EDB subgoals, whichdefine the same n-ary query predicate q. Let also IC be a set of regular dctgd�s. [54] proposes twopossible methods to check whether Q1 vIC Q2 holds.

The first method proceeds in two phases. The first phase expands Q1 according to IC, in order toobtain a new query, Q 0 = Expand(Q1, IC), which is semantically equivalent to Q1. Roughly, theexpansion procedure ‘‘enlarges’’ the query body of Q1 with the consequent of a dctgd whose ante-cedent matches the body of the query. Since there may be more than one consequent in a dctgd,the expansion procedure of the given query may result in a union of expanded queries. Moreover,the expansion procedure is recursive: new expansions are performed on the queries obtainedfrom the expansion process. This process ends when new different matches between dctgd anteced-ents and query bodies are no longer identified.

Consider the following query and dctgd, for the sake of an example:

Q1 ¼ fqðX Þ worksForðX ; Y Þg

IC ¼ f8x8y½worksForðx; yÞ ! empðxÞ _ empðyÞ�g

then Q 0 = Expand(Q1, IC) is

fqðX Þ worksForðX ; Y Þ ^ empðX Þ

qðX Þ worksForðX ; Y Þ ^ empðY Þg

Note that the antecedent of the dctgd in IC also matches the body of the two deductive rules inQ 0, but such a mapping is not new: is the same mapping that held between that dctgd antecedentand the former single deductive rule of Q1.

Ref. [54, Theorem 1] states that Q1 vIC Q2 if and only if {Ri} v Q2, for each deductive ruleRi 2 Expand(Q1, IC). Therefore, the second phase of the first method proposed in [54] consistin performing a containment test for each Ri 2 Expand(Q1, IC) to verify whether {Ri} v Q2 actu-ally holds. No new algorithm for checking containment for conjunctive queries with built-in sub-goals is defined in [54] so it is necessary to select and use one from the several methods proposed inthe literature to perform such containment tests [31,36,42,50,55], including, of course, the CQCmethod.

The procedure Expand defined in [54] may derive an infinite set of new deductive rules and,hence, it does not terminate when this happens. The cause is the same that may make the CQCmethod build infinite EDB�s: the presence of axioms of infinity [6]. For this reason, [54] identifiessome restricted subclasses of dctgd�s for which termination always holds:


• Non-recursive regular dctgd’s. A set of regular dctgd�s IC is said to be recursive if there is cycle inthe graph defined when connecting each pair of dctgd�s in IC such that their respective conse-quent and antecedent have some EDB predicates in common.

• Full dependencies. A dctgd�s is said to be a full dependency if (1) it is regular and (2) every var-iable occurring in an EDB subgoal of a consequent appears also in a EDB subgoal of theantecedent.

The two restrictions that define these subclasses suffice to guarantee termination but they arenot necessary in general. Moreover, if procedure Expand does not terminate then it does not fol-low necessarily that there exists no finite EDB proving that Q1 6vIC Q2, as it is shown in Example7.4.

This first method is quite similar to the ones defined in [56,52] to check conjunctive query con-tainment under constraints that are less expressive than dctgd�s. In the case of [56,52], the integrityconstraints handled there must be acyclic, that is, non-recursive, to ensure termination of theexpansion procedure.

The second method proposed in [54] to verify whether Q1 vIC Q2 holds is an alternative to itsfirst one. Indeed, this other method, called Ctest, consists in combining in a single procedure thetwo distinct phases of the first method. In this way, for each new deductive rule Ri obtained in anintermediate expansion round it is checked whether {Ri} v Q2 holds before performing the nextexpansion round and

1. if {Ri} v Q2 actually holds, then Ri will not be included in the next expansion round;2. if {Ri} 6v Q2 and there is no new mapping between the antecedent of a dctgd in IC and the body

of Ri, then the procedure ends just then, reporting Q1 6vICQ2;3. otherwise, that is {Ri} 6v Q2 and such a new mapping exists, Ri is included in the next expansion

round.

Therefore, Ctest may terminate reporting either Q1 6vICQ2 or Q1 vIC Q2 without needing to de-rive every deductive rule belonging to Expand(Q1, IC). In the case of Q1 vIC Q2, this happenswhen no intermediate deductive rule waits for being expanded after discarding successively thoserules Ri such that {Ri} v Q2. Consequently, in some cases, Ctest may terminate even when pro-cedure Expand does not, as Example 7.4 shows.

Example 7.4. This example has been taken form [54]. Let Q1 and Q2 be two queries defining thesame 1-ary query predicate q

Q1 ¼ fqðX Þ pðX Þg

Q2 ¼ fqðX Þ pðX Þ ^ rðX ; Y Þg

where p and r are EDB relations. Consider also the following set of dctgd�s IC:

IC ¼f8x½pðxÞ ^ x > 0! 9yrðx; yÞ�8x8y½rðx; yÞ ^ y > 0! pðyÞ�g


According to these definitions, procedure Expand does not terminate for Q1 and IC:

Expand1ðQ1; ICÞ ¼fqðX Þ pðX Þ ^ X 6 0 ½R1�qðX Þ pðX Þ ^ rðX ; Y Þg

Expand2ðQ1; ICÞ ¼fqðX Þ pðX Þ ^ X 6 0

qðX Þ pðX Þ ^ rðX ; Y Þ ^ Y 6 0

qðX Þ pðX Þ ^ rðX ; Y Þ ^ pðY Þg

Expand3ðQ1; ICÞ ¼

fqðX Þ pðX Þ ^ X 6 0

qðX Þ pðX Þ ^ rðX ; Y Þ ^ Y 6 0

qðX Þ pðX Þ ^ rðX ; Y Þ ^ pðY Þ ^ Y 6 0

qðX Þ pðX Þ ^ rðX ; Y Þ ^ pðY Þ ^ rðY ;ZÞgetc.

However, after performing the first expansion round, Expand1(Q1, IC), procedure Ctest checkswhether {R1} v Q2 holds using one of the available containment checkers proposed in theliterature to deal with conjunctive queries with built-in subgoals. Clearly, {R1} 6v Q2 in this case.Since there is no new match between the antecedents of the two dctgd�s in IC and the body of R1,Ctest terminates reporting Q1 6vIC Q2, without needing to perform the next expansion round,namely Expand2(Q1, IC).

From {R1} 6v Q2, it follows that {p(0)}, for instance, is clearly a finite EDB that satisfies IC andmakes q1(0) true whereas q2(0) does not hold. Fig. 7.4 below shows a CQC derivation thatconstructs such an EDB. Before, IC must be rewritten into the appropriate form

IC0 ¼ f pðX Þ ^ X > 0 ^ :auxðX Þ rðX ; Y Þ ^ Y > 0 ^ :pðY Þ�g

where aux is an new IDB predicate whose definition is

auxðX Þ rðX ; Y Þ

The CQC derivation shown in Fig. 7.4 has K0 = {0} as the initial set of constants to take intoaccount, since 0 occurs in IC 0. The VIP to be considered is either the Discrete Order VIP or theDense Order VIP.

In step 2, there are three possible choices to instantiate the selected EDB atom p(X), namelyX = c < 0, X = 0 and X = d > 0. The second option, X = 0, is selected in the depicted CQC-derivation, leading to the successful EDB {p(0)}. The first option, X = c < 0, would also drive aquick successful CQC-derivation with {p(c)} as the constructed EDB. Instead, the third option,X = d > 0, only may lead to a finitely failed CQC derivation, if that derivation is fair. Thereason is that the provisional EDB constructed by that failed derivation would containnecessarily p(d) and r(d,e) to ‘‘repair’’ condition C1, making both q1(d) and q2(d) true whicheverthe value of e is.


Ref. [54, Theorem 2] claims finite-success and failure soundness for Ctest. That is, if Ctest ter-minates reporting Q1 6vIC Q2(Q1 vIC Q2) then Q1 6vIC Q2(Q1 vIC Q2) actually holds. Unfortu-nately, no completeness results are given for Ctest.

7.3. Description logics and query containment

Description logics (DLs) are formalisms for representing and reasoning about classes of objects,concepts, and their relationships, roles. DLs are descendants of the KL-ONE language and theycan be seen as a new generation of knowledge representation languages with a formal model-theo-retic semantics. In the last years, DLs have gone beyond their traditional scope in the ArtificialIntelligence world to provide new alternatives and solutions to many topics of the Database lit-erature [32].

A DL is formed by three basic components [32]

• A description language, which specifies how to create complex concept and role expressions bymeans of several language constructors. The basic building blocks are atomic concepts, whichcan be thought of as unary predicates, and atomic roles, which can be thought of as binarypredicates in most cases.

• A knowledge specification mechanism, which specifies how to construct a DL knowledge base.Normally, such a specification consists of a set of constraints defined in terms of inclusion asser-tions over DL concepts.

Fig. 7.4. Using the CQC method with the example taken from [54].


• A set of reasoning procedures, mainly, concept subsumption and knowledge base satisfiabilitychecking. Indeed, the notion of concept subsumption recalls that of query containment: giventwo concept expressions, C1 and C2, C1 is subsumed by C2 iff all the objects described by C1 fallalso into the description of C2.

Example 7.5. Atomic concepts are Female, Employee and Department, for instance. An atomic,binary, role is works-in.

Complex concepts are Employee u �Female, which describes the class of individuals that are‘‘employees but not female’’, and "works-in.Department, which describes the class of individualsthat ‘‘only work in departments’’.

An inclusion assertion is Employee v $works-in.Department, which states that every ‘‘employeeworks in, at least, one department’’.

In this way, Employee u �Female is subsumed by $works-in.Department if the assertionEmployee v $works-in.Department is considered.

Over the years, a considerable variety of DLs have been proposed, studied and implemented.They differ, basically, in the set of language constructors that they provide, in an attempt toachieve a satisfactory agreement between language expressiveness and decididability and com-plexity of reasoning in different application domains. The DLs proposed so far can be reducedto subclasses of first-order logic formulas with equality [5]. Hence, they are expressible ultimatelyin terms of the inputs that the CQC method requires. The contribution of DL research to sub-sumption/containment checking is not worthless:

1. They have provided efficient algorithms to check subsumption/containment that take advan-tage of their richest set of language constructors.

2. They have identified new subclasses of concepts/queries and knowledge bases/databases forwhich subsumption/containment is decidable.

It is in applications where DLs are used as query languages for existing data or knowledge baseswhere such formalisms show their weakness, since even the most expressive DL cannot fully ex-press conjunctive queries [5]. To by-pass this, several hybrid systems combining DL schemes anddatalog-like rules have been proposed [9,18,29,34] to take the maximum advantage of their respec-tive expressive power.

In order to compare the contribution of this dissertation with the huge amount of researchdone in concept subsumption, two works, which are believed to be the most representativeand significant, have been selected. The first one, published in [17,28,41], presents a calculusto check subsumption between concepts described in a DL called ALCNR. In this case, aclear correspondence between this concept subsumption checking calculus and the CQC methodwill be exposed. Finally, the works of [9,29] will be reviewed, which adrress an hybrid systemconsisting of a knowledge base schema and the queries that can be made on it. This schemais a set of integrity constrains defined as inclusion assertions of DLRreg concepts and relation-ships. Queries as expressed as plain conjunctive queries over the concepts and relationships de-fined in the schema. Moreover, [9,29] provide two different methods for checking containment


of these queries with respect to the schema, that is, query containment under DLRreg

constraints.

7.3.1. Subsumption of ALCNR conceptsA family of DLs, called AL-languages, with their corresponding reasoning mechanism is

introduced in [17,28,41]. ALCNR is the most expressive language of that family, whichentails all the rest. Moreover, it is also one of the most expressive DLs with a decidable rea-soning mechanism. This reasoning mechanism is intended to check whether a concreteALCNR concept is satisfiable, that is, whether there exists some domain with, at least,one individual belonging to the class that this concept describes. Since the ALCNR lan-guage allows for general complements, that is, negation, concept satisfiability and subsumptioncan be reduced to each other in linear time. In this way, C is subsumed by D iff C u �D is notsatisfiable.

This section examines this work from two complementary points of view. First, the expressive-ness of ALCNR concepts is discussed with respect to the class of deductive queries and viewsthat the CQC method covers. And second, a narrow relationship between the ALCNR-conceptsubsumption method and the CQC method will be shown.

Ref. [17,28,41] prove that their calculus for checking subsumption of ALCNR concepts isdecidable. From the correspondence between this calculus and the CQC method, it follows thatthe CQC method terminates when it is used to check QC for cases that are also expressible asALCNR concept subsumption problems.

7.3.1.1. TheALCNR language.What makes a DL language more or less expressive than anyoneelse is the set of language constructors that it provides. The set of constructors for ALCNR islisted in the left half of Table 7.2. Rather than reproducing their formal semantics, also found in[17,28,41], their corresponding translations in terms of first-order-logic formulas are provided,according to the encoding rules defined in [5,17,41]. These translations are listed in the right halfof Table 7.2.

As it is suggested also in Table 7.2, the concept definition mechanism can be thought of a kindof query, or IDB, definition mechanism. Moreover, this concept definition mechanism is restrictedin [17,28,41] by constraining it with two additional requirements:

• There is no more than one definition for a concept name.• The definitions are acyclic, in the sense that concepts are defined in terms neither of themselves

nor of other concepts that refer to them via a chain of definitions.

From a carefully examination of Table 7.2 and the just mentioned constraints, it follows thatthe class of deductive queries and deductive rules that is expressible as ALCNR has the follow-ing limitations:

• Query and IDB predicates may only be unary.• EDB predicates may be unary or binary.• Negation may only be applied on unary predicates, which can be query, IDB or EDB.• Recursive predicate definitions are not allowed.


• In every rule definition, for each unary atom c(T) in the body either T is the variable X appear-ing in the rule head or T is connected to X via a chain of binary atoms r1(X,S1) ^r2(S1,S2) ^ � � � ^ rn(Sn�1,T).

• In every rule definition, for each binary atom r(T1, T2) in the body either T1 is the variable Xappearing in the rule head or T1 is connected to X via a chain of binary atoms r1(X,S1) ^r2(S1,S2) ^ � � � ^ rn(Sn�1,T1).

• There are no order predicates (<, 6, P, >) in the rule bodies.

Example 7.6. If the occurrences of worksFor(T1,T2) are substitued by hasW (T2,T1) in the initialexample used in Sections 3.1 and 3.2, then the schema and the queries defined in that example canbe translated in terms of an ALCNR terminology by means of the following conceptdefinitions:

Boss :¼ 9HasW:>Chief :¼ 9HasW:Boss

Sub1 :¼ Emp u :ChiefSub2 :¼ Emp u :Boss

However, the schema and the conjunctive queries defined in Example 7.1 do not haveALCNR counterparts, since all the predicates, even the query ones, are binary.

Table 7.2

ALCNR language constructors and their corresponding FOL formulas

Concept C FOL formula /C(X)

A (atomic concept) a(X)

T (universal concept) dom(X) (ffitrue)? (empty concept) �dom(X) (ffifalse)�C (negation) �/C(X)

C u D (concept intersection) /C(X) ^ /D(X)

C t D (concept union) /C(X) _ /D(X)

$R.C (existential role quantification) $Y./R(X,Y) ^ /C(Y)

"R.C � �($R.�C) "Y./R(X,Y)! /C(Y)

(universal role quantification)

(PnR) (at-least role restriction) $Y1 . . . Yn/R(X,Y1) ^ � � � ^ /R(X,Yn) ^ð^ni6¼jY i 6¼ Y jÞ

(6nR) (at-most role restriction) "Y1 � � � Yn /R(X,Y1)^ � � � ^ /R(X,Yn)!^ni6¼jY i ¼ Y j½� :/ðPnþ1RÞðX Þ�

Role R FOL formula /R(X,Y)

P (atomic role) p(X,Y)

R u Q (role intersection) /R(X,Y) ^ /Q(X,Y)

Concept definition Query (IDB) definition

A :¼ C a(X) /C(X)


7.3.1.2. The ALCNR-concept subsumption checking calculus. Indeed, the calculus proposed in[17,28,41] is intended to check the satisfiability of concept expressions. As it has already beenpointed out in the introduction of this section, since the ALCNR language allows for generalcomplements, concept satisfiability and subsumption can be reduced to each other in linear time.In this way, C is subsumed by D iff C u �D is not satisfiable. Conversely, C is satisfiable iff C isnot subsumed by ?.

That satisfiability checking calculus consists of inference rules that decompose complex con-cepts according to the top-levelALCNR construct. As the authors admit, such rules can be sim-ulated by applications of several rules of the Semantic Tableau calculus [21,46] using a specialcontrol strategy to guarantee termination. It is important to note that the only input that thiscalculus takes into account are the concept expressions themselves, and that anyone elsesource of knowledge, like inclusion assertions over ALCNR concepts, it is not considered.Therefore, thismethod cannot perform anything similar to checking of query containment underConstraints.

The data structures underlying the calculus are constraints that state

1. either that an individual is a a member of a concept, x :C;2. or that two individuals are related through a role, xPy;3. or that two individuals are distinct, x5y.

A concept C is satisfiable iff the constraint system {x :C} is satisfiable. If the concept satisfi-ability checking calculus procedure starts with {x :C}, every derivation terminates after finitelymany steps in each of which some inference rule is applied. If all terminal sets of constraints thatare derivable contain an ‘‘obvious contradiction’’, clash, then {x :C} is unsatisfiable. Otherwise,{x :C} is satisfiable, since a clash-free terminal set describes a model of C.

To keep the number of inference rules small, concepts are assumed to be in a negation normalform. A concept is in negation normal form if it contains only complements of the form �A whereA is a primitive concept.

Example 7.6. (cont.).)Continuing with the ALCNR derived in Example 7.6, to check whetherSub1 is subsumed by Sub2, it should be checked whether the constraint system {x :Sub1 u �Sub2}is not satisfiable. However, before applying the calculus, such a constraint system must benormalized. The normalization procedure leads to the constraint system {x :Emp u "HasW."HasW.? u (�Emp t $HasW.>)}. Notice that non-base concepts are substituted by their definingconcept expressions and that negation has been shifted to the base concepts.

The completion calculus for {x :Emp u "HasW."HasW.? u (�Emp t HasW.$)} is shown in Fig.7.5.

Notice that one rule is applied at each derivation step in an attempt to construct a model bygenerating new individuals as required by the constraints. The rule conditions guarantee that arule can be applied to a constraint system only if its application changes the system. In steps 1, 2, 3and 3 0 the !u-rule and the !t-rule decompose constraints of the form x :C1 u C2 andx :C1 t C2, respectively. In step 4, the constraint x :$HasW.> states that x has an HasW-successor.To satisfy this constraint, and since such a HasW-successor does not yet exist, the !$-rulegenerates a new HasW-successor of x, y.


In step 5, the!"-rule is applied to the constraint x :"HasW."HasW.? obtained in step 1 sincethere is a HasW-successor of x,y, in S4. The result of this application is a constraint system, S5,with a new constraint for the HasW-successor of x, y :"HasW.?. Since y has no HasW-successor inS5, the !"-rule, and indeed all the rest, cannot be applied.

Notice that steps 3 and 3 0 illustrates the nondeterministic application of the !t-rule. Theconstraint system S3 contains a clash, because it includes both x :Emp and x :�Emp. In contrast,S5 is a complete constraint system, that is, no more derivation rule can be applied on it. Moreover,since S5 does not contain any clash, S0, and thus {x :Sub1 u �Sub2}, are satisfiable. Therefore,Sub1 is not subsumed by Sub2.

The next step is to show that there is a clear correspondence between the concept satisfiabilitychecking calculus, or completion calculus, and the CQC method.

To compare the completion calculus with the CQC method, consider, on the one hand, thecompletion calculus derivation that has proved that Sub1 was not subsumed by Sub2. On theother hand, consider a CQC-derivation that proves that sub1 is not contained in sub2. The CQC-derivation shown in Section 3.1 proved that result for the original schema and queries defined inSections 3.1 and 3.2. However, for the sake of clearness, consider a new CQC derivation thatmakes use of a new set of deductive rules, which have been obtained after rewriting thenormalized constraint system S0

G0 ¼ empðX Þ ^ :aux1ðX Þ ^ aux2ðX Þ and

R ¼ faux1ðX Þ hasW ðX ; Y Þ ^ hasW ðY ; ZÞaux2ðX Þ empðX Þaux2ðX Þ hasW ðX ; Y Þg

Notice that this translation circumvent unnecessary literals as dom(Z) for z :>. Moreover, rulesas aux2(X) �emp(X) are not safe according to the definition of the deductive frameworkentrenched in Chapter 2. However, in this particular case this fact does not mislead CQC-deriva-

Fig. 7.5. Completion calculus for {x :Emp u "HASW.? u (�Emp t $HasW.>)}.


tions. Figs. 7.6 and 7.7 show respectively a failed CQC-derivation and a successful one for the ini-tial goal to attain G0 = emp(X) ^ �aux1 (X) ^ aux2(X). Such derivations correspond to the twobranches of the tree constructed by the completion calculus in Fig. 7.5.

Since conjunction is the base logical connective in the deductive rules, consider that the !u-rule of the completion calculus is applied implicitly by the CQC method when selecting a literal ofthe current goal. It may happen that the!u-rule derives a constraint of the form x :A, where A isa primitive concept. For example, x :Emp. This particular case corresponds to the application ofthe A2 rule of the CQC method, where an EDB fact is added to the EDB, as it happens in step 1 inFig. 7.6.

The application of the!t-rule in step 3 in Fig. 7.5 corresponds clearly to the application of therule A1 in steps 3 and 3 0 in Figs. 7.6 and 7.7, respectively. Note that the auxiliary derived predicateaux2(X) translates x :�Emp t $HasW.> by encoding each disjunctant as the body of a differentdefining rule.

Fig. 7.6. The failed CQC-counterpart of the completation-calculus example.

Fig. 7.7. The successful CQC-counterpart of the completation-calculus example.


The !$-rule, as is applied in step 4 of the completion calculus, corresponds directly to theapplication of the A2 rule in step 3 0 of the CQC derivation from Fig. 7.7, where an EDB factabout the binary EDB predicate hasW is included in T.

The application of the !"-rule in step 5 in Fig. 7.5 corresponds to the application of the B2-rule in steps 6 in Fig. 7.7.

The results are quite similar. On the one hand, the completion calculus obtains a complete clash-free constraint system S5 that contains three terminal constraints x :Emp, xHasWy andy :"HasW.?. On the other hand, a CQC derivation ends successfully with and EDB being{emp(0), hasW(0,1)} and a final set of conditions to maintain including the condition hasW(1,Z).

Example 7.6 has also illustrated a concrete case where the Simple VIP may be used instead ofthe Negation VIP without loss of completeness, even though there are negated literals in the rulebodies. In this concrete case, the justification for using the Simple VIP is that it is not necessary toexplore all the alternatives that the Negation VIP expands when the noncontaiment goals anddeductive rules come from translations of ALCNR constraints systems that do no contain sub-expressions of the form (6nR). That is, if there are no constraints restricting the number of R-suc-cessors of a variable in the original constraint system, consider that each R-successor is different,as the Simple VIP does.

When a constraint system has (6nR)-subexpressions, the completion calculus rule as well as theCQC method should consider as a possible alternative that two initially different R-successors of avariable might need to be equaled. In the case of the CQC method, this can be handled by using aspecial VIP that assigns to each new R-successor y of t either a new constant or the constant as-signed formerly to another R-successor z of t.

As a summary, Table 7.3 relates each completion calculus rule to its corresponding CQC-steps.

7.3.2. Conjunctive query containment under DLRreg constraintsRef. [9] proposed a new method for checking query containment under a set IC of integrity con-

straints that is defined in terms of a DL language called DLRreg. The queries are plain conjunc-tive queries whose atoms are complex DLRreg expressions. Such a combination of conjunctive

Table 7.3

Completion-calculus expansion rules vs. CQC-expansion rules

Completion calculus CQC method

!u-rule for x :C1 u C2 Implicit in the selection of literals. A2 for ci(X),

if Ci is a primitive concept.

!t-rule for x :C1 t C2 A1 for auxc1tc2ðX Þ, where auxc1tc2ðX Þ auxc1ðX Þ auxc1tc2ðX Þ auxc2ðX Þ!$-rule for $R.C where R :¼ umiP1P i

and Pi Is a primitive role

A2m for p1(c,Y)^ � � � ^ pm(c,Y)

!"-rule for "R.C where R :¼ umiP1P i,

and Pi is a primitive role

B2m for a condition p1(c,Y) ^ � � � ^ pm(c,Y) ^ /�c(Y)

!P-rule for (PnR) where R :¼ umiP1P i,


A2n*m for p1(c,Y1)^ � � � ^ pm(c,Y1)^ � � � ^ p1(c,Yn)

^ � � � ^ pm(c,Yn) and A4n for (^i5j Yi 5 Yj)

!6-rule for (6nR) where R :¼ umiP1P 1,


B2m and (B5 or B6)n for the condition p1ðc; Y 1Þ ^ � � � ^ pmðc; Y 1Þ^ � � � ^ p1ðc; Y nþ1Þ ^ � � � ^ pmðc; Y nþ1Þ ^ ð

Vnþ1i¼j Y i 6¼ Y jÞ


queries and DL makes this system very expressive and able to capture a great variety of data mod-els, such as the relational, the entity-relationship and the object-oriented ones.

This section, first and mainly, relates the expressiveness of the hybrid system with respect tothe class of deductive queries and views covered by the CQC method. The method to check con-junctive query containment under DLRreg constraints provided in [9] is also reviewed as well asthe approach of [29]. This latter propose an alternative method whose authors claim to be morepractical than the one of [9]. However, the method of [29] does not consider DLRreg but a restric-tive version.

7.3.2.1. The hybrid system. In [9], it has been proposed an hybrid system consisting of

• ASchema, defined in terms of inclusion assertions, constraints, of the formC1 v C2 andR1 v R2,where C1 and C2 are, maybe complex, DLRreg concepts and R1 and R2 are, maybe complex,DLRreg relationships of the same arity. Such knowledge base specification mechanism allowsto define views on concepts and relationships :A v C1 and C1 v A define the atomic concept Aas a view concept; andP v R1 andR1 v P define the atomic relationshipP as a view relationship.

• Plain conjunctive queries written in the form qðX Þ Body1ðX ; Y 1;�c1Þ _ � � � _ BodymðX ; Y m;�cmÞ,where each BodyiðX ; Y i;�ciÞ is a conjunction of atoms, X ; Y i are variables and �ci are constantsappearing in the conjunct. Each atom has one of the forms RðT Þ, C(T) or E(T,T 0), where T , Tand T 0 are terms in X ; Y i or �ci; and R, C, and E are, respectively, DLRreg relations, conceptsand regular expressions, roles, over the schema.

DLRreg is a DL with even more expressive power than ALCNR. Its main feature is thatthere is a clear distinction between 2 6 n-ary roles, called relationships, and regular expressions,which are founded on binary projections defined over relationships. The set of constructors forDLRreg is listed in the left half of Table 6.4. Notice that concept expressions as C1 t C2, "E.Cand R1 t R2, which do not appear in Table 6.4, are equivalent to �(�C1 u �C2),�($E�C) and�(�R1 u �R2), respectively.

As in the case ALCNR language, translations of DLRreg expressions in terms of First-Order-Logic formulas are provided in Table 7.4.

Moreover, the concept/relation assertion mechanism does not have the acyclicity restriction im-posed in [17,28,41] for ALCNR concept definitions. That is, the same concept/relation subex-pression may appear in both sides of one assertion.

Example 7.7. This example illustrates the high degree of expressiveness that can be reached bycombining conjunctive queries with DLRreg schemas. Consider the query

subðX ; Y Þ Works forðX ; Y ; ZÞ ^ ðð:Works forÞj$2;$1Þ�ðX ; Y Þ

over the schema

S : Works for v ð$1=3 : EmpÞ u ð$2=3 : EmpÞ u ð$3=3 : DeptÞBy translating the query and the schema to a formalism more suitable for the CQC method, con-sider the query

subðX ; Y Þ worksForðX ; Y ;ZÞ ^ aux1ðX ; Y Þ


for a schema S 0 = (DR, IC), where dom, worksFor and emp are EDB relations:

DR : faux1ðX ; Y Þ domðX Þ ^ Y ¼ X

aux1ðX ; Y Þ aux2ðZ;X Þ ^ aux1ðZ; Y Þaux2ðX ; Y Þ dom3ðX ; Y ; ZÞ ^ :worksForðX ; Y ; ZÞ

IC : f worksForðX ; Y ;ZÞ ^ :empðX Þ ½Ic1� worksForðX ; Y ;ZÞ ^ :empðY Þ ½Ic2� worksForðX ; Y ;ZÞ ^ :deptðZÞ ½Ic3� worksForðX ; Y ;ZÞ ^ :dom3ðX ; Y ;ZÞ ½Ic4� dom3ðX ; Y ;ZÞ ^ :domðX Þ ½Ic5� dom3ðX ; Y ;ZÞ ^ :domðY Þ ½Ic6� dom3ðX ; Y ;ZÞ ^ :domðZÞ ½Ic7� empðX Þ ^ :domðX Þ ½Ic9� deptðY Þ ^ :domðY Þ ½Ic9�g

Table 7.4

DLRreg language constructs and their corresponding FOL formulas

Concept C FOL formula /C(X)

A (atomic concept) A(X)

T1 (universal concept) dom(X) (ffitrue)�C (negation) �/C(X)

C1 u C2 (concept intersection) /C1(X) ^ /C2(X)

$E.C (existential reg. exp. quantification) $y/E(X,Y) ^ /C(Y)

$[$i]R (existential relation quantification) 9Y/RðY Þ ^ Y i ¼ X(6k[$i]R) (number restriction) 8Y 1; . . . ; Y k/RðY 1Þ ^ Y 1i ¼ X ^ � � � ^ /RðY kÞ ^ Y ki ¼ X ! _kh6¼jY h ¼ Y j

Regular expression E FOL formula /E(X,Y)

e (identity) X = Y

Rj$i,$j (relation projection) 9�Z/Rð�ZÞ ^ Zi ¼ X ^ Zj ¼ YE1E2 (role composition) $Z/E1

(X,Z) ^ /E2(Z,Y)

E1 t E1 (role union) /E1(X,Y) _ /E2

(X,Y)

E* (transitive clousure) _1n¼0ð9Z0; . . . ; Zn . . ./EðZ0; Z1Þ ^ � � � ^ /EðZn�1; ZnÞ ^ Z0 ¼ X ^ Zn ¼ Y Þ

Relation R FOL formula /Rð�X ÞTn (Tn�T1 · � � � · T1) domnð�X ÞP (atomic relation) Pð�X Þ($ i/n :C) (selection) domnð�X Þ ^ /CðX iÞ�R (negation) domnð�X Þ ^ :/Rð�X ÞR1 u R2 (relation intersection) /R1

ð�X Þ ^ /R2ð�X Þ

Concept/relation assertion Integrity constraint definition

C1 v C2 8X/C1ðX Þ ! /C2

ð�X ÞR1 v R2 8�X/R1

ð�X Þ ! /R2ð�X Þ


aux1 is a recursive binary predicate that translates ((�Works_for)j$2,$1)*, and it uses anotherderived predicate, aux2, to translate (�Works_for)j$2,$1. Notice that �Works_for must be trans-lated as dom3(x,y,z)^�worksFor(X,Y,Z) according to the semantics of negation in DLRreg.Integrity constraints Ic1–Ic3 translate the schema assertion Works_for v ($1/3:Emp) u ($2/3:Emp) u ($3/3:Dept). The rest of integrity constraints, Ic4–Ic9, make explicit the underlyingsemantics of DLRreg relations and concepts. It is important to note that dom3 does not denotethe 3-Cartesian product of the domain, dom · dom · dom, but only a subset of it, that coversall relations of arity 3. Therefore an integrity constraint of the form dom(X) ^dom(Y) ^ dom(Z)^�dom3(X,Y,Z) is not included in IC.

The expressive power of the hybrid system presented in [9] is quite high. The presence of tran-sitive closures of regular expressions, like (�Works_for)j$2,$1)* in Example 7.7 above, in queriesand/or assertions requires the introduction of recursively-defined IDB predicates in order to trans-late appropriately those expressions into the logical framework that the CQC method addresses.

However, the hybrid system has two important restrictions when it is compared against theclass of queries and integrity constraints for which the CQC method is proved sound andcomplete:

1. It does not admit order predicates in neither queries nor schema.2. It cannot express even trivial cases of negation on derived predicates. This point will be illus-

trated in the following example.

Example 7.8. Consider the query

Q1 ¼ fqðX ; Y Þ aðX ; Y Þ ^ :pðY ;X Þg

and the schema S = (DR,B), where a is an EDB relation and

IDB ¼ fpðU ; V Þ aðU ;W Þ ^ aðW ; V Þg

The hybrid system defined in [9] cannot capture properly such a query + schema. The problemhere is that the view definition of [9] by usingDLRreg assertions in the schema is unable to expressviews as the one defined by the IDB predicate p. Since it is not possible define negation on regularexpressions, it neither can be defined a query such as q(X,Y) A(X,Y) ^ �(Aj$1,$2Aj$1,$2)(Y,X)to express the intended semantics of Q1 in the hybrid system.

In [29], a restricted version of the hybrid framework defined in [9] is considered. Such a restric-tion consists in disallowing regular expressions to occur in the queries or in the integrity con-straints. In this way, DLRreg becomes DLR.

Therefore, the queries and integrity constraints addressed in [29] can be translated into queriesand integrity constraints belonging to the classes for which the CQC method is proven sound andcomplete, since no recursive deductive rule need to be introduced.

7.3.2.2. Methods for checking query containment underDLRreg constraints. The aim of this sectionis not to compare the methods that check query containment underDLRreg constraints defined in[9,29] with the CQC method but just to give a rough idea of how they operate.


Given two queries Q1 and Q2 with the same arity defined over a schema S, Q1 vS Q2, orS � Q1 v Q2, iff fq1ð�c1Þjq1ð�c1Þ 2 Q1ðKBÞg � fq2ð�c2Þ j q2ð�c2Þ 2 Q2ðKBÞg for any knowledge baseKB satisfying, being a model of, S. In other words, Q1 vS Q2 iff there is no model of S that sat-isfies the non-containment formula ð_i6m1

Body1ið�a; �b1i ;�c1iÞÞ ^ ð^j6m2:9Z2j :Body2jð�a;Z2j ;�c2jÞÞ,

where �a; �b1i , are Skolem constants without the unique name assumption.In [9], the problem of checking the unsatisfiability of a noncontainment formula is reduced to

the problem of checking the unsatisfiability of a corresponding converse propositional dynamiclogic (CPDL) formula. Such a CPDL formula has the form US ^ ð_i6m1

UBody1iÞ ^

ð^j6m2UBody2j

Þ ^ Uaux: US encodes the schema S. UBody1iencodes Body1ið�a; �b1i ;�c1iÞ. UBody2j

encodes:9Z2j . Body2jða; Z2j ; c2jÞ. Uaux. US encodes constants and variables. Such encodings are differentdepending on if either (_i6m1

Body1ið�a; �b1i ;�c1iÞÞ does not contain regular expressions, E(T,T 0)atoms, or there are no number restrictions, of the form (6k[$i]R), both in S and in the queries.Although there exists a straightforward correspondence between DLRreg and CPDL, the trans-lation from the initial non-containment formula to the set of CPDL formulas requires an impor-tant amount of sophistication to move from a formalism that admits variables, e.g. the queries, toanother one that does not.

Moreover, [29] points out that the technique proposed by [9] does not result in a practical deci-sion procedure for checking Q1 vS Q2 since there is no known implementation of a CPDL-satis-fiability checker.

The approach of [29] consists in reducing the problem of checking conjunctive query contain-ment under DLR constraints to the problem of checking satisfiability of knowledge bases in theSHIQ description logic. SHIQ is quite similar to ALCNR but it is more expressive than thislatter because it supports reasoning with inverse roles, transitive roles and role inclusion asser-tions. As in the case of [9], such a problem ‘‘reduction’’ is not trivial at all and it must be donein three complex steps:

1. Queries Q1 and Q2 are transformed into two canonical DLR ABoxes such that Q1 vS Q2 isreduced to an ABox inclusion problem. An ABox is to a DL schema what is an EDB to a data-base schema.

2. The ABox inclusion problem is then transformed into one of more knowledge-base satisfiabil-ity problems in DLR.

3. Finally, those DLR knowledge bases are transformed into SHIQ knowledge bases, and thentheir satisfiability is checked.

7.4. The extended positive tableau method for constraint satisfiability checking

Semantic Tableau methods [21,46] are well-known automated proof procedures for first-orderlogic formulas, although resolution-based methods [43] are the most used and implemented in thatfield nowadays. Both resolution and tableaux are refutation systems. That is, rather than provingdirectly that a certain formula is valid, they prove that its negation is unsatisfiable. In the case ofsemantic tableaux, they do not only detect unsatisfiability of a formula but also generate modelsfor this formula. However, semantic tableaux are often less efficient than resolution-based meth-ods and they may construct infinite models, even if finite ones exist.


In [3,7,8], a new method, the Extended Positive Tableau method, EP Tableaux for shorthand, ispresented with the aim of overcoming these drawbacks of the ‘‘classical’’ semantic tableaux. Inparticular, an EP Tableau constructs a finite model for a given set of formulas if such a finitemodel exists.

One of the fields where EP Tableaux are applied is the one of constraint satisfiability checkingin databases. In this case, database constraints are expressed in terms of fully quantified, closed,first-order formulas and the models that EP Tableaux construct are prototypical, canonical, data-bases over which such constraints are satisfied.

The CQC method can also be used to check schema satisfiability. In fact, we only need to applythe CQC method to the initial CQC node ([ ] I C B B K0) where K0 is the set of constants occur-ring in the deductive rules and IC. Notice that the initial goal is empty and IC is the full set ofintegrity constraints. It is in this context, schema satisfiability, where the CQC and the EP Tableaumethod are being compared.

This section is organized as follows. Section 7.4.1 introduces the subclass of first-order-logicformulas for which the EP Tableau method can be applied. Section 7.4.2 reviews the main featuresof the EP Tableau method and shows how to use the EP Tableau method to check schema/con-straint satisfiability. Finally, Section 7.4.3 relates the CQC method and the EP Tableau method toeach other.

7.4.1. PRQ formulasGiven a set of first-order-logic formulas, the EP Tableau method tries to construct a finite

model that satisfies it. This set must not contain any first-order-logic formula but formulas of aspecial fragment of first-order-logic, that of positive formulas with restricted quantifications,PRQ formulas for shorthand. The intuition behind the concept of restricted quantification is sim-ilar to the one of range restriction or safeness that is found and often required in databases.

However, any arbitrary set F of first-order-logic formulas can be transformed to another set F 0

containing only PRQ formulas without loss of expressiveness. Such a transformation may requirethe explicit use of the special predicate dom. dom(a) is true whenever a belongs to the domain ofreference.

Another characteristic of PRQ formulas is that the negation operator ‘‘�’’ is not used explicitly.In this way, a (sub)formula �X must be rewritten as X!?. Here ? means ‘‘false’’.

As it has been alleged above, first-order-logic formulas and, thus, PRQ formulas can be used toexpress the integrity constraints defined over a database. In this case, the predicates that occur insuch constraints represent the relations of the database. PRQ constraints contain atoms aboutEDB and/or IDB predicates, although EP Tableaux are unable to differentiate ones from the oth-ers semantically, as it will be showed later on.

Moreover, PRQ constraints do not admit order predicates as <, 6, P and >.

7.4.2. The EP tableau method for finite satisfiability checkingThe EP Tableau method is intended to construct a finite term model that satisfies a given set of

PRQ formulas. Term models are similar to Herbrand models except that their universes are notnecessarily infinite [21]. The domain of a Herbrand model consists of all possible ground terms. Incontrast, the domain of a term model M contains only those ground terms, here constants, whichoccur in the ground atoms satisfied by M, plus one additional constant.


The method proceeds by expanding the initial set of PRQ formulas in successive steps, so thatsimpler formulas are obtained. At each expansion step, an expansion rule is applied to the selectedPRQ formula. These EP Tableau expansion rules are listed in the right column of Table 7.5. Theleft column lists their corresponding ‘‘Classic’’ Semantic Tableau expansion rules.

A Tableau expansion procedure takes the form of a tree. In this way, an expansion rule de-scribes, above the horizontal line, the selected PRQ formula of the current tree node to whichthe rule is applied. Below the line, there are the additional formulas to be added to the child nodes.Vertical bars separate alternatives corresponding to different children.

An EP Tableau for a set S of PRQ formulas is inductively defined as follows:

1. The tree consisting of the single node S is an EP Tableau for S.2. Let T be an EP Tableau for S, L a leaf of T and u a formula in L that is not satisfied. Then the

tree obtained from T by appending one or more children to L according to the expansion ruleapplicable to u is an EP Tableau for S.

The PUHR rule is the only expansion rule that can be applied more than once to the same for-mula along a branch of an EP tableau. The rest of rules other than the PUHR rule are applied toan unsatisfied formula in a branch only once. Such a restriction was not applied in the ClassicSemantic Tableau definition.

Note that the PUHR rule, Positive Unit Hyper-Resolution rule, handles universally quantifiedand implicative formulas at the same time. If N is the set of PRQ formulas in a given EP Tableaunode, the application of the PUHR rule to a PRQ formula 8�xðRð�xÞ ! F Þ of N is comparable to aresolution proof procedure for f Rð�xÞg [ fF g with N as input set.

A branch of an EP tableau is closed if it contains ?. Otherwise, is open. An EP tableau is open ifat least one of its branches is open; otherwise, it is closed. An EP tableaux is fair in each one of its

Table 7.5

Classic Semantic Tableaux vs. EP Tableaux

‘‘Classic’’ Semantic Tableaux EP Tableaux

::EE

:>?

:?> Not applicable

E1^E2

E1

E2

E1 _ E2

E1 j E2

E1 ! E2

:E1 j E2(PUHR rule)

8�xðRð�xÞ ! F ÞF ½�c=�x�

where R[c/x] is satisfied by the set of ground

atoms occurring in that node

8xF ðxÞF ½ci=x�

for any constant ci occurring in that node

9xF ðxÞF ½cnew=x�

where cnew had not occurred in that node yet

9xF ðxÞF ½c1=x� j � � � j F ½ck=x� j F ½cnew=x�

where ci, i = 1, . . . ,k, occurs in the expanded

node and cnew had not occurred yet


open branches every possible application of an expansion rule to a formula in the branch takesplace after finitely many expansion steps.

The main property of EP Tableaux is that they are complete for finite satisfiability. That is, ifthere exist a finite term model for S, then every fair EP tableau for S has a finite open branch thatcontains the finite term model, up to a renaming of constants.

7.4.3. The EP Tableau method vs. the CQC method

What makes the EP Tableau method different from other Tableau approaches is that it is amethod conceived mainly to construct finite models for formulas. Such a property is very impor-tant in many applications, such as constraint satisfiability checking in databases. And this prop-erty relies on a specific characteristic of the EP method: it reuses constants introduced at formersteps when it expands existentially quantified formulas.

In other words, the EP Tableau method uses the Negation VIP. Therefore, in those situationswhere the Negation VIP is applicable, both methods, CQC and EP Tableau, are supposed to ob-tain similar results. Note that the CQC method is complete for finite satisfiability, as it can be eas-ily proved from the results stated in Section 5.

With respect to the language expressiveness, order predicates are not included in the languagein which valid PRQ formulas can be written. Moreover, [3] admits that �there may also be rulesdefining views� in the databases whose constraints are checked for satisfiability. However, [3] pro-vides very poor information of how IDB atoms must be handled. They only say that IDB rules areconsider as integrity constraints, assuming that there is no a �strict separation between extensionaland intensional database�.

If IDB rules must be expressed as integrity constraints, it is not clear how this translation mustbe done. Consider, for instance, a simple database schema S = (DR, IC), where a is an EDBrelation:

DR ¼ fpðX ; Y Þ aðX ; Y Þ ^ :aðY ;X ÞgIC ¼ f :pða; aÞg

Clearly, this schema is not satisfiable under the database semantics. The single integrity con-straint in IC requires that every database instancing S must make the IDB fact p(a,a) true. How-ever, that never will be possible due to the definition of p.

If IDB [ IC is expressed in terms of a set of PRQ formula

S0 ¼f8xyððAðx; yÞ ^ ðAðy; xÞ !?ÞÞ ! P ðx; yÞÞ9xðPðx; xÞ ^ x ¼ aÞg

then S 0 is satisfiable since {P(a,a)}, for instance, is a model of S 0. Obviously, this model is notacceptable under the intended semantics of databases.

Refer to [20] for a more detailed comparison between state-satisfiability, i.e. database seman-tics, and model-satisfiability, i.e. first-order-logic semantics. Indeed, these semantic differencesare similar to the differences between ‘‘true’’ QC containment and uniform QC, explained in Sec-tion 7.1.3. Recall that uniform containment tests may consider models containing IDB facts thatare not derivable from the evaluation of their defining rules.


In a personal communication, Heribert Schutz, one of the authors of [3], suggested that theEP tableau method could handle properly literals about IDB predicates, that is, under data-base semantics, if their defining rules are encoded according to a special adaptation of theClark�s completion [12]. However, up to some concrete examples where this approach workedeffectively, this proposal still is neither formalized nor proved correct at the best of ourknowledge.

Note that if derived, IDB or query, predicates are not handled appropriately, it is not worth touse the EP Tableau method to check the satisfiability of constraints including literals about thesepredicates. The same is applicable if this method wants to be used to check query containment byexpressing it in terms of a satisfiability problem.

8. Conclusions

In this paper we have presented the Constructive query containment method for QC Checkingwhich ckecks ‘‘true’’ QC and QC under constraints for queries over databases with safe negationin both IDB and EDB subgoals and with or without built-in predicates. As far as we know, ours isthe first proposal that covers all these features in a single method and in a uniform and integratedway.

The CQC method manages all these features in a uniform and integrated way. However, it can-not be said that the CQC method is ‘‘immune’’ to the particular cases appearing in each contain-ment test. On the contrary, the CQC method is ‘‘sensitive’’ to certain well-defined subcases ofqueries and/or integrity constraints not to limit its capacity but rather to improve its performance.This is exactly the rationale that underlies the concept of Variable Instantiation Pattern, VIP,which is one of the most noticeable original contributions of the CQC method. The different VIPsdefined in this paper allows keeping the generality of the CQC method and, at the same time,searching possible counterexamples by generating only the relevant ones to each particular casewithout loss of completeness.

Moreover, the CQC method is not less efficient than some previous methods for the same casesthat those methods covered, as it has been shown in the concrete case of conjunctive querycontainment with safe negated EDB subgoals. This latter case is really revealing, since theCQC method can race with the most efficient algorithm just proposed [53] by just replacing theNegation VIP with the Simple VIP.

We have proved several properties regarding the correctness of the CQC method: finite successsoundness for hierarchical queries and databases, failure soundness, finite success completenessfor strict-stratified queries and databases and failure completeness for hierarchical queries anddatabases. From these results, and from previous results that showed that infinite non-contain-ment counterexamples never exist in the particular case of checking QC for conjunctive querieswith safe EDB negation and built-in predicates, we can ensure termination, and thus decidability,of our method for those cases.

There are some open problems that should be addressed. One of these refers to the treatment ofrecursively-defined IDB predicates. Indeed, the calculus defined in the CQC method admits recur-sive predicates. However, we cannot still guarantee the correctness of the results in the presence ofrecursion. Other open problem is to define new decidable cases that guarantee the termination of


the CQC method. These new cases could be found by a carefully analysis of the elements and cir-cumstances that concur when the CQC method constructs an infinite EDB.

References

[1] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison Wesley, Reading, MA, 1995.

[2] A.V. Aho, Y. Sagiv, J.D. Ullman, Equivalences among relational expressions, SIAM Journal on Computing 8 (2)

(1979) 218–246.

[3] F. Bry, N. Eisinger, H. Schutz, S. Torge, SIC: Satisfiability checking for integrity constraints, in: Proceedings of the

6th International Workshop on Deductive Databases and Logic Programming (DDLP�98), 1998, pp. 25–36.[4] M. Buchheit, M.A. Jeusfeld, W. Nutt, M. Staudt, Subsumption of queries in object-oriented databases,

Information Systems 19 (1) (1994) 33–54.

[5] A. Borgida, On the relative expressiveness of description logics and predicate logics, Artificial Intelligence 82 (1–2)

(1996) 353–367.

[6] F. Bry, R. Manthey, Checking consistency of database constraints: a logical basis, in: Proceedings of the 12th

International Conference on Very Large Data Bases (VLDB�86), 1986, pp. 13–20.[7] F. Bry, S. Torge, A deduction method complete for refutation and finite satisfiability, in: Proceedings of Logics in

Artificial Intelligence, JELIA�98, LNCS 1489, Springer, Berlin, 1998, pp. 122–138.

[8] F. Bry, S. Torge, Solving database satisfiability problems, in: Proceedings of the Workshop Grundlagen von

Datenbanken, Friedrich Schiller Universitat Jena, 1999, pp. 122–126.

[9] D. Calvanese, G. De Giacomo, M. Lenzerini, On the decidability of query containment under constraints, in:

Proceedings of the 17th ACM Symposium on Principles of Database Systems (PoDS�98), 1998, pp. 149–158.[10] E.P.F. Chan, Containment and minimization of positive conjunctive queries in OODB�s, in: Proceedings of the

11th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PoDS�92), 1992, pp.202–211.

[11] L. Cavedon, J.W. Lloyd, A completeness theorem for SLDNF resolution, Journal of Logic Programming 7 (3)

(1989) 177–191.

[12] K.L. Clark, Negation as failure, in: Logic and Data Bases, Plenum Press, New York, 1977, pp. 293–322.

[13] A.K. Chandra, P.M. Merlin, Optimal implementation of conjunctive queries in relational data bases, in:

Proceedings of the 9th ACM SIGACT Symposium on Theory of Computing, 1977, pp. 77–90.

[14] S. Cohen, W. Nutt, A. Serebrenik, Rewriting aggregate queries using views, in: Proceedings of the 18th ACM

SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PoDS�99), 1999, pp. 155–166.[15] C. Chekuri, A. Rajaraman, Conjunctive query containment revisited, in: Proceedings of the 6th International

Conference on Database Theory (ICDT�97), LNCS (1186), Springer, Berlin, 1997, pp. 56–70.

[16] S. Chaudhuri, M.Y. Vardi, On the equivalence of recursive and nonrecursive datalog programs, Journal of

Computer and System Sciences 54 (1) (1997) 61–78.

[17] F. Donini, M. Lenzerini, D. Nardi, W. Nutt, The complexity of concept languages, Information and Computation

134 (1) (1997) 1–58.

[18] F. Donini, M. Lenzerini, D. Nardi, A. Schaerf, AL-log: integrating datalog and description logics, Journal of

Intelligent Information Systems 10 (3) (1998) 227–252.

[19] G. Dong, J. Su, Conjunctive query containment with respect to views and constraints, Information Processing

Letters 57 (2) (1996) 95–102.

[20] H. Decker, E. Teniente, T. Urp, How to tackle schema validation by view updating, in: Proceedings of the 5th

International Conference on Extending Database Technology (EDBT�96), LNCS 1057, Springer, 1996, pp. 535–

549.

[21] M. Fitting, First-Order Logic and Automated Theorem Proving, Springer, Berlin, 1990.

[22] C. Farre, E. Teniente, T. Urp, The constructive method for query containment checking, in: Proceedings of the

10th International Conference on Database and Expert Systems Applications (DEXA�99), LNCS 1677, 1999, pp.

583–593.


[23] C. Farre, A New method for query containment checking in databases, Ph.D. Thesis, Universitat Politecnica de

Catalunya, 2003.

[24] C. Farre, E. Teniente, T. Urpı, Query containment with negated IDB predicates, in: Proceedings of the 7th

European Conference on Advances in Databases and Information Systems (ADBIS 2003), 2003, pp. 411–429.

[25] A. Gupta, Y. Sagiv, J.D. Ullman, J. Widom, Constraint checking with partial information, in: Proceedings of the

13th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PoDS�94) 1994, pp. 45–55.

[26] A.Y. Halevy, Answering queries using views: A survey, VLDB Journal 10 (4) (2001) 270–294.

[27] A.Y. Halevy, I.S. Mumick, Y. Sagiv, O. Shmueli, Static analysis in datalog extensions, Journal of the ACM 48 (5)

(2001) 971–1012.

[28] B. Hollunder, W. Nutt, M. Schmidt-Schauß. Subsumption algorithms for concept description languages, in:

Proceedings of the 9th European Conference on Artificial Intelligence (ECAI�90), 1990, pp. 348–353.[29] I. Horrocks, U. Sattler, S. Tessaris, S. Tobies, How to decide query containment under constraints using a

description logic, in: Proceedings of the 7th International Workshop on Knowledge Representation meets

Databases (KRDB 2000), CEUR Workshop Proceedings.

[30] D.S. Johnson, A. Klug, Testing containment of conjunctive queries under functional and inclusion dependencies,

Journal of Computer and System Sciences 28 (1) (1984) 167–189.

[31] A. Klug, On conjunctive queries containing inequalities, Journal of the ACM 35 (1) (1988) 146–160.

[32] M. Lenzerini, Description logics and their relationships with databases, in: Proceedings of the 7th International

Conference on Database Theory (ICDT�99), LNCS 1540, Springer, Berlin, 1999, pp. 32–38.

[33] J.W. Lloyd, Foundations of Logic Programming, Springer, 1987.

[34] A. Levy, M-C. Rousset. CARIN: a representation language combining Horn rules and description logics, in:

Proceedings of the 12th European Conference on Artificial Intelligence (ECAI�96), 1996, pp. 323–327.[35] A. Levy, M.-C. Rousset, Verification of knowledge bases based on containment checking, Artificial Intelligence

101 (1–2) (1998) 227–250.

[36] A. Levy, Y. Sagiv, Queries independent of updates, in: Proceedings of the 19th International Conference on Very

Large Data Bases (VLDB�93), 1993, pp. 171–181.[37] A. Levy, Y. Sagiv, Semantic query optimization in datalog programs, in: Proceedings of the 14th ACM SIGACT-

SIGMOD-SIGART Symposium on Principles of Database Systems (PoDS�95), 1995, pp. 163–173.[38] A. Levy, D. Suciu, Deciding containment for queries with complex objects, in: Proceedings of the 16th ACM

SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PoDS�97), 1997, pp. 20–31.[39] J.W. Lloyd, R.W. Topor, Making prolog more expressive, Journal of Logic Programming 1 (3) (1984) 225–240.

[40] W. Nutt, Y. Sagiv, S. Shurin, Deciding equivalences among aggregate queries, in: Proceedings of the 17th ACM

SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PoDS�98), 1998, pp. 214–223.[41] W. Nutt, Algorithms for constraints in deduction and knowledge representation, Ph.D. Thesis, University of

Saarland, 1993.

[42] M. Penabad, N. Brisaboa, H. Hernandez, J. Param, A general procedure to check conjunctive query containment,

Acta Informatica 38 (7) (2002) 489–529.

[43] J.A. Robinson, A machine-oriented logic based on the resolution principle, Journal of ACM 12 (1) (1965) 23–41.

[44] R. Ramakrishnan, Y. Sagiv, J.D. Ullman, M.Y. Vardi, Proof-tree transformation theorems and their applications,

in: Proceedings of the 8th ACM Symposium on Principles of Database Systems (PoDS�89), 1989, pp. 172–181.[45] Y. Sagiv, Optimizing datalog programs, in: Foundations of Deductive Databases and Logic Programming,

Morgan Kaufmann, 1988, pp. 659–698.

[46] R. Smullyan, First-Order Logic, Springer, 1968.

[47] M. Staudt, K.v. Thadden, A generic subsumption testing toolkit for knowledge base queries, in: Proceedings of the 7th

International Conference on Database and Expert Systems Applications (DEXA�96), LNCS 1134, 1996, pp. 834–844.

[48] Y. Sagiv, M. Yannakakis, Equivalences among relational expressions with the union and difference operators,

Journal of the ACM 27 (4) (1980) 633–655.

[49] J.D. Ullman, Principles of Database an Knowledge-Base Systems, vol. 2. Computer Science Press, 1989.

[50] J.D. Ullman, Information integration using logical views, in: Proceedings of the 6th International Conference on

Database Theory (ICDT�97), 1997, pp. 19–40.


[51] R. van der Meyden, The Complexity of Querying Indefinite Data about Linearly Ordered Domains, in:

Proceedings of the 11th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database 7Systems

(PoDS�92): 1992, 331–345.[52] F. Wei, G. Lausen, Conjunctive query containment in the Presence of Disjunctive Integrity Constraints, in:

Proceedings of the 9th International Workshop on Knowledge Representation meets Databases (KRDB 2002),

CEUR Workshop.

[53] F. Wei, G. Lausen, Containment of Conjunctive Queries with Safe Negation, in: Proceedings of the 9th

International Conference on Database Theory (ICDT 2003): 346–360, LNCS (2572), Springer, 2003.

[54] J. Wang, R.W. Topor, M.J. Maher, Reasoning with disjunctive constrained tuple-generating dependencies, in:

Proceedings of the 12th International Conference on Database and Expert Systems Applications (DEXA 2001),

LNCS 2113, Springer, Berlin, 2001, pp. 963–973.

[55] X. Zhang, M.Z. Ozsoyoglu, On efficient reasoning with implication constraints, in: Proceedings of the Third

International Conference on Deductive and Object-Oriented Databases (DOOD�93), LNCS 760, Springer, Berlin,

1993, pp. 236–252.

[56] X. Zhang, M.Z. Ozsoyoglu, Implication and referential constraints: a new formal reasoning, IEEE Transactions on

Knowledge and Data Engineering 9 (6) (1997) 894–910.

Carles Farre is currently associate lecturer at the Department of Software of the Technical

University of Catalonia in Barcelona. He received his Ph.D. degree from the Technical University

of Catalonia in 2003. His research interests are involved with conceptual modelling, schema

validation, abduction and query containment.

Ernest Teniente is currently associate professor at the Department of Software of the Technical

University of Catalonia in Barcelona. He received his Ph.D. degree from the Technical University

of Catalonia in 1992. He has also been a visiting researcher at the Politecnico di Milano and at the

Universita di Roma Tre. He worked on deductive databases, database updates and integrity

constraint maintenance. Current research interests are involved with conceptual modelling,

schema validation, abduction and query containment.

Toni Urpı is currently associate professor at the Department of Software of the Technical Uni-

versity of Catalonia in Barcelona. He received his Ph.D. degree from the Technical University of

Catalonia in 1993. He worked on deductive databases, database updates and integrity constraint

maintenance. Current research interests are involved with conceptual modelling, schema vali-

dation, abduction and query containment.


Checking query containment with the CQC method

Documents

Transcript of Checking query containment with the CQC method