Query Containment Using a DLR ABox

Query containment under constraints is the problem of determining whether the result of one query is contained in the result of another query for every database satisfying a given set of constraints. This problem is of particular importance in information integration and warehousing where, in addition to the constraints derived from the source schemas and the global schema, inter-schema constraints can be used to specify relationships between objects in different schemas. A theoretical framework for tackling this problem using the DLR logic has been established, and in this paper we show how the framework can be extended to a practical decision procedure. The proposed technique is to extend DLR with an Abox (a set of assertions about named individuals and tuples), and to transform query subsumption problems into DLR Abox satisfiability problems. We then show how such problems can be decided, via a reification transformation, using a highly optimised reasoner for the SHIQ description logic.


Introduction
Query containment under constraints is the problem of determining whether the result of one query is contained in the result of another query for every database satisfying a given set of constraints (derived, for example, from a schema).This problem is of particular importance in information integration (see [9]) and data warehousing where, in addition to the constraints derived from the source schemas and the global schema, inter-schema constraints can be used to specify relationships between objects in different schemas (see [6]).
This problem has been studied by Calvanese et al. [4]; they have established a theoretical framework using the logic ÄÊ, 1 presented several (un)decidability results, and described a method for solving the decidable cases using an embedding in the propositional dynamic logic CPDL [12,11].However, this method does not lead directly to a practical decision procedure as there is no (known) implementation of a CPDL reasoner.Moreover, even if such an implementation were to exist, similar embedding techniques [10] have resulted in severe tractability problems when used, for example, to embed the ËÀÁ description logic in ËÀ by eliminating inverse roles [13]. 1 Set semantics is assumed in this framework.
In this paper we present a practical decision procedure for the case where neither the queries nor the constraints contain regular expressions.This represents a restriction with respect to the framework described in Calvanese et al., where it was shown that the problem is still decidable if regular expressions are allowed in the schema and the (possibly) containing query, but this seems to be acceptable when modelling classical relational information systems, where regular expressions are seldom used [7,6].
Moreover, the use of ÄÊ in both schema and queries still allows for relatively expressive queries, and by staying within a strictly first order setting we are able to use a decision procedure that has demonstrated good empirical tractability.
The procedure is based on the method described by Calvanese et al., but extends ÄÊ by defining an ABox, a set of axioms that assert facts about named individuals and tuples of named individuals (see [5]).This leads to a much more natural encoding of queries (there is a direct correspondence between variables and individuals), and allows the problem to be reduced to that of determining the satisfiability of a ÄÊ knowledge base (KB), i.e., a combined schema and ABox.This problem can in turn be reduced to a KB satisfiability problem in the ËÀÁÉ description logic, with Ò-ary relations reduced to binary ones by reification.In [16], a similar approach is presented.
However, the underlying description logic ( Ä AEÊ) is less expressive than ÄÊ and ËÀÁÉ (for example, it is not able to capture Entity-Relationship schemas).
We have good reasons to believe that this approach represents a practical solution.In the FaCT system [13] we already have an (optimised) implementation of the decision procedure for ËÀÁÉ schema satisfiability described in [15], and using FaCT we have been able to reason very efficiently with a realistic schema derived from the integration of several Entity-Relationship schemas using ÄÊ inter-schema constraints. 2In Section 4 we show how this algorithm can be straightforwardly extended to deal with ABox axioms.As the number of individuals generated by the encoding of realistic problems will be relatively small, this extension should not compromise the empirical tractability.

Preliminaries
In this section we will (briefly) define the key components of our framework, namely the logic ÄÊ, (conjunctive) queries, and the logic ËÀÁÉ.

The Logic ÄÊ
We will begin with ÄÊ as it is used in the definition of both schemas and queries.
ÄÊ is a description logic (DL) extended with the ability to describe relations of any arity.
Definition 2.1.1Given a set of atomic concept names AE and a set of atomic relation names AEÊ, every ¾ AE is a concept and every R ¾ AEÊ is a relation, with every R having an associated arity.If are concepts, R S are relations of arity Ò, is an Note that Ò does not need to be interpreted as the set of all tuples of arity Ò, but only as a subset of them, and that the negation of a relation R with arity Ò is relative to Ò .
In our framework, a schema consists of a set of logical inclusion axioms expressed in ÄÊ.These axioms could be derived from the translation into ÄÊ of schemas expressed in some other data modelling formalism (such as Entity-Relationship modelling [3,8]), or could directly stem from the use of ÄÊ to express, for example, inter-schema constraints to be used in data warehousing, (see [6]).Additionally, the interpretation function ¡ Á maps every individual to an element of ¡ Á .An interpretation Á satisfies an axiom Û (written Á Û ) iff Û Á ¾ Á , and it satisfies an axiom Û R (written Á Û R) iff Û Á ¾ R Á .An interpretation Á satisfies an ABox iff Á satisfies every axiom in .
A knowledge base (KB) Ã is a pair Ë , where Ë is a schema and is an ABox.An interpretation Á satisfies a KB Ã iff it satisfies both Ë and .
If an interpretation Á satisfies a concept, schema, or ABox , then we say that Á is a model of , call satisfiable, and write Á .Note that it is not assumed that individuals with different names are mapped to different elements in the domain (the so-called unique name assumption).Definition 2.1.4 If Ã is a KB, Á is a model of Ã, and is an ABox, then Á ¼ is called an , and all concepts, roles and individuals occuring in Ã are interpreted identically by Á and Á ¼ .

Given two ABoxes
¼ and a schema Ë, is included in ¼ w.r.t.Ë (written Ë ¼ ) iff every model Á of Ë can be extended to ¼ .

Queries
In this extended abstract we will consider only conjunctive queries (see [1, chap. 4]).
A conjunctive query Õ is an expression where Ü, Ý, and are tuples of distinguished variables, variables, and constants, respectively (distinguished variables appear in the answer, "ordinary" variables are used only in the query expression, and constants are fixed values).Each term Ø ÖÑ ´ Ü Ý µ is called an atom in Õ and is in one of the forms ´Ûµ or R´ Ûµ, where Û (resp.Û) is a variable or constant (resp.tuple of variables and constants) in Ü Ý , is a ÄÊ concept and R is a ÄÊ relation. 3or example, a query designed to return the bus number of the city buses travelling in both directions between two stops is: BUS´ÒÖ µ bus route´ÒÖ ×ØÓÔ ½ ×ØÓÔ ¾ µ bus route´ÒÖ ×ØÓÔ ¾ ×ØÓÔ ½ µ city bus´ÒÖ µ where ÒÖ is a distinguished variable (it appears in the answer), ×ØÓÔ ½ and ×ØÓÔ ¾ are non- distinguished variables, city bus is a ÄÊ concept and bus route is a ÄÊ relation.
In this framework, the evaluation of a query Õ with Ò distinguished variables w.r.t.
For example, the schema containing the axioms ´bus route Ù ´°½ ¿ city busµµ Ú city bus route and city bus route Ú ´bus route Ù ´°½ ¿ city busµµ states that the relation city bus route contains exactly the bus route information that concerns city buses.It is easy to see that the following CITY BUS query CITY BUS´ÒÖ µ city bus route´ÒÖ ×ØÓÔ ½ ×ØÓÔ ¾ µ city bus route´ÒÖ ×ØÓÔ ¾ ×ØÓÔ ½ µ is equivalent to the previous BUS query w.r.t. the given schema.In an information integration scenario, for example, this could be exploited by reformulating the BUS query as a CITY BUS query ranging over a smaller database without any loss of information.

The Logic ËÀÁÉ
ËÀÁÉ is a standard DL, in the sense that it deals with concepts and (only) binary relations (called roles), but it is unusually expressive in that it supports reasoning with inverse roles, qualifying number restrictions on roles, transitive roles, and role inclusion axioms.
Definition 2.3.1 Given a set of atomic concept names AE and a set of atomic role names AEÊ with transitive role names AEÊ • AEÊ, every ¾ AE is a concept, every Ê ¾ AEÊ is a role, and every Ê ¾ AEÊ • is a transitive role.If Ê is a role, then Ê is also a role (and if Ê ¾ AEÊ • then Ê is also a transitive role).If Ë is a (possibly inverse) role, are concepts, and is a non-negative integer, then , , Ù , Ë , Ë are also ËÀÁÉ concepts.
The semantics of ËÀÁÉ is given in terms of interpretations Á ´¡Á ¡ Á µ, where ¡ Á is the domain (a non-empty set), and ¡ Á is an interpretation function that maps every concept to a subset of ¡ Á and every role to a subset of ´¡Á µ ¾ such that the following equations are satisfied.
ËÀÁÉ schemas, ABoxes, and KBs are defined similarly to those for ÄÊ: if are concepts, Ê Ë are roles, and Ú Û are individuals, then a schema Ë consists of axioms of the form Ú and Ê Ú Ë, and an ABox consists of axioms of the form Û and Ú Û Ê.Again, a KB Ã is a pair Ë , where Ë is a schema and is an ABox.
The definitions of interpretations, satisfiability, and models also parallel those for ÄÊ, and there is again no unique name assumption.
Note that, in order to maintain decidability, the roles that can appear in number restrictions are restricted [15]: if a role Ê occurs in a number restriction Ë , then neither Ë nor any of its sub roles may be transitive (i.e., if the schema contains a Ú-path from Ë ¼ to Ë, then Ë ¼ is not transitive).

Determining Query Containment
In this section we will describe how the problem of deciding whether one query is contained in another one w.r.t. a schema can be reduced to the problem of deciding KB satisfiability in the ËÀÁÉ description logic.There are three steps to this reduction.
Firstly, the queries are transformed into ÄÊ ABoxes ½ and ¾ such that Ë Õ ½ Ú Õ ¾ iff Ë ½ ¾ (see Definition 2.1.4).Secondly, the ABox inclusion problem is transformed into one or more KB satisfiability problems.Finally, we show how a ÄÊ KB can be transformed into an equisatisfiable ËÀÁÉ KB.

Transforming Query Containment into ABox Inclusion
We will first show how a query can be transformed into a canonical ÄÊ ABox.
Such an ABox represents a generic pattern that must be matched by all tuples in the evaluation of the query.
Definition 3.1.1Let Õ be a conjunctive query.The canonical ABox for Õ is defined by ´Ûµ is an atom in Õ We introduce a new atomic concept È Û for every individual Û in and define the completed canonical ABox for Õ by The axioms Û È Û in Õ introduce representative concepts for each individual Û in Õ .They are used (in the axioms Û È Û ) to ensure that individuals corresponding to different constants in Õ cannot have the same interpretation,5 and will also be useful in the transformation to KB satisfiability.
By abuse of notation we will say that an interpretation Á, and an assignment of distinguished variables, non-distinguished variables and constants to elements in the domain of Á such that Á ´Õµ, define a model for Õ with the interpretation of the individuals corresponding with and the interpretation È Á Û Û Á .We can use this definition to transform the query containment problem into a (very similar) problem involving ÄÊ ABoxes.We can assume that the names of the nondistinguished variables in Õ ¾ differ from those in Õ ½ (arbitrary names can be chosen without affecting the evaluation of the query), and that the names of distinguished variables and constants appear in both queries (if a name is missing in one of the queries, it can be simply added using a term like ´Úµ).
The following Theorem shows that a canonical ABox really captures the structure of a query, allowing the query containment problem to be restated as an ABox inclusion problem.
Û Á for all individuals Û occuring in and their corresponding representative concepts È Û , ¡ Á ¼ is the same as ¡ Á in every other respect, and Á ¼ Ë .PROOF: From the semantics it is clear that the interpretation of a concept depends only on the interpretations of the atomic concepts and roles that appear syntactically in , and from Definition 3.1.1,È Û appears only in axioms of the form Û È Û and Û È Û in Ò .Therefore Á ¼ satisfies all the axioms Ú and R Ú S in Ë and all the axioms Û in , because Á Ë and all the , , R, S and Û are identically interpreted by Á and Á ¼ .
Moreover, Á ¼ also satisfies both kinds of axiom in Ò .It obviously satisfies the

Transforming ABox Inclusion into ABox Satisfiability
Next, we will show how to transform the ABox inclusion problem into one or more KB satisfiability problems.In order to do this, there are two main difficulties that must be overcome.The first is that, in order to transform inclusion into satisfiability, we would like to be able to "negate" axioms.This is easy for axioms of the form Û , because an interpretation satisfies Û iff it does not satisfy Û .However, we cannot deal with axioms of the form Û R in this way, because ÄÊ only has a weak form of negation for relations relative to Ò .Our solution is to transform all axioms in Õ¾ into the form Û .
The second difficulty is that Õ¾ may contain individuals corresponding to nondistinguished variables in Õ ¾ (given the symmetry between queries and ABoxes, we will refer to them from now on as non-distinguished individuals).These individuals introduce an extra level of quantification that we cannot deal with using our standard reasoning procedures: Ë Õ½ Õ¾ iff for all models Á of Ë Õ½ there exists some extension of Á to Õ¾ .We deal with this problem by eliminating the non-distinguished individuals from Õ¾ .
We will begin by exploiting some general properties of ABoxes that allow us to compact Õ¾ so that it contains only one axiom Û R for each tuple Û, and one axiom Û for each individual Û that is not an element in any tuple.It is obvious from the semantics that we can combine all ABox axioms relating to the same individual or tuple: The following lemma shows that we can also absorb Û into Û R when Û is an element of Û.
, where Û is the th The ABox resulting from exhaustive application of Lemma 3.2.1 can be represented as a graph, with a node for each tuple, a node for each individual, and edges connecting tuples with the individuals that compose them.The graph will consist of one or more connected components, where each component is either a single individual (representing an axiom Û , where Û is not an element in any tuple) or a set of tuples linked by common elements (representing axioms of the form Û R).As they do not have any individuals in common, we can deal independently with the inclusion problem for each connected set of axioms: for every connected set of axioms ¼ .
Returning to our original problem, we will now show how we can collapse a con- into a single axiom of the form Û , where Û (the "root" individual) is an element of one of the tuples Û ½ Û Ò occurring in , is a concept that describes from the point of view of Û, and Ë Õ½ iff Ë Õ½ Û .The collapsing procedure works by replacing each axiom Û R with an axiom of the form Û (where Û is an element of Û ), which can then be absorbed into another axiom Û R (where Û is an element of Û ) using Lemma 3.2.1.A recursive traversal of the graphical representation of is used to choose the order in which to apply the replacements and absorptions so that is collapsed into a single axiom (a similar technique is used in [4] to transform queries into concepts).During the collapsing procedure, new concepts É Û may be introduced to represent non-distinguished individuals Û that occur in .These concepts serve only as "place-holders", and will be replaced when is completely collapsed.
A traversal starts at an (arbitrary) individual node Û (the "root") and proceeds as follows.
¯At an individual node Ü, the node is first marked as visited.Then, while there remains an unmarked tuple node connected to Ü, one of these, Û, is selected, visited, and the axiom Û R is replaced with the axiom where Û Û ½ Û Ò , Ü is the th element of Û, Û is the th element of Û, and Û is either the representative concept È Û , if Û is an individual occurring in Õ½ , or a concept É Û otherwise.Finally, any axioms Ü ½ Ü Ò resulting from visiting the unmarked tuples connected to Ü are merged into a single axiom Ü ´ ½ Ù Ù Ò µ. ¯At a tuple node Û, the node is first marked as visited.Then, while there remains an unmarked individual node connected to Û, one of these, Ü, is selected, visited, and any axiom Ü that results from the visit is merged into the axiom Û R using Lemma 3.2.1.
After the traversal, has been reduced to a single axiom Û , but it may contain concepts É Û that were introduced during the collapsing procedure as representatives for non-distinguished individuals.As these concepts do not occur in Ë Õ½ , they must be eliminated if the inclusion relationship is to be preserved.This is easy for concepts É Û that occur only once in , and where Û is not the root individual (i.e., Û Û ): as Û is "referred to" only once in the collapsed axiom, and can be freely interpreted when a model Á of Ë Õ½ is extended to , É Û can simply be replaced with (this will be shown more formally in Lemma 3.2.2).This solution cannot be adopted for a concept É Û that occurs more than once in , or that occurs at least once in when Û Û , because Û must have the same interpretation everywhere it is "referred to" in the collapsed axiom.However, in this case we can deal with É Û by exploiting the fact that the individual Û must occur in a cycle in the graph representing .An individual Û is in a cycle in the graph if there is a path leading from the node representing Û back to itself in which the same edge is never traversed (in either direction) more than once.As the marking of nodes during the traversal ensures that the same edge is never traversed more than once, Û must have been in such a cycle.
Given the correspondence between the graph and the axioms in , it is obvious that can only be satisfied by an interpretation Á in which Û Á is also in a relational cycle (the cycle is explicitly asserted by the axioms in ).Moreover, given that Ë , and that extending an interpretation of Ë to cannot extend the interpretation of any relation, then such a cycle must already exist in every interpretation of Ë .Finally, the properties of ÄÊ mean that an interpretation Á of Ë can only be guaranteed to contain a relational cycle if the cycle is explicitly asserted in axioms of the form Û R in , so that each element in the cycle must be the interpretation of one of the individuals forming the tuples in these axioms.We can therefore conclude that the individual Û must have the same interpretation as some individual Û occurring in , and that É Û can be replaced with the representative concept È Û (and that if Û is the root individual, the axiom Û can be replaced by Û ).Of course we do not know which individual occurring in corresponds to a given É Û , but we can simply try all possible replacements (of which there can only be finitely many), so that Ë Õ½ iff, for one of these replacements, Ë Õ½ Û .
An extra level of non-determinism is thus added to the procedure, but this should be manageable as the numbers of such É Û will typically be very small. 6These replace- 6 This represents a useful refinement over the procedure described in [4], where all Þ that occur in cycles are non-deterministically replaced with one of the Û , regardless of whether or not they are used to enforce a co-reference.ments can obviously be performed either before or after the collapsing procedure without affecting the the result.In practice, it will be more efficient to delay the replacement as long as possible, but in the following Lemma (Lemma 3.2.2) we will assume that the replacements have been performed before the collapsing procedure.
The correctness of the collapsing procedure does not depend on the traversal (whose purpose is simply to choose a suitable ordering), but only on the correctness of the individual transformations.We have already shown that the compacting and absorbing transformations preserve (un)satisfiability, and so obviously preserve the implication relationship; it only remains to show that the implication relationship is also preserved by each replacement of an axiom of the form Û R with one of the form Û .Lemma 3.2.2Let Ë be a schema, a completed canonical ABox and ½ an ABox where Û R ¾ ½ , Û Û ½ Û Ò , Û is the th element of Û, and every other element of Û is either an individual that occurs in or an individual that occurs nowhere else in either or ½ .Let be the concept where Û is the representative concept È Û when Û is individual that occurs in , and otherwise.If ¾ is the ABox that results from the replacement of Û R ¾ ½ with the axiom Û , then Ë contains other axioms, then any interpretation that satisfies these axioms will still satisfy them after the replacement.For the only if direction, it is easy to show that if Á Ë , and Á ¼ is an extension of Á that satisfies Û R, then Á ¼ also satisfies Û .
Obviously, Û Á ¼ is the th element of Û Á ¼ , and Û Á ¼ ¾ R Á ¼ . For each component ´° Ò Û µ in there are two possible cases 1.When Û is an individual occurring in , Û is È Û , the representative concept for Û .In this case, Û È Û is an axiom in , so The converse direction is more complicated.Let Á be an interpretation such that Á Ë Õ½ , and Á cannot be extended to satisfy Û R. From Lemma 3.1.3we can assume, without loss of generality, that È Á Û Û Á for every representative concept È Û occurring in .Assume that Á can be extended to an interpretation Á ¼ that satisfies Û .Then there must be some , and for each with , ¾ ´° Ò Û µ Á ¼ .Again, for each component ´° Ò Û µ in there are two possible cases.
1.When Û is an individual occurring in , Û is È Û , the representative concept for Û .In this case, È Á Otherwise, Û is .In this case, Û occurs nowhere else in either or ½ , so when Á was extended to Á ¼ , Û could have been interpreted as any element in ¡ Á without affecting the satisfiability of any other axiom.We can therefore assume, without loss of generality, that in this particular interpretation Û Á ¼ (obviously, ¾ Á ¼ ).
We therefore have PROOF: In the case where Û is an individual in , there are no longer any nondistinguished individuals in Û , so Ë Û iff every model of Ë is also a model of Û .This is obviously true iff there are no models of Ë Û that are also models of Û , i.e., iff Ë ´ Û µ is not satisfiable.
In the case where Û is not an individual in , Ë Û iff for every model Á of Ë , Á can be extended to Û .As Û is the only remaining non-distinguished individual in Û , Á can be extended to Û iff Á (equivalently, ´ µ Á ¡ Á ), i.e., iff ´Ë Ú µ is not satisfiable.
To illustrate the inclusion to satisfiability transformation, we will refer to the example given in Section 2.2.The containment of BUS in CITY BUS w.r.t. the schema is demonstrated by the inclusion Ë ½ ¾ , where Ë, ½ and ¾ are the schema and two canonical ABoxes (completed in the case of ½ ) corresponding to the given queries: Ë ´bus route Ù ´°½ ¿ city busµµ Ú city bus route city bus route Ú ´bus route Ù ´°½ ¿ city busµµ

©
The two axioms in ¾ are connected, and can be collapsed into a single axiom using the described procedure.If Þ ½ is chosen as the root, and the traversal visits Ò Þ ½ Þ ¾ , Þ ¾ , and Ò Þ ¾ Þ ½ , in that order, then the resulting axiom (describing ¾ from the point of view of Þ ½ ) is Þ ½ , where is the concept and È Þ½ È Þ¾ are "place-holders" for Þ ½ Þ ¾ . 7As Þ ¾ is referred to only once, È Þ¾ can be replaced with .However, as Þ ½ is referred to twice (as È Þ½ and as the root), it must be replaced (non-deterministically) with one of the individuals in ½ , and Ë ½ ¾ iff Ë ½ Þ ½ for one of these replacements.Substituting È Þ¾ with , Þ ½ with Ý ½ and È Þ½ with È Ý½ results in an axiom Ý ½ ¼ , and Ë ½ Ý ½ ¼ holds because Ë ´ ½ Ý ½ ¼ µ is not satisfiable.
Summing up, we thus have:

Dealing with disjunctive queries
In this section we will show how the technique can be extended in order to decide the containment of disjunctive queries.
where all the terms are defined exactly as in the conjunctive queries of Section 2.2.The query evaluation is defined as the union of all the evaluations for any disjunct.
Given a query Õ with Ò distinguished variables, its evaluation w.r.t. the interpretation Á ´¡Á ¡ Á µ is the set of Ò-tuples: Without loss of generality we can assume that all the variable names in Ý ½ Ý Ñ are distinct, and that distinguished variables and constant names appear in every disjunct (see Section 3.1).The query containment problem is defined as in the conjunctive case.
The basic idea is to consider each conjunctive subexpression as a canonical ABox, and to extend the inclusion relation of Section 2.1 to take into account the "disjunction" of ABoxes.We will first extend the definition of ÄÊ ABoxes to disjunctive ÄÊ ABoxes (in order to avoid ambiguity, we will sometimes refer to the kind of ABox defined in Section 2.1 as a conjunctive ABox).
The definition of interpretation and satisfiability for each conjunctive ABox is the same as that given in Section 2.1.An interpretation Á satisfies a disjunctive ABox (written Á ) iff Á satisfies at least one of the conjunctive ABoxes in .
On top of the definition of a disjunctive ABox, is built the notion of a disjunctive KB and its satisfiability.All the definitions given in Section 2.1 can be naturally extended to the disjunctive case; in particular the fundamental notion of the inclusion relation between ABoxes.
To simplify the notation, we define the operator ´¡¢¡µ which adds a set of axioms to each element of a disjunctive KB.The meaning of the operator is given by the following equations: with the natural extension to finite sets of axioms: Now we will proceed as in Sections 3.1 and 3.2 by first showing how to reduce the query containment problem to ABox inclusion, and then to ABox satisfiability.
First, we will extend the definition of canonical ABox to deal with disjunctive queries.Û R R´ Ûµ Ø ÖÑ ´ Ü Ý µ is an atom in Õ for some Û ´Ûµ Ø ÖÑ ´ Ü Ý µ is an atom in Õ for some The completed canonical disjunctive ABox for Õ is defined in a similar way to the nondisjunctive case (see Definition 3.1.1),the difference being that the new axioms are added to each of the conjunctive ABoxes making up the disjunction.Given the disjunctive ABox Õ ½ Ñ , its completed version (written as Õ ) is defined as: As in the non-disjunctive case, there is a natural correspondence between database instances and interpretations of disjunctive KBs.Each element of a query evaluation corresponds to an interpretation satisfying the canonical ABox and vice versa. 8Proposition 3.3.4Given a database Á ´¡Á ¡ Á µ and a disjunctive query Õ´ Üµ, then the tuple is in the evaluation Õ´Áµ iff there is an extension Á ¼ of Á satisfying Õ such that Ü Á ¼ for each Ü in Ü. 8 We will consider a database as a standard ÄÊ interpretation in which an individuals corresponding to a constant is taken to be interpreted as the actual constant.
PROOF: For the "only if" direction, let be in Õ´Áµ, then it satisfies at least one of the disjuncts in Õ: Ñ, which means that there is an assignment for the variables in Ý that makes the expression true.If ¾ Õ is the corresponding conjunctive ABox, then an extension Á ¼ of Á can be defined by adding to ¡ Á a mapping from each individual in to the corresponding element of .It is easy to see that Á ¼ satisfies and thus satisfies Õ .
For the "if" direction, let Á ¼ be an extension of Á satisfying Õ such that Ü Á ¼ for each Ü in Ü.Then, from the definition of satisfiability of a disjunctive ABox, there is some appearing in ; it therefore defines an assignment for the variables Ý in the corresponding disjunct of Õ.It is easy to see that this assignment satisfies the formula Given the Proposition 3. The next step consists of reducing ABox inclusion to ABox satisfiability.As in the conjunctive case, we only consider a particular kind of ABox on the r.h.s. of the inclusion, namely those containing only axioms of the form Û .This assumption can be made without loss of generality because the connected components of each conjunctive ABox can be collapsed into a single concept assertion, as shown in Section 3.
ABox in ¼ .Therefore Á Û¤ , and Á ¼¼ Û .Moreover, as both Á and Á ¼¼ are extensions of Á ¼ (see 2.1.4),they differ only in the interpretation of nondistinguished variables.There are two cases, depending on whether or not Û is a non-distinguished individual.
¯For the "if" direction, assume that there is an interpretation Á satisfying Ë which cannot be extended to one satisfying ½ Ñ .For each there must be at least one axiom ´Û µ ¾ that Á cannot be extended to satisfy.Therefore, there is a KB Û¤ for some ´Û µ ¾ such that the interpretation Á cannot be extended to satisfy any of the selected axioms Û ¾ .The interpretation Á satisfies Ë , and it also satisfies all the axioms Û¤ added in Ã ¼ .Again, there are two cases, depending on whether or not Û is a non-distinguished individual.
-If Û is a non-distinguished variable, then Á must be empty, otherwise an extension satisfying Û can be defined by mapping Û to one of the element of Á .Therefore Á Ú . ´ ´ ° ℄Rµ ´Rµ ´ ° ℄Rµ ´ ´Rµµ

Transforming ÄÊ satisfiability into ËÀÁÉ satisfiability
We decide satisfiability of ÄÊ knowledge bases by means of a satisfiability-preserving translation ´¡µ from ÄÊ KBs to ËÀÁÉ KBs.This translation deals with the fact that ÄÊ allows for arbitrary Ò-ary relations while ËÀÁÉ only allows for unary predicates and binary relations; this is achieved by a process called reification.The main idea behind this is easily described: each Ò-ary tuple in a ÄÊ-interpretation is represented by an individual in a ËÀÁÉ-interpretation that is linked via the dedicated functional relations ½ Ò to the elements of the tuple.
For ÄÊ without regular expressions, the mapping ´¡µ shown in Figure 1 (given by Calvanese et al. [4]) reifies ÄÊ expressions into ËÀÁÉ-concepts.This mapping can be extended to a knowledge base as follows.Definition 3.4.1 Let Ã ´Ë µ be a ÄÊ knowledge base.The reification of Ë is given by To reify the ABox , we have to reify all tuples appearing in the axioms.For each distinct tuple Û Û ½ Û Ò occurring in we chose a distinct individual Ø Û (called the "reification of Û") and define: We need a few additional inclusion and ABox axioms to guarantee that any model of ´ ´Ëµ ´ µµ can be "un-reified" into a model of ´Ë µ.Let Ò max denote the max- imum arity of the ÄÊ relations appearing in Ã.We define ´Ëµ to consist of the following axioms (where Ü Ý is an abbreviation for Ü Ú Ý and Ý Ú Ü): for each atomic relation P of arity Ò Ú ½ for each atomic concept These are the standard axioms needed for reification in schema reasoning, and can already be found in [4].
We introduce a new atomic concept É Û for every individual Û in and define ´ µ to consist of the following axioms: ´ These axioms are crucial when dealing with the problem of tuple-admissibility (see below) in the presence of ABoxes.
The proof of Theorem 3.4.2 is rather involved and technical.We first give a sketch of the proof.PROOF (sketch): The same techniques that were used in [2] can be adapted to the DL ËÀÁÉ, and extended to deal with ABox axioms.The only-if direction is straightforward.A model Á of Ã can be transformed into a model of ´Ãµ by introducing, for every arity Ò with ¾ Ò Ò max and every Ò-tuple of elements ¾ ´¡Á µ Ò , a new element Ø that is linked to the elements of by the functional relations ½ Ò .If we interpret ½ by ¡ Á , Ò by the reifications of all elements in Á Ò , and, for every Û that occurs in , É Û by Û Á , then it is easy to show that we have constructed a model of ´Ãµ.
The converse direction is more complicated since a model of ´Ãµ is not necessarily tuple-admissible, i.e., in general there may be distinct elements Ø Ø ¼ that are reifications of the same tuple .In the "un-reification" of such a model, would only appear once which may conflict with assertions in the ÄÊ KB about the number of tuples in certain relations.However, it can be shown that every satisfiable KB ´Ãµ also has a tuple-admissible model.It is easy to show that such a model, by "un-reification", induces a model for the original KB Ã.
Theorem 3.4.2will be an immediate consequence of the following Lemmata 3.4.3and 3.4.5.
Lemma 3.4.3Let Ã ´Ë µ be a ÄÊ knowledge-base.If Ã is satisfiable, then the ËÀÁÉ-KB ´Ãµ is satisfiable.PROOF: Let Á be a model of ´Ë µ.We will reify it into a model Á for ´ ´Ëµ ´Ãµ ´ µµ.
Let Ò max denote the maximum arity of relations in Ë and .The set of individuals of Á is the set of individuals of Á plus a distinct individual for each possible Ò-tuple We have to fix the interpretation of the atomic ËÀÁÉ-concepts and roles.The only roles that occur in ´ ´Ëµ ´Ã ´ µµ are the Ò with ½ Ò Ò max .For each role For every atomic ÄÊ-concept , we set

Á Á
For every atomic ËÀÁÉ-concept P that corresponds to an Ò-ary atomic ÄÊ-relation with Ò ¾ we define P Á Ø ¾ ´¡Á µ Ò and ¾ P Á Finally, we have to define the interpretation of the newly-introduced atomic concepts Ò for ½ Ò Ò max .This is done as follows: By induction of the structure of ÄÊ-concepts and relations one can show, for every ÄÊ-concept , every ÄÊ-relation R, every ¾ ¡ Á , and every ¾ ´¡Á µ Ò for ¾ Ò Ò max , that From this it immediately follows that Á Ë implies Á ´Ëµ and hence Á ´Ëµ ´Ãµ.It remains to show that also Á ´ µ.
We fix the interpretation of the auxiliary concepts É Û that have been introduced in At first, we have to define the interpretation of the individuals in ´Ëµ.For any individual Û that appears also in we set Û Á Û Á .For each newly introduced this definition it is easy to see that Á ´ µ.
It remains to show that Á ´ µ.Á Û É Û follows by construction of É Á Û for every individual Û that occurs in .Let Û ½ Û Ò be a tuple that occurs in .We have to show that Û Á ¾ ´ ½ ½ ´ Ò Ù ¾ É Û¾ Ù Ù Ò É ÛÒ µµ Á .By construction we have that Ò for some Ü ¾ ¡ Á and hence ´ Trivially, for every Ü ¾ ¡ Á , there is at most one Ò-tuple that starts with Ü and continues with Û Á ¾ Û Á Ò .Hence, we get, for every tuple Û ½ Û Ò that occurs in , that PROOF: Let Á be a model of ´Ãµ.We will transform Á into a tuple-admissible model Á for ´Ãµ.Since Á ´Ãµ, we have that Á ´Ãµ and hence Á Ò is the graph of a partial function.To this function we will refer by Á Ò ´¡µ.For ¾ Ò Ò max and Ò-tuple ½ Ò ¾ ´ Á ½ µ Ò , we define the set of all reifications of this tuple by Each set Ì which contains more than one element violates ´£µ.For any such set we pick an arbitrary element Ø ¾ Ì and say that the other elements are conflicting with Ø .With Conf we denote the set of all elements that are conflicting with other elements.
We will now transform Á into an interpretation Á that contains no conflicts.
We start by describing this transformation for the simple case that we have only a single conflicting element Ø.This conflict can be resolved as follows.Let Á ¼ be the interpretation consisting of two disjoint copies of Á (we will forget about the interpretation of individuals at the moment).Á ¼ contains the conflicting element Ø and a copy Ø ¼ of Ø.We define Á from Á ¼ by setting and preserving the interpretation of all other atomic concepts and roles.The result is an interpretation that contains no more conflicting elements.
The construction in the general case is a little bit more complicated because in general Conf may be of arbitrary cardinality and we have to take care of the ABox axioms.To prevent interference of the later construction with the ABox axioms we will use a little bit more care when choosing Conf.Firstly, we show that the interpretation of two different ABox individuals may never conflict.´ µ we have, for each ½ Ò, Ú Á Û Á and hence as the first component of two distinct reified tuples that satisfy Ò Ù ¾ É Û¾ Ù Ù Ò É ÛÒ .This is a contradiction to the assumption that Á ´ µ.
Using Claim 1, we make sure that we do not have any conflicting elements that appear in the interpretation of ABox individuals.There are no two ABox individuals Ø Û½ Ø Û¾ such that Ø Á Û½ Ø Á Û¾ are conflicting.From this it follows that, in each set Ì , there is at most one element that appears as the image of an ABox individual of the interpretation Á (it may appear as the image of several ABox individuals).Hence, we can choose Conf in a way that it contains no elements that appear as images of ABox individuals of Á.
Let Á ¼ denote the disjoint union of ℄´¾ Conf µ copies of Á.For a set Conf we denote the copy of ¾ ¡ Á in the -th copy of Á by .For two distinct sets ¼ and elements ¼ , we call exchanging Á ¼ ½ ´ µ with Á ¼ ½ ´ ¼ µ the operation on Á ¼ which changes the interpretation of ½ under Á ¼ as follows: -If ¾ Conf, then we have changed the relation ½ for .Strictly speaking, we have to distinguish the two cases ¾ and ¾ but these are dual.
-If ¾ Conf, then we have not changed the relation ½ for and hence we have Á ½ ´ µ Á ¼ ½ ´ µ.At the same time, we have exchanged ½ ´ µ by ½ ´ Ò µ and hence Á ½ ´ µ Á ¼ ½ ´ µ which is a contradiction.
In both cases, we have Á ½ ´ µ Á ½ ´ µ because these elements are in different disjoint copies.Hence, and cannot be conflicting.
¯Now assume that we have created a new conflict between elements Á.This implies that, w.l.o.g., the function ½ has been modified for during the exchange (otherwise the conflict would already be present in Á ¼ ).Since we only change the interpretation of the role ½ , and Ò, and hence and ¼ must reside in the same disjoint copy because we do not have -links between the disjoint copies in Á ¼ for ¾.
Hence we have ¼ .Since and do not conflict in Á ¼ , we must have -Finally, if ¾ Conf, then we have to distinguish between the following cases: Hence, Ò and Ò refer to the same disjoint copy and we have and thus and are the same element and can not conflict.
£ if ¾ , then follows analogously and hence and cannot conflict.CLAIM 3: Let be a ËÀÁÉ-concept, ¾ Á and Conf.Then ¾ Á .PROOF OF CLAIM 3. We use a simple induction over the stucture of ËÀÁÉ-concepts.
The claim obviously holds for all atomic concepts.Also, per induction, it immediately holds for the boolean combination of concepts.For the existential, value, and number restrictions it follows from the fact that we start with disjoint copies and only change the interpretation of roles by exchanging elements that are copies of the same element.Hence, we do not changes the number of successors for each element, and, we also exchange only links to elements which, by the induction hypothesis, cannot be distinguished by "smaller" concepts.
From Claim 3 it follows that Á Ë.It remains to show that we can fix the interpretation of the ABox individuals in under Á such that Á .This can be done by interpreting all individuals in a single copy, e.g., by setting, for every ABox individual Û, Û Á Û Á Again, from Claim 3, we get that, for every ABox assertion Û ¾ we have that Û Á ¾ Á implies Û Á ¾ Á .Furthermore, since, for every individual Û that appears in , we have Û Á ¾ Conf and hence the interpretation of is not changed for Û Á .For any assertion Û ½ Û ¾ , we have Û Á ½ Û Á ¾ ¾ Á and hence Û Á ½ Û Á ¾ ¾ Á .Thus, we also have Á and thus Á Ã.
Together with Claim 2, which yields that Á satisfies ´£µ, we have that Á is a tupleadmissible interpretation with Á ´Ãµ.
Once we have solved the problem of tuple admissibility it is fairly straightforward to show the following lemma.Lemma 3.4.5Let Ã ´Ë µ be a ÄÊ knowledge-base.If the ËÀÁÉ-KB ´Ãµ is satisfiable, then Ã is satisfiable.
PROOF: If ´Ãµ is consistent, then, by Lemma 3.4.4we have that there is a tuple admissible model Á for ´Ãµ.We will "un-reify" the reified tuples in Á into ordinary tuples.We use the auxiliary function ÙÖ that maps a reified tuple to its un-reified counterpart.If Ø ¾ Á Ò and Á ´Øµ for ½ Ò, then we define ÙÖ´Øµ ½ Ò .The atomic concepts and relations will be defined as follows: for each atomic concept P Á ÙÖ´Øµ Ø ¾ P Á for each atomic relation P of arity Ò We also have to define the interpretation of the ABox individuals in .For every individual Û that appears in we set Û Á Û Á .Please note that, also if Û appears inside a tuple of a relation assertion in , Û will appear in ´ µ and hence Û Á is defined.
Since Á ´Ãµ we have that Á is indeed a well defined interpretation.The following can easily be shown: CLAIM: For every ÄÊ-concept and ÄÊ-relation R, ¾ ´ µ Á ÑÔÐ × ¾ Á Ø ¾ ´Rµ Á ÑÔÐ × ÙÖ´Øµ ¾ R Á PROOF OF THE CLAIM.The claim is obvious for atomic concepts and relations by the definition of Á.By induction it follows easily also for complex concepts and roles.We need the fact that Á is tuple admissible to ensure that the claim holds for concepts and relations involving counting expressions.
From this it follows that Á Ë and also Á , hence we have shown that Ã is consistent.
We now have the machinery to transform a query containment problem into one or more ËÀÁÉ schema and ABox satisfiability problems.In the next section we will present a decision procedure that will enable us to solve such problems.

Deciding Satisfiability of ËÀÁÉ Knowledge Bases
To test satisfiability of a knowledge base Ã Ë , we first internalise the schema Ë into the ABox , i.e, we add, for each individual Û that occurs in , an axiom Û Ë , where Ë Ð Ú ¾Ë ´ Ù µ Ù Í Ð Ú ¾Ë ´ Ù µ for Í ¾ AEÊ • a new transitive role with Ê Ú Í for all roles Ê occurring in Ã.Since Í functions as a universal role, the ABox resulting from this internalisation is satisfiable iff Ã is satisfiable.Thus it only remains to decide satisfiability of ËÀÁÉ-ABoxes.
Satisfiability of ËÀÁÉ-ABoxes can be decided by a tableaux algorithm that tries to construct a model for the input ABox by breaking down concepts occurring in into sub-concepts, possibly introducing new individual variables, and thus making explicit the constraints imposed on individuals in models of .To this purpose, it works on a completion forest (i.e., a collection of trees whose root nodes are possibly connected to each other) some of whose nodes correspond to individuals in a model.The forest's edges denote role-successorships, and each node is labelled with concepts it must be an instance of.This algorithm is similar to the one that decides satisfiability of ËÀÁÉ-concepts presented in Horrocks et al. [15].Due to lack of space, we can neither describe the algorithm in detail nor prove its soundness and completeness, and refer the reader to [14], pages 38-49.Instead, we will simply point out the differences between the concept-and the ABox-satisfiability algorithm.
Firstly, instead of working on a completion tree, it works on a completion forest, that is, a collection of completion trees whose nodes correspond to individuals of a model of the input ABox and whose root nodes correspond to those individuals that occur explicitly in the ABox.Secondly, the rules of the algorithm had to be modified to correctly handle completion forests.This mainly involves the rule that identifies some of the neighbours of a node Ü whenever it has Ò neighbour nodes with respect to a role Ê, and we learn that, due to an at-most number restriction, Ü must only have at most Ò ½ of these "Ê-successors".Here, we must take special care when root nodes are involved in this identification.Thirdly, the blocking condition which guarantees termination had to be modified in order to deal properly with root nodes.Basically, this means that root nodes can never be blocked.

Discussion
In this paper we have shown how the problem of query containment under constraints can be decided using a KB (schema plus ABox) satisfiability tester for the ËÀÁÉ description logic, and we have indicated how a ËÀÁÉ schema satisfiability testing algorithm can be extended to deal with an ABox.We have only talked about conjunctive queries, but extending the procedure to deal with disjunctions of conjunctive queries should be straightforward.Although there is some loss of expressive power with respect to the framework presented in [4], this seems to be acceptable when modelling classical relational information systems, where regular expressions are seldom used.
Given that the FaCT implementation of the ËÀÁÉ schema satisfiability algorithm has been shown to work well with realistic problems, and that the number of individuals generated by query containment problems will be relatively small, there is good reason to believe that a combination of the ABox encoding and the extended algorithm will lead to a practical decision procedure for query containment problems.Work is underway to test this hypothesis by extending the FaCT system to deal with ËÀÁÉ ABoxes.

Definition 3 . 3 . 3
Let Õ be a disjunctive query.The canonical disjunctive ABox for Õ is defined by conjunct in the query:

Figure 1 :
Figure 1: Reification of ÄÊ concepts and relations

CLAIM 1 :
Let Ø Ú Ø Û be two distinct ABox individuals.There is no conflict between Ø Á Ú and Ø Á Û .PROOF OF CLAIM 1: If Ø Á Ú Ø Á Û then there cannot be a conflict because no element conflicts with itself.Assume Ø Á Ú Ø Á Û but, for each ½ Ò, Á ´ØÁ Ú µ Á ´ØÁ Û µ (a conflict).Since Á and cannot conflict in Á.-If ¾ Conf and ¾ Conf, then we have that Á ½ ´ µ lies in the -th disjoint copy for , while Á ½ ´ µ lies in the -th disjoint copy.Thus, we cannot have a conflict between and .

Theorem 3.1.2 Given a schema Ë and two queries
Õ ½ and Õ ¾ , Ë Õ ½ Ú Õ ¾ iff Ë Õ½ Õ¾ .PROOF: For the if direction, assume Ë Õ ½ Ú Õ ¾ .Then there exists a model Á of Ë Let Ë be a schema, a canonical ABox and the completed version of .If Á is an interpretation such that Á Ë , then there exists an interpretation in contradiction of the assumption.Having collapsed , and (non-deterministically) replaced the É Û , we finally have a problem that we can decide using KB satisfiability.If Ë is a schema, is a completed canonical ABox and is a concept composed only of relations and concepts occurring in Ë or , then Ë Ûiff Û is an individual in and Ë ´ Û µ is not satisfiable, or Û is not an individual in and ´Ë Ú µ is not satisfiable.
For a ÄÊ KB Ã Ë and a ÄÊ ABox ¼ , the problem whether is included in ¼ w.r.t.Ë can be reduced to (possibly several) ÄÊ ABox satisfiability problems.

2. Proposition 3.3.6
Let Ë be a schema, a completed canonical disjunctive ABox and ¼ a disjunctive ABox.Then Ë , also apply in the disjunctive case, and can be used to transform each conjunctive ABox in ¼ so that it contains only such axioms.In the following Lemma (Lemma 3.3.7),which provides the reduction to ABox satisfiability, we use the notation Û¤ to describe the axiom which forces the interpretation of the individual Û not to be in the extension of .If Û is a non-distinguished individual, then it is the schema axiom Ú ; otherwise it is the ABox axiom Û . is satisfiable.Let Á be an interpretation satisfying Ã ¼ , and Á ¼ the restriction of this interpretation to exclude the non-distinguished individuals in ½ Á ¼ satisfies Ë .Therefore there is an extension Á ¼¼ of Á ¼ satisfying ¼¼ .By construction of Ã ¼ , there must be an assertion Û ¾ such that if ¼ iff there is a disjunctive ABox ¼¼ containing only axioms of the form Û , such that Ë ¼¼ PROOF: (SKETCHED) The same considerations set out in Section 3.2, which enable us to "collapse" connected components into single axioms of the form Û Ñ .Obviously Á The proof of the converse direction of Lemma 3.4.3 is more involved.The problem arises from the fact that a model Á of ´Ãµ may not be tuple-admissible, i.e., there may be two distinct elements Ø Ø ¼ ¾ ¡ Á that are reifications of the same tuple The next lemma shows that any consistent ËÀÁÉ knowledge base always has a tuple-admissible model.Let Ã ´Ë µ be ÄÊ-KB and ´Ãµ ´ ´Ëµ ´Ãµ ´ µµ its reified ËÀÁÉ-counterpart.If ´Ãµ is consistent, then there exists a tuple-admissible model Á for ´Ãµ, i.e., a model where, for every ¾ Ò Ò max and Ø Ø ¼ ¾ Á for ½ Ò.Ò it holds that as the result of simultaneously exchanging, for each ¾ Conf and each Conf with ¾ , ½ ´ µ with ½ ´ Ò µ.¯A conflict in Á ¼ can only involve two elements in the same disjoint copy.Let denote the conflicting elements which reside in the -th copy.W.o.l.g.,