Terminological Cycles in a Description Logic with Existential Restrictions*

Cyclic definitions in description logics have until now been investigated only for description logics allowing for value restrictions. Even for the most basic language FL₀, which allows for conjunction and value restrictions only, deciding subsumption in the presence of terminological cycles is a PSPACE-complete problem. This report investigates subsumption in the presence of terminological cycles for the language EL, which allows for conjunction and existential restrictions. In contrast to the results for FL₀, subsumption in EL remains polynomial, independent of wether we use least fixpoint semantics, greatest fixpoint semantics, or descriptive semantics. These results are shown via a characterization of subsumption through the existence of certain simulation relations between nodes of the description graph associated with a given cyclic terminology.

Partially supported by the DFG under grant BA 1122/4-3. of technology rather than need. In fact, the Galen medical knowledge base contains many cyclic dependencies [Rector and Horrocks, 1997]. Also, even in the case of acyclic terminologies, our polynomial subsumption algorithm improves on the usual approach that first unfolds the TBox (a potentially exponential step) and then applies the polynomial subsumption algorithm for ¦ ¢ -concept descriptions [Baader et al., 1999].
The first thorough investigation of cyclic terminologies in description logics (DL) is due to Nebel [1991], who introduced three different semantics for such terminologies: least fixpoint (lfp) semantics, which considers only the models that interpret the defined concepts as small as possible; greatest fixpoint (gfp) semantics, which considers only the models that interpret the defined concepts as large as possible; and descriptive semantics, which considers all models.
In [Baader, 1990;1996], subsumption w.r.t. cyclic terminologies in the small DL ¡ § ¢ ¥ ¤ , which allows for conjunction and value restrictions only, was characterized with the help of finite automata. This characterization provided PSPACE decision procedures for subsumption in ¡ £ ¢ ¤ with cyclic terminologies for the three types of semantics introduced by Nebel. In addition, it was shown that subsumption is PSPACE-hard. The results for cyclic ¡ £ ¢ ¤ -terminologies were extended by Küsters [1998]  (which extends ¡ £ ¢ 8 ¤ by full negation) is a syntactic variant of the multi-modal logic K opens a way for treating cyclic terminologies and more general recursive definitions in more expressive languages like 3 ¢ 9 6 and extensions thereof by a reduction to the modal mucalculus [Schild, 1994;De Giacomo and Lenzerini, 1994]. In this setting, one can use a mix of the three types of semantics introduced by Nebel. However, the complexity of the subsumption problem is EXPTIME-complete.
In spite of these very general results for cyclic definitions in expressive languages, there are still good reasons to look at cyclic terminologies in less expressive (in particular sub-Boolean) description logics. One reason is, of course, that one can hope for a lower complexity of the subsumption problem. For DLs with value restrictions, this hope is not fulfilled, though. Even in the inexpressive DL ¡ £ ¢ ¥ ¤ , subsumption becomes PSPACE-complete if one allows for cyclic definitions. This is still better than the EXPTIME-completeness that one Name Syntax Semantics concept name with cyclic definitions, but from the practical point of view it still means that the subsumption algorithm may need exponential time.
In contrast, the subsumption problem in ¦ ¢ can be decided in polynomial time w.r.t. the three types of semantics introduced by Nebel. The main tool used to show these results is a characterization of subsumption through the existence of so-called simulation relations.
In the next section we will introduce the DL ¦ ¢ as well as cyclic terminologies and the three types of semantics for these terminologies. Then we will show in Section 3 how such terminologies can be translated into description graphs. In this section, we will also define the notion of a simulation between nodes of a description graph, and mention some useful properties of simulations. The next three sections are then devoted to the characterization of subsumption in ¦ ¢ w.r.t. gfp, lfp, and descriptive semantics, respectively.  ¦ ¢ -TBoxes we have just defined is called descriptive semantic by Nebel [1991].

R T S
For some applications, it is more appropriate to interpret cyclic concept definitions with the help of an appropriate fixpoint semantics. Before defining least and greatest fixpoint semantics formally, let us illustrate their effect on an example. Example 1 Assume that our interpretations are graphs where we have nodes (elements of the concept name $ 0 ) and edges (represented by the role 0$ 4 ) 0 ) , and we want to define the concept $ ) 0 of all nodes lying on an infinite (possibly cyclic) path of the graph. The following is a possible definition of Now consider the following interpretation of the primitive concepts and roles: . All these models are admissible w.r.t. descriptive semantics, whereas the first is the gfp-model and the last is the lfp-model of the TBox. Obviously, only the gfp-model captures the intuition underlying the definition (namely, nodes lying on an infinite path) correctly.
It should be noted, however, that in other cases descriptive semantics appears to be more appropriate. For example, consider the definitions With respect to gfp-semantics, the defined concepts ! ) 0 and ! } must always be interpreted as the same set whereas this is not the case for descriptive semantics. 1 Before we can define lfp-and gfp-semantics formally, we must introduce some notation. Let Interpretations based on can be compared by the following ordering, which realizes a pairwise inclusion test between the s ª . respective interpretations of the defined concepts: if b r pb u  [Tarski, 1955] for complete lattices, it is not hard to show [Nebel, 1991] that, for a given primitive interpretation , there is always a greatest and a least (w.r.t. ¡ f ) model of based on . We call these models respectively the greatest fixpoint model (gfp-model) and the least fixpoint model (lfpmodel) of . Greatest (least) fixpoint semantics considers only gfp-models (lfp-models) as admissible models.
We will show in the following that all three subsumption problems are decidable in polynomial time. To do this, we represent ¦ ¢ -TBoxes as graphs.

Description graphs and simulations
¦ ¢ -TBoxes as well as primitive interpretations can be represented as description graphs. Before we can translate ¦ ¢ -TBoxes into description graphs, we must normalize them. In the following, let We say that the Since there is a polynomial translation of general TBoxes into normalized ones [Baader, 2002], one can restrict the attention to normalized TBoxes. Thus, we will assume that all TBoxes are normalized. Normalized

¦ ¢
-TBoxes can be viewed as graphs whose nodes are the defined concepts, which are labeled by sets of primitive concepts, and whose edges are given by the existential restrictions. For the rest of this section, we fix a normalized ¦ ¢ are finite, then this greatest simulation can be computed in polynomial time [Henzinger et al., 1995]. The following proposition is an easy consequence of this fact (see [Baader, 2002] s r u a . This proposition (whose proof can be found in [Baader, 2002]), can now be used to prove the following characterization of subsumption w. i.e., CV consists of the nodes in 4 that can reach a cyclic path in 4 . The following proposition is an easy consequence of the definition of lfp-semantics and of CV (see [Baader, 2002]). In Example 9, all the defined concepts belong to CV , and thus they are all unsatisfiable w.r.t. lfp-semantics.
Since all the defined concepts in CV are unsatisfiable, their definitions can be removed from the TBox without changing the meaning of the concepts not belonging to CV .
(Their definition cannot refer to an element of CV .) This leaves us with an acyclic terminology, on which gfp-and lfpsemantics coincide [Nebel, 1991] We will see below that the reason for t e and not being equivalent is that in the infinite path in 4 starting with t e , one reaches a with an odd number of edges, whereas is reached with an even number; for the path starting with , it is just the opposite. In contrast, the infinite paths starting respectively with t and "synchronize" after a finite number of steps. To formalize this intuition, we must introduce some notation.
We are now ready to state our characterization of subsumption w.r.t. descriptive semantics (see [Baader, 2002] for the proof). It remains to be shown that property (2) of the theorem can be decided in polynomial time. To this purpose, we construct a simulation ¢ such that (2) of Theorem 15 is equivalent to e X p k t r I u ¢ (see [Baader, 2002]  a satisfying e 6 p k t r u a to the strategy problem for a certain two-player game with a positional winning condition. The existence of a winning strategy is in this case a polynomial time problem [Grädel, 2002].

Future and related work
We have seen that subsumption in ¦ ¢ with cyclic terminologies is polynomial for the three types of semantics introduced by Nebel [1991]. In some applications, it would be interesting to have a mix of all three semantics, and it remains to be seen whether the polynomiality results also hold in such a setting (which would correspond to a restriction of the modal -calculus [Kozen, 1983]). Sub-Boolean DLs (like ¦ ¢ ) have attracted renewed attention in the context of so-called non-standard inferences [Küsters, 2001] like computing the least common subsumer and the most specific concept. In [Baader, 2003] we have shown that the characterization of subsumption in ¦ ¢ w.r.t. gfp-semantics also yields an approach for computing the least common subsumer in ¦ ¢ w.r.t. gfp-semantics. In addition, we have extended the characterization of subsumption in ¦ ¢ w.r.t. gfp-semantics to the instance problem, and have shown how this can be used to compute the most specific concept. Simulations and bisimulations play an important rôle in modal logics (and thus also in description logics). However, until now they have mostly been considered for modal logics that are closed under all the Boolean operators, and they have usually not been employed for reasoning in the logic. A notable exception are [Kurtonina and de Rijke, 1997;, where bisimulation characterizations are given for sub-Boolean Modal Logics and DLs. However, these characterizations are used to give a formal account of the expressive power of these logics. They are not employed for reasoning purposes.
In [Baader et al., 1999], subsumption between ¦ ¢ -concept descriptions was characterized through the existence of homomorphisms between the description trees (basically the syntax trees) associated with the descriptions. If one looks at the polynomial time algorithm for deciding the existence of such a homomorphism, then it is easy to see that it actually computes the greatest simulation between the trees. For trees, the existence of a homomorphism mapping the root to the root coincides with the existence of a simulation containing the tuple of the roots. For graphs, a similar connection does not hold. In fact, for graphs the existence of a homomorphism is an NP-complete problem. For simple conceptual graphs (or equivalently, conjunctive queries) the implication (containment) problem can be characterized via the existence of certain homomorphisms between graphs [Chein et al., 1998], and is thus NP-complete.