Model-based Most Specific Concepts in Description Logics with Value Restrictions

Non-standard inferences are particularly useful in the bottom-up construction of ontologies in description logics. One of the more common non-standard reasoning tasks is the most specific concept (msc) for an ABox-individual. In this paper we present similar non-standard reasoning task: most specific concepts for models (model-mscs). We show that, although they look similar to ABox-mscs their computational behaviour can be different. We present constructions for model-mscs in FL0 and FLE with cyclic TBoxes and for ALC with acyclic TBoxes. Since subsumption in FLE with cyclic TBoxes has not been examined previously, we present a characterization of subsumption and give a construction for the least common subsumer in this setting.


Introduction
Description Logics (DL) are a logic-based knowledge representation formalism, which can be used to represent terminological knowledge (in so called TBoxes) and knowledge about individuals (in so-called ABoxes) in a logically well-founded way [6].In recent years, various non-standard inference services for Description Logics (DL) have been developed.Among them are the least common subsumer (lcs) and the most specific concept (msc).The major application of non-standard reasoning services is supporting the development of ontologies, in particular when using the bottom-up approach [9].The least common subsumer for example is used to automatically generalize two given concept description into a third description that subsumes them both.The most specific concept on the other hand can be used to obtain a generalized description from an individual in an ABox.
Unfortunately, in most common logics most specific concepts need not exist, at least if one only allows for acyclic TBoxes.For the case of cyclic TBoxes research has shown that they do exist in smaller logics such as EL [1], ALN [2] and their computational properties have been examined.Their main application lies in the construction of ontologies using the bottom up approach.
Here we focus on a similar, but still different non-standard reasoning service: Instead of constructing the most specific concept describing an individual in an ABox we look at the most specific concept describing an element of the domain of the model (model-msc).
In [8] an approach for extracting knowledge from a model has been proposed.Given a model one tries to find axioms in the form of GCIs that hold in the model.The idea is to find a minimal set of GCIs such that all GCIs holding in that model follow logically.The approach presented in [8] for the description logic EL makes heavy use of most specific concepts for models.The reason why they are useful is that if a GCI A ⊑ B holds in a model then it follows from A ⊑ msc(A).Here msc(A) is the model-based most specific concept of A taken in the respective model.Thus if the model-based most specific concept exists, all GCI with a fixed left-hand side can be represented by a single GCI.
Another application of model-based most specific concepts is the construction of ontologies.When defining new concept descriptions using most specific concepts for ABoxes it is necessary that an ABox with a certain number of individuals is already present.ABoxes, like DLs in general, have an open-world semantics, which means that one does not expect them to be a complete representation of the knowledge about the domain of the knowledge base.At the beginning of the knowledge engineering process it may, however, be the case that no ABox is present, but that there is data in some closed-world representation.In the DLworld closed-world data corresponds to models.Thus extending the notion of most specific concepts to models is a step towards the application of the bottomup approach in this setting.
Although most specific concepts in models and most specific concepts in ABoxes seem to be closely related they can behave quite differently for certain logics, in particular for logics allowing for value restrictions.This is pointed out in Section 2.2 where we illustrate that model-mscs do not always exist in F L 0 without cyclic TBoxes while ABox-mscs do exist and are easy to compute.Further on we point out that they do exist if we allow for cyclic TBoxes.There may, however, be an exponential blow-up in size that cannot be avoided.
We then look at model-mscs in ALC ∪ * which is ALC with reflexive-transitive closure and union of roles.We provide an approach to construct them, which proves that they exist.Due to the great expressivity of ALC they are probably overfitted, i. e. they are so specific to the model that they describe that they probably do not provide a useful generalization for the bottom-up approach.This is why we go back to the less expressive logic F LE, the most simple logic that provides for both existential restrictions and value restrictions.We prove the existence of model-mscs in the presence of cyclic F LE-TBoxes.Since F LE is not a very common logic, there has not been much research concerning non-standard inferences in F LE. Previous work has been done by Baader et al. [3].They introduce a characterization of subsumption in F LE with acyclic TBoxes that uses so called F LE-description graphs.Using this characterization they obtain a construction for the least common subsumer.This has been extended to F LE + by Brandt et al. [10].
Little previous work has been done about cyclic concept descriptions in logics with both existential and value restrictions.Quite a while ago the K-Rep-system, which allows for cyclic concept descriptions and offers an concept constructor that combines existential and value restrictions, has been presented by Mays et al. [13].K-Rep, which does not use DL-semantics, but another semantics based

Syntax Semantics Concept Name
Table 1: Syntax and semantics of concept descriptions on so-called concept algebras.In a later paper they show that their semantics are equivalent to gfp-semantics in a DL-sense [11].Baader et al. [2] have examined non-standard inferences in ALN which, however, does not provide for full existential restrictions.
In this paper we extend the characterization from [3] to the more general case of F LE with cyclic TBoxes and gfp-semantics.We provide a characterization of subsumption and a construction for the least common subsumer.
1.1 Cyclic Terminologies with greatest fixpoint semantics in F L 0 , F LE and ALC Concept Descriptions in F L 0 are defined using a set N C of concept names and a set N r of role names and certain concept constructors.In the case of F L 0 these constructors are the top-concept (⊤), conjunction (C ⊓ D) and value restrictions (∀r.C).F LE uses these these constructors and additionally provides for existential restrictions (∃r.C).ALC provides for all the constructors in Table 1.1, i. e. conjunction, disjunction (⊔), negation (¬), existential and value restrictions.The semantics of these concept descriptions are defined using interpretations i = (∆ i , • i ).∆ i is a non-empty set, the domain of the interpretation, and an interpretation function • i mapping every role name to some binary relation over ∆ i and every concept name to some subset of ∆ i .The extension of • i to concept descriptions is defined recursively according to Table 1.1.
A TBox is a finite set of statements of the form A k ≡ D k where the A k are concept names and the D k are concept descriptions.We also require that the A k be distinct.Concept names that occur on the left hand side of one of these statements are called defined concept names all other concept names are called primitive concept names.Note that we do not require TBoxes to be acyclic, i. e. we allow a defined concept name to be used in the right hand side of its definition explicitly or implicitly.It is not immediately clear how to define the semantics of cyclic TBoxes.Indeed, Nebel [14] has shown that there are several possibilities to define it.In this paper we use greatest fixpoint semantics which are today the most common type of semantics for cyclic TBoxes in DL.A primitive interpretation i is a mapping that assigns a binary relation over ∆ i to every role name and a subset of ∆ i to every primitive concept name.An interpretation j is based on the primitive interpretation i if it coincides with i on all role names and primitive concept names.Note that j is uniquely defined by the set of interpretations of the defined concept names (A j 1 , . . ., A j n ).We denote the set of all interpretations that are based on i by Int(i).The interpretations from Int(i) can be compared by the following ordering For every subset of Int(i) both least upper bounds and greatest lower bounds exist and coincide with pointwise union and pointwise intersection, respectively.Hence (Int(i), ) is a complete lattice.The Knaster-Tarski-Fixpoint Theorem states that in every complete lattice (L, ≤) every monotone function f has a greatest fixpoint which is In the lattice (Int(i), ) we define a function f as follows.Let j ∈ Int(i) be an interpretations that maps the defined concept names A 1 , . . ., A n to A j 1 , . . ., A j n , respectively.Assume that the TBox T contains statements A 1 ≡ D 1 , . . ., A n ≡ D n .Then define f (j) to be the interpretation that maps the defined concepts A 1 ,. . ., A n to D j 1 ,. . ., D j n , respectively.For logics that do not allow for negation f is monotone.By the Knaster-Tarski-Fixpoint Theorem f has a greatest fixpoint.This greatest fixpoint is the interpretation that maps every defined concept We call this fixpoint the gfp-model of T corresponding to i.Note that both for cyclic and acyclic TBoxes there is exactly one model for a given primitive interpretation.Therefore we will use the terms model and primitive interpretation interchangeably.For matters of simplicity we will denote both the primitive interpretation and the corresponding gfp-model by the letter i.
Let T be some TBox and let A and B be two defined concepts from T .We say that A is subsumed by B and write A ⊑ T B iff A i ⊆ B i for all gfp-models i of T .Several ways to define a conservative extension of TBox exist in the literature.
In this work we use two different definitions.A conservative extension of a TBox T is a TBox T ′ such that T ⊆ T ′ , and if A and B are concept names used in T then A ⊑ T ′ B iff A ⊑ T B. So T ′ introduces new concept definitions but leaves the subsumption relations among concepts occurring in T invariant.A strictly conservative extension of a TBox T is a TBox T ′ such that • T ′ uses the same primitive concept names and role names as T , and • if A is a defined concept in T and i a primitive interpretation of N p and N r , i T and i T ′ the corresponding models of T and T ′ , then So T ′ introduces new concept definitions but leaves the extensions of concept names occurring in T invariant.
Definition 1 Let T be a TBox containing the defined concepts A, B and let T 0 be a conservative extension of T .Let T 0 contain a defined concept E. E is called the least common subsumer of A and B iff the following conditions hold.
An ABox A is a finite set of assertions about individuals, i. e. statements of the form A(a) and r(a, b), where A is a concept name, r a role name and a and b are individual names from a finite set of individual names N i .We call i a gfp-model of an ABox A and a TBox T if i is a gfp-model of T and satisfies a i ∈ A i for all statements A(a) in A, and (a i , b i ) ∈ r i for all statements r(a, b) in A.
Definition 2 (most specific concept) Let T be a TBox and A an ABox.Let a be some individual name from A. Let T 0 be a conservative extension of T , and E a defined concept in T 0 .Then E is called the most specific concept of a iff the following conditions hold The above definition is the standard way of defining most specific concepts for ABox-individuals.In this paper we also look at a different type of most specific concepts: most specific concepts for models.
Definition 3 (model-based most specific concepts) Let T be a TBox and i a model of T .Let x ∈ ∆ i be some object from the domain of i and E a defined concept in T .Then E is called the model-based most specific concept (modelmsc) of x iff the following conditions hold P a r Figure 1: A simple cyclic model Let T 1 and T 2 be two TBoxes in some logic that uses gfp-semantics.Assume that T 1 and T 2 use the same set of role names N r and the same set of primitive concept names N p .To simplify notation we denote by T 1 ∪T 2 the TBox that is obtained as follows.First make the sets of defined concept names in T 1 and T 2 disjoint, for example by renaming.Let T 1 ∪T 2 be the union of these TBoxes with disjoint sets of defined concept names.It is easy to see that T 1 ∪T 2 is a strictly conservative extension to both T 1 and T 2 .
We can say that model-based most specific concepts are unique up to equivalence in the following sense.Let T 1 and T 2 be TBoxes, E be a defined concept in T 1 and a most specific concept for some x ∈ ∆ i , and F be a defined concept in T 2 and a most specific concept for x.Then E ≡ T 1 ∪T 2 F .
To avoid confusion we use the expression model-msc when speaking of most specific concepts for models and ABox-msc when speaking of most specific concepts for models.
2 Most specific concepts in FL 0

Most specific concepts in F L 0 with acyclic TBoxes
Even though we will concentrate on cyclic terminologies for the rest of the paper, this short section takes a look at the situation in F L 0 with acyclic TBoxes.It serves to illustrate that most specific concepts for models and most specific concepts for ABoxes are not as closely related as it may seem from the definitions.
Look at the very simple example of a set of role names N r = {r}, a set of primitive concept names N p = {P } and an empty TBox T .Let i be the model defined by the following primitive interpretation. . . .This a sequence of increasing role depth such that a ∈ E i k for every natural number k.Thus, there is no model-msc for a in i, since every concept description in an unfoldable TBox can only have finite role depth.Basically, one would have to express that all r * successors of a are in P i , where r * is the reflexive-transitive closure of r.However, this cannot be done using just the expressivity of F L 0 without allowing for terminological cycles.
Things look completely different if we look at the graph in Figure 2.1 as representing an ABox A. Let A be the ABox that contains a single individual a and the statements P (a) and r(a, a).Then a is not an instance of E ≡ ∀r.P , since we have the open-world semantics of the ABox, in contrast to the closed world assumption of the models.We simply do not know, whether all r-successors of a are instances of P or not; we only have information about one particular rsuccessor of a. Thus P is the most specific concept for a in A. It is possible to compute ABox-mscs in F L 0 for any ABox, simply by propagating ∀r-statements along r-edges in the ABox.For example for the the individual a in the ABox A 1 together with the TBox T 1 shown in Figure 2.1 is the most specific concept.The term ∀r 2 .∀r 1 .Q is directly stated in the ABox, the term P is obtained by propagating ∀r 1 .P along the r 1 edge starting from b, and Q is obtained by propagating ∀r 2 .∀r 1 .Q along the r 2 and r 1 -edges starting from a. Computing the ABox-msc in the presence of acyclic TBoxes in F L 0 can even be done in polynomial time.

Most specific concepts for models in F L 0
We have seen that in the case of acyclic F L 0 -TBoxes ABox-mscs always exist, while there are examples where model-mscs do not exist.To guarantee existence of model-mscs for finite models one possibility is to allow for cyclic TBoxes with gfp-semantics.
To prove that model-mscs always exist for F L 0 with cyclic TBoxes, we make use of characterizations of gfp-semantics that have been found by Baader [4].These characterizations are based on semi-automata with word transitions.A semiautomaton with word transitions is a triple A = (Σ, Q, E), where Σ is a finite alphabet, Q is a set of states and E ⊆ Q × Σ * × Q is a set of labeled edges.We associate every F L 0 -TBox T to a semi-automaton A T .We first normalize the TBox T such that every concept description in T is of the form Then A T is defined to be a semi-automaton over the alphabet N r whose states are the concepts in T and where every concept description we denote the set of all words labelling paths from A to B in A T .Now let i be a finite gfp-model, i. e. a i is based on an primitive interpretation such that ∆ i is finite.We define an automaton A i over the alphabet N r which has the elements of ∆ i as states and the edges (a, r, b) for all r ∈ N r and (a, b) ∈ r i .We denote by L(a, b) the set of all words labelling paths from a to b in A i .
The following two propositions from [4] help proving the existence of model-based most specific concepts.
Proposition 1 Let T be a TBox and let A T be the corresponding semi-automaton.Let i be a gfp-model of T and let A be a concept name occurring in T .For any iff for all primitive concepts P , all words W ∈ L(A, P ) and all individuals e ∈ ∆ i W ∈ L(d, e) implies e ∈ P i .
Proposition 2 Let T be a terminology and let A T be the corresponding semiautomaton.Let A, B be concept names occuring in T .Subsumption in T can be reduced to inclusion of regular languages defined by A T .More precisely A ⊑ T B iff L(B, P ) ⊆ L(A, P ) for all primitive concepts P.
Let i be a fixed model.For every x ∈ ∆ i and Together with the previous propositions this definition allows us to characterize model-mscs as follows.
Lemma 1 Let x ∈ ∆ i .If for some F L 0 -TBox T and some defined concept E in T we have L(E, P ) = L(x, P ) for all primitive concepts P , then E is the model-msc for x.
Proof: From Proposition 1 it follows that x ∈ E i if and only if L(E, P ) ⊆ L(x, P ) for all primitive concepts P .This must be the case since we even have L(E, P ) = L(x, P ).Thus x ∈ E i .We need to show that E is the most specific concept description with this property.Assume that F is another concept description with x ∈ F i .Then by Proposition 1 we get L(F, P ) ⊆ L(x, P ) = L(E, P ) for all primitive concepts P .By Proposition 2 this implies E ⊑ F .So E is the least concept description with x ∈ E i .
From Lemma 1 we know that if for every P ∈ N p we can find an automaton that accepts L(x, P ) then a most specific concept for x can be constructed.However, it is not immediately clear if such automata exist, which is why we need the following lemma.
holds.Since the L(x, c) are regular this implies that L(x, P ) is also regular. Proof: Since the L(x, P ) are regular, it is possible to construct an automaton for them.This is exactly what we need in order to prove existence of model-mscs for F L 0 .
Theorem 1 In F L 0 model-mscs exist for every gfp-model i and every x ∈ ∆ i .Moreover if F is a model-msc for x then L(F, P ) = L(x, P ) for all primitive concepts P .
Proof: Since by Proposition 3 L(x, P ) is regular for all primitive concepts P we can construct a semi-automaton A with the following properties • Among the states of A there is a state labelled x and there are states labelled P i for every primitive concept P i .
• If x is chosen as initial state and P i is chosen as terminal state then A accepts L(x, P i ).
If we translate this automaton back into a TBox T we get a cyclic concept description E A that satisfies the conditions of Lemma 1. From the lemma it follows that E A is the most specific concept of x.
Let T ′ be some other TBox such that T ′ contains a defined concept F that is a most specific concept of x.Then F ≡ T ∪T ′ E. From Proposition 2 we get that L(F, P ) = L(E A , P ) = L(x, P ) for all primitive concept names P .
Theorem 1 provides an automata-theoretic approach for constructing the model most specific concept for a given model.In one step of this approach the complement of a non-deterministic automaton is being constructed.The standard way to do this is to first construct the corresponding deterministic automaton and then turning accepting states into non-accepting states and vice-versa.However, there may be an exponential blow-up in size when constructing the deterministic automaton.
The question arises whether this blow-up can be avoided by using a different construction.The following example proves that this is not the case.Given a natural number n, consider a set of role names 1}} and a single primitive concept name P .Define a model i as follows . ., w n , y 1 , . . ., y n }, Informally, the model i can be thought of as being obtained from n models i k like the one depicted in Figure 2.2 by merging the x k .In the automaton A i k the language L(x k , y k ) (and thus also L(x, y k ) in A i ) contains all words in which both r (k,0) and r (k,1) occur.By Proposition 3 which is the language of all words in N * r that do not contain both r (k,0) and r (k,1) for any k.
One can show that any non-deterministic automaton accepting L(x, P ) must have at least 2 n states.Let a 1 , a 2 , . . ., a n be an arbitrary sequence of 0 and 1.There are 2 n words of the form w = r (1,a 1 ) r (2,a 2 ) . . .r (n,an) .Let w and w ′ be two mutually distinct words of this form.both ww and w ′ w ′ are in L(x, P ), but ww k ′ and w k w k ′ are not.Thus in any automaton A accepting L(x, P ) there must be accepting runs for both ww and w ′ w ′ .Let q w and q w ′ be the states reached after n steps in these accepting runs.Then q w and q w ′ must be distinct, for otherwise ww ′ and ww ′ would also be accepted by A. There are 2 n different words of the form w = r (1,a 1 ) r (2,a 2 ) . . .r (n,an) .Hence A must contain at least 2 n mutually distinct states q w .Together with Lemma 1 this proves that any concept description which is a model-msc for x in i must be of size exponential in n, while the size of i is linear in n.
So we have shown that model-mscs for F L 0 exist if we allow for cyclic terminologies.However in comparison to ABox-mscs for F L 0 with unfoldable TBoxes there may be an exponential blow-up in size, which cannot completely be avoided.

Model-mscs in ALC
When it comes to ALC it is, again, quite easy to see that if we do not allow for cycles in TBoxes then neither ABox-mscs nor model-mscs exist.Basically any cyclic ABox or cyclic model provides a valid counter-example.Contrary to most less expressive logics it is not necessary to allow for cyclic TBoxes.Simpler extensions such as union and reflexive-transitive closure of roles (ALC ∪ * ) also do the job.In the following we do the proof for ALC ∪ * with acyclic TBoxes.
Let i be some primitive interpretation over a set of role names N r and a set of primitive concept names N p .Two states v ∈ ∆ i and w ∈ ∆ i are called modally equivalent or indistinguishable with respect to ALC if there is no ALC concept description C such that v ∈ C i and w / ∈ C i .First of all we assume that i is a model that does not contain any indistinguishable states.Then for every x ∈ ∆ i we can find ALC-concept descriptions C x such that y ∈ C i x implies y = x.Define where we define the empty disjunction to be ⊥ and the empty conjunction to be ⊤.It is then not hard to check that x ∈ D i x for every x ∈ ∆ i .
Theorem 2 Let i be a model and x ∈ ∆ i a state.Define Then x ∈ M i x .Furthermore M x has the property that for every model j and every x ′ ∈ M j x the states x and x ′ are bisimilar.It is well known from modal logic that bisimilar states cannot be distinguished by neither ALC nor ALC ∪ * concept descriptions.This implies that M x is a modelmsc for x.
Proof: We start by proving x ∈ M i x .Obviously x ∈ C x is true.From the assumption that for every y ∈ ∆ i the only element of C i y is y itself it follows that z ∈ (¬C y ⊔ D y ) i for every z ∈ ∆ i , z = y.On the other hand the definition of D y has been constructed in such a way that y ∈ D i y and therefore y ∈ (¬C y ⊔ D y ) i .So z ∈ (¬C x ⊔ D x ) i for every state z ∈ ∆ i and therefore this also holds true for all r∈Nr r * -successors of w.Hence x ∈ M i x .Let j be a model and x ′ ∈ M j x a state.We define a relation and show that it is a bisimulation.So the definition of Z requires that for every pair (y, y ′ ) ∈ Z the state y ′ is a ( r∈Nr r) * -successor of x ′ such that y ′ / ∈ (¬C y ) j .Since x ′ ∈ M j x this implies y ′ ∈ D j y .To prove that Z is a bisimulation we need to check three properties.
1.) Since y ′ ∈ D j y it follows from the definition of D y that y ′ ∈ P j iff y ∈ P i .This shows the first property of bisimulations, namely that y and y ′ satisfy the same primitive concept descriptions.
2.) Suppose that (y, z) ∈ r i , (y, y ′ ) ∈ Z, for some y, z ∈ ∆ i and some r ∈ N r .It follows from y ′ ∈ D j y that y ′ ∈ ∃r.C z j , i. e. there is some z ′ ∈ C j z such that (y ′ , z ′ ) ∈ r j .Since (x ′ , y ′ ) ∈ r∈Nr r * j this proves that (x ′ , z ′ ) ∈ r∈Nr r * j . Thus (z, z ′ ) ∈ Z by definition of Z.So we have shown that Z satisfies the so-called forth condition for bisimulations.
3.) Assume that (y ′ , z ′ ) ∈ r j , (y, y ′ ) ∈ Z, for some y, z ′ ∈ ∆ j and some r ∈ N r .We know that y ′ ∈ ∀r.z∈∆ i (y,z)∈r i C z since y ′ ∈ D j y .Therefore there must be some z ∈ ∆ i such that (y, z) ∈ r i and z ′ ∈ C j z .As above it follows from (y ′ , z ′ ) ∈ r j and (x ′ , y ′ ) ∈ r∈Nr r * j that (x ′ , z ′ ) ∈ r∈Nr r * j . So (z, z ′ ) ∈ Z by definition of Z.This proves the so-called back condition for bisimulations.
We have thus shown that Z is a bisimulation.In a last step we check whether the pair (y, y ′ ) is in Z. Actually, this is easy to see, because y ′ ∈ C j y follows immediately from y ′ ∈ M j x and the definition of M x and (y ′ , y ′ ) ∈ r∈Nr r * j is trivial.We have hence shown that y and y ′ are bisimilar and thus cannot be distinguished by some ALC ∪ * -concept description.
Now assume that i is a finite model in which some states cannot be distinguished by ALC concept descriptions.Then we can define an equivalence relation ↔ on ∆ i by defining x ↔ y if there is no ALC-concept description C such that x ∈ C and y / ∈ C. We denote the equivalence class of some x ∈ ∆ i by [x].Now we consider the model (∆ i / ↔ , • i/↔ ) where we define ∆ i / ↔ to be the set of all ↔equivalence classes.For r ∈ N r define r i/↔ = {([x], [y]) | x, y ∈ ∆ i , (x, y) ∈ r i } and for every P ∈ N p define

Furthermore define the relation
It is purely technical to check that Z is well-defined and a bisimulation.But x and [x] being bisimilar implies that every PDL-concept description that is satisfied by x must also be satisfied by [x] and vice versa.Hence a most specific concept for [x] is also a most specific concept for x.Such a most specific concept for [x] can be constructed using Theorem 2, since (∆ i / ↔ , • i/↔ ) by its definition does not contain any indistinguishable states.
Thus model-mscs exist in any finite ALC ∪ * -model i.But Theorem 2 goes even further.For x ∈ ∆ i the model-msc M x describes precisely those states that are bisimilar to x, i. e. cannot be distinguished by ALC nor by ALC ∪ * , not even if we add fixpoint operators.Since ALC is quite expressive this indicates that M x is probably overfitted for practical purposes.In particular, when constructing an ontology using the bottom-up approach, one would probably like to have some sort of generalization, when creating concepts out of closed-world data.Therefore less expressive logics might be worth a closer look.
When it comes to ABox-mscs we do not yet know whether they exist in ALC with universal role.However, we strongly conjecture that they do not exist, even for simple cases like the ABox from Figure 2.1.In this case one would somehow need to express that there is an infinite chain of r-successors.Therefore one would probably need some term of the form ∀r * .F , however any such term would already be too specific in an open-world context.Yet, this is a mere conjecture, the actual proof still needs to be done.4 Model-msc and least common subsumers in FLE with greatest fixpoint semantics It has been shown that model-mscs exist in EL, the most simple DL providing for existential restrictions, if one allows cyclic TBoxes [8].In this paper, so far we have shown that they exist in F L 0 , the most simple DL providing for value restrictions, if we allow cyclic TBoxes, and in ALC ∪ * .Yet, presumably in the case of ALC ∪ * they are overfitted and thus not useful.Thus it is only natural to look at logics whose expressivity lies between the two very basic DLs and full ALC.The simplest logic allowing for both full existential and value restrictions is F LE.
F LE is admittedly not the most common DL-language, and has never really been in the focus of research.Previous work mainly focuses on the case of acyclic TBoxes [3].However, it is clear that in order to create model-mscs for cyclic models one has to somehow extend F LE.The most natural extension one can think of is, again, cyclic TBoxes.Since the gfp-semantics of cyclic TBoxes is somewhat unintuitive we would like to have a characterization similar to the ELdescription graphs from [3], or the automata theoretic approach for F L 0 , that we have presented above.

Characterizing the semantics
We characterize gfp-semantics via a generalization of the F LE-description-graphs that were introduced by Baader et al. [3].Let T be an F LE-TBox.Without loss of generality assume that T is in normal form, i. e. every statement in T is of the form By introducing new defined concept names one can easily transform any TBox into normal form in polynomial time.
With every TBox T we can associate an F LE-description graph G T = (V T , E T , L T ) which is a directed graph with labelled nodes and edges where • the set of vertices V T is the set of defined concepts in T and • for every concept A defined as in (2) the set E T contains an edge (A, ∃r k , D k ) for all 1 ≤ r ≤ m and an edge (A, ∀r k , E k ) for all 1 ≤ r ≤ n and • the labelling function L T maps A to the set {P 1 , . . ., P l }.
In the case of EL-description graphs gfp-semantics can be characterized via graphsimulations.To a certain extent, this is also possible for F LE, but obviously one has to take special care of value-restrictions.This is why we define two types of simulations.The first type of simulation maps one F LE-description graph to another.
Definition 4 (Graph-Simulation) Let G T 1 and G T 2 be description graphs of some F LE-TBoxes T 1 and simulation iff for all pairs (A, B) ∈ ϕ the following statements are true.
Instead of graph-simulation we simply write simulation if it is clear from the context, that we are dealing with two F LE-description graphs.The second type of simulation that we define maps F LE-description graphs to models.
Definition 5 (Model-Simulation) Let i = (∆ i , • i ) be a model and G T the description graph of some F LE-TBox T .A binary relation ϕ ⊆ V T × ∆ i is called model-simulation iff for all pairs (A, x) ∈ ϕ the following statements are true.
(MS1) P ∈ L T (A) ⇒ x ∈ P i (MS2) For all edges (A, ∃r, B) ∈ E T there is some y ∈ ∆ i such that (x, y) ∈ r i and (B, y) ∈ ϕ.
(MS3) For all edges (A, ∀r, B) ∈ E T and all y ∈ ∆ i with (x, y) ∈ r i it holds that (B, y) ∈ ϕ.
This notion of model-simulation does indeed yield a characterization of gfpsemantics, just like in the case for EL.
Lemma 2 For x ∈ ∆ i and some defined concept A it holds that x ∈ A i iff there is a model simulation ϕ from G T = (V T , E T , L T ) to i such that (A, x) ∈ ϕ. Proof: We prove that ϕ is a model-simulation by checking properties (MS1) to (MS3).Let (B, x) ∈ ϕ. (MS1) follows directly from the semantics of F LE. (MS2): Let (B, ∃r, F ) ∈ E T be some edge.Then By definition of ϕ it holds that x ∈ B i .Hence, there is some z ∈ ∆ i such that (x, z) ∈ r i and z ∈ F i , i. e. (F, z) ∈ ϕ. (MS3) can be shown analogously.
(A, x) ∈ ϕ follows immediately from the definition.
If: We prove this using (1) from the Knaster-Tarski-Theorem.As usual, we denote both the gfp-model i as well as the primitive interpretation that it is based upon by i. Define an interpretation j as follows.Let j coincide with i on all role names and primitive concept names.For all defined concept names B define B j = {y ∈ ∆ i | (B, y) ∈ ϕ}.Notice that j is not necessarily a gfp-model, just some interpretation based on i.We prove that for every statement A ≡ D in T where A is a defined concept name and D a concept description it holds that A j ⊆ D j .Since T is normalized D is of the form Let y ∈ A j .From the way we have defined j it follows that (A, y) ∈ ϕ.Since ϕ is a simulation this implies that y ∈ P i k = P j k for all 1 ≤ k ≤ l.Now consider r k , E k for some k, 1 ≤ k ≤ m.Because of (A, y) ∈ ϕ and (MS2) there must be some From (MS3) and (A, y) ∈ ϕ we get that (y, z) ∈ s k implies (F k , z) ∈ ϕ for all z ∈ ∆ i .Thus y ∈ (∀s k .F k ) j .We have thus shown that These results prove that A j ⊆ D j for all statements A ≡ D in T .By (1) this implies A j ⊆ A i , where i is the gfp-model based on the primitive interpretation i.Furthermore, (A, x) ∈ ϕ implies x ∈ A j which in turn implies x ∈ A i , which is what we wanted.
Lemma 2 provides a useful characterization for the gfp-semantics in F LE. Given a model i an object x ∈ ∆ i , and a TBox T containing a defined concept A it is possible to check whether x ∈ A i in polynomial time.This is because checking whether there is a model-simulation from G T to i containing (A, x) can be done in polynomial time.The proof is only outlined here.First one has to show that the union of two model-simulations is also a model-simulation.This implies that there is a greatest model-simulation from G T to i.This greatest model-simulation can be obtained by taking the whole set V T ×∆ i and subsequently removing pairs that do not satisfy one of the conditions (MS1) to (MS3).

Canonical terminologies and model-msc
Another way to characterize gfp-semantics is via graph-simulations and what we shall call canonical terminologies.Definition 6 (canonical terminology) For a given primitive interpretation i define a TBox T i as follows: • The defined concept names in T i are the subsets U ⊆ ∆ i .
• Denote by S r,U = {y ∈ ∆ i | ∃x ∈ U : (x, y) ∈ r i } the set of r-successors of U.
• Let T i contain all statements of the form We call T i the canonical terminology of i.
Lemma 3 Let T be some F LE-TBox and A a defined concept in T .Let i be a gfp-model of T , T i its canonical terminology and x ∈ ∆ i .Then x ∈ A i iff there is a graph-simulation ϕ from G T to G T i with (A, {x}) ∈ ϕ.
Proof: If: Let ϕ be a graph-simulation from G T to G T i .Define a binary relation ψ as follows.For every defined concept E in T and every x ∈ ∆ i , let We check the properties of a model-simulation.
By definition of T i this implies U ⊆ P i and thus x ∈ P i .(MS2) Let (E, ∃r, F ) ∈ E T be some ∃-edge and (E, x) ∈ ψ.Choose some set U ⊆ ∆ i , such that (E, U) ∈ ϕ, x ∈ U.By (S2) there must be some set Hence, there is some y ∈ V such that (x, y) ∈ r i .Since (F, V ) ∈ ϕ it follows that (F, y) ∈ ψ. (MS3) Let (E, ∀r, F ) ∈ E T be some ∀-edge and (E, x) ∈ ψ.
We can find U ⊆ ∆ i , such that (E, U) ∈ ϕ, x ∈ U.By (S3) there must be some set By definition of T i this implies that V = S r,U .For every y ∈ ∆ i , (x, y) ∈ r i it follows that y ∈ V and since (F, V ) ∈ ϕ we get (F, y) ∈ ψ.This proves that ψ is a model simulation.Furthermore (A, x) ∈ ψ since (A, {x}) ∈ ϕ.From Lemma 2 we obtain that x ∈ A i .
Only if: Lemma 2 shows that there is some model-simulation ψ from G T to G T i .Define a binary relation ϕ as follows.For every defined concept A in T and every set U ⊆ ∆ i (A, U) ∈ ϕ iff (A, x) ∈ ψ for all x ∈ U.
We prove that ϕ is a graph-simulation.(S1) Let (A, U) ∈ ϕ.From (MS1) get that x ∈ P i for all P ∈ L T (A) and all x ∈ U. Thus U ⊆ P i for all P ∈ L T (A).By definition of For every x l (MS2) implies that there is some y l ∈ ∆ i such that (x l , y l ) ∈ r i and (B, y) ∈ ψ.Define V = {y 1 , . . ., y k }.Then V ∈ S r,U and (B, V ) ∈ ϕ. (S3) Let (A, U) ∈ ϕ, (A, ∀r, B) ∈ E T .Then (U, ∀r, S r,U ) ∈ E T i .(MS3) implies (B, y) ∈ ψ for all y ∈ S r,U and thus (B, S r,U ) ∈ ϕ.
Lemma 3 yields yet another characterization for gfp-semantics.However, this is not useful for model-checking, since the size of the canonical terminology of a model can be exponential in the size of the original model, while we have seen that model checking can be done in polynomial time.The real purpose of the canonical terminology is, that it provides us with a most specific concept.This can be proved using the following sufficient condition for subsumption.
Lemma 4 Let T be an F LE-TBox and T ′ a strictly conservative extension of T .Let A and B be defined concepts in T and T ′ , respectively.If there is a simulation ϕ from G T ′ to G T such that (B, A) ∈ ϕ, then A ⊑ T ′ B.
Proof: Let i be some primitive interpretation of N r and N p .We also denote the corresponding gfp-models by i.Let T i be the canonical terminology for i.Let x ∈ A i .From Lemma 3 it follows that there is a simulation ψ from G T to G T i such that (A, {x}) ∈ ψ.Since the composition of graph-simulations is also a graph-simulation ψ • ϕ is a simulation from G T to G T i such that (B, {x}) ∈ ψ • ϕ.Then x ∈ B i follows from Lemma 3.
Corollary 1 Let i be a primitive interpretation for a given set of primitive concept names N p and a set of role names N r .Let T i be the canonical terminology for i.Then {x} in T i is a model-msc for i.
Proof: Follows directly from Lemma 3 and Lemma 4.
Since we are using a subset construction to produce the canonical terminology the size of the most specific concept may become exponentially large in the size of the model.As in the case of F L 0 this cannot always be avoided.The same example as in the case of F L 0 can be used to illustrate this (cf.Figure 2.2).

Subsumption and least common subsumers for F LE with cyclic TBoxes
As in the case of model-checking, it is desirable to find some characterization of subsumption, that is a bit more intuitive than the fixpoints themselves.Lemma 4 already provides a sufficient condition for subsumption.Unfortunately the converse of Lemma 4 is not true, not even in the case of acyclic TBoxes.
For the case of acyclic TBoxes, Küsters and Molitor have presented a characterization for subsumption [3].They introduce a normalization step which consists of propagating ∀r statements along ∃r-edges in the description graph and of merging ∀r edges that share a common origin.What we do in the case of cyclic TBoxes is very similar.The normalization is obtained by a subset construction.
Definition 7 Let T be some F LE-TBox, N d the set of defined concept names in T .We define a TBox T N as follows.
• The defined concept names in T are the subsets U ⊆ N d .
• ∆ i N = P(N d ) Once again it is only technical to check that ϕ is a model-simulation from G T N to i N .(MS1) follows directly from the definitions of T N and i N .(MS2) Let (V, W ) ∈ ϕ, (V, ∃r, X) ∈ E T N .Then X = {B} ∪ S r,V for some B ∈ N d and some A ∈ V with (A, ∃r, B) One can think of i N as being obtained from G T N by removing all ∀-edges and then transforming the description graph into a model.
Theorem 3 Let T be an F LE-TBox and T ′ a strictly conservative extension of T .Let T N be the normalisation of T .Let A and B be defined concepts in T and T ′ , respectively.If A ⊑ T ′ B then there is a simulation ϕ from G T ′ to G T N such that (B, {A}) ∈ ϕ.
Proof: First assume that T ′ is acyclic.Let U ⊆ N d be a subset of the defined concepts in T such that U i ⊆ B i holds for all primitive interpretations i.We prove that there is a simulation ϕ from G T ′ to G T N with (B, U) ∈ ϕ.Since T ′ is acyclic, B can be expanded into a finite tree, which means we can use induction over the structure of B to prove our claim.
For the base case consider B ≡ P for some primitive concept name P .Define a canonical model i N as in Lemma 8. Since U ∈ U i N and U i N ⊆ B i N also U ∈ B i N and thus U ∈ P i N holds.Therefore the simulation ϕ = {(B, U)} is a simulation with the desired properties.Obviously this simulation also works for the case that B ≡ ⊤ and we are done with the base case.
We divide the step case into three subcases.
Step case 1: First let B ≡ E 1 ⊓. ..⊓E n where we already know from the induction hypothesis that there are simulations ϕ } is a simulation with the desired properties.
Step case 2: Consider B ≡ ∀r.E for some r ∈ N r and some defined concept E for which the induction hypothesis holds.By definition there is exactly one ∀r-edge in G T N starting from U, say (U, ∀r, V ) ∈ E T N .Suppose that there is no simulation ϕ from G T ′ to G T N with (B, U) ∈ ϕ.Then there is also no simulation ψ from G T ′ to G T N with (E, V ) ∈ ψ because ψ could be extended by (B, U) and would still be simulation.The induction hypothesis implies that V ⊑ E. Thus there must be some model i 0 and some x ∈ ∆ i 0 such that x ∈ V i 0 but x / ∈ E i 0 .We can extend the model i N by i 0 to a new model i in the following way.W.l.o.g.∆ i 0 and ∆ i N are disjoint.
for all primitive concept names P .One can easily build a model simulation from G T N to i as the union of the two model simulations from G T N to i 0 and to i N , respectively.On the other hand there cannot be a model-simulation containing (B, U) from G T ′ to i, since such a simulation would yield a model-simulation containing (E, x).Therefore U ∈ U i but U / ∈ B i .This contradicts U i ⊆ B i .Hence our assumption that there is no simulation from G T ′ to G T N containing (B, U) must be false.
Step case 3: The last case where B ≡ ∃r.E for some r ∈ N r is treated in a similar way.Assume that the induction hypothesis holds for E but not for B, i. e. there is no simulation from G T ′ to G T N containing (B, U).Let V 1 , . . ., V k be the ∃r-successors of U in G T N .Then for none of the V l there can be a simulation from G T ′ to G T N containing (E, V l ).Therefore V l ⊑ E for all 1 ≤ l ≤ k.We choose models i l and x l ∈ ∆ i l such that x ∈ V i l l but x / ∈ E i l for all 1 ≤ l ≤ k.As in the previous case this can be used to construct a model.Without loss of generality, we can assume that ∆ i N and ∆ i l , 1 ≤ l ≤ k, are mutually disjoint.
Again it can be shown that in this model i it holds that X ∈ U i but X / ∈ B i , which contradicts U i ⊆ B i .Therefore there must be some simulation from from G T ′ to G T N containing (B, U).
Therefore we have proved via structural induction that in the case where T ′ is an acyclic TBox, B some defined concept in T ′ , and U some defined concept from T N there is a simulation ϕ from G T ′ to G T N with (B, U) ∈ ϕ if U i ⊆ B i for all primitive interpretations i.Since we know that A ⊑ T ′ B holds and thus A i ⊆ B i for all primitive interpretations i it follows from Lemma 5 that {A} i ⊆ B i .Thus we have shown that there must be a simulation ϕ from G T ′ to G T N with (B, {A}) ∈ ϕ in the acyclic case.Now let T ′ be a cyclic TBox.Consider T ′d,B , the unraveling of T starting from B for some natural number d.Let (B) be the defined concept in T d,B that corresponds to B. Then A i ⊆ B i ⊆ (B) i for all models i.Since T d,B is acyclic it follows from the previous case that there is a simulation from G T ′d,B to G T N containing ((B), {A}).From Lemma 6 we obtain that there is a simulation from G T ′ to G T N containing (B, {A}).
Thus we have found a necessary condition for subsumption in F LE with cyclic TBoxes.We have not yet proved that a concept description and its normalization are equivalent.This follows quickly from the theorem.Obviously for every TBox T and for every defined concept A from T the identity relation id T N is a simulation from G T N to G T N containing ({A}, {A}).Therefore by Theorem 3 A ⊑ T ∪T N {A}.
Corollary 2 Let T be some F LE-TBox and A a defined concept in T .Then A i = {A} i for all models i.
Theorem 3 can be viewed as some sort of converse to Lemma 4. But notice that there is a normalization step involved that cannot be avoided.
Computing the normalized TBox T N corresponding to T can be done in time exponential in the size of T .Checking whether there is a simulation from G T 2 to G T N can be done in time polynomial in the size of T N and T 2 .Therefore using the above characterization, subsumption can be checked in ExpTime.It is unclear, whether one can develope a PSpace-algorithm that computes T N on the fly, using only polynomial space.To the best knowledge of the author, it is not known whether subsumption in F LE with cyclic TBoxes can be checked in PSpace.So far only PSpace-hardness [4] and containment in ExpTime are known [12].
Except for the normalization-step, our characterization of subsumption in F LE with terminological cycles uses essentially the same techniques as the characterization provided by Baader et al. [5].This gives us a strong indication that least common subsumers in F LE can be computed in a similar way as in EL, namely by forming the product of the corresponding description graphs.
Definition 8 Let G 1 = (V 1 , E 1 , L 1 ) and G 2 = (V 2 , E 2 , L 2 ) be two F LE-description graphs.Their product is the description graph G 1 × G 2 = (V, E, L) where Given some F LE-TBox T and T N its normalization, let A and B be two defined concepts from T .The description graph G T N × G T N yields a TBox T 0 such that G T 0 = G T N × G T N .T 1 = T ∪ T 0 is a strictly conservative extension of T , since the defined concepts in T 0 and T are disjoint.
Lemma 9 ({A}, {B}) in T 1 is the least common subsumer of A and B in T .
Proof: (1) We first prove A ⊑ T 1 ({A}, {B}).By Corollary 2 and Remark ?? it suffices to show that {A} ⊑ T N ∪T 0 ({A}, {B}).We construct a simulation from G T 0 to G T N containing ({A}, {B}), {A} .Given such a simulation the rest follows from Lemma 4 Define this ϕ to be the projection of all elements of G T 0 to the first component.
The proof that ϕ really is a simulation that contains ({A}, {B}), A is done in analogy to the case for EL and can be found in [1] on page 15.This proves A ⊑ T 1 ({A}, {B}), B ⊑ T 1 ({A}, {B}) can be shown analogously.
(2) Now assume that there is some F LE-TBox T 2 and some defined concept F in T 2 such that A ⊑ T 1 ∪T 2 F and B ⊑ T 1 ∪T 2 F .Then by Remark ?? A ⊑ T ∪T 2 F and B ⊑ T ∪T 2 F must also hold.By Theorem 3 there must be simulations ϕ A and ϕ B from G T 2 to G T N such that (F, {A}) ∈ ϕ A and (F, {B}) ∈ ϕ B .In order to prove ({A}, {B}) ⊑ T 1 ∪T 2 F it is sufficient to construct a simulation ψ from G T 2 to G T 1 containing F, ({A}, {B}) .Define Again it can be checked in analogy to the case for EL that ψ is a simulation and contains (F, ({A}, {B}).The proof can also be found in [1] on page 15.

Conclusion
Computing the model-msc is a useful tool to extract knowledge from a model.We have seen that in many standard logics model-mscs need not exist.In [] it was shown that in the case of EL one can overcome this problem by allowing for cyclic TBoxes with gfp-semantics.In the present report we have presented extensions of F L 0 , F LE, and ALC in which model-mscs do exist.We have shown that in the case of ALC it suffices to add union of roles and reflexive-transitive closure of roles.Furthermore we have proved that model-mscs exist in F L 0 and F LE with cyclic TBoxes and gfp-semantics.
There has not been any previous work characterizing subsumption.We have generalized Küsters' and Molitor's approach [3], to form a characterization of subsumption in F LE in the presence of cyclic TBoxes.This characterization requires a normalization step which may lead to the terminologies becoming exponentially large.The characterization also leads to a construction of the least common subsumer in the presence of cyclic F LE-TBoxes.

Figure 2 :
Figure 2: Sample F LE ABox and TBox

Figure 3 :
Figure 3: Partial illustration of a model in which F LE-model-mscs must be exponentially large

Figure 4 :
Figure 4: A model and its canonical terminology

Figure 4 .
Figure 4.1 shows an F LE-model and the F LE-description graph of its canonical terminology.They are based on a set of role names N r = {r} a set of primitive concept names N p = {P } and a model i: