Optimal Repairs in the Description Logic EL Revisited

. Ontologies based on Description Logics may contain errors, which are usually detected when reasoning produces consequences that follow from the ontology, but do not hold in the modelled application domain. In previous work, we have introduced repair approaches for EL ontologies that are optimal in the sense that they preserve a maximal amount of consequences. In this paper, we will, on the one hand, review these approaches, but with an emphasis on motivation rather than on technical details. On the other hand, we will describe new results that address the problems that optimal repairs may become very large or need not even exist unless strong restrictions on the terminological part of the ontology apply. We will show how one can deal with these problems by introducing concise representations of optimal repairs.


Introduction
Description Logics (DLs) [4,5] are a prominent family of logic-based knowledge representation formalisms, which offer a good compromise between expressiveness and the complexity of reasoning and are the formal basis for the Web ontology language OWL. 3 In a DL ontology, the important notions of the application domain are introduced as background knowledge in the terminology (TBox), and then these notions are used to represent a specific application situation in the ABox.The DLs of the EL family have drawn considerable attention since their reasoning problems are tractable [3], but they are nevertheless expressive enough to represent ontologies in many application domains, such as biology and medicine. 4For instance, the medical ontology SNOMED CT employs EL and contains the following concept inclusion (CI) in its TBox: Common cold ⊑ Disease ⊓ ∃ causative agent.Virus ⊓ ∃ finding site.Upper respiratory tract structure ⊓ ∃ pathological process.Infectious process, which says that a common cold is a disease that is caused by a virus, can be found in the upper respiratory tract, and has as pathological process an infectious process.A GP can then employ this concept to store in the ABox that patient Alice is diagnosed with common cold using the concept assertion (∃ has diagnosis.Common cold)(alice).The GP's ABox may also contain the information that Charles is Alice's father, expressed as role assertion has father(alice, charles), which might be of interest in the context of hereditary diseases.
Like all large human-made digital artefacts, the ontologies employed in such applications may contain errors, and this problem gets even worse if parts of the ontology (usually the ABox) are automatically generated by inexact methods based on information retrieval or machine learning.Errors in ontologies are often detected when the reasoner generates a consequence that formally follows from the knowledge base, but is incorrect in the sense that it does not hold in the application domain that is supposed to be modelled.For example, in a previous version of SNOMED CT, the concept "Amputation of finger" was classified as a subconcept of "Amputation of hand," which is fortunately wrong in the real world.To correct such errors in large ontologies, the knowledge engineer (KE) should be supported by an appropriate repair tool.Such a tool receives as input one or more consequences of the given ontology that are unwanted, and it should return one or more repaired ontologies that no longer have these consequences (called repairs).The KE can then choose one of the computed repairs and either use it as the new ontology, or continue the repair process from it if other unwanted consequences are detected.Of course, it makes no sense to use as a repair an arbitrary ontology that does not have the unwanted consequences.The repaired ontology should (a) not introduce new information and (b) be as close as possible to the original ontology.There are different possibilities for how to formalize these conditions.
The classical approaches for ontology repair return maximal subsets of the ontology that do not have the unwanted consequence, and employ methods inspired by model-based diagnosis [33] to compute these sets [17,32,34].Thus, these approaches interpret the above conditions in a syntactic way: (a) is read as "no new axioms" and (b) is realized by the maximality condition.In [15] we called classical repairs that satisfy this maximality condition optimal classical repairs.While these approaches preserve as many of the axioms in the ontology as possible, they need not preserve a maximal amount of consequences, and they are syntax-dependent.For example, consider the ABoxes A := {(A ⊓ B)(a)} and B := {A(a), B(a)}, which both say that individual a belongs to the concepts A and B, and are thus equivalent.However, with respect to the unwanted consequence A(a), the ABox A has the empty ABox as only optimal classical repair, whereas B has the optimal classical repair {B(a)}.Thus, the latter repair retains the consequence B(a), whereas the former does not.To overcome this problem, more gentle repair approaches have been introduced, e.g., in [15,21,23,26,35].The basic idea underlying these approaches is to replace some axioms of the ontology by weaker ones, rather than just removing them, as in the classical approach.In our example, one can replace the axiom (A ⊓ B)(a) in the ABox A with the weaker axiom B(a), and thus retain the consequence B(a) even if one starts with A rather than B. However, these gentle repairs are still dependent on the syntactic structure of the axioms in the ontology, and how well they realize condition (b) depends on the employed weakening relation between axioms and the strategy used to apply it.
Providing the KE with syntax-dependent repair tools is not in line with the functional approach to knowledge representation [18,27] adopted by DLs.In this approach, the syntactic structure of the axioms in the ontology is supposed to be irrelevant.What counts is what queries are entailed by the ontology, which in DLs are usually instance queries (IQ) or conjunctive queries (CQ).In this functional setting, (a) should be read as "no new consequences" (expressed in the adopted query formalism) and (b) as preserving a maximal set of such consequences.This leads us to the definition of an optimal repair [7,15], which is an ontology that does not have the unwanted consequences, is entailed by the original ontology (thus realizing property (a)), and preserves a maximal amount of consequences in the sense that there is no repair (i.e., no ontology satisfying the first two properties) that strictly entails it (property (b)).Entailment can be IQ-entailment or CQ-entailment, depending on whether we are interested only in instance queries, or also in conjunctive queries [28].Maximizing the retained consequences is also motivated by the following observation.All the repair tool knows is the original ontology and the consequences that should be removed, which are specified in what we call a repair request.If it were to remove more consequences than are strictly needed to satisfy the repair request, then the decision which additional consequences to remove would be a random choice by the tool, not based on any application knowledge, which is held by the KE.In case the optimal repair retains consequences that should be removed, the KE needs to specify this in a subsequent repair request.
If a repair problem consisting of an ontology and a repair request does not have a repair, then it cannot have an optimal one.In general, however, optimal repairs of repair problems that have a repair need not exist either, even in the simple setting of EL ABoxes without a TBox.This is illustrated in the following example, where the ABox A = {V (n), ℓ(n, n)} says that Narcissus is a vain individual that loves itself, and the repair request R = {V (n)} wants us to remove the consequence that Narcissus is vain.Intuitively, to obtain a repair, we must remove V (n).However, since all assertions of the form ∃ℓ.(V ⊓ (∃ℓ.) k ⊤)(n), saying that Narcissus loves a vain individual that is the starting point of a loves-chain of length k, are consequences of A and can be added to {ℓ(n, n)} without entailing V (n), it is easy to see that there is no finite EL ABox that is an optimal repair.In fact, since Narcissus is no longer vain, the retained cycle ℓ(n, n) cannot be used to generate the loves-chains of arbitrary length starting from a vain individual.Even if a given repair problem has optimal repairs, they may not cover all repairs in the sense that every repair is entailed by an optimal one.To see this, we can look at a modified version of the above example.Consider the ABox B = {k(t, n), V (n), ℓ(n, n)}, which contains the additional information that Tiresias knows Narcissus, and the repair request Q = {(∃k.V )(t)}.Removing k(t, n) from B yields an optimal repair.However, there are also repairs that retain this assertion, but there is no optimal one among them for the same reason as in the previous example.Thus, if the KE is only offered the optimal repair {V (n), ℓ(n, n)} by the repair tool, the repair options that retain the assertion k(t, n) are missed.This illustrates that the use of optimal repairs in a repair tool requires a setting where the optimal repairs always cover all repairs.
This can be achieved by using a more general notion of ABoxes, called quantified ABoxes (qABoxes) [16], where in addition to the usual named individuals we also have anonymous objects, which are represented as (existentially quantified) variables.In our Narcissus example, an optimal repair of A for R is obtained by removing V (n) and introducing an anonymous vain and self-loving lover of Narcissus, which yields the qABox ∃ {x}.{ℓ(n, n), ℓ(n, x), ℓ(x, n), ℓ(x, x), V (x)}.Note that we could not have used a named individual b instead of the variable x since then the resulting ABox would have entailed instance relationships for b, such as V (b), that are not entailed by A. One might think that retaining a consequence like (∃ℓ.V )(n) is not justified since one of the reasons for this being a consequence of A, namely V (n), has been removed.However, with this argument, we would be back at the classical repair approach.As argued above, since the repair request only specifies that V (n) should no longer be a consequence, other consequences like (∃ℓ.V )(n) should not be lost unless this is needed to remove V (n).
In [16] we consider a setting where ontologies are qABoxes and the repair requests consist of entailed EL instance relationships. 5Given such a repair problem, we show how to construct a finite set of repairs, called the canonical repairs, which cover all repairs.The canonical repairs are of exponential size, and there may be exponentially many of them.Not every canonical repair is optimal, but due to the covering property, the set of them contains all optimal repairs up to equivalence.The set of optimal repairs can thus be obtained by removing non-optimal canonical repairs, i.e., ones that are strictly entailed by another canonical repair, and this set covers all repairs.The construction of the canonical repairs is actually the same for the CQ and the IQ case.The only difference is that, when removing non-optimal canonical repairs, the respective entailment relation must be used.Since CQ-entailment implies IQ-entailment, but not vice versa, more canonical repairs may be removed as non-optimal in the IQ setting.In addition, since CQ-entailment is NP-complete and IQ-entailment is tractable, the complexity of removing non-optimal repairs is higher in the CQ case.
The differences between the CQ and the IQ case get more pronounced if we add an EL TBox.In [7], we assume that this TBox is correct, and thus should not be changed in the repair process.In order to adapt the approach and the results of [16] to this setting, the first step is to saturate the given qABox w.r.t.
the TBox, to reduce entailment with TBox to entailment without TBox.For the IQ case, such a saturation always exists and can be computed in polynomial time.For the CQ case, a finite saturation need not exist in general.However, for cycle-restricted TBoxes [2], it always exists, but may be of exponential size.Continuing the repair process with the saturated qABox, we still need to take the TBox into account when defining canonical repairs, to ensure that consequences that have been removed from the qABox cannot be reintroduced by the TBox.With this adapted notion of canonical repairs, we obtain the same results as for the case without TBox.The canonical repairs cover all repairs and can be computed in exponential time.From them the set of all optimal repairs can be obtained by removing non-optimal ones using entailment test [7].This works both for the IQ and the CQ case, but in the latter only if we can compute a finite saturation, which is always the case if the TBox is cycle-restricted.For TBoxes that are not cycle-restricted, optimal repairs need not exist in the CQ case.For example, with respect to the TBox {V ⊑ ∃ ℓ.V, ∃ ℓ.V ⊑ V }, which says that vain individuals are exactly the ones that love a vain individual, the qABox {V (n)} does not have an optimal repair for the repair request R = {V (n)}.Intuitively, the reason is that the qABox together with the TBox implies the existence of arbitrarily long loves-chains starting from n, which are no longer entailed by the TBox if V (n) is removed (see Example 9 in [11] for a more detailed argument).One might think that the first GCI V ⊑ ∃ ℓ.V is enough to destroy existence of an optimal repair.This is, however, not the case.Without the second GCI one can introduce an anonymous vain individual x that is loved by n and loves itself to obtain an optimal repair.
In the first part of the paper (Section 2 and Section 3), we will describe the repair approaches developed in our previous work [7,16], but with an emphasis on motivation rather than on technical details.The second part of the paper (Section 4 and Section 5) describes new result.We will consider more concise representations of optimal repairs, to deal both with the exponential size of canonical repairs in the IQ case and the non-existence problem w.r.t.cyclic TBoxes in the CQ case.
The former problem is due to the fact that the canonical repairs employed in our approach are by construction of exponential size.To alleviate this problem, we have, on the one hand, developed in [7] an optimized algorithm for computing repairs, which yields optimized repairs that are equivalent to the canonical ones, but in most cases considerably smaller, though in the worst case they may still be exponential.On the other hand, each canonical repair is induced by a socalled repair seed, whose size is polynomial in the size of the TBox and the repair request.We have seen in [13] that, for the IQ case, one can compute consequences of canonical repairs and check IQ-entailment between them by working only with the seed functions inducing them.This way, the exponential blow-up due to the construction of the canonical repair can be avoided.In Section 4, we report on experimental results that compare the performance on answering instance queries between the optimized repairs and the canonical ones represented by seed functions.
In Section 5, we show that, also in the CQ case, optimal repairs always exist and cover all repairs if we allow for certain infinite, but finitely represented qABoxes.To be more precise, we introduce the notion of a shell unfolding of a given qABox, which basically unravels parts of the qABox into (possibly infinite) trees.The shell unfoldings of IQ-saturations turn out to be CQ-saturations, and this also works for cyclic TBoxes.If we then consider the canonical IQ-repairs for a given repair problem, then we can prove that their shell unfoldings yields a set of (possibly infinite) CQ-repairs that cover all CQ-repairs.In addition, consequences from such shell unfolded repairs and entailment between them can be decided based on their finite representation without an increase in complexity.Thus, one can work with them as if they were finite.

Preliminaries
We recall the definition of the DL EL and then introduce quantified ABoxes as well as the two entailment relations we employ for them.
The Description Logic EL As usual in DL, knowledge about an application domain is represented in EL using classes (called concepts), relationships (called roles), and objects (called individuals), which are collected in the signature Σ, consisting of pairwise disjoint sets of concept names Σ C , role names Σ R , and individual names Σ I .Concept descriptions C of EL are then constructed using the grammar rule , where A ranges over concept names and r over role names.An atom is a concept name A or an existential restriction ∃ r. C. Each concept description C is a conjunction of atoms, with ⊤ corresponding to the empty conjunction.We denote the set of these atoms as Conj(C).
An EL TBox can be used to state subconcept-superconcept relationships between such concept descriptions, i.e., it is a finite set of concept inclusions (CIs) C ⊑ D, where C, D are EL concept descriptions.In the ABox one can then relate individuals with concepts and with other individuals, i.e., it is a finite set of concept assertions C(a) and role assertions r(a, b), where a, b are individual names, r is a role name, and C is an EL concept description.An EL ontology is a pair consisting of an EL ABox and an EL TBox.
The semantics of EL is defined as usual [5] based on the notion of an interpretation I = (Dom(I), • I ), which assigns subsets A I of the non-empty set Dom(I) to concept names A, binary relations r I on Dom(I) to role names r, and elements a I of Dom(I) to individual names a.This mapping is extended to concept descriptions according to the semantics of the constructors.The interpretation I is a model of the TBox T if it satisfies all its CIs, i.e., C I ⊆ D I holds for all CIs C ⊑ D in T .Similarly, I is a model of the ABox A if it satisfies its assertions, i.e., a I ∈ C I and (a I , b I ) ∈ r I holds for all concept assertions C(a) and role assertion r(a, b) in A. It is a model of the ontology (T , A) if it is a model of both T and A.
Reasoning makes implicit consequences of an ontology explicit.For instance, we say that a concept assertion C(a) is entailed by an ABox A w.r.t. a TBox T if C(a) is satisfied in all models of A and T ; this is abbreviated as A |= T C(a) and we also say that a is an instance of C w.r.t.A and T .Similarly, a CI C ⊑ D is entailed by T if C ⊑ D is satisfied in every model of T ; we then write C ⊑ T D and also say that C is subsumed by D w.r.t.T .In case T = ∅, we may omit the superscript ∅ and just write |= instead of |= ∅ .Both the instance and the subsumption problem are decidable in polynomial time in EL [3].
Quantified ABoxes Quantified ABoxes were first introduced in [16], but they were also considered, as relational datasets with labelled nulls, in [20], and their existentially quantified variables correspond to the "anonymous individuals" in the OWL 2 standard [31].Also, as explained in [16], quantified ABoxes are basically the same as Boolean conjunctive queries.Informally, a quantified ABox is an EL ABox where concept assertions are restricted to concept names and in addition to individuals one can use variables in assertions.To indicate that the names of these variables are irrelevant, we quantify them existentially.
More formally, a quantified ABox (qABox) ∃ X.A consists of a finite set X of variables, which is disjoint with the signature Σ, and of a matrix A, which is a finite set of assertions A(u) and r(u, v), where A is a concept name, r a role name, and u, v individual names or variables.We call the individual names and variables occurring in ∃ X.A its objects, and denote the set of them by Obj(∃ X. A).Regarding the semantics of a qABox ∃ X. A, we can translate it in an obvious way into a first-order formula by taking the conjunction of the assertions in A (viewed as atomic formulas) and prefacing it with an existential quantifier prefix containing exactly the variables in X.The models of ∃ X.A are then the first-order models of this formula.
Based on this semantics, we can now define when a qABox entails another qABox or a concept assertion in the usual way.If α is an EL concept assertion or a qABox, then ∃ X.A entails α w.r.t. the EL TBox T (written ∃ X.A |= T α) if every model of ∃ X.A and T is a model of α.Again, we may omit the superscript ∅ if T is empty.If α is a concept assertion, then entailment |= T can be decided in polynomial time whereas it is NP-complete if α is a qABox [7,16].NP-hardness already holds without a TBox.
From a syntactic point of view, EL ABoxes that use compound concept descriptions in concept assertions are not qABoxes, but it is easy to see that every EL ABox can be transformed into an equivalent qABox (i.e., one having the same models) [16].Conversely, not every qABox has an equivalent EL ABox, the simplest example being ∃ {y}.{r(y, y)}, which enforces an r-loop in every model, but without naming the element that has this loop.In contrast, EL ABoxes can only enforce loops for named individuals, i.e., elements of Σ I .Also note that a qABox cannot entail C(x) for a variable x since this is not a well-formed concept assertion.We can, however, view the matrix A as a normal ABox (where the variables are treated as individuals), and then one can derive concept assertions for elements of X from A. The following lemma, which gives a recursive charac-terization of the instance relationship for the case of an empty TBox is relevant for our construction of canonical repairs.

Lemma 1 ([16]
).Let ∃ X.A be a qABox, D an EL concept description, and u ∈ Obj(∃ X. A).Then A |= D(u) iff the following statements are satisfied for every C ∈ Conj(D): Two entailment relations between qABoxes As motivated in the introduction, it makes sense to compare qABoxes w.r.Since every concept assertion can be translated into an equivalent qABox, CQentailment is a stronger requirement that IQ-entailment.
With respect to the empty TBox, these query-based entailment relations have structural characterizations by means of simulations and homomorphisms [16].In the IQ case, ∃ X.A |= IQ ∃ Y. B iff there is a simulation from ∃ Y. B to ∃ X. A, which is a relation S ⊆ Obj(∃ Y. B) × Obj(∃ X. A) satisfying the following: To extend these characterizations of the entailment relations to the case of non-empty TBoxes, we must first saturate the qABox on the left-hand side.We defer describing saturation to the second part of the next section, where we extend our repair approach from the setting without TBox to the one with a TBox.

Canonical and Optimal Repairs
We start with introducing (optimal) repairs in the general setting, but then concentrate first on the CQ case without a TBox for didactic reasons, before considering the IQ case and explaining how non-empty TBoxes can be tackled.
As unwanted consequences we consider EL concept assertions.Whereas it would be useful to be able to specify unwanted consequences via CQs, this may cause non-existence of optimal repairs unless one considers a strongly restricted class of CQs [11].For this reason, a repair request will in the following be a finite set of concept assertions, both in the IQ and in the CQ case.Definition 2. Let T be an EL TBox, ∃ X.A a qABox, R a repair request, and QL ∈ {IQ, CQ}.
Since CQ-entailment implies IQ-entailment, every CQ-repair is also an IQ-repair, but the converse need not hold.The latter can be illustrated by the second version of our Narcissus example from the introduction.Consider the TBox In fact, this qABox is not CQ-entailed w.r.t.T by ∃ ∅. {V (n)} since there are models of ∃ ∅. {V (n)} and T that do not contain an individual with a loop.It is IQ-entailed, basically since all EL concept assertions of the form (∃ℓ.) k ⊤(n) are entailed by ∃ ∅. {V (n)} w.r.t.T .The question is now how one can actually compute all optimal repairs of a given repair problem, consisting of an EL TBox, a qABox, and a query language QL ∈ {IQ, CQ}.We start with the case where the TBox is empty and QL = CQ.
Blind search A first idea could be to start with the input qABox and then generate a chain of qABoxes with entailment relationships between them, until a qABox that does not entail any element of R has been found.Such a chain can be generated by applying the following rules successively to the current qABox ∃ X. A: Copy Rule.Choose an object u of ∃ X.A as well as a fresh variable y ̸ ∈ Obj(∃ X. A), and return the qABox Choose an assertion α in A and return the qABox ∃ X. (A \ {α}), or choose a variable x ∈ X that does not occur in A and return the qABox ∃ (X \ {x}). A.
It is easy to see that the qABox obtained from ∃ X.A by application of ones of these rules is CQ-entailed by ∃ X. A. The following proposition shows that these rules indeed cover the whole search space of entailed qABoxes.to ∃ X. A. After that, we can remove assertions that are in the image, but not in the pre-image.Finally, we can rename variables and remove variables that do not have a pre-image (see [6] for a more detailed proof).

⊓ ⊔
If one starts with the input qABox ∃ X.A and generates a search tree by applying the above rules, this process need not terminate since one can generate an arbitrary number of copies of objects.But now Proposition 11 in [11] comes to the rescue: if ∃ X.A contains m objects and R contains n atoms, then any repair of ∃ X.A for R is CQ-entailed by a repair that has at most m • 2 n objects.Thus, we can restrict the search to qABoxes that have at most this many objects, which makes the search tree finite.We can be sure that the repairs found this way cover all repairs.The optimal repair can be obtained from this covering set by removing non-optimal elements, i.e., elements that are strictly entailed by another element.
Canonical repairs Obviously, the blind search approach for computing optimal repairs sketched above is very inefficient.However, it provides us with several interesting ideas for how to construct, in a more direct way, a set of repairs that covers all repairs.First, we notice that we must generate copies of objects, and then may need to remove assertions for these copies.Second, the cited result from [11] tells us that creating at most exponentially many copies of each object is sufficient.
In our canonical repairs, each object u of the input qABox ∃ X.A receives copies of the form ⟨⟨u, K⟩⟩, where the second component specifies which assertions C(u) that are entailed by A must not hold for this copy.More formally, K is a repair type for u, i.e., a subset of the set of atoms occurring in R that satisfies the following two properties: The first condition is due to the fact that we only need to remove instance relationships that hold in A. The second reduces the number of different repair types.It is justified by the fact that requiring to remove D(u) ensures that also The canonical repairs have the same set of objects and the same matrix.They have all tuples ⟨⟨u, K⟩⟩ as their objects, where u ∈ Obj(∃ X. A) and K is a repair type for u.Using these objects, the matrix B of the canonical repairs consists of the following assertions: To understand this definition, one needs to consider Lemma 1. Regarding concept names A ∈ K, not adding the concept assertion A(⟨⟨u, K⟩⟩) to B ensures that this assertion is not entailed by B. For existential restrictions ∃ r.C ∈ K, we can only have the role assertion r(⟨⟨u, K⟩⟩, ⟨⟨v, L⟩⟩) in B if B does not entail C(⟨⟨v, L⟩⟩).This non-entailment is ensured by having an atom D ∈ L that satisfies C ⊑ ∅ D. In fact, B |= C(⟨⟨v, L⟩⟩) would otherwise imply B |= D(⟨⟨v, L⟩⟩), which is forbidden due to D ∈ L.
To determine a concrete canonical repair, we choose, for each individual a of ∃ X. A, one of its copies as representative of a in B. Of course, this choice must be made such that the obtained qABox really is a repair, i.e., does not entail any of the unwanted consequences in R. Formally, this is realized by fixing a repair seed S, which maps each individual name a to a repair type S a for a such that the following condition is satisfied: Given such a repair seed S, the canonical repair rep(∃ X. A, S) induced by S is the qABox ∃ Y. B, where individual names a and their copies ⟨⟨a, S a ⟩⟩ are used as synonyms, and Y consists of the other objects of B. This construction works both in the CQ and in the IQ case, and yields a set of repairs that covers all repairs.
Proposition 4 ( [16]).Consider a qABox ∃ X. A, an EL repair request R, and a query language QL ∈ {IQ, CQ}.For each repair seed S, the induced canonical repair rep(∃ X. A, S) is a QL-repair of ∃ X.A for R. Conversely, if ∃ Z. C is a QLrepair of ∃ X.A for R, then there is a repair seed S such that rep(∃ X. A, S) |= QL ∃ Z. C.
The set of all canonical repairs can obviously be computed in exponential time.To obtain the optimal repairs, one needs to employ entailment tests to remove the non-optimal ones from it.Since IQ-entailment is in P and CQentailment is NP-complete, this yields the complexity results stated in the following theorem.Obviously, after removing redundant elements, the obtained set still covers all repairs.
Theorem 5 ([16]).The set of optimal QL-repairs of ∃ X.A for R covers all QL-repairs.There is a (deterministic) algorithm that computes this set and runs in exponential time.If QL = CQ, then this algorithm needs access to an NP oracle, whereas no such oracle is required for QL = IQ.
Let us come back to the first variant of the Narcissus example from the introduction, where the input qABox is ∃ ∅.A for A = {V (n), ℓ(n, n)} and the repair request is R = {V (n)}.The only atom in R is V , and both ∅ and {V } is a repair type for n.The only repair seed is S with S n = {V }.If we denote ⟨⟨n, S n ⟩⟩ with n and ⟨⟨n, ∅⟩⟩ with x, then the qABox ∃ {x}.{ℓ(n, n), ℓ(n, x), ℓ(x, n), ℓ(x, x), V (x)} is the only canonical repair, which thus is an optimal repair both in the CQ and in the IQ case.
Adding a static TBox As mentioned before, we restrict the attention to the case where the TBox is assumed to be correct, and thus is static in the sense that it must not be changed in the repair process.Our main idea for dealing with an EL TBox T is to extend the given qABox ∃ X.A with consequences entailed by the CIs in T .We call this extension process saturation [7].
Intuitively, if C ⊑ D ∈ T , then saturation adds the assertion D(u) to the matrix A if A |= C(u), but A ̸ |= D(u).However, if D is a compound concept description, then this does not generate a well-formed new qABox.For this reason, one must express D(u) by atomic assertions.Obviously, for each concept name A ∈ Conj(D), we must add the assertion A(u) to A. For each existential restriction ∃ r.E ∈ Conj(D), we add a new variable x to X and the assertions r(u, x) and E(x) to A. In case E is still compound, we apply the process of expressing such an assertion by atomic ones recursively.To be more precise, the treatment of existential restrictions differs depending on whether we are in the CQ or the IQ case.In the former, we always need to use a new variable x.In the IQ case, for each concept description E occurring in an existential restriction ∃ r.E in T , we introduce the variable x E , and reuse this variable whenever we encounter an existential restriction with E in the second position.Let us call this process of expressing a concept assertion D(u) for a compound concept description D the QL-unfolding of D(u), for QL ∈ {IQ, CQ}.QL-saturation is the process of applying the following saturation rule exhaustively: Example 6.Consider again the TBox T = {V ⊑ ∃ ℓ.V, ∃ ℓ.V ⊑ V } and the qABox ∃ ∅. {V (n)}.The first application of the IQ-saturation rule to n adds the assertion (∃ ℓ.V )(n) to the qABox.The IQ-unfolding of this assertion introduces one new variable x V , adds the assertions ℓ(n, x V ) and V (x V ), and removes the compound assertion.The IQ-saturation rule now applies to x V , adding (∃ ℓ.V )(x V ).The IQ-unfolding of this assertion reuses the variable x V , and adds the assertion ℓ(x V , x V ).This completes the IQ-saturation process with the This qABox is IQ-entailed by ∃ ∅. {V (n)} w.r.t.T , but it is not CQ-entailed.The reason for the latter non-entailment is that there are models of ∃ ∅. {V (n)} and T where no element has a loop.To avoid introducing a loop or a cycle, we must use a new variable in each CQ-unfolding of an assertion of the form (∃ ℓ.V )(x).But this clearly leads to non-termination of the CQ-saturation process.To ensure termination for the CQ case, we restrict the attention in [7] to cycle-restricted TBoxes, where an EL TBox T is cycle-restricted if there are no role names r 1 , . . ., r n and no EL concept description C such that
The IQ-saturation sat T IQ (∃ X. A) can be computed in polynomial time, whereas the computation of sat T CQ (∃ X. A) may require exponential time in the worst case.
The idea is now to apply the repair process described above to the saturated qABox rather than the original one.This ensures that, in RT1, the entailment is then w.r.t. the TBox.However, without additional changes to our construction of canonical repairs, the obtained qABox would not be a repair.In our example, a canonical repair of ∃ {x V }. {V (n), ℓ(n, x V ), V (x V ), ℓ(x V , x V )} for R = {V (n)} could choose as synonym for n the copy ⟨⟨n, {V }⟩⟩ that does not belong to V , but still has an ℓ-successor that belongs to V .Together with the CI ∃ ℓ.V ⊑ V , this qABox would then still entail V (n).
To avoid this problem, we amend the definition of repair types as follows.First, we now consider subsets of the atoms occurring in R or T as possible repair types.Second, we add an additional condition to the definition: then there is an atom D in K such that E ⊑ ∅ D. 6In our example, K = {V } does not satisfy RT3 since the saturated qABox entails ∃ ℓ.V (n), there is a CI that has ∃ ℓ.V as left-hand side and V ∈ K as right-hand side, but K does not contain an atom that subsumes ∃ ℓ.V (n).In fact, with the additional condition RT3, any repair type for n that contains V must also contain ∃ ℓ.V .The copy ⟨⟨n, {V, ∃ ℓ.V }⟩⟩ of n does not belong to V in the canonical repair, and also does not have an ℓ-successor that belongs to V .Overall, for T = {V ⊑ ∃ ℓ.V, ∃ ℓ.V ⊑ V }, the qABox ∃ X.A = ∃ ∅. {V (n)}, and the repair request R = {V (n)}, we obtain the following canonical IQ-repair induced by the (unique) repair seed S with S n = {V, ∃ ℓ.V }: where y 1 stands for ⟨⟨n, ∅⟩⟩, y 2 for ⟨⟨x N , {N, ∃ ℓ.N }⟩⟩, and y 3 for ⟨⟨x N , ∅⟩⟩.
In general, let rep T QL (∃ X. A, S) be the canonical repairs obtained by first QLsaturating ∃ X.A w.r.t.T and then applying the amended repair approach that takes RT3 into account.Then Proposition 4 and Theorem 5 hold accordingly in the presence of a static TBox T if we replace rep(∃ X. A, S) with rep T QL (∃ X. A, S) and in the CQ case add the assumption that T is cycle-restricted (see [7]).

Concise Representations of Canonical IQ-Repairs
Canonical IQ-repairs are of exponential size, not only in the worst case, but also in the best case.In this section, we consider two approaches for alleviating this problem.One approach produces considerably smaller repairs in practice, which may, however, still be exponential in the worst case.The second approach uses the polynomial-sized repair seeds as representations for the exponentially large canonical repairs.
Optimized IQ-repairs To avoid generating exponential-sized repairs also in the best case, we have developed in [7] an optimized algorithm for computing repairs induced by repair seeds.Intuitively, these optimized repairs do not contain all the objects occurring in the canonical repair, but only those that are really needed.We have shown that the optimized IQ-repair induced by a repair seed S is IQ-equivalent to the canonical one induced by S, and thus the set of optimized IQ-repairs can be used in place of the set of canonical ones when computing the optimal repairs.The experiments described in [7] show that the optimized repairs are in most cases considerably smaller than the canonical ones.For example, in the canonical IQ-repair we have just computed for our Narcissus example, the objects y 1 and y 3 are not needed since they are not reachable from n. IQequivalence of the optimized repair ∃ {y 2 }. {ℓ(n, y 2 ), ℓ(y 2 , y 2 )} with the canonical one can be seen by using the identity on the objects n and y 2 as simulation in both directions.
Note, however, that in general an exponential blow-up cannot be avoided, as already shown in [12] for a restricted class of qABoxes without a TBox.This blow-up is not only a problem when computing the repair, but also when using it later on to answer queries.While answering IQs is polynomial for the original (unrepaired) qABox, it may become exponential after the repair if we measure the complexity in the size the repair problem, consisting of the original qABox, the TBox, and the repair request.
Representing canonical IQ-repairs by repair seeds The size of a repair seed S is polynomial in the size of the repair problem, and it uniquely determines the induced canonical repair rep T IQ (∃ X. A, S).To take advantage of this more concise representation of canonical repairs, we must be able to work directly with this representation when comparing the repairs w.r.t.IQ-entailment and when answering IQs w.r.t.them.The following proposition shows how this can be realized.
Proposition 8 ( [10,13]).Let T be an EL TBox, ∃ X.A a qABox, R a repair request, S, S ′ repair seeds, and E(b) an EL concept assertion.Then, A, S ′ ) iff for each individual name a and for each atom C ∈ S a , there is an atom The conditions formulated in this proposition are clearly decidable in time polynomial in the size of the repair problem.Thus, from a theoretical point of view, representing canonical repairs using repair seeds is preferable to using optimized repairs since the worst-case complexity of the relevant inference problems is polynomial for the former, whereas it is exponential for the latter.Comparing the worst-case complexity of two algorithms does not always tell us which algorithm will perform better in practice.To investigate the advantages and disadvantages of our two concise representations of canonical IQ-repairs in practice, we performed experiments on real-world ontologies.
Experimental evaluation The goal of the experiments was to evaluate the performance of the two representations with respect to the time needed for answering instance queries.To this end, we created a benchmark consisting of EL ontologies, instance queries, and repair requests.As in the experiments in [7], which mainly compared the sizes of the optimized repairs with that of the canonical ones, we took the ontologies from the OWL EL Materialization track of the OWL Ontology Reasoner Evaluation 2015 [30], filtering out axioms that cannot be expressed in EL.To test the limits of both approaches, we this time included all 109 ontologies from this corpus, instead of considering only ontologies of up to 100,000 axioms as in [7].Table 1 provides information on how large the employed ontologies were.
For each ontology, we randomly generated 100 IQs.To generate repair requests, we used the approach employed in [7], which generates requests where the concept assertions involve only concept names.In addition, we this time also generated repair requests containing assertions with compound concept descriptions.The repair requests generated in these two ways are respectively denoted RR1 and RR2 in the following.We attempted to compute 10 repair seeds per ontology based on the generated repair requests, which was, however, not always possible within a timeout of 10 minutes.For each tuple of ontology, repair request, and repair seed, we first computed the induced optimized IQ-repair, which was possible in most, but not all, cases within a timeout of 1 hour.Then we compared the performance of answering IQs from the optimized repairs and from the repair seeds.Any required EL reasoning was performed using Elk [24].More information on the experimental setup can be found in [6].
Figure 1 shows the results of this comparison, where each point corresponds to a tuple of ontology, repair request, and seed function, the x-axis to the runtime of evaluating all 100 IQs using the repair seed, and the y-axis of evaluating all IQs using the optimized repair, where the red color denotes that we also count the computation time of the optimized repair, and the blue color denotes that we do not.For RR1 with the simple repair requests, using the repair seed instead of the precomputed repair was faster in 98.7% of cases if we also count the time for computing the repair, and otherwise in 17.9% of cases.As we can see however in Figure 1, using the optimized repair was almost never significantly faster, and there were many cases in which using the repair seed instead of the repair was significantly faster even if we do not count the time for computing the  repair.For RR2 with the complex repair requests, using the repair seed was faster in 64.6% of cases if we count the time for computing the repair, and otherwise almost never (0.13% of cases).The reason for this was that after obtaining the query answers from Elk, we still have to do a subsumption check for each individual in the answer when using the repair seed only (see the condition in Proposition 8).In RR2, each of these tests was more expensive, since we were comparing complex EL concepts.When using the precomputed optimized repair, no additional subsumption tests are necessary.
The results show that computing the optimized repair explicitly rather than using the repair seed is only advisable if this repair is considered to be the final one, which is then used for many instance tests.This is not the case for intermediate repairs in a setting where the KE iteratively repairs the ontology by (a) choosing a repair seed, then (b) checking out the induced canonical repair by looking at some of its consequences, and based on this inspection deciding whether (c) to choose a different repair seed or (d) to use this repair seed, but maybe repair the obtained ontology further by formulating a new repair request.It then makes sense to compute the optimized repair only after the iterative repair process is finished.
If the repair is assumed to be the final one, a good indicator for when computing the optimized repair does not pay off is the size of the original ontology.If we consider RR1 and do not count the time for computing the repair, for on-tologies with at most 404,509 axioms (85% of the corpus), using the repair seed was faster in only 6.8% of the cases, while for the larger ontologies, it was faster in 80.5% of the cases.The numbers are similar if we look at the size increase of the repair: if the repair contained at most 132,622 axioms more than the original ontology (85% of the corpus), then using the repair seed was faster in 5.5% of the cases, and otherwise in 87.5% of the cases.

Finite Representations of Optimal CQ-Repairs
The results concerning optimal CQ-repairs of [7] recalled in Section 3 assume that the TBox is cycle-restricted.We have seen an example (the version of our Narcissus example with a TBox) that for TBoxes not satisfying this restriction, optimal repairs need not exist.To overcome this problem, we allow for infinite qABoxes as repairs, but require that they have an appropriate finite representation.In our construction of optimal CQ-repairs, cycle-restrictedness of the TBox is needed to ensure that CQ-saturation terminates.For cycles in the TBox do not lead to non-termination since the saturation process can reuse variables.This is not possible for CQ-saturation since it may generate cycles in the saturated qABox that are not CQ-entailed by the original qABox.Whereas IQs cannot distinguish such cycles from their unfoldings, CQs obviously can.The idea is now to use appropriate unfoldings of IQ-saturations and canonical IQ-repairs in the CQ case.
Infinite qABoxes An infinite qABox is still of the form ∃ X. A, but now both the variable set X and the matrix A may be infinite.The model-based semantics can straightforwardly be extended from finite qABoxes to infinite ones, and the correspondence between (model-based) entailment and the existence of a homomorphism is still easy to show.However, the equivalence between entailment and CQ-entailment no longer holds.While the existence of a homomorphism is still sufficient for CQ-entailment, it is no longer necessary, as illustrated by the following example.However, there is no homomorphism from ∃ Y. B to ∃ X. A. In fact, no mapping from R (the objects of ∃ Y. B) to N (the objects of ∃ X. A) can be injective.Thus, if h was a homomorphism, then it would send two real numbers x < y to the same natural number n, which would be a contradiction since B contains the role assertion r(x, y), whereas A does not contain its image r(n, n).A slightly more complicated example can be used to show that this problem persists even if we consider only countable qABoxes [6].The intuitive reason for the difference between entailment and CQ-entailment is that CQs (which are finite) cannot capture differences of infinite qABoxes that manifest themselves only "in the infinite."Fortunately, the problem goes away if we restrict the attention to shell unfoldings of finite ABoxes.Shell unfolding are similar to what is called unraveling in the DL literature [5], but it is applied to ABoxes rather than to interpretations.
Shell unfoldings and homomorphisms Consider a (finite) quantified ABox ∃ X. A, the objects of which are divided into kernel objects and shell objects, such that each individual name is a kernel object, each shell object is reachable from some kernel object, but no kernel object is reachable from any shell object.Later on, we will apply the shell unfolding operation to the IQ-saturation ∃ X.A of a given finite qABox ∃ Y. B. In this setting, the kernel objects of ∃ X.A are the objects of ∃ Y. B, and the shell objects are the additional objects introduced during the saturation process.It is easy to see that this division into kernel and shell objects satisfies the requirements we have just formulated.
A shell path is a sequence u starts with a kernel object u 0 but otherwise only contains shell objects u 1 , . . ., u n such that A contains r i (u i−1 , u i ) for all i ∈ {1, . . ., n}.We call n ≥ 0 its length, u 0 its source, and u n its target.Note that kernel objects, and thus also individuals, can be seen as shell paths of length 0. The target of such a shell path representing a kernel object is this object itself.Definition 10.The shell unfolding of ∃ X.A is defined as the qABox ∃ X ′ .A ′ with the following components: { r(p, q) | p, q are shell paths such that q = p r − → u for a shell object u }.
Note that a finite qABox can be seen as the shell unfolding of itself where all objects are assumed to be kernel objects.If the matrix A contains cycles among shell objects, then the shell unfolding ∃ X ′ .A ′ of ∃ X.A is infinite.However, since ∃ X ′ .A ′ is uniquely determined by the finite qABox ∃ X.A and the division of its objects into kernel and shell objects, we can use this as a finite representation of the infinite qABox ∃ X ′ .A ′ .We can show [6] that, for shell unfoldings, CQ-entailment can again be characterized by the existence of a homomorphism, and thus coincides with (modelbased) entailment.
If we want to work with (finitely represented) shell unfoldings in the context of CQ-repairs, we must be able to decide CQ-entailment, and thus the existence of a homomorphism between shell unfoldings.This is possible in non-deterministic polynomial time in the size of the finite representation.
Theorem 12 ([6]).Let ∃ X.A and ∃ Y. B be two finite qABoxes whose object sets are partitioned into kernel objects and shell objects as introduced above, and let ∃ X ′ .A ′ and ∃ Y ′ .B ′ be their shell unfoldings.Then the problem of deciding whether there is a homomorphism from ∃ X ′ .A ′ to ∃ Y ′ .B ′ is NP-complete in the size of the input ∃ X.A and ∃ Y. B.
Since a finite qABox can be seen as the shell unfolding of itself (with empty set of shell objects), this theorem also shows that answering CQs for shell unfoldings is NP-complete in the size of their finite representations.
Infinite CQ-saturation and CQ-repair The idea is now to extend the notion of a CQ-repair to a setting where qABoxes need not be finite, but must be finitely representable as the shell unfoldings of finite qABoxes.We call such qABoxes rational qABoxes since they consist of a finite part (the kernel) out of which grow (possibly) infinite trees, which are however rational [19].We start with showing that, in this setting, finite qABoxes always have a CQ-saturation, even if the TBox is not cycle-restricted.
Given a finite qABox ∃ X.A and an EL TBox T , we consider the shell unfolding of the IQ-saturation sat T IQ (∃ X. A), where all objects of the sub-qABox ∃ X.A are kernel objects and all other objects (added by applications of the IQ-Saturation Rule) are shell objects.We can show that this rational qABox CQ-entails exactly those rational qABoxes that are CQ-entailed by ∃ X.A and T .It can thus replace the finite CQ-saturation from [7], but is not limited to cycle-restricted TBoxes.For this reason, we denote this shell unfolding by sat T CQ (∃ X. A) and call it the CQ-saturation of ∃ X.A w.r.t.T .Proposition 13 ([6]).Let ∃ X.A be a finite qABox and T an EL TBox.Then ∃ X.A |= T CQ ∃ Z. C iff sat T CQ (∃ X. A) |= CQ ∃ Z. C for each rational qABox ∃ Z. C. Coming back to Example 6, where we constructed the IQ-saturation with kernel object n and shell object x N , we now obtain as shell unfolding the CQ-saturation sat T CQ (∃ X. A) = ∃ {x 1 , x 2 , . . .}. {N (n), ℓ(n, x 1 ), N (x 1 ), ℓ(x 1 , x 2 ), N (x 2 ), . . .}, where Regarding repairs, we now allow them to be rational qABoxes, i.e., in Definition 2 the qABoxes ∃ Y. B and ∃ Z. C are allowed to be rational qABoxes rather than just finite one.We call such repairs rational CQ-repairs.But note that the input qABox is still assumed to be finite.
In this setting, the rôle of canonical CQ-repairs is now taken on by shell unfoldings of canonical IQ-repairs.In such an IQ-repair rep T IQ (∃ X. A, S), an object ⟨⟨u, K⟩⟩ is a kernel object if u is a kernel object in the underlying IQ-saturation, and otherwise it is a shell object.We denote the shell unfolding of rep T IQ (∃ X. A, S) as rep T CQ (∃ X. A, S), and call it again the canonical CQ-repair induced by S. The following proposition shows that using this notation is justified.Proposition 14 ([6]).Consider a finite qABox ∃ X. A, an EL TBox T , and an EL repair request R. For each repair seed S, the induced canonical repair rep T CQ (∃ X. A, S) is a rational CQ-repair of ∃ X.A for R. Conversely, if ∃ Z. C is a rational CQ-repair of ∃ X.A for R, then there is a repair seed S such that rep T CQ (∃ X. A, S) |= CQ ∃ Z. C. Note that the canonical CQ-repair must be constructed as shell unfolding of the full canonical IQ-repair, not from the optimized IQ-repair or any another qABox that is IQ-equivalent to it.In our Narcissus example with TBox, the canonical IQ-repair contains objects belonging to N , which are, however, not reachable from n.The optimized IQ-repair no longer contains such objects.Thus, the shell unfolding of the optimized repair does not entail ∃ {x}.{N (x)}, but there are CQ-repairs that do, such as the shell unfolding of the canonical IQ-repair.
As an immediate consequence of the previous proposition, we obtain the main result of this section.
Theorem 15 ([6]).Let ∃ X.A be a finite qABox, T an EL TBox, and R an EL repair request.Then we can compute, in (deterministic) exponential time using an NP-oracle, a finite set of repair seeds {S 1 , . . ., S m } such that the set {rep T CQ (∃ X. A, S 1 ), . . ., rep T CQ (∃ X. A, S m )} consists of all optimal rational CQrepairs of ∃ X.A for R w.r.t.T (up to CQ-equivalence).This set covers all rational CQ-repairs of ∃ X.A for R w.r.t.T .Also note that the optimal repairs rep T CQ (∃ X. A, S i ) are saturated w.r.t.T in the sense that they CQ-entail a rational qABox w.r.t.T if they already entail it without T .By Theorem 12, this implies that conjunctive queries can be answered for rep T CQ (∃ X. A, S i ) in non-deterministic polynomial time in the size of rep T IQ (∃ X. A, S i ) and the query.

Conclusion
In the first part of this paper we have mainly recalled the approaches and results from [7,16].In other work, we have extended these results in several directions.
The paper [11] extends the expressivity of the underlying DL considerably, by adding nominals, inverse roles, regular role inclusions and the bottom concept to EL, which yields a fragment of the well-known DL Horn-SROIQ [29].In [9], we investigate whether and how one can obtain optimal repairs if one restricts the output of the repair process to being ABoxes rather than qABoxes.In general, such optimal ABox repairs need not exist.The main contribution of the paper is an approach that can decide the existence of optimal ABox repairs in exponential time, and can compute all such repairs in case they exist.The papers [13,14] consider error-tolerant reasoning based on optimal repairs and [1] compares optimal repairs with contractions from the area of belief change.Moreover, an approach to computing optimal repairs of EL TBoxes is developed in [25].
In the second part of this paper we have presented new results on how to represent exponentially large repairs in a polynomial way and infinite repairs in a finite way.It would be interesting to see whether such approaches can also be extended to other settings.We conjecture that non-cycle-restricted TBoxes can still be tackled by using shell-unfoldings for the DLs considered in [11].However, in [11] we also show that optimal repairs need not exist if the role inclusions are not regular.It is unclear whether this problem can be overcome by an appropriate finite representation of infinite repairs.Another interesting topic for future research is to investigate whether finitely represented rational repairs can be used in practice.
for every EL concept assertion C(a).The definition of CQ-entailment considers all qABoxes ∃ Z. C in place of concept assertions C(a).It is easy to see that the CQ-entailment relation |= T CQ actually coincides with the model-based entailment relation |= T introduced above [7,16].
QL-Saturation Rule.Choose an object u of ∃ X.A as well as a CI C ⊑ D in T with A |= C(u), but A ̸ |= D(u), and add D(u) to A. Then apply QL-unfolding to D(u).

Fig.
Fig.Run times of evaluating 100 instance queries on repairs using the seed function (x-axis) vs. using the optimized repairs (y-axis).Color intensity corresponds to size of the input ontology.Orange-red crosses include times for computing the repair, whereas cyan-blue circles do not.Results of RR1 on the left, and for RR2 on the right.

Example 9 .
As left-hand side of the entailment, we consider the qABox representing the natural numbers with their usual order relation: ∃ X.A with variables X := N and matrix A := { r(m, n) | m < n }.As right-hand side, we take the real numbers: ∃ Y. B with variables Y := R and matrix B := { r(x, y) | x < y }.Each finite qABox entailed by ∃ Y. B is also entailed by ∃ X. A, i.e., ∃ X.A |= CQ ∃ Y. B.

X
′ := { p | p is a shell path where p ̸ ∈ Σ I }, A ′ := { A(p) | p is a shell path with target u and A(u) ∈ A } ∪ { r(u, v) | u, v are kernel objects and r(u, v) ∈ A } ∪ t. the queries they entail rather than w.r.t. the models they have.Instance queries (IQ) are just concept assertions whereas (Boolean) conjunctive queries (CQ) are just qABoxes.The qABox ∃ X.A IQ-entails the qABox ∃ Y. B w.r.t.T (written ∃ X.A |= T IQ Proposition 3. If ∃ X.A |= CQ ∃ Y. B, then there is a finite chain of applications of the Copy and Delete Rules that starts with ∃ X.A and ends with ∃ Y. B. Proof sketch.If ∃ X.A |= CQ ∃ Y. B, then there is a homomorphism from ∃ Y. B to ∃ X. A. If this homomorphism is not injective, then we can make it injective by adding copies of individuals that are images of several elements of Obj(∃ Y. B)