Two Ways of Explaining Negative Entailments in Description Logics Using Abduction (Extended Version)

We discuss two ways of using abduction to explain missing entailments from description logic knowledge bases, one more common, one more unusual, and then have a closer look at how current results/implementations on abduction could be used towards generating such explanations, and what still needs to be done.


Introduction
The importance of services to explain inferences performed by description logic (DL) reasoners is long understood. In fact, understanding and debugging large DL ontologies such as SNOMED CT (350,000 axioms) would be much more challenging as it is now without the service of justications [5,23,12] explaining reasoner results and their direct support through the standard ontology editor Protégé [13]. Recently, explaining positive entailments (things that logically follow from the ontology) has been further improved by the possibility of showing proofs created by the EL reasoner ELK [14]. How to explain ontology entailments using proofs, and how to select good proofs, has since then been the subject of further research [1,2]. But what is the situation for explaining negative entailments, that is, logical consequences that do not follow from an ontology? Here, usually two solutions are suggested: showing a counter example in form of a DL interpretation [6], or using abduction [8,9]. However, literature on abduction in DLs often just takes this motivation as given, and then goes on to provide complexity results or methods, without further elaborating how this can help.
In particular, many approaches for abduction look at rst sight more well-suited for the task of diagnosis: providing a plausible explanation for some observation that does not logically follow from the known facts [9,20], which is indeed the traditional motivation for abduction [21]. In this abstract, we want to clarify this a bit by elaborating, based on a simple example, on two possible ways in which abduction can help, possibly in combination with proof services, to explain missing entailments from a DL knowledge base. In particular, we argue also for the use of abduction for the generation of counter examples. We then review some recent results under the light of those explanation services, highlighting both possible extensions and open challenges towards this direction. Proofs for the claims made here can be found in the appendix.

Description Logics
For detailed information on DLs and their semantics, we refer to [3], and only give the syntax and central notions of the classical DL ALC. ALC concepts are built from countably innite sets N C and N R of respectively concept and role names according to the following syntax rule, where A ∈ N C and r ∈ N R : A TBox is a set of axioms of the form C D and C ≡ D, where C and D are concepts, and an ABox is a set of axioms of the form C(a) and r(a, b), where C is a concept, r ∈ N R , and a and b are picked from a countably innite set N I of individual names. We call an ABox A at if for every C(a) ∈ A, C is a concept name. A knowledge base (KB) is a union of a TBox and an ABox, and thus also generalises both notions. We say that a KB K entails an axiom α, in symbols K |= α, if every model I of K is also a model of α.

Fixpoint Operators
We also briey recall the less classical DL ALC µ , which extends ALC with least xpoint concepts of the form µX.C[X], where C[X] is a concept in which X occurs like a concept name, but only under an even number of negation symbols [16].

Abduction
We call a subset Σ ⊆ N C ∪ N R a signature, and denote by sig(()K) the signature that consists of all concept and role names that occur in the KB K. We focus on the following notion of abduction. Denition 1. Let L be a DL. A signature-based L abduction problem is given by a triple A = K, Φ, Σ , with an L KB K of background knowledge, an L KB Φ as observation, and a signature Σ ⊆ N C ∪ N R of abducibles; and asks whether there exists a hypothesis for A, i.e. an L KB H satisfying A1 K ∪ H |= ⊥, A2 K ∪ H |= Φ, and A3 sig(H) ⊆ Σ.
If Φ and H are additionally required to be TBoxes/ABoxes/at ABoxes, we speak of a TBox/ABox/at ABox abduction problem.
Because of A3, this kind of abduction is called signature-based abduction [15,16,22,8]. Other variants of abduction usually use a dierent condition instead to avoid trivial answers: for instance that H contains axioms picked from a predened set of abducible axioms [9,22], has a specied shape [10], or satises a relevance criterion [11].

Motivating and Introducing Type 1 and Type 2 Explanations
We illustrate our idea with a simplied toy example on the pizza ontology 1 , where the aim is to explain a missing entailed TBox axiom. Our notions of explanations generalise to other settings such as in explaining missing ABox inferences or query answers [8]. The pizza ontology is a toy ontology that is commonly used in tutorials on the OWL ontology language, and it provides denitions of dierent pizza types (Margherita, SalamiPizza), categories of pizzas (VegetarianPizza, SpicyPizza), together with various types of pizza ingredients and their properties.
Specically, it provides the following denition of VegetarianPizza: Assume an ontology engineer wants to extend the pizza ontology with a denition of the Pizza Marinara, for which they would use the following axioms: the so-called closure-axiom they forgot to add to its denition [24]. In the following, we call this pragmatic type of explanations type 1 explanation.
Note that this notion is equivalent to that of signature-based abduction hypothesis.
While this helps the engineer to x their denition, it might be insucient in helping them understand why the axiom is necessary. We could additionally provide a proof tree as in [1]

State-of-the-Art and Challenges
For both types of explanations, restricting abductive solutions using a signature is key to controlling its output to something useful. Most existing work on abduction either do not consider such a restriction, or they x the set of axioms to be used, rather than the signature. In those cases, abduction boils down to nding a minimal subset from this set of abducibles that is sucient for entailing the observation, something that could for instance be solved using axiom pinpointing [12]. The problem with this approach is that we need to know beforehand which axioms we are looking for: as we show below, the solution is otherwise simply too large. We focus in the following on the case of TBox entailments to be explained, as in the example.

Computing Explanations of Type 1
For explanations of Type 1, we need to perform signature-based TBox abduction, for which we presented a method for ALC in [16]. However, in the general case, this method does not compute a single ALC hypothesis for the given abduction, but it computes a hypothesis in form of a Boolean ALCOI µ -KB that is general enough to cover every possible ALC hypothesis within the signature.
If the observation consists only of TBox axioms, the result would be a Boolean combination Φ of ALC µ GCIs, which are combined using disjunction and conjunction, but without use of negation. From this, one can always extract an ALC hypothesis by dropping disjuncts and unfolding xpoint expressions up to a certain depth. However, we have not investigated yet how we can bound the depth of the unfolding.
The method for computing hypotheses in ALCOI µ -KBs is based on uniform interpolation [17], and may thus produce solutions that are triple exponential in the size of the abduction problem [19]. While this may sound like bad news, such an blow-up is in general unavoidable: it can be shown by a simple modication of a proof in [15] that there exists a family of ALC TBox abduction problems for which every hypothesis is of size triple exponential in the size of the abduction problem.
On the positive side, the evaluation in [16] indicates that xpoint operators are not often introduced into the solution. A more realistic investigation of this is future work.

Computing Explanations of Type 2
For explanations of type 2, we need to perform ABox abduction for an observation of the form C(a). Moreover, as the previous example illustrates, hypotheses become more visual if they use fresh individual names, instead of encoding all the information into a single complex concept. Indeed, explanations that are at ABoxes, or use complex concepts only where necessary, seem to be more convenient. The method from [16] is therefore not immediately useful, as it never computes a solution with fresh individual names. Signature-based ABox abduction for at ABox hypotheses can be performed by the AAA ABox Abduction Solver [22], however, this tool does not introduce fresh individual names either.
An approach could be to x a suciently high number of individuals before running the solver. We have shown in [15] that the number of required individual names for at ABox hypotheses can become exponential in the worst case, and if we allow for complex concepts, hypotheses may require axioms of triple exponential size. Furthermore, we have shown that deciding whether there exists a solution of size n, where n is given in binary, is NExpTime-complete for EL ⊥ and NExpTime NP -complete for ALCI. Fixing the number of abducibles beforehand is thus not feasible in general. Therefore, a more directed way of generating fresh individual names is needed. We are currently investigating a method for ABox abduction in EL that ideas from a method for ABox repairs presented in [4], where broken erroneous entailments are repaired by introducing anony- for ABox abduction in [15], Theorem 5. Here, a family of signature-based abduction problems of the form A i = T i , Goal(a), Σ is constructed such that every hypothesis is of the form C(a), where C is an ALC concept that must be at least triple exponentially large. We obtain the desired family of abduction problems by setting A i = T i , ∃r * .Goal, Σ ∪ {r * } . Clearly, there exists a hypothesis for A i i there exists a hypothesis for A i , and every hypothesis for A i is of the form ∃r * .C and can be translated to a hypothesis of the form C(a) for A i and vice versa. It follows that every hypothesis for A i is of size triple exponential in the size of A i .
The following lemma can be shown by easy inspection of the procedure for signature-based abduction presented in [16], which however would be tedious in the context of this abstract and hard to follow without a good understanding of the method. Lemma 1. For TBox abduction problems, the method presented in [16] always computes a hypothesis that is in the form of a Boolean combination of ALC µ GCIs, where no negation is used except on the level of concepts, and least xpoint operators only occur positively.
We describe the procedure sketched in the main text more precisely. Let H 0 be hypothesis that is a Boolean combination of ALC µ GCIs. From this, a hypothesis in ALC can now be extracted non-deterministically applying one of the following steps until we end up with a conjunction of ALC GCIs that is consistent with T : Step 1 Pick a disjunction φ ∨ ψ in Φ and replace it by φ, Step 2 Pick a concept of the form µX.C[X], and replace it by µX.
Step 3 Pick a concept of the form µX.C[X], and replace it by C[⊥] obtained from C[X] by replacing X with ⊥.
Theorem 2. The procedure always terminates with an ALC hypothesis.
Proof. Let T , Φ, Σ be a TBox abduction problem, and H 0 be a hypothesis in ALC µ in the form as in Lemma 1. We need to show that 1. each step of the procedure preserves Properties A1A3, and 2. that it terminates. Property A3 is satised, since no step introduces any names outside the signature. For Property A2, let H i and H i+1 be such that H i+1 is obtained from H i through one of Step 1 Step 3, and note that H i+1 |= H i (specically, since every xpoint expression aected by Step 2 and Step 3 occurs only positively in H. By induction, we obtain T ∪ H i+1 |= H and since T |= H 0 |= Φ by Condition A2, also T ∪ H i+1 |= Φ.
It remains to show Property A1. For Step 1, we note that every model of φ ∨ ψ is a model of φ or of ψ. Consequently, one of the non-deterministic choices here must lead to a consistent KB, provided that the previous KB was consistent.
Step 2 only replaces concepts by equivalent concepts, and thus clearly preserves consistency as well. For Step 3, we have to do a bit more. Step 2 n times, and then eliminate the introduced xpoint expressions using Step 3. This shows not only that those steps preserve A1 on some of the non-deterministic choices, it also gives a bound on the necessary applications of those steps, and thus shows termination of the procedure.