LTCS–Report Approximate Uniﬁcation in the Description Logic FL

. Uniﬁcation in description logics (DLs) has been introduced as a novel inference service that can be used to detect redundancies in ontologies, by ﬁnding diﬀerent concepts that may potentially stand for the same intuitive notion. It was ﬁrst investigated in detail for the DL FL 0 , where uniﬁcation can be reduced to solving certain language equations. In order to increase the recall of this method for ﬁnding redundancies, we introduce and investigate the notion of approximate uniﬁcation, which basically ﬁnds pairs of concepts that “almost” unify. The meaning of “almost” is formalized using distance measures between concepts. We show that approximate uniﬁcation in FL 0 can be reduced to approximately solving language equations, and devise algorithms for solving the latter problem for two particular distance measures.


Introduction
Description logics [1] are a well-investigated family of logic-based knowledge representation formalisms. They can be used to represent the relevant concepts of an application domain using concept descriptions, which are built from concept names and role names using certain concept constructors. In this paper, we concentrate on the DL FL 0 , which offers the constructors conjunction ( ), value restriction (∀r.C), and the top concept ( ).
Unification in DLs has been introduced as a novel inference service that can be used to detect redundancies in ontologies, and was first investigated in detail for FL 0 [4]. For example, assume that one developer of a medical ontology defines the concept of a patient with severe head injury as Patient ∀finding.(Head_injury ∀severity.Severe), whereas another one represents it as Patient ∀finding.(Severe_finding Injury ∀finding_site.Head).
Formally, these two concept descriptions are not equivalent, but they are nevertheless meant to represent the same concept. They can obviously be made equivalent by treating the concept names Head_injury and Severe_finding as variables, and substituting the first one by Injury ∀finding_site.Head and the second one by ∀severity.Severe. In this case, we say that the descriptions are unifiable, and call the substitution that makes them equivalent a unifier. Intuitively, such a unifier proposes definitions for the concept names that are used as variables: in our example, we know that, if we define Head_injury as Injury ∀finding_site.Head and Severe_finding as ∀severity.Severe, then the two concept descriptions (1) and (2) are equivalent w.r.t. these definitions.
Of course, this example was constructed such that a unifier providing sensible definitions for the concept names used as variables actually exists. It is based on the assumption that both knowledge engineers had the same definition of the concept patient with severe head injury in mind, but have modelled certain subconcepts on different levels of granularity. Whereas the first knowledge engineer used Head_injury as a primitive (i.e., not further defined) concept, the other one provided a more detailed definition for head injury; and the other way round for severe finding. But what if there are more differences between the two concepts, maybe due to small modelling errors? For example, assume that a third knowledge engineer has left out the concept name Severe_finding from (2), based on the assumption that all injuries with finding site head are severe: Patient ∀finding.(Injury ∀finding_site.Head).
The concept descriptions (1) and (3) cannot be unified if only Head_injury is used as a variable. Nevertheless, the substitution that replaces Head_injury by Injury ∀finding_site.Head makes these two descriptions quite similar, though not equivalent. We call such a substitution an approximate unifier.
The purpose of this paper is to introduce and investigate the notion of approximate unification for the DL FL 0 . Basically, to formalize approximate unification, we first need to fix the notion of a distance between FL 0 concept descriptions. An approximate unifier is then supposed to make this distance as small as possible. Of course, there are different ways of defining the distance between concept descriptions, which then also lead to different instances of approximate unification. In this paper, we consider two such distance functions, which are based on the idea that differences at larger role depth (i.e., further down in the nesting of value restrictions) are less important than ones at smaller role depth. The first distance considers only the smallest role depth where the difference occurs (and then uses 2 − as distance), whereas the second one "counts" all differences, but the ones at larger role depth with a smaller weight. This idea is in line with work on nonstandard inferences in DLs that approximate least common subsumers and most specific concepts by fixing a bound on the role depth [8].
Exact unification in FL 0 was reduced in [4] to solving certain language equations, which in turn was reduced to testing certain tree automata for emptiness. We show that this approach can be extended to approximate unification. In fact, by linking distance functions on concept descriptions with distance functions on languages, we can reduce approximate unification in FL 0 to approximately solving language equations. In order to reduce this problem to a problem for tree automata, we do not employ the original construction of [4], but the more sophisticated one of [5]. Using this approach, both the decision variant (is there a substitution that makes the distance smaller than a threshold) and the computation variant (compute the infimum of the achievable distances) of approximate unification can be solved in exponential time, and are thus of the same complexity as exact unification in FL 0 .
2 Unification in F L 0 We will first recall syntax and semantics of FL 0 and describe the normal form of FL 0 concept descriptions that is based on representing value restrictions as finite languages over the alphabet of role names. Then, we introduce unification in FL 0 and recall how it can be reduced to solving language equations.

Syntax and semantics
The concept descriptions C of the DL FL 0 are built recursively over a finite set of concept names N c and a finite set of role names N r using the following syntax rules: where A ∈ N c and r ∈ N r . In the following, we assume that N c = {A 1 , . . . , A k } and N r = {r 1 , . . . , r n }.
The semantics of FL 0 is defined in the usual way, using the notion of an interpretation I = (∆ I , · I ), which consists of a nonempty domain ∆ I and an interpretation function · I that assigns binary relations on ∆ I to role names and subsets of ∆ I to concept names. The interpretation function · I is extended to FL 0 concept descriptions as follows: I := ∆ I , (C D) I := C I ∩D I , and (∀r.C) I := {d ∈ ∆ I | for all e ∈ ∆ I : if (d, e) ∈ r I , then e ∈ C I }.

Equivalence and normal form
Two FL 0 concept descriptions C, D are equivalent (written C ≡ D) if C I = D I holds for all interpretations I.
As an easy consequence of the semantics of FL 0 , we obtain that value restrictions (∀s.·) distribute over conjunction ( ), i.e., ∀s.(C D) ≡ ∀s.C ∀s.D holds for all FL 0 concept descriptions C, D. Using this equivalence from left to right, we can rewrite every FL 0 concept description into a finite conjunction of descriptions ∀s 1 . · · · ∀s m .A, where m ≥ 0, s 1 , . . . , s m ∈ N r , and A ∈ N c . We further abbreviate ∀s 1 . · · · ∀s m .A as ∀(s 1 . . . s m ).A, where s 1 . . . s m is viewed to be a word over the alphabet of all role names N r , i.e., an element of N * r . For m = 0, this is the empty word ε. Finally, grouping together value restrictions that end with the same concept name, we abbreviate conjunctions ∀w 1 .A . . . ∀w .A as ∀{w 1 , . . . , w }.A, where {w 1 , . . . , w } ⊆ N * r is viewed to be a (finite) language over N r . Additionally we use the convention that ∀∅.A is equivalent to . Then, any FL 0 concept description C (over N c = {A 1 , . . . , A k } and N r = {r 1 , . . . , r n }) can be rewritten into the normal form ∀L 1 .A 1 . . . ∀L k .A k , where L 1 , . . . L k are finite languages over the alphabet N r . For example, if k = 3, then the concept description A 1 ∀r 1 .(A 1 ∀r 1 .A 2 ∀r 2 .A 1 ) has the normal form ∀{ε, r 1 , r 1 r 2 }.A 1 ∀{r 1 r 1 }.A 2 ∀∅.A 3 . Using this normal form, equivalence of FL 0 concept descriptions can be characterized as follows (see [4] for a proof).
Consider the head injury example from the introduction, where for brevity we replace the concept and role names by single letters: (1) thus becomes A ∀r.(X ∀s.B) and (2)

Unification
In order to define unification in FL 0 , we need to introduce an additional set of concept names N v , whose elements we call concept variables. Intuitively, N v contains the concept names that have possibly been given another name or been specified in more detail in another concept description describing the same notion. From a syntactic point of view, concept variables are treated like concept names when building concepts. We call expressions built using the syntax rules (4), but with A ∈ N c ∪N v , concept patterns, to distinguish them from concept descriptions, where only A ∈ N c is allowed. The difference between elements of N c and N v is that concept variables can be replaced by substitutions.
A substitution σ is a function that maps every variable X ∈ N v to a concept description σ(X). This function can be extended to concept patterns, by setting σ(A) , and σ(∀r.C) := ∀r.σ(C). We denote the set of all substitutions as Sub.
Definition 1 (Unification). The substitution σ is a unifier of the two FL 0 concept patterns C, D if σ(C) ≡ σ(D). If C, D have a unifier, then we call them unifiable. The FL 0 unification problem asks whether two given FL 0 concept patterns are unifiable or not.
In [4] it is shown that the FL 0 unification problem is ExpTime-complete. The ExpTime upper bound is proved by a reduction to language equations, which in turn are solved using tree automata. Here we sketch the reduction to language equations. The reduction to tree automata will be explained in Section 4. Without loss of generality, we can assume that the input patterns are in normal form (where variables are treated like concept names), i.e., where S 0,i , T 0,i , S j , T j are finite languages over N r . The unification problem for C, D can be reduced to (independently) solving the language equations for i = 1, . . . , k, where "·" stands for concatenation of languages. A solution σ i of such an equation is an assignment of languages (over N r ) to the variables X j,i such that This assignment is called finite if all the languages σ i (X j,i ) are finite. We denote the set of all assignments as Ass and the set of all finite assignments as finAss.
As shown in [4], C, D are unifiable iff the language equations of the form (7) have finite solutions for all i = 1, . . . , k. In fact, given finite solutions σ 1 , . . . , σ k of these equations, a unifier of C, D can be obtained by setting and every unifier of C, D can be obtained in this way. Of course, this construction of a substitution from a k-tuple of finite assignments can be applied to arbitrary finite assignments (and not just to finite solutions of the equations (7)), and it yields a bijection ρ between k-tuples of finite assignments and substitutions.
Coming back to our example (5), where we now view X, Y as variables, the language equations for the concept names A and B are

Approximate unifiers and solutions
As motivated in the introduction, it makes sense to look for substitutions σ that are actually not unifiers, but come close to being unifiers, in the sense that the distance between σ(C) and σ(D) is small. We call such substitutions approximate unifiers. In the following, we will first recall some definitions regarding distances from metric topology [14,11]. Subsequently, we will first introduce approximate unification based on distances between concept descriptions, and then approximately solving language equations based on distances between languages. Next, we will show how distances between languages can be used to define distances between concept descriptions, and that approximate unification for distances obtained this way can be reduced to approximately solving language equations.

Metric topology
Given a set X, a metric (or distance) on X is a mapping d : X × X → [0, ∞) that satisfies the properties: In this case, (X, d) is called a metric space. A useful metric on R k (that will be used later) is the Chebyshev distance, d ∞ , which for x = (x 1 , . . . , x k ), y = (y 1 , . . . , y k ) ∈ R k is defined as Given a metric space (X, d), a sequence (a n ) of elements of X is said to converge to a ∈ X (written a n d −→ a) if for every > 0 there is an n 0 ∈ N s.t. d(a n , a) < for every n n 0 .
For a sequence ((a n 1 , . . . , a n k )) of elements of R k , we have that (a n 1 , . . . , a n k ) d∞ −→ (a 1 , . . . , a k ) iff a n i d −→ a i for every i = 1, . . . , k. A sequence (a n ) is called a Cauchy sequence, if for every > 0, there exists an n 0 ∈ N, s.t. for every m, n n 0 it holds that d(a n , a m ) < . A metric space (X, d) is called complete, if every Cauchy sequence converges to a point in X. It is well known that the metric space (R k , d ∞ ) is complete [11].
Theorem 1 (Banach's Fixed Point Theorem [7,11]). Let (X, d) be a complete metric space and a function f : X → X be a contraction. Then there exists a unique p ∈ X such that f (p) = p.
Finally, we provide the formal definition of the infimum of a set of real numbers, which will be needed in the proofs.
Definition 3. Given a set of real numbers S, we say that p is the infimum of S, and denote this by p = inf S if the following two conditions hold: (a) p s for all s ∈ S, i.e. p is a lower bound of S, (b) for all > 0, there is an s ∈ S such that s < p + .
Note that this means that if p = inf S, there exists a sequence (s n ) of elements of S, s.t.

Approximate unification
In order to define how close σ(C) and σ(D) are, we need to use a function that measures the distance between these two concept descriptions. We say that a function that takes as input a pair of FL 0 concept descriptions and yields as output an element of [0, ∞) is a concept distance for FL 0 if it satisfies the following three properties: Note that equivalence closedness corresponds to (M 1) and symmetry to (M 2) in the definition of a metric. Equivalence invariance ensures that m can be viewed as operating on equivalence classes of concept descriptions.
Definition 4 (Approximate unification). Given a concept distance m, FL 0 concept patterns C, D, and a substitution σ, the degree of violation of σ is defined as v m (σ, C, D) := m(σ(C), σ(D)). For p ∈ Q, we say that σ is a p-approximate unifier of C, D if 2 −p > v m (σ, C, D).
Equivalence closedness of m yields that v m (σ, C, D) = 0 iff σ is a unifier of C, D.
The decision problem for approximate unification asks, for a given threshold p ∈ Q, whether C, D have a p-approximate unifier or not. In addition, we consider the following computation problem: compute inf σ∈Sub v m (σ, C, D). The following lemma, which is immediate from the definitions, shows that a solution of the computation problem also yields a solution of the decision problem.
Lemma 2. Let m be a concept distance and C, D FL 0 concept patterns.
Proof. By definition, if C, D have a p-approximate unifier, then there exists a substitution σ ∈ Sub, s.t. p > v m (σ, C, D), and thus p > inf σ∈Sub v m (σ, C, D).
The reduction of the decision problem to the computation problem obtained from this lemma is actually polynomial. In fact, though the size of a representation of the number 2 −p may be exponential in the size of a representation of p, the number 2 −p need not be computed. Instead, we can compare p with log 2 inf σ∈Sub v m (σ, C, D), where for the comparison we only need to compute as many digits of the logarithm as p has.

Approximately solving language equations
Following [5], we consider a more general form of language equations than the one given in (7). Here, all Boolean operators (and not just union) are available. Language expressions are built recursively over a finite alphabet Σ using union, intersection, complement, and concatenation of regular languages from the left, as formalized by the following syntax rules: where L can be instantiated with any regular language over Σ and X with any variable. We assume that all the regular languages occurring in an expression are given by finite automata. Obviously, the left-and the right-hand sides of (7) are such language expressions. As before, an assignment σ ∈ Ass maps variables to languages over Σ. It is extended to expressions in the obvious way (where ∼ is interpreted as set complement). The assignment σ solves the language For finite solvability we require the languages σ(X) to be finite, i.e., σ should be an element of finAss.
In order to define approximate solutions, we need the notion of distances between languages. A function d : Definition 5 (Approximate solutions). Given a language distance d, language expressions φ, ψ, and an assignment σ, the degree of violation of σ is defined as The decision and the computation problem for approximately solving language equations are defined analogously to the case of unification. In addition, the analog of Lemma 2 also holds in this case, and thus the decision problem can be reduced to the computation problem.
Recall that unification in FL 0 is reduced to finite solvability of language equations. The above definition of approximate solutions and of the decision and the computation problem can also be restricted to finite assignments, in which case we talk about finite approximate solvability. However, we will show that finite approximate solvability can actually be reduced to approximate solvability. For this to be the case, we need the language distance to satisfy an additional property (M4). Given a natural number , we call two languages K, L ⊆ Σ * equal up to length (and write K ≡ L) if K and L coincide on all words of length at most .
(M 4) Let L be a language and (L n ) a sequence of languages over Σ.
If (M 4) is satisfied for d, then the computation problem for finite assignments has the same solution as for arbitrary assignments.
Lemma 3. Let d be a language distance satisfying (M 4) and φ, ψ language expressions. Then, Set p = inf σ∈Ass v d (σ, φ, ψ). This means that there exists a sequence of assignments σ 1 , σ 2 , . . . , , for each σ i , there exists a sequence of finite assignments We will construct a sequence of finite assign- This implies that inf σ∈finAss v d (σ, φ, ψ) p, and the proof is complete.
By definition of convergence, we have that for every n ∈ N: Thus, by the triangle inequality, we get that d ∞ (v d (σ jn in , φ, ψ), p) < 1 n . Set τ n = σ jn in and we have the required sequence.
Before showing that language distances can be used to construct concept distances, we give two concrete examples of language distances satisfying (M 4).

Two language distances satisfying (M 4)
The following two mappings from 2 Σ * × 2 Σ * to [0, ∞) are defined by looking at the words in the symmetric difference K L := (K \ L) ∪ (L \ K) of the languages K and L: The intuition underlying both functions is that differences between the two languages are less important if they occur for longer words. The first function considers only the length of the shortest word for which such a difference occurs and yields 2 − as distance, which becomes smaller if gets larger. The second function also takes into account how many such differences there are, but differences for longer words count less than differences for shorter ones. More precisely, a difference for the word u counts as much as the sum of all differences for words uv properly extending u. Now we show that these functions satisfy the required properties. Proof. In order to show that they are language distances, we have to show that they satisfy

For requirement (M 4):
The assumption that L n ≡ L mod Σ n implies that d 1 (L n , L) 2 −(n+1) and d 2 (L n , L) = Using one of the language distances d 1 , d 2 introduced above in this setting means that differences between the concepts C, D at larger role depth count less than differences at smaller role depth.
Lemma 5. Let d be a language distance and f be a combining function. Then the concept distance induced by f, d is indeed a concept distance, i.e., it is equivalence closed, symmetric, and equivalence invariant. Equivalence invariance: Reducing approximate unification to approximately solving language equations In the following, we assume that d is a language distance, f a combining function, and m d,f the concept distance induced by f, d. Let C, D be FL 0 concept patterns in normal form, as shown in (6), and (7) the corresponding language equations, for i = 1, . . . , k. We denote the left-and right-hand sides of the equations (7) with φ i and ψ i , respectively. The following lemma shows that the degree of violation transfers from finite assignments σ 1 , . . . , σ k to the induced substitution ρ(σ 1 , . . . , σ k ) as defined in (8).
Lemma 7. Assume that d satisfies (M 4). Then, In case f is computable (in polynomial time), this lemma yields a (polynomial time) reduction of the computation problem for approximate FL 0 unification to the computation problem for approximately solving language equations. In addition, we know that the decision problem can be reduced to the computation problem. Thus, it is sufficient to devise a procedure for the computation problem for approximately solving language equations.
In our example, the normal forms of the abbreviated concept descriptions (1) and (3)  It is easy to see that the language equations for the concept names A, D, E are solvable, and thus these solutions contribute distance 0 to the overall concept distance. The language equation for the concept name B is {rs} ∪ {r} · X B = ∅ ∪ ∅ · X B , and the assignment X B = ∅ leads to the smallest possible symmetric difference {rs}, which w.r.t. d 1 yields the value 2 −2 = 1/4. It is easy to see that this is actually the infimum for this equation. If we use the combining function avg, then this gives us the infimum 1/16 for our approximate unification problem.

Approximately solving language equations
In the following, we show how to solve the computation problem for the language distances d 1 and d 2 introduced above. Our solution uses the automata-based approach for solving language equations introduced in [5].
The first step in this approach is to transform the given system of language equations into a single equation of the form φ = ∅ such that the language expression φ is normalized in the sense that all constant languages L occurring in φ are singleton languages {a} for a ∈ Σ ∪ {ε}. This normalization step can easily be adapted to approximate equations, but in addition to a normalized approximate equation φ a ≈ ∅ it also generates a normalized strict equation φ s = ∅.
Lemma 8. Let φ, ψ be language expressions. Then we can compute in polynomial time normalized language expressions φ a and φ s such that the following holds for d ∈ {d 1 , d 2 }: Proof. In [5] (Lemma 1) it is shown how a given system of language equations can be transformed into a single normalized language equation such that there is a one-to-one correspondence between the solutions of the original system and the solutions of the normal form. Given an approximate equation φ ≈ ψ, we first abstract the left-and right-hand side of this equation with new variables X, Y , and add strict equations that say that X must be equal to φ and Y must be equal to ψ, i.e., we consider the approximate equation X ≈ Y together with the strict equations X = φ and Y = ψ. We then apply the normalization approach of [5] to the two strict equations. Basically, this approach introduces new variables and equations that express the regular languages occurring in φ and ψ, using the fact that regular languages can be expressed as unique solutions of language equations in solved form (see [5] for details). The resulting system of strict equations can then be expressed by a single strict equation φ s = ∅ using the facts that Though it is not explicitely stated in [5], it is easy to see that this transformation is such that, for any assignment σ, there is a solution θ of φ s = ∅ such that θ(X) = σ(ψ) and θ(Y ) = σ(ψ).
Conversely, any solution θ of φ s = ∅ satisfies θ(X) = σ(ψ) and θ(Y ) = θ(ψ). Consequently, we have If we now define φ a := X Y , then the lemma is an easy consequence of the above identity and the fact that both d 1 and d 2 consider the symmetric difference of the input languages.
This lemma shows that, to solve the computation problem for φ ≈ ψ, we must solve the computation problem for φ a ≈ ∅, but restrict the infimum to assignments that solve the strict equation φ s = ∅.
In a second step, [5] shows how a normalized language equation can be translated into a tree automaton working on the infinite, unlabeled n-ary tree (where n = |Σ|). The nodes of this tree can obviously be identified with Σ * . The automata considered in [5] are such that the state in each successor of a node is determined independently of the choice of the states in its siblings. These automata are called looping tree automata with independent transitions (ILTA).

Definition 6.
An ILTA is of the form A = (Σ, Q, Q 0 , δ), where Σ is a finite alphabet, Q is a finite set of states, with initial states Q 0 ⊆ Q, and δ : Q × Σ → 2 Q is a transition function that defines possible successors of a state for each a ∈ Σ. A run of this ILTA is any function r : Σ * → Q with r(ε) ∈ Q 0 and r(wa) ∈ δ(r(w), a) for all w ∈ Σ * and a ∈ Σ.
According to this definition, ILTAs do not have a fixed set of final states. However, by choosing any set of states F ⊆ Q, we can use runs r of A to define languages over Σ as follows: Given a normalized language equation φ = ∅ with variables {X 1 , . . . , X m }, it is shown in [5] how to construct an ILTA A φ = (Σ, Q φ , Q φ 0 , δ φ ) and subsets F, F 1 , . . . , F m ⊆ Q φ such that the following holds: Proposition 1. If r is a run of A φ , then the induced assignment σ r with σ r (X i ) := L r (A φ , F i ), for i = 1, . . . , m, satisfies σ r (φ) = L r (A φ , F ). In addition, every assignment is induced by some run of A φ .
The size of this ILTA is exponential in the size of φ. In order to decide whether the language equation φ = ∅ has a solution, one thus needs to decide whether A φ has a run in which no state of F occurs. This can easily be done by removing all states of F from A φ , and then checking the resulting automaton A φ −F for emptiness. In fact, as an easy consequence of the above proposition we obtain that there is a 1-1-correspondence between the runs of A φ −F and the solutions of φ = ∅ (Proposition 2 in [5]). This approach can easily be adapted to the situation where we have an approximate equation φ a ≈ ∅ and a strict equation φ s = ∅. Basically, we apply the construction of [5] to φ a ∪ φ s , but instead of one set of states F we construct two sets F a and F s such that σ r (φ a ) = L r (A φa∪φs , F a ) and σ r (φ s ) = L r (A φa∪φs , F s ) holds for all runs r of A φa∪φs . By removing all states of F s from A φa∪φs , we obtain an automaton whose runs are in 1-1-correspondence with the assignments that solve φ s = ∅. In addition, we can make this automaton trim 2 using the polytime construction in the proof of Lemma 2 in [5].
Theorem 2. Given an approximate equation φ a ≈ ∅ and a strict equation φ s = ∅, we can construct in exponential time a trim ILTA A = (Σ, Q, Q 0 , δ) and sets of states F a , F 1 , . . . , F m ⊆ Q such that every run r of A satisfies σ r (φ a ) = L r (A, F a ) and σ r (φ s ) = ∅. In addition, every assignment σ with σ(φ s ) = ∅ is induced by some run of A.
The measure d 1 Using Lemma 8, Theorem 2, and the definition of d 1 , it is easy to see that the computation problem for an approximate language equation φ ≈ ψ can be reduced to solving the following problem for the trim ILTA A = (Σ, Q, Q 0 , δ) of Theorem 2: compute sup r run of A min{|w| | r(w) ∈ F a }. More formally, we have the following lemma.
In order to compute this supremum, it is sufficient to compute, for every state q ∈ Q, the length lpr (q) of the longest partial run of A starting with q that does not have states of F a at non-leaf nodes. More formally, we define: Definition 7. Let Σ denote the set of all words over Σ of length at most . Given a trim ILTA A = (Σ, Q, Q 0 , δ), a partial run of A of length from a state q ∈ Q is a mapping p : Σ → Q such that p(ε) = q and p(wa) ∈ δ(p(w), a) for all w ∈ Σ −1 and a ∈ Σ. The leaves of p are the words of length . Finally, for every q ∈ Q we have that lpr (q) := sup{ | ∃p : Σ → Q s.t. p( ) = q ∧ ∀w ∈ Σ −1 (p(wa) ∈ δ(p(w), a) ∧ p(w) / ∈ F a )}. Proof. In order to compute lpr , we use an iteration similar to the emptiness test for looping tree automata [6].
If q ∈ F a , then clearly lpr (q) = 0 and otherwise q has an appropriate partial run of length > 0 (recall that A is trim). For this reason, we start the iteration with Next, for i ≥ 0, we define We have Q (0) ⊆ Q (1) ⊆ Q (2) ⊆ . . . ⊆ Q. Since Q is finite, there is an index j ≤ |Q| such that Q (j) = Q (j+1) , and thus the iteration becomes stable.
It is easy to show that To prove the above, the following claim is enough.
Claim. It holds that q / ∈ Q (i) iff there is a partial run of length i + 1 of A starting with q that does not have states of F a at non-leaf nodes.
Proof (Claim). By induction on i: there is a partial run of length 1 of A starting with q that does not have states of F a at non-leaf nodes (i.e. at the root). For i 1, if q / ∈ Q (i) then for every a ∈ Σ, it holds that δ(q, a) ⊆ Q (i−1) , i.e., for every a ∈ Σ there exists a q a ∈ δ(q, a) \ Q (i−1) . By the induction hypothesis, for every such q a there is a partial run of length i of A starting with q a that does not have states of F a at non-leaf nodes, thus we can construct such a run of length i + 1 for q. If q ∈ Q (i) , then there exists an a ∈ Σ, s.t. δ(q, a) ⊆ Q (i−1) . By the induction hypothesis, there is no partial run of length i of A starting with p for any p ∈ δ(q, a) that does not have states of F a at non-leaf nodes. Thus, there is no such run of lenth i + 1 starting with q, and this completes the proof of the claim.
If q / ∈ Q (j) , note that the claim implies that there are such runs for every n ∈ N, and thus lpr (q) = ∞.
Since the number of iterations is linear in |Q| and every iteration step can obviously be performed in polynomial time, this completes the proof.
The function lpr can now be used to solve the computation problem as follows: If this maximum is ∞, then the measure d 1 yields value 0 and the approximate equation was actually solvable as a strict one.
Theorem 3. For the distance d 1 and a polytime computable combining function, the computation problem (for approximate FL 0 unification and for approximately solving language equations) can be solved in exponential time, and the decision problem is ExpTime-complete.
Proof. The ExpTime-upper bounds follow from our reductions and the fact that the automaton A can be computed in exponential time and is thus of at most exponential size. Hardness can be shown by a reduction of the strict problems, which are known to be ExpTime-complete [4,5]. In fact, the proof of Lemma 10 shows that d 1 either yields the value 0 = 2 −∞ (in which case the strict equation is solvable) or a value larger than 2 −(|Q|+1) (in which case the strict equation is not solvable). In other words, for a threshold smaller than 2 −(|Q|+1) the decision problem is equivalent to the classical solvability problem.

The measure d 2
Recall that the value of d 2 is obtained by applying the function µ to the symmetric difference of the input languages. In case one of the two languages is empty, its value is thus obtained by applying µ to the other language. It is easy to show that the following lemma holds.
where a −1 L := {w ∈ Σ * | aw ∈ L} and χ L is the characteristic function of the language L. Proof.
Using Lemma 8, Theorem 2, and the definition of d 2 , it is easy to see that the computation problem for an approximate language equation φ ≈ ψ w.r.t. d 2 can be reduced to solving the following problem for the trim ILTA A = (Σ, Q, Q 0 , δ) of Theorem 2: compute inf r run of A µ(L r (A, F a )). In fact, this is the exact value that answers the computation problem, as can be seen from the following lemma. In the following, we are only interested in languages defined by runs of A with set of final states F a , thus for ease of notation we will write L r instead of L r (A, F a ).
Using (10), we now show that this infimum can be computed by solving a system of recursive equations that is induced by the transitions of A. Given an arbitrary (not necessarily initial) state q ∈ Q, we say that r : Σ * → Q is a q-run of A if r(ε) = q and r(wa) ∈ δ(r(w), a) for all w ∈ Σ * and a ∈ Σ. We denote the set of all q-runs of A with R A (q). Since each run of A is a q 0 -run for some q 0 ∈ Q 0 , we have For all q ∈ Q, we define µ(q) := inf r∈R A (q) µ(L r ). The identity above shows that we can solve the computation problem for approximate language equations w.r.t. d 2 if we can devise a procedure for computing the values µ(q) ∈ R for all q ∈ Q. The identity (10) can now be used to show the following lemma.
Lemma 13. For all states q ∈ Q we have where χ Fa denotes the characteristic function of the set F a .
Proof. Given a run r ∈ R A (q), for every a ∈ Σ a unique run r a ∈ R A (r(a)) is defined as r a (w) = r(aw). Obviously, r(a) ∈ δ(q, a). Conversely, given a run r a ∈ R A (q a ), for every a ∈ Σ, such that q a ∈ δ(q, a), a unique run r 0 ∈ R A (q) can be derived, as Hence, there is a bijection between the set of q-runs R A (q) and the set of "successor" runs SR(q) := p1∈δ(q,a1) R A (p 1 ) × · · · × pn∈δ(q,an) R A (p n ).
Thus, given a run r ∈ R A (q) it holds µ(L r ) = µ( (q) ∪ a∈Σ aL ra ) = µ( (q)) + a∈Σ µ(aL ra ) Thus it can be inferred that By introducing variables x q (for q ∈ Q) that range over R, we can rephrase this lemma by saying that the values µ(q) yield a solution to the system of equations Thus, to compute the values µ(q) for q ∈ Q it is sufficient to compute a solution of (11).
Next, we use Banach's fixed point theorem to show that the system has a unique solution in R. In particular, we will transform the system of equations to a contraction in R k that has as a fixed point the solution of (11).
For every equation (11) corresponding to a state q i ∈ Q, we represent the right-hand side as a function f i : R k → R, defined by Before proving that f is a contraction, we provide a technical lemma that will be useful in the proof of the next one.
Lemma 14. Given a finite set of indices I and a set {J(i) ⊆ I | i ∈ I}, it holds that max i∈I min j∈J(i) Proof. For every i ∈ I we have min j∈J(i) Then, for every i ∈ I we get min j∈J(i) For equality ( * ) suppose without loss of generality, x ki y i . If x ki y i , the exact symmetric argument can be used.
Finally, we have that The following lemma provides the last condition for Theorem 1.
Lemma 15. The function f defined above is a contraction in (R k , d ∞ ).
Finally, since (R k , d ∞ ) is complete, from Theorem 1 and Proposition 2 we get the following.
Lemma 16. The system of equations (11) has a unique solution.
In order to actually compute this, we can use Linear Programming [15].
A Linear Programming problem or LP problem is a set of restrictions along with an objective function. In its most general form, an LP problem looks like this: The feasible region of the LP problem consists of all the tuples (x 1 , . . . , x n ) that satisfy the restrictions. The answer to an LP problem is a tuple in the feasible region that maximizes the objective function and "no" if the feasible region is empty.
It is well known that LP problems are solvable in polynomial time in the size of the problem. [15] From the above system of equations 11 we can derive an LP problem. The only non-trivial step in this translation is to express the minimum operator. For this, we introduce additional variables y q,a , which intuitively stand for min p∈δ(q,a) x p . Then (11) is transformed into To express the intuitive meaning of the variables y q,a , we add the inequalities y q,a ≤ x p for all q ∈ Q and p ∈ δ(q, a) as well as the objective to maximize the values of these variables: Lemma 17. The LP problem consisting of the equations (12), the inequations (13), and the objective (14) has the unique solution Proof. Initially, observe that the above vector is in the feasible region, since it satisfies the restrictions (12) and (13). Next, we procede to show that it is indeed the only point that maximizes the objective function. First, we need the following claim.
Claim. If x is a solution that maximizes the objective function then, for every q ∈ Q and every a ∈ Σ, at least one of the inequalities (13) holds as an equality.
Proof (Claim). Suppose on the contrary that x is a solution that maximizes z, but for some q ∈ Q and a ∈ Σ, inequalities x q,a µ p are strict for all p ∈ δ(q, a). This would mean that the value of x q,a can be increased, until it actually becomes equal to min p∈δ(q,a) µ p , and all inequalities would still hold. The only restriction that would be hurt, is the one of the form (12) for the state q. This can be easily mended by setting µ q to be equal to the right-hand side. This change will not affect any of the other restrictions. Thus, a new point x has been produced, that satisfies all the restrictions of the LPP and additionally gives a larger value for the objective function. This is a contradiction to our initial assertion about x. This completes the proof of the claim.
As a result, any points that are solutions to the LP problem, satisfy the condition y q,a = min p∈δ(q,a) x p for all q, a. Given that they also satisfy the equality constraints (12) (since they are in the feasible region), they correspond to solutions of the system of equations (11).
Finally, since there is a unique such solution, the solution of the LP problem is this unique solution.
Since LP problems can be solved in polynomial time and the size of the LP problem in the above lemma is polynomial in the size of A, we obtain an ExpTime-upper bound for the computation problem and the decision problem. ExpTime-hardness can again be shown by a reduction of the strict problem.
Even though the main idea is the same, the formal proof is quite more technical in this case. Initially note that solving (11) induces a "best" q-run of the automaton for every q; in every step, pick the state with the minimum value among all possible. Given such a best run of the automaton, we say that p is a descendant of q at depth d, if there are states q 0 := q, q 1 , . . . , q d−1 , q d := p, and a word a 1 . . . , a d s.t. q i = arg min q∈δ(qi−1,ai) µ(q) for i = 1, . . . , d. A bad descendant of a state q is a state p ∈ F a that is a descendant of q. Note that, if q ∈ F a , then µ(q) 1 2 and if q ∈ F a , then µ(q) 1 2 .
Lemma 18. For a state q it holds that µ(q) > 0 if and only if q has a bad descendant.
Proof. If q has a bad descendant p, say at depth d, then there is a branch with nodes labeled q, q 1 , . . . , p and thus µ(q) Conversely, suppose that q has no bad descendant. Thus q / ∈ F a and the same holds for all its descendants. Then it holds that µ(q) = 1 2 χ Fa (q) + 1 2|Σ| for some child p of q. Iterating this for d steps, we get that µ(q) ( 1 2 ) d µ(p ) for some descendant p of q. But since p / ∈ F a , µ(p ) 1 2 and thus µ(q) ( 1 2 ) d+1 . Since this holds for every d, it can be concluded that µ(q) = 0.
Lemma 19. If q has a bad descendant, then q has a bad descendant at depth at most |Q|.
Proof. Set k = |Q|. Suppose that there exists a q 0 ∈ Q with no bad descendants up to depth k. It will be proved that there is a q 0 -run with no states from F a , i.e. q 0 has no bad descendants.
No bad descendants up to depth k implies that there is a partial run of length k of A starting with q 0 that does not have state of F a . For a branch of length k, the nodes are labelled with states q 0 , q 1 , . . . , q k . Since there are only k states, there are indices i < j k such that q i = q j . The tree having as root the node labeled with q i has bigger length than the one labeled with q j . Replace the latter tree with the former one. Then, all branches passing through the node labeled with q j have length at least k + 1. Iterating this procedure for all branches, a partial run of length at least k + 1 is derived. Every time the above is repeated, a longer partial run is derived. We conclude that a partial run of infinite length, i.e. a q 0 -run of A can be derived that has no states from F a . Thus q 0 has no bad descendants.
Lemma 20. For the distance d 2 , the decision problem for approximately solving language equations is ExpTime-hard.
Proof. If µ(q) > 0, then q has a bad descendant at depth at most |Q|. Thus µ(q) ( 1 2|Σ| ) |Q| · 1 2 =: t. We conclude that the decision problem with threshold t has a positive answer for the equation φ ≈ ∅ iff the equation φ = ∅ has a solution. Since the latter problem is ExpTime-complete, we get an ExpTime-hardness result for our problem as well.
Theorem 4. For the distance d 2 and a polytime computable combining function, the computation problem (for approximate FL 0 unification and for approximately solving language equations) can be solved in exponential time, and the decision problem is ExpTime-complete.
For this theorem to hold, the exact definition of the distance d 2 is actually not important. Our approach works as long as the distance induces a system of equations similar to (11) such that Banach's fixed point theorem ensures the existence of a unique solution, which can be found using linear programming.
The condition that a difference for the word u counts as much as the sum of all differences for words uv properly extending u holds if we set λ = 1 2 . Furthermore, note that for ν = λ = 1 2 , wt(a) = 1 |Σ| for every a ∈ Σ, we get the function µ defined for d 2 .

Conclusion
We have extended unification in DLs to approximate unification in order to enhance the recall of this method of finding redundancies in DL-based ontologies. For the DL FL 0 , unification can be reduced to solving certain language equations [4]. We have shown that, w.r.t. two particular distance measures, this reduction can be extended to the approximate case. Interesting topics for future research are considering approximate unification for other DLs such as EL [3]; different distance measures for FL 0 and other DLs, possibly based on similarity measures between concepts [13,16]; and approximately solving other kinds of language equations [12].
Approximate unification has been considered in the context of similarity-based Logic Programming [9], based on a formal definition of proximity between terms. The definition of proximity used in [9] is quite different from our distances, but the major difference to our work is that [9] extends syntactic unification to the approximate case, whereas unification in FL 0 corresponds to unification w.r.t. the equational theory ACUIh (see [4]). Another topic for future research is to consider unification w.r.t. other equational theories. First, rather simple, results for the theory ACUI , which extend the results for strict ACUI -unification [10], can be found in [2].