What are the methods of inference in First order Logic

inference in first order logic in artificial intelligence, what is inference in first order logic, and inference in first order logic in artificial intelligence pdf free download
HalfoedGibbs Profile Pic
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
INFERENCE IN 9 FIRST-ORDER LOGIC In which we dejine eflective procedures for answering questions posed in jirst- order logic. Chapter 7 defined the notion of inference and showed how sound and complete inference can be achieved for propositional logic. In this chapter, we extend those results to obtain algo- rithms that can answer any answerable question stated in first-order logic. This is significant, because more or less anything can be stated in first-order logic if you work hard enough at it. Section 9.1 introduces inference rules for quantifiers and shows how to reduce first- order inference to propositional inference, albeit at great expense. Section 9.2 describes the idea of unification, showing how it can be used to construct inference rules that work di- rectly with first-order sentences. We then discuss three major families of first-order inference algorithms: forward chaining and its applications to deductive databases and production systems are covered in Section 9.3; backward chaining and logic programming systems are developed in Section 9.4; and resolution-based theorem-proving systems are described in Section 9.5. In general, one tries to use the most efficient method that can accommodate the facts and axioms that need to be expressed. Reasoning with fully general first-order sentences using resolution is usually less efficient than reasoning with definite clauses using forward or backward chaining. This section and the next introduce the ideas underlying modern logical inference systems. We begin with some simple inference rules that can be applied to sentences with quantifiers to obtain sentences without quantifiers. These rules lead naturally to the idea thatrst-order inference can be done by converting the knowledge base to propositional logic and using propositional inference, which we already know how to do. The next section points out an obvious shortcut, leading to inference methods that manipulate first-order sentences directly. Section 9.1. Propositional vs. First-Order Inference 273 Inference rules for quantifiers Let us begin with universal quantifiers. Suppose our knowledge base contains the standard folkloric axiom stating that all greedy kings are evil: tJ x King(x) A Greedy (x) + Euil(x) . Then it seems quite permissible to infer any of the following sentences: King(John) A Greedy(John) + Evil (John) . King(Richard) A Greedy(Richard) + Evil(Richard) . King(Father(John)) A Greedy (Father(John)) + Euil(Father( John)) . UNIVERSAL The rule of Universal Instantiation (UI for short) salys that we can infer any sentence ob- INsTANTlATloN tained by substituting a ground term (a term without varialbles) for the variable. ' To write out the inference rule formally, we use the notion of substitutions introduced in Section 8.3. Let SUBST(, a) denote the result of applying the substitution 8 to the sentence a. Then the rule is written Vv a SUBST(V/), 4 for any variable v and ground term g. For example, the three sentences given earlier are obtained with the substitutions 1x1 John), x/Richard), arid x/Father(Jolzn)). EXISTENTIAL The corresponding Existential Instantiation rule: for the existential quantifier is slightly INSTANTIATION more complicated. For any sentence a, variable v, and constant symbol k that does not appear elsewhere in the knowledge base, For example, from the sentence 3 2 Crown(x) A OnHead (x, John) we can infer the sentence Crown(C1) A OnHead(Cl, John) as long as Cl does not appear elsewhere in the knowledge base. Basically, the existential sentence says there is some object satisfying a conditioln, and the instantiation process is just giving a name to that object. Naturally, that name musl. not already belong to another object. Mathematics provides a nice example: suppose we discover that there is a number that is a little bigger than 2.71828 and that satisfies the equation d(xy)/dy = xy for x. We can give this number a name, such as e, but it would be a mistake to give it the name of an existing object, SKOLEMCONSTANT such as T. In logic, the new name is called a Skolem constant. Existential Instantiation is a special case of a more general process called skolemization, which we cover in Section 9.5. Do not confuse these substitutions with the extended interpretations used to define the semantics of quantifiers. The substitution replaces a variable with a term (a piece of syntax) to produce a new sentence, ,whereas an interpretation maps a variable to an object in the domain. 274 Chapter 9. Inference in First-Order Logic As well as being more complicated than Universal Instantiation, Existential Instanti- ation plays a slightly different role in inference. Whereas Universal Instantiation can be applied many times to produce many different consequences, Existential Instantiation can be applied once, and then the existentially quantified sentence can be discarded. For example, once we have added the sentence Kill(Murderer, Victim), we no longer need the sentence 3 x Kill(x, Victim). Strictly speaking, the new knowledge base is not logically equivalent INFERENTIAL to the old, but it can be shown to be inferentially equivalent in the sense that it is satisfiable EQUIVALENCE exactly when the original knowledge base is satisfiable. Reduction to propositional inference Once we have rules for inferring nonquantified sentences from quantified sentences, it be- comes possible to reduce first-order inference to propositional inference. In this section we will give the main ideas; the details are given in Section 9.5. The first idea is that, just as an existentially quantified sentence can be replaced by one instantiation, a universally quantified sentence can be replaced by the set of all possible instantiations. For example, suppose our knowledge base contains just the sentences 'd x Kzng (x) A Greedy(x) + Evil (x) King (John) Greedy (John) Brother(Richard, John) . Then we apply UI to the first sentence using all possible ground term substitutions from the vocabulary of the knowledge base-in this case, xl John) and x/Richard). We obtain King( John) A Greedy( John) + Evil (John) , King (Richard) A Greedy (Richard) + Evil (Richard) , and we discard the universally quantified sentence. Now, the knowledge base is essentially propositional if we view the ground atomic sentences-Kzng (John), Greedy (John), and so on- as proposition symbols. Therefore, we can apply any of the complete propositional algorithms in Chapter 7 to obtain conclusions such as Evil (John). PROPOSITIONALIZATION This technique of propositionalization can be made completely general, as we show in Section 9.5; that is, every first-order knowledge base and query can be propositionalized in such a way that entailment is preserved. Thus, we have a complete decision procedure for entailment . . . or perhaps not. There is a problem: When the knowledge base includes a function symbol, the set of possible ground term substitutions is infinite For example, if the knowledge base mentions the Father symbol, then infinitely many nested terms such as Father(Father(Father(J0hn))) can be constructed. Our propositional algorithms will have difficulty with an infinitely large set of sentences. Fortunately, there is a famous theorem due to Jacques Herbrand (1930) to the effect that if a sentence is entailed by the original, first-order knowledge base, then there is a proof involving just anite subset of the propositionalized knowledge base. Since any such subset has a maximum depth of nesting among its ground terms, we can find the subset by first generating all the instantiations with constant symbols (Richard and John), then all terms of Section 9.2. Unification and Lifting 275 depth 1 (Father(Richard) and Father(John)), then all tenns of depth 2, and so on, until we are able to construct a propositional proof of the entailed sentence. We have sketched an approach to first-order inference via propositionalization that is complete-that is, any entailed sentence can be proved. This is a major achievement, given that the space of possible models is infinite. On the other hand, we do not know until the proof is done that the sentence is entailed What happens when the sentence is not entailed? Can we tell? Well, for first-order logic, it turns out that we cannot. Our proof procedure can go on and on, generating more and more deeply nested terms, but we will not know whether it is stuck in a hopeless loop or whether the proof is just about to pop out. This is very much like the halting problem for Turing machines. Alan firing (1936) and Alonzo Church (1936) both proved, in rather different ways, the inevitability of this state of affairs. The question of entailment for first-order logic is semidecidable-that is, algorithms exist that say yes to every entailed sentence, but no algorithm exists that also says no IO every nonentailed sentence. 9.2 UNIFICATION AND LIFTING The preceding section described the understanding of first-order inference that existed up to the early 1960s. The sharp-eyed reader (and certainly the computational logicians of the early 1960s) will have noticed that the propositionalization approach is rather inefficient. For example, given the query Evil(x) and the knowledge: base in Equation (9.1), it seems per- verse to generate sentences such as King(Richard) A Greecly(Richard) + Evil (Richard). Indeed, the inference of Evil (John) from the sentences 'd x King (x) A Greedy (x) + Evil (x) King (John) Greedy( John) seems completely obvious to a human being. We now show how to make it completely obvious to a computer. A first-order inference rule The inference that John is evil works like this: find some :c such that x is a king and x is greedy, and then infer that this x is evil. More generally, if there is some substitution 8 that makes the premise of the implication identical to sentences already in the knowledge base, then we can assert the conclusion of the implication, after applying 8. In this case, the substitution xi John) achieves that aim. We can actually make the inference step do even more work. Suppose that instead of knowing Greedy(John), we know that everyone is greedy: V y Greedy (y) . Then we would still like to be able to conclude that Evil(John), because we know that John is a king (given) and John is greedy (because everyone is greedy). What we need for this to work is find a substitution both for the variables in the implication sentence 276 Chapter 9. Inference in First-Order Logic and for the variables in the sentences to be matched. In this case, applying the substitution XI John, y / John) to the implication premises King (x) and Greedy (x) and the knowledge base sentences King(John) and Greedy(y) will make them identical. Thus, we can infer the coilclusion of the implication. This inference process can be captured as a single inference rule that we call General- GENERA PONENS LIZED ized Modus Ponens: For atomic sentences p,, p,', and q, where there is a substitution 8 such that Suesr(Q,pil) = Suesr(B,p,), for all i, There are n + 1 premises to this rule: the n atomic sentences pi' and the one implication. The conclusion is the result of applying the substitution 0 to the consequent q. For our example: pll is King(John) pl is Kzng(x) p2' is Greedy (y) p2 is Greedy(x) 8 is x/ John, y/ John) q is Evil (x) SUBST(, q) is Evil (John) . It is easy to show that Generalized Modus Ponens is a sound inference rule. First, we observe that, for any sentence p (whose variables are assumed to be universally quantified) and for any substitution 8, p I= Suss(8,p) . This holds for the same reasons that the Universal Instantiation rule holds. It holds in partic- ular for a 8 that satisfies the conditions of the Generalized Modus Ponens rule. Thus, from pll,. . . ,pnl we can infer SUB ST(,') A .. . A SUB ST(,,') and from the implication pl A . . . A p, j q we can infer SUI3sT(B,pl) A .. . A Suss(B,p,) +- SUBST(, q) . Now, 8 in Generalized Modus Ponens is defined so that SUBST(, pi') = SUBST(, pi), for all i; therefore the first of these two sentences matches the premise of the second exactly. Hence, Suss(19, q) follows by Modus Ponens. LIFTING Generalized Modus Ponens is a lifted version of Modus Ponens-it raises Modus Po- nens from propositional to first-order logic. We will see in the rest of the chapter that we can develop lifted versions of the forward chaining, backward chaining, and resolution algorithms introduced in Chapter 7. The key advantage of lifted inference rules over propositionalization is that they make only those substitutions which are required to allow particular inferences to proceed. One potentially confusing point is that in one sense Generalized Modus Ponens is less general than Modus Ponens (page 21 1): Modus Ponens allows any single a on the left-hand side of the implication, while Generalized Modus Ponens requires a special format for this sentence. It is generalized in the sense that it allows any number of Pi'. Unification Lifted inference rules require finding substitutions that make different logical expressions UNIFICATION look identical. This process is called unification and is a key component of all first-order Section 9.2. Unification and Lifting 277 UNIFIER inference algorithms. The U NIFY algorithm takes tcvo sentences and returns a unifier for them if one exists: UNIFY(, q) = 0 where SUBST(O,) = SUBST(, q) . Let us look at some examples of how UNIFY should behave. Suppose we have a query Knows (John, x) : whom does John know? Some answers to this query can be found by find- ing all sentences in the knowledge base that unify with Knows (John, x). Here are the results of unification with four different sentences that might be in the lcnowledge base. UN(Knows(John, x), Knows(John, Jane)) = XI Jane) UlFY(Knows(John, x), Knows(y, Bill)) = x/Bi11, y/ John) UNIFY(KOWS (John, x), Knows(y, other())) = y/ John, x/Mother(John) UNIFY (Knows( John, x), Knows (x, Elizabeth)) = Jail . The last unification fails because x cannot take on the values John and Elizabeth at the same time. Now, remember that Knows(x, Elizabeth) meails "Everyone knows Elizabeth," so we should be able to infer that John knows Elizabeth. The problem arises only because the two sentences happen to use the same variable name, 1:. The problem can be avoided by standardizing apart one of the two sentences beiing unified, which means renaming its APART variables to avoid name clashes. For example, we can rename x in Knows(x, Elizabeth) to 217 (a new variable name) without changing its meaning. Now the unification will work: UIF(Knows(John, x), Knows(z17, Elizabeth)) = z/Elizabeth, z17/ John) . Exercise 9.7 delves further into the need for standardizing apart. There is one more complication: we said that U NIFY should return a substitution that makes the two arguments look the same. But there coilld be more than one such uni- fier. For example, UY(Knows(John, x), Knows(y, z)) could return y/ John, x/z) or y/ John, z/ John, z/ John). The first unifier gives Knows(John, z) as the result of unifi- cation, whereas the second gives Knows (John, John), The second result could be obtained from the first by an additional substitution z/ John); we say that the first unifier is more general than the second, because it places fewer restrictions on the values of the variables. It U MOSTGENERAL NIFIER turns out that, for every unifiable pair of expressions, there is a single most general unifier (or MGU) that is unique up to renaming of variables. In this case it is y/ John, x/z). An algorithm for computing most general unifiers is shown in Figure 9.1. The process is very simple: recursively explore the two expressions simultaneously "side by side," building up a unifier along the way, but failing if two corresponding points in the structures do not match. There is one expensive step: when matching a variable against a complex term, one must check whether the variable itself occurs inside the term; if it does, the match fails OCCURCHECK because no consistent unifier can be constructed. This so-called occur check makes the complexity of the entire algorithm quadratic in the size of the expressions being unified. Some systems, including all logic programming systems, sirnply omit the occur check and sometimes make unsound inferences as a result; other system:s use more complex algorithms with linear-time complexity. 278 Chapter 9. Inference in First-Order Logic function UNIFY(Z, y, 0) returns a substitution to make x and y identical inputs: x, a variable, constant, list, or compound y, a variable, constant, list, or compound 0, the substitution built up so far (optional, defaults to empty) if 0 = failure then return failure else if x = y then return 0 else if VARIABLE?(%) then return UNIFY-VAR(X, y, 0) else if VARIABLE?() then return UNIFY-VAR(, x, 0) else if COMPOUND?(X) and COMPOUND?() then return UNIFY(ARGSX, ARGSy, UNIFY(OPx, OPy, 6)) else if LIST?(X) and LIST?() then return UNIFY(RESTz, REST, UNIFY(FIRSTX, FIRST, 0)) else return failure function UNIFY-V(var, x, 0) returns a substitution inputs: var, a variable x, any expression 0, the substitution built up so far if var/val E 6' then return UNIFY(VU, a, 0) else if x/val E 6' then return UNIFY(U, val, 8) else if OCCUR-CHECK?(, z) then return failure else return add varlx) to 6' Figure 9.1 The unification algorithm. The algorithm works by comparing the structures of the inputs, element by element. The substitution 0 that is the argument to UNIFY is built up along the way and is used to make sure that later comparisons are consistent with bindings that were established earlier. In a compound expression, such as F(A, B), the function OP picks out the function symbol F and the filnction ARCS picks out the argument list (A, B). Storage and retrieval Underlying the TELL and ASK functions used to inform and interrogate a knowledge base are the more primitive STORE and FETCH functions. STORE(S) stores a sentence s into the knowledge base and FETCH() returns all unifiers such that the query q unifies with some sentence in the knowledge base. The problem we used to illustrate unification-finding all facts that unify with Knows(John, x)-is an instance of FETCH. The simplest way to implement STORE and FETCH is to keep all the facts in the knowl- edge base in one long list; then, given a query q, call UNIFY(, s) for every sentence s in the list. Such a process is inefficient, but it works, and it's all you need to understand the rest of the chapter. The remainder of this section outlines ways to make retrieval more efficient, and can be skipped on first reading. We can make FETCH more efficient by ensuring that unifications are attempted only with sentences that have some chance of unifying. For example, there is no point in trying Section 9.2. Unification and Lifting 279 to unify Knows(John, x) with Brother(Richard, John). We can avoid such unifications by INDEXING indexing the facts in the knowledge base. A simple scheme called predicate indexing puts PREDICATE all the Knows facts in one bucket and all the Brother facts in another. The buckets can be INDEXING 2 stored in a hash table for efficient access. Predicate indexing is useful when there are many predicate symbols but only a few clauses for each symbol. In some applications, however, there are many clauses for a given predicate symbol. For example, suppose that the tax authorities want to keep track of who employs whom, using a predicate Employs(z, y). This would be a very large bucket with perhaps millions of employers and tens of millions of employees. Answering a query such as Employs(x, Richard) with predicate indexing would require scanning the entire bucket. For this particular query, it would help if facts were indexed both by predicate and by second argument, perhaps using a combined hash table key. Then we could simply construct the key from the query and retrieve exactly those facts that unify with the query. For other queries, such as Employs (AIMA. org , y), we would need to have indexed the facts by com- bining the predicate with the first argument. Therefiore, facts can be stored under multiple index keys, rendering them instantly accessible to various queries that they might unify with. Given a sentence to be stored, it is possible to construct indices for all possible queries that unify with it. For the fact Employs(AIMA.org, Richand), the queries are Employs (A IMA. org, Richard) Does AIMA.org ernploy Richard? Employs (x, Richard) Who employs Richard? Employs(AIMA.org, y) Whom does AIMA..org employ? Ernlos(x, Y) Who employs whom? LA sUBsUMPTloN TTICE These queries form a subsumption lattice, as shown in Figure 9.2(a). The lattice has some interesting properties. For example, the child of any node in the lattice is obtained from its parent by a single substitution; and the "highest" common descendant of any two nodes is the result of applying their most general unifier. The portion of the lattice above any ground fact can be constructed systematically (Exercise 9.5). A sentence with repeated constants has a slightly different lattice, as shown in Figure 9.2(b). Function symbols and variables in the sentences to be stored introduce still more interesting lattice structures. The scheme we have described works very well whenever the lattice contains a small number of nodes. For a predicate with n arguments, the lattice contains 0(2n) nodes. If function symbols are allowed, the number of nodes is also exponential in the size of the terms in the sentence to be stored. This can lead to a huge number of indices. At some point, the benefits of indexing are outweighed by the costs of storing and maintaining all the indices. We can respond by adopting a fixed policy, such as maintaining indices only on keys composed of a predicate plus each argument, or by using an adaptive policy that creates indices to meet the demands of the kinds of queries being asked. For most A1 systenas, the number of facts to be stored is small enough that efficient indexing is considered a solved problem. For industrial and commercial databases, the problem has received substantral technology development. A hash table is a data structure for storing and retrieving information indexed by fixed keys. For practical purposes, a hash table can be considered to have constant storage and retrieval times, even when the table contains a very large number of items. 280 Chapter 9. Inference in First-Order Logic Employs(John, John) I (a) (b) Figure 9.2 (a) The subsumption lattice whose lowest node is the sentence Employs(AIMA.org, Richard). (b) The subsumption lattice for the sentence Employs(John, John). A forward-chaining algorithm for propositional definite clauses was given in Section 7.5. The idea is simple: start with the atomic sentences in the knowledge base and apply Modus Ponens in the forward direction, adding new atomic sentences, until no further inferences can be made. Here, we explain how the algorithm is applied to first-order definite clauses and how it can be implemented efficiently. Definite clauses such as Situation + Response are especially useful for systems that make inferences in response to newly arrived information. Many systems can be defined this way, and reasoning with forward chaining can be much more efficient than resolution theorem proving. Therefore it is often worthwhile to try to build a knowledge base using only definite clauses so that the cost of resolution can be avoided. First-order definite clauses First-order definite clauses closely resemble propositional definite clauses (page 217): they are disjunctions of literals of which exactly one is positive. A definite clause either is atomic or is an implication whose antecedent is a conjunction of positive literals and whose conse- quent is a single positive literal. The following are first-order definite clauses: King (x) A Greedy (x) + Evil (x) . Kzng( John) . Greedy(y) . Unlike propositional literals, first-order literals can include variables, in which case those variables are assumed to be universally quantified. (Typically, we omit universal quantifiers when writing definite clauses.) Definite clauses are a suitable normal form for use with Generalized Modus Ponens. Not every knowledge base can be converted into a set of definite clauses, because of the single-positive-literal restriction, but many can. Consider the following problem: The law says that it is a crime for an American to sell weapons to hostile nations. The country Nono, an enemy of America, has some missiles, and all of its missiles were sold to it by Colonel West, who is American. Section 9.3. Forward Chaining 28 1 We will prove that West is a criminal. First, we will represent these facts as first-order definite clauses. The next section shows how the forward-chaining algorithm solves the prloblem. ". . . it is a crime for an American to sell weapons to hostile nations": "Nono . . . has some missiles." The sentence 3 x Owns (Nono, .rc) A Missile (x) is transformed into two definite clauses by Existential Elimination, introducing a new constant MI : Owns (Nono, MI) Mzssile(Ml) "All of its missiles were sold to it by Colonel West": fissile (x) A Owns (Nono, x) + Sells( West, z, Nono) . (9.6) We will also need to know that missiles are weapons: Missile (x) =+ Weapon (x) (9.7) and we must know that an enemy of America counts as "hostile": Enemy(x, America) + Hostile(x) . (9.8) "West, who is American . . .": American( West) . (9.9) "The country Nono, an enemy of America . . .": Enemy(Nono, Anzerica) . (9.10) This knowledge base contains no function symbols and is therefore an instance of' the class DATALOG of Datalog knowledge blases-that is, sets of first-order definite clauses with no function symbols. We will see that the absence of function symbols makes inference much easier. A simple forward-chaining algorithm The first foiward chaining algorithm we will consider is a very simple one, as shown in Figure 9.3. Starting from the known facts, it triggers all the rules whose premises are satisfied, adding their conclusions lo the known facts. The process repeats until the query is answered (assuming that just one answer is required) or no new facts are added. Notice that a fact is RENAMING not "new" if it is just a renaming of a known fact. One sentence is a renaming of another if they are identical except for the names of the variables. For example, Likes(x. Icecream) and Likes(y, Icecream) are renamings of each other because they differ only in the choice of x or y; their meanings are identical: everyone likes ice cream. We will use our crime problem to illustrate how FOL-FC-ASK works. The implication sentences are (9.3), (9.6), (9.7), and (9.8). Two iterations are required: On the first iteration, rule (9.3) has unsatisfied premises. Rule (9.6) is satisfied with x/Ml), and Sells( West, MI, Nono) is added. Rule (9.7) is satisfied with x/hf), and Weapon(lVll) is added. Rule (9.8) is satisfied with x/Nono, and Hostile(Nono) is added. 282 Chapter 9. Inference in First-Order Logic function FOL-FC-AsK(KB, a) returns a substitution or false inputs: KB, the knowledge base, a set of first-order definite clauses a, the query, an atomic sentence local variables: new, the new sentences inferred on each iteration repeat until new is empty new + ) for each sentence r in KB do (pl A .. . A p, + q) t STANDARDIZE-APART(T) for each B such that SUBST(B, pl A . . . A p,) = SUBST(B, pi A . . . A p;) for some pi, . . . , pk in KB q' t Suss(0, q) if q' is not a renaming of some sentence already in KB or new then do add q' to new h t UNIFY(', a) if q5 is not fail then return 4 add new to KB return false A conceptually straightforward, but very inefficient, forward-chaining algo- Figure 9.3 rithm. On each iteration, it adds to KB all the atomic sentences that can be inferred in one step from the implication sentences and the atomic sentences already in KB. Crinzinal( West) 'i The proof tree generated by forward chaining on the crime example. The initial Figure 9.4 facts appear at the bottom level, facts inferred on the first iteration in the middle level, and facts inferred on the second iteration at the top level. On the second iteration, rule (9.3) is satisfied with x/ West, Y/MI, z/Nono), and Criminal ( West) is added. Figure 9.4 shows the proof tree that is generated. Notice that no new inferences are possible at this point because every sentence that could be concluded by forward chaining is already contained explicitly in the KB. Such a knowledge base is called a fixed point of the inference process. Fixed points reached by forward chaining with first-order definite clauses are similar Section 9.3. Forward Chaining 283 to those for propositional forward chaining (page 219); the principal difference is that a first- order fixed point can include universally quantified atomic sentences. FOL-FC-ASK is easy to analyze. First, it is sound, because every inference is just an application of Generalized Modus Ponens, which is sound. Second, it is complete for definite clause knowledge bases; that is, it answers every query whose answers are entailed by any knowledge base of definite clauses. For Datalog knowledge bases, which contain no function symbols, the proof of completeness is fairly easy. We begin by couriting the number of possible facts that can be added, which determines the maximilm number of iterations. Let k be the maximum arity (number of arguments) of any predicate, p be the number of predicates, and n be the number of constant symbols. Clearly, there can be no more than pszk distinct ground facts, so after this many iterations the algorithm must have reached a fixed point. Then we can make an argument very similar to the proof of completeness for propositional forward chaining. (See page 219.) The details of how to make the transition from propositional to first-order completeness are given for the resolution algorithm in Section 9.5. For general definite clauses with function symbols, FOL-FC-ASK can generate in- finitely many new facts, so we need to be more careful. For the case in which an answer to the query sentence q is entailed, we must appeal to Herbrand's theorem to establish that the algorithm will find a proof. (See Section 9.5 for the resolution case.) If the query has no answer, the algorithm could fail to terminate in some cases. For example, if the knowledge base includes the Peano axioms NatNum(0) 'v' n NatNum(n) + NatNum(S(n)) then forward chaining a'dds NatNum(S(O)), NatNum,(S(S(O))), NatNum(S(S(S(O)))), and so on. This problem is unavoidable in general. As with general first-order logic, entail- ment with definite clauses is semidecidable. Efficient forward chaining The forward chaining algorithm in Figure 9.3 is designed for ease of understanding rather than for efficiency of operation. There are three possible sources of complexity. First, the "inner loop" of the algorithm involves finding all possible unifiers such that the piremise of a rule unifies with a suitable set of facts in the knowledge base. This is often called pattern PATERNMATCHING matching and can be very expensive. Second, the algorithm rechecks every rule on every iteration to see whether its premises are satisfied, even if very few additions are made to the knowledge base on each iteration. Finally, the algorithm might generate many facts that are irrelevant to the goal. We will address each of these sources in turn. Matching rules against Icnown facts The problem of matching the premise of a rule against the facts in the knowledge base might seem simple enough. For example, suppose we want to apply the rule 284 Chapter 9. Inference in First-Order Logic DzSf(wa, nt) A Dz(wa, sa) A Dzfl(nt, q) A Difl(nt, sa) A Dzfl(q, nsw) A DiSf(q, sa) A Difl(nsw, v) A Dz(nsw, sa) A Dzfl(v, sa) + Colorable() Dig (Red, Blue) Difl (Red, Green) Dzff (Green, Red) Di( Green, Blue) Dzjf (Blue, Red) Dzff(Blue, Green) (a) (b) Figure 9.5 (a) Constraint graph for coloring the map of Australia (from Figure5.1). (b) The map-coloring CSP expressed as a single definite clause. Note that the domains of the variables are defined implicitly by the constants given in the ground facts for Dzff. Then we need to find all the facts that unify with Mzsszle(x); in a suitably indexed knowledge base, this can be done in constant time per fact. Now consider a rule such as Mzssile(x) A Owns(Nono, x) + Sells( West, x, Nono) . Again, we can find all the objects owned by Nono in constant time per object; then, for each object, we could check whether it is a missile. If the knowledge base contains many objects owned by Nono and very few missiles, however, it would be better to find all the missiles first CONJUNCT and then check whether they are owned by Nono. This is the conjunct ordering problem: ORDERING find an ordering to solve the conjuncts of the rule premise so that the total cost is minimized. It turns out that finding the optimal ordering is NP-hard, but good heuristics are available. For example, the most constrained variable heuristic used for CSPs in Chapter 5 would suggest ordering the conjuncts to look for missiles first if there are fewer missiles than objects that are owned by Nono. The connection between pattern matching and constraint satisfaction is actually very close. We can view each conjunct as a constraint on the variables that it contains-for ex- ample, Mzsszle(x) is a unary constraint on x. Extending this idea, we can express every finite-domain CSP as a single dejnite clause together with some associated ground facts. Consider the map-coloring problem from Figure 5.1, shown again in Figure 9.5(a). An equiv- alent formulation as a single definite clause is given in Figure 9.5(b). Clearly, the conclusion Colorable() can be inferred only if the CSP has a solution. Because CSPs in general include 3SAT problems as special cases, we can conclude that matching a dejnite clause against a set of facts is NP-hard. It might seem rather depressing that forward chaining has an NP-hard matching problem in its inner loop. There are three ways to cheer ourselves up: Section 9.3. Forward Chaining 285 We can remind ourselves that most rules m real-world knowledge bases are small and simple (like the rules in our crime example) rather than large and complex (like the CSP formulation in Figure 9.5). It is common in the database world to assume that both the sizes of rules and the arities of predicates are bounded by a constant and to worry DATA COMPLEXITY only about data complexity-that is, the complexity of inference as a function of the number of ground facts in the database. Ilt is easy to show that the data complexity of forward chaining is polynomial. We can consider subclasses of rules for which matching is efficient. Essentially every Datalog clause can be viewed as defining a CSP, so matching will be tractable just when the corresponding CSP is tractable. Chapter 5 describes several tractable families of CSPs. For exainple, if the constraint graph (the graph whose nodes are variables and whose links are constraints) forms a tree, then the CSP can be solved in linear time. Exactly the same result holds for rule matching. For instance, if we remove South Australia from the map in Figure 9.5, the resulting clause is Di;Tf(wa, nt) A Di(nt, q) A Dzfl(q, nsw) A Dzfl((nszo, v) + Colorable() which corresponds to the reduced CSP shown in Figure 5.11. Algorithms for solving tree-structured CSPs can be applied directly to the problem of rule matching. We can work hard to eliminate redundant rule matching attempts in the forward chain- ing algorithm, which is the subject of the next section. Incremental forward chaining When we showed how forward chaining works on the crime example, we cheated; in partic- ular, we omitted some of the rule matching done by the algorithm shown in Figure 9.3. For example, on the second iteration, the rule matches against Mzssile(Ml) (again), and of course the conclusion Weapon(n/Il) is already known so nothing happens. Such redundant rule matching can be avoided if we make the following observation: Every new fact inferred on iteration t must be derived from at least one new fact inferred on iteration t - 1. This is true because any inference that does not require a new fact from iteration t - 1 could have been done at iteration t - 1 already. This observation leads naturally to an incremental forward chaining algorithm where, at iteration t, we check a rule only if its premise includes a conjunct p, that unifies with a fact p: newly inferred at iteraton t - 1. The rule matching step then fixes p, to match with p:, but allows the other conjuncts of the rule to match with facts from any previous iteration. This algorithm generates exactly the same facts at each iteration as the algorithm in Figure 9.3, but is much more efficient. With suitable indexing, it is easy to identify all the rules that can be triggered by any given fact, and indeed marly real systems operate in an "update" mode wherein forward chain- Ing occurs in response to each new fact that is TELL to the system. Inferences cascade through the set of rules until the fixed point is reached, and then the process begins again for the next new fact. 286 Chapter 9. Inference in First-Order Logic Typically, only a small fraction of the rules in the knowledge base are actually triggered by the addition of a given fact. This means that a great deal of redundant work is done in con- structing partial matches repeatedly that have some unsatisfied premises. Our crime example is rather too small to show this effectively, but notice that a partial match is constructed on the first iteration between the rule American(x) A Weapon(y) A Sells (x, y, z) A Hostile(z) + Criminal(x) and the fact American( West). This partial match is then discarded and rebuilt on the second iteration (when the rule succeeds). It would be better to retain and gradually complete the partial matches as new facts arrive, rather than discarding them. 3 g RETE The rete al orithm was the first to address this problem seriously. The algorithm preprocesses the set of rules in the knowledge base to construct a sort of dataflow network in which each node is a literal from a rule premise. Variable bindings flow through the network and are filtered out when they fail to match a literal. If two literals in a rule share a variable- for example, Sells (x, y, z) A Hostile(z) in the crime example-then the bindings from each literal are filtered through an equality node. A variable binding reaching a node for an n- ary literal such as Sells(x, y, z) might have to wait for bindings for the other variables to be established before the process can continue. At any given point, the state of a rete network captures all the partial matches of the rules, avoiding a great deal of recomputation. Rete networks, and various improvements thereon, have been a key component of so- PRODUCTION called production systems, which were among the earliest forward chaining systems in SYSTEMS widespread use.4 The XCON system (originally called R1, McDermott, 1982) was built us- ing a production system architecture. XCON contained several thousand rules for designing configurations of computer components for customers of the Digital Equipment Corporation. It was one of the first clear commercial successes in the emerging field of expert systems. Many other similar systems have been built using the same underlying technology, which has been implemented in the general-purpose language OPS-5. COGNITIVE Production systems are also popular in cognitive architectures-that is, models of hu- ARCHITECTURES man reasoning-such as ACT (Anderson, 1983) and SOAR (Laird et al., 1987). In such sys- tems, the "working memory" of the system models human short-term memory, and the pro- ductions are part of long-term memory. On each cycle of operation, productions are matched against the working memory of facts. A production whose conditions are satisfied can add or delete facts in worlung memory. In contrast to the typical situation in databases, production systems often have many rules and relatively few facts. With suitably optimized matching technology, some modern systems can operate in real time with over a million rules. Irrelevant facts The final source of inefficiency in forward chaining appears to be intrinsic to the approach and also arises in the propositional context. (See Section 7.5.) Forward chaining makes all allowable inferences based on the known facts, even if they are irrelevant to the goal at hand. In our crime example, there were no rules capable of drawing irrelevant conclusions, Rete is Latin for net. The English pronunciation rhymes with treaty. The word production in production systems denotes a condition-action rule. Section 9.4. Backward Chaining so the lack of directedness was not a problem. In other cases (e.g., if we have several rules describing the eating habits of Americans and the prices of missiles), FOL-FC-ASK will generate many irrelevant conclusions. One way to avoid drawing irrelevant conclusions is to use backward chaining, as de- scribed in Section 9.4. .Another solution is to restrict forward chaining to a selected subset of rules; this approach was discussed in the propositional context. A third approach has emerged in the deductivie database community, where forward chaining is the standard tool. the rule set, using information from the goal, so that only relevant The idea is to rewrite MAGIC SET variable bindings-those belonging to a so-called magic set-are considered during forward inference. For example, if the goal is Criminal ( West), the rule that concludes Criminal (x) will be rewritten to include an extra conjunct that constrains the value of x: Magic(x) A Amerzean(z) A Weapon(y) A Sells(x, y, z) A Hostile(z) + Crimz;lal(x) . The fact Magic( West) is also added to the KB. In this way, even if the knowledge base contains facts about millions of Americans, only Colonel West will be considered during the forward inference process. The complete process for defining magic sets and rewriting the knowledge base is too complex to go into here, but the basic idea is to perform a sort of "generic" backward inference from the goal in order to work out which variable bindings need to be constrained. The magic sets approach can therefore be thought of as a kind of hybrid between forward inference and backward preprocessing. The second major family of logical inference algorithms uses the backward chaiining ap- proach introduced in Section 7.5. These algorithms work backward from the goal, chaining through rules to find known facts that support the proof. We describe the basic algonthm, and logic programming, which is the most widely useld form of then we describe how it is used in automated reasoning. We will also see that backward chaining has some disadvantages com- pared with forward chaining, and we look at ways to overcome them. Finally, we will look at the close connection between logic programming and constraint satisfaction problems. A backward chaining algorithm Figure 9.6 shows a simple backward-chaining algorithm, FOL-BC-ASK. It is called with a list of goals containing a single element, the original query, and returns the set of all substi- tutions satisfying the queiry. The list of goals can be thought of as a "stack" waiting to be .worked on; if all of them can be satisfied, then the current branch of the proof succeleds. The algorithm takes the first goal in the list and finds every clause in the knowledge base whose positive literal, or head, unifies with the goal. Each such clause creates a new recursive call in which the premise, or body, of the clause is added to the goal stack. Remember that facts are clauses with a head but no body, so when a goal unifies with a known fact, no rzew sub- goals are added to the stack and the goal is solved. Figure 9.7 is the proof tree for deriving t7rzminal( West) from sentences (9.3) through (9.10). 288 Chapter 9. Inference in First-Order Logic function FOL-BC-AsK(KB, goals, 0) returns a set of substitutions inputs: KB, a knowledge base goals, a list of conjuncts forming a query (0 already applied) 0, the current substitution, initially the empty substitution local variables: answers, a set of substitutions, initially empty if goals is empty then return (0) q' +- Suss(0, Frs(goa1s)) for each sentence r in KB where STANDARDIZE- APART() = (pl A . . . A p, + q) and 0' +- UNIFY(, q') succeeds new-goals +- pl, . . . , p,IRST(goals) answers + FOL-BC-AsK(KB, new-goals, COMPOSE(O', 0)) U answers return answers - Figure 9.6 A simple backward-chaining algorithm. American(West) Weaponb) Hostile(Nono) 1 1 Missiieb) ( ( Missiie(MI) I / Owns(Nono, MI) 1 I Enemy (Nono,America) I Figure 9.7 Proof tree constructed by backward chaining to prove that West is a criminal. The tree should be read depth first, left to right. To prove Criminal( West), we have to prove the four conjuncts below it. Some of these are in the knowledge base, and others require further backward chaining. Bindings for each successful unification are shown next to the corresponding subgoal. Note that once one subgoal in a conjunction succeeds, its substitution is applied to subsequent subgoals. Thus, by the time FOL-BC-ASK gets to the last conjunct, originally Hostile(z), z is already bound to Nono. The algorithm uses composition of substitutions. COMPOSE(Q, Q2) is the substitution COMPOSITION whose effect is identical to the effect of applying each substitution in turn. That is, In the algorithm, the current variable bindings, which are stored in 8, are composed with the bindings resulting from unifying the goal with the clause head, giving a new set of current bindings for the recursive call. Section 9.4. Backward Chaining 289 Backward chaining, as we have written it, is clearly a depth-first search algorithm. This means that its space requirements are linear in the size of the proof (neglecting, for now, the space required to accumulate the solutions). It also means that backward chaining (unlike forward chaining) suffers from problems with repeated states and incompleteness. We will discuss these problems and some potential solutions, but first we will see how backward chaining is used in logic programming systems. Logic programming Logic programming is a technology that comes fairly close to embodying the declarative ideal described in Chapter 7: that systems should be constructed by expressing knowledge in a formal language and that problems should be solved by running inference processes on that knowledge. The ideal is summed up in Robert Kowalski's equation, Algorithm = Logic 4 Control . PROLOG Prolog is by far the most widely used logic programming language. Its users numiber in the hundreds of thousands. It is used primarily as a rapid-prototyping language and for symbol- manipulation tasks such as writing compilers (Van Roy, 1990) and parsing natural language (Pereira and Warren, 1980). Many expert systems have been written in Prolog for legal, medical, financial, and other domains. Prolog programs are sets of definite clauses written in a notation somewhai different from standard first-order. logic. Prolog uses uppercase letters for variables and lowercase for constants. Clauses are written with the head preceding the body; " : -" is used for left- implication, commas sep,arate literals in the body, and a period marks the end of a sentence: Prolog includes "syntactic sugar" for list notation and arithmetic. As an example, here is a Prolog program for append (X, Y, Z ) , which succeeds if list Z is the result of appending lists x and Y: In English, we can read these clauses as (1) appending an empty list with a list Y produces the same list Y and (2) A I Z I is the result of appending A 1 XI onto Y, provided that z is the result of appending X onto Y. This definition of append appears fairly similar to the corresponding definition in Lisp, but is actually much more powerful. For example, we can ask the query append (A, B , 1,2 I ) : what two lists can be appended to give 1,2 1 ? We get back the solutions The execution of Prolog programs is done via depth-first backward chaining, where clauses are tried in the order in which they are written in the knowledge base. Some: aspects of Prolog fall outside standard logical inference: 290 Chapter 9. Inference in First-Order Logic There is a set of built-in functions for arithmetic. Literals using these function symbols are "proved by executing code rather than doing further inference. For example, the goal "X is 4 +3" succeeds with x bound to 7. On the other hand, the goal "5 is X+Y" fails, because the built-in functions do not do arbitrary equation solving.5 There are built-in predicates that have side effects when executed. These include input- output predicates and the as ser tlre t rac t predicates for modifying the knowledge base. Such predicates have no counterpart in logic and can produce some confusing effects-for example, if facts are asserted in a branch of the proof tree that eventually fails. Prolog allows a form of negation called negation as failure. A negated goal not P is considered proved if the system fails to prove p. Thus, the sentence alive (X) : - not dead(X) . can be read as "Everyone is alive if not provably dead." Prolog has an equality operator, =, but it lacks the full power of logical equality. An equality goal succeeds if the two terms are unzable and fails otherwise. So X+Y=2+3 succeeds with x bound to 2 and Y bound to 3, but morningstar=eveningstar fails. (In classical logic, the latter equality might or might not be true.) No facts or rules about equality can be asserted. The occur check is omitted from Prolog's unification algorithm. This means that some unsound inferences can be made; these are seldom a problem except when using Prolog for mathematical theorem proving. The decisions made in the design of Prolog represent a compromise between declarativeness and execution efficiency-inasmuch as efficiency was understood at the time Prolog was designed. We will return to this subject after looking at how Prolog is implemented. Efficient implementation of logic programs The execution of a Prolog program can happen in two modes: interpreted and compiled. Interpretation essentially amounts to running the FOL-BC-ASK algorithm from Figure 9.6, with the program as the knowledge base. We say "essentially," because Prolog interpreters contain a variety of improvements designed to maximize speed. Here we consider only two. First, instead of constructing the list of all possible answers for each subgoal before continuing to the next, Prolog interpreters generate one answer and a "promise" to generate the rest when the current answer has been fully explored. This promise is called a choice CHOICE POINT point. When the depth-first search completes its exploration of the possible solutions arising from the current answer and backs up to the choice point, the choice point is expanded to yield a new answer for the subgoal and a new choice point. This approach saves both time and space. It also provides a very simple interface for debugging because at all times there is only a single solution path under consideration. Second, our simple implementation of FOL-BC-ASK spends a good deal of time gen- erating and composing substitutions. Prolog implements substitutions using logic variables Note that if the Peano axioms are provided, such goals can be solved by inference within a Prolog program Section 9.4. Backward Chaining 29 1 I procedure APPEND(, y, ar, continuation) I tra.il t GLOBAL-TRAIL-POINTER() if ax = I and UNIFY(Y, az) then CAL(continuation) RESET-TRAIL(T) a t NEW-VARIABLE(); x t NEW-VARIABLE(); z t NEW-VA,RIABLE() if UNIFY(X, a I x) and UNIFY(, a 1 z) then APPEND(X, y, z, continuation) Figure 9.8 Pseudocode representing the result of compiling the Append predicate. The function NEW-VARIABLE returns a new variable, distinct from all other variables so far used. The procedure CAL(continuati0n) continues execution with the specified continuatlion. that can remember their current binding. At any point in time, every variable in the program either is unbound or is bound to some value. Together, these variables and values implicitly define the substitution for the current branch of the proof. Extending the path can only add new variable bindings, because an attempt to add a different binding for an already bound variable results in a failure of unification. When a path in the search fails, Prolog will back up to a previous choice point, and then it might have to unbind some variables. This is done TRAIL by keeping track of all the variables that have been bound in a stack called the trail. As each new variable is bound by UNIFY-VAR, the variable is pushed onto the trail. When a goal fails and it is time to back up to a previous choice point, each of the variables is unbound as it is removed from the trail. Even the most efficient Prolog interpreters require several thousand machine instruc- tions per inference step because of the cost of index lookup, unification, and building the recursive call stack. In effect, the interpreter always behaves as if it has never seen1 the pro- gram before; for example, it has to find clauses that match the goal. A compiled Prolog program, on the other hand, is an inference procedure for a specific set of clauses, so it knows what clauses match the goal. Prolog basically generates a miniature theorem prover for each different predicate, thereby eliminating much of the overhead of interpretation. It is also pos- OPENCODE sible to open-code the unification routine for each different call, thereby avoiding explicit analysis of term structure. (For details of open-coded unification, see Warren et al. (1977).) The instruction sets of today's computers give a poor match with Prolog's semantics, so most Prolog compilers compile into an intermediate language rather than directly into ma- chine language. The most popular intermediate language is the Warren Abstract Machine, or 'WAM, named after David H. D. Warren, one of the implemenrors of the first Prolog com- piler. The WAM is an abstract instruction set that is suitable for Prolog and can be either interpreted or translated into machine language. Other compilers translate Prolog into a high- level language such as Lisp or C and then use that language's compiler to translate to imachine language. For example, the definition of the Append predicate can be compiled into the code shown in Figure 9.8. There are several points worth mentioning: Rather than having to search the knowledge base for Append clauses, the claiuses be- come a procedure and the inferences are carried out simply by calling the procedure.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.