Hierarchical task network planning in Artificial Intelligence

hierarchical task network planning in artificial intelligence, adversarial hierarchical-task network planning for complex real-time games pdf free download
HalfoedGibbs Profile Pic
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
PLANNING AND ACTING 1 2 IN THE REAL WORLD In which we see how more expressive representations and more interactive agent architectures lead to planners that are useful in the real world. The previous chapter introduced the most basic concepts, representations, and algo- rithms for planning. Planners that are are used in the real world for tasks such as scheduling Hubble Space Telescope observations, operating factories, and handling the logistics for mil- itary campaigns are more complex; they extend the ba.sics in terms both of the representation language and of the way the planner interacts with the e:nviroiiment. This chapter shows how. Section 12.1 describes planning and scheduling with time and resource constraints, and Sec- tion 12.2 describes planning with predefined subplans. Sections 12.3 to 12.6 present a series of agent architectures designed to deal with uncertain environments. Section 12.7 shows how to plan when the environment contains other agents. 12.1 TIME, SCHEDULES, AND RESOURCES The STRIPS representation talks about what actions do, but, because the representation is based on situation calculus, it cannot talk about how long an action takes or even about when an action occurs, except to say that it is before or after another action. For some domains, we would like to talk about when actions begin and end. For example, in the cargo delivery domain, we might like to know when the plane carrying some cargo will arrive, not just that it will arrive when it is done flying. JOB SHOP Time is of the essence in the general family of appllications called job shop scheduling. SCHEDULING Such tasks require completing a set of jobs, each of whi.ch consists of a sequence of actions, where each action has a given duration and might require some resources. The problem is to determine a schedule that minimizes the total time required to complete all the jobs, while respecting the resource constraints. An example of a job shop scheduling problem is given in Figure 12.1. This is a highly simplified automobile assembly problem. There are two jobs: assembling cars C1 and C2. Each job consists of three actions: adding the engine, adding the wheels, and inspecting the 418 Chapter 12. Planning and Acting in the Real World Init ( Chassis (C1) A Chassis (Cz) 1 A Engine (El, Cl ,30) A Engine (E2, C2,60) A Wheels(Wl, Cl, 30) A Wheels(W2, C2, 15)) I Goal (Done (C1) A Done (C2)) Action(AddEngine (e, c), PRECOND: Engine(e, c, d) A Chassis (c) A 1 EngineIn(c) , EFFECT: EngineIn(c) A Duration(d)) Action(Add Wheels (w , c) , PRECOND: Wheels (w , c, d) A Chassis (c) A EngineIn (c) , EFFECT: WheelsOn(c) A Duration(d)) Action(Inspect (c), PRECOND: EngineIn(c) A WheelsOn(c) A Chassis (c) : EFFECT: Done (c) A Duration (lo)) Figure 12.1 A job shop scheduling problem for assembling two cars. The notation Duration(d) means that an action takes d minutes to execute. Engine(E1, CI, 60) means that El is an engine that fits into chassis C1 and takes 60 minutes to install. results. The engine must be put in first (because having the front wheels on would inhibit access to the engine compartment) and of course the inspection must be done last. The problem in Figure 12.1 can be solved by any of the planners we have already seen. Figure 12.2 (if you ignore the numbers) shows the solution that the partial-order planner POP would come up with. To make this a scheduling problem rather than a planning problem, we must now determine when each action should begin and end, based on the durations of actions as well as their ordering. The notation Duration(d) in the effect of an action (where d must be bound to a number) means that the action takes d minutes to complete. Given a partial ordering of actions with durations, as in Figure 12.2, we can apply the M CR'TrCALPATH ETHOD critical path method (CPM) to determine the possible start and end times of each action. A path through a partial-order plan is a linearly ordered sequence of actions beginning with Start and ending with Finish. (For example, there are two paths in the partial-order plan in Figure 12.2.) The critical path is that path whose total duration is longest; the path is "critical" CRITICAL PATH because it determines the duration of the entire plan-shortening other paths doesn't shorten the plan as a whole, but delaying the start of any action on the critical path slows down the whole plan. In the figure, the critical path is shown with bold lines. To complete the whole plan in the minimal total time, the actions on the critical path must be executed with no delay between them. Actions that are off the critical path have some leeway-a window of time in which they can be executed. The window is specified in terms of an earliest possible start time, ES, and a latest possible start time, LS. The quantity LS - ES is known as the slack of SLACK an action. We can see in Figure 12.2 that the whole plan will take 85 minutes, that each action on the critical path has 0 slack (this will always be the case) and that each of the actions in the assembly of C1 have a 15-minute window in which they can be started. Together the ES and LS times for all the actions constitute a schedule for the problem. SCHEDULE Section 12.1. Time. Schedules. and Resources 419 - 30 30 O,Ol Start Finish Addwheels2 Inspect2 - - Figure 12.2 A solution to the job shop scheduling problem from Figure 12.1. At the top, the solution is given as a partial-order plan. The duration of each action is given at the bottom of each rectangle, with the earliest and latest start time listed as ES, LS in the upper left. The difference between these two numbers is the slack of an action; actions with zero slack are on the critical path, shown with bold arrows. At the bottom of the figure, the same solution is shown as a timeline. Grey rectangles represent time ilntervals during which an action may be executed, provided that the ordering constraints are respected. The unoccupied portion of a gray rectangle indicates the slack. The following formulas serve as a definition for ES and LS and also as the outline of a dynamic programming algorithm to compute them: The idea is that we start by assigning ES(Start) to be 0. Then as soon as we get an action B such that all the actions that come immediately before B have ES values assigned, we set ES(B) to be the maximum of the earliest finish times of those immediately preceding actions, where the earliest finish time of an action is defined as the earliest start time plus the duration. This process repeats until every action has been assigned an ES value. The LS values are computed in a similar manner, working backwards from the Finish action. The details are left as an exercise. The complexity of the critical path algorithm is just O(Nb), where N is the number of actions and b is the maximum branching factor into or out of an action. (To see this, 420 Chapter 12. Planning and Acting in the Real World note that the LS and ES computations are done once for each action, and each computation iterates over at most b other actions.) Therefore, the problem of finding a minimum-duration schedule, given a partial ordering on the actions, is quite easy to solve. Scheduling with resource constraints RESOURCES Real scheduling problems are complicated by the presence of constraints on resources. For example, adding an engine to a car requires an engine hoist. If there is only one hoist, then we cannot simultaneously add engine El to car C1 and engine E2 to car Cz; hence, the schedule shown in Figure 12.2 would be infeasible. The engine hoist is an example of a reusable REUSABLE resource-a resource that is "occupied" during the action but that becomes available again RESOURCE when the action is finished. Notice that reusable resources cannot be handled in our standard description of actions in terms of preconditions and effects, because the amount of resource available is unchanged after the action is completed.' For this reason, we augment our repre- sentation to include a field of the form RESOURCE: R(k), which means that k units of resource R are required by the action. The resource requirement is both a prerequisite-the action can- not be performed if the resource is unavailable-and a temporary effect, in the sense that the availability of resource r is reduced by k for the duration of the action. Figure 12.3 shows how to extend the engine assembly problem to include three resources: an engine hoist for installing engines, a wheel station for putting on the wheels, and two inspectors. Figure 12.4 shows the solution with the fastest completion time, 115 minutes. This is longer than the 85 minutes required for a schedule without resource constraints. Notice that there is no time at which both inspectors are required, so we can immediately move one of our two inspectors to a more productive position. Inspectors (2), rather The representation of resources as numerical quantities, such as than as named entities, such as Inspector(Il) and Inspector(12), is an example of a very AGGREGATION general technique called aggregation. The central idea of aggregation is to group individual objects into quantities when the objects are all indistinguishable with respect to the purpose at hand. In our assembly problem, it does not matter which inspector inspects the car, so there is no need to make the distinction. (The same idea works in the missionaries-and-cannibals problem in Exercise 3.9.) Aggregation is essential for reducing complexity. Consider what happens when a schedule is proposed that has 10 concurrent Inspect actions but only 9 in- spectors are available. With inspectors represented as quantities, a failure is detected imme- diately and the algorithm backtracks to try another schedule. With inspectors represented as individuals, the algorithm backtracks to try all lo ways of assigning inspectors to Inspect actions, to no avail. Despite their advantages, resource constraints make scheduling problems more compli- cated by introducing additional interactions among actions. Whereas unconstrained schedul- ing using the critical-path method is easy, finding a resource-constrained schedule with the earliest possible completion time is NP-hard. This complexity is often seen in practice as well as in theory. A challenge problem posed in 1963- to find the optimal schedule for a In contrast, consumable resources, such as screws for assembling the engine, can be handled within the standard framework; see Exercise 12.2. Section 12.1. Time, Schedules, and Resources 42 1 Init (Chassis (C1) A Chassis(C2) A Engine(E, C1, 30) A Engine(E2, C2,60 A Wheels(W1, Cl, 30) A Wheels(W2, Cz, 15) A EngineHoists(1) A WheelStations(1) A Jnspectors(2)) Goal(Done(C1) A Done(C2)) Action(AddEngine(e, c), PR0D:Engine(e, c, d) A Chassis(c) A 1 EngineIn(c), EFFECT:E?LI(C) A Duration(d) , RS0:EngineHoisCs (1)) Aetion(Add Wheels(w, c), PRECOND: Wheels(w, c, d) A Chassis(c) A Engin,eIn(c), EFFECT: WheelsOn(c) A Duration(d), RESOURCE: WheelStations(1)) Action(Inspect (c), PRC0:EngzneIn(c) A WheelsOn(c), El?EcT:Done(c) A Duration(l0) , RS01C:Inspectors (1)) Job shop scheduling problem for assembling two cars, with resources. The Figure 12.3 available resources are one engine assembly station, one wheel assembly station, and two inspectors. The notation RESOURCE:?- means that the resource r is used during execution of an action, but becomes free again when the action is complete. Figure 12.4 A solution to the job shop scheduling problem with resources from Fig- The left-hand margin lists the three resources, and actions are shown aligned ure 12.3. horizontally with the resources they consume. There are two possible schedules, depend- ing on which assembly uses the engine station first; we've shown the optimal solution, which takes 1 15 minutes. problem involving just 10 machines and 10 jobs of 100 actiolns each-went unsolved for 23 years (Lawler et al., 1993). Many approaches have been tried, including branch-and-bound, simulated annealing, tabu search, constraint satisfaction, and other techniques from Part 11. MINIMUM SLACK One simple but popular heuristic is the minimum slack algolrithm. It schedules actions in a greedy fashion. On each iteration, it considers the unscheduled actions that have had all their predecessors scheduled and schedules the one with the least slack for the earliest possible start. It then updates the ES and LS times for each affected action and repeats. The heuristic 422 Chapter 12. Planning and Acting in the Real World is based on the same principle as the most-constrained-variable heuristic in constraint satis- faction. It often works well in practice, but for our assembly problem it yields a 130-minute solution, not the 115-minute solution of Figure 12.4. The approach we have taken in this section is "plan first, schedule later": that is, we divided the overall problem into a planning phase in which actions are selected and partially ordered to meet the goals of the problem, and a later scheduling phase, in which temporal in- formation is added to the plan to ensure that it meets resource and deadline constraints. This approach is common in real-world manufacturing and logistical settings, where the planning phase is often performed by human experts. When there are severe resource constraints, how- ever, it could be that some legal plans will lead to much better schedules than others. In that case, it makes sense to integrate planning and scheduling by taking into account durations and overlaps during the construction of a partial-order plan. Several of the planning algorithms in Chapter 11 can be augmented to handle this information. For example, partial-order planners can detect resource constraint violations in much the same way that they detect conflicts with causal links. Heuristics can be modified to estimate the total completion time of a plan, rather than just the total cost of the actions. This is currently an active area of research. One of the most pervasive ideas for dealing with complexity is hierarchical decomposition. fN Complex software is created from a hierarchy of subroutines or object classes, armies operate as a hierarchy of units, governments and corporations have hierarchies of departments, sub- sidiaries, and branch offices. The key benefit of hierarchical structure is that, at each level of the hierarchy, a computational task, military mission, or administrative function is reduced to a small number of activities at the next lower level, so that the computational cost of finding the correct way to arrange those activities for the current problem is small. Nonhierarchical methods, on the other hand, reduce a task to a large number of individual actions; for large- scale problems, this is completely impractical. In the best case-when high-level solutions always turn out to have satisfactory low-level implementations-hierarchical methods can result in linear-time instead of exponential-time planning algorithms. HIERARCHICAL TASK This section describes a planning method based on hierarchical task networks or NETWORK HTNs. The approach we take combines ideas from both partial-order planning (Section 1 1.3) and the area known as "HTN planning." In HTN planning, the initial plan, which describes the problem, is viewed as a very high-level description of what is to be done-for exam- ple, building a house. Plans are refined by applying action decompositions. Each action A DECOMPOSITION CTION decomposition reduces a high-level action to a partially ordered set of lower-level actions. Action decompositions, therefore, embody knowledge about how to implement actions. For example, building a house might be reduced to obtaining a permit, hiring a contractor, doing the construction, and paying the contractor. (Figure 12.5 shows such a decomposition.) The PRIMITIVE ACTION process continues until only primitive actions remain in the plan. Typically, the primitive actions will be actions that the agent can execute automatically. For a general contractor, Section 12.2. Hierarchical Task Network Planning 423 7 "install landscaping ' might be primitive because it simply involves calling the landscaping contractor. For the landscaping contractor, actions such as '"plant rhododendrons here" might be considered primitive. In "pure" HTN planning, plans are generated only by successive action decomposi- tions. The HTN therefore views planning as a process of making ari activity description more concrete, rather than (as in the case of state-space and partial-order planning) a process of constructing an activity description, starting from the empty activity. It turns out that every STRIPS action description can be turned into an action decomposition (see Exercise 12.6), and that partial-order planning can be viewed as a special case of pure HTN planning. For certain tasks, howeverespecially "novel" conjunctive goals-the pure HTN viewpoint is rather unnatural, and we prefer to take a hybrid approach in which action decompositions are -order planning, in addition to the standard operations of used as plan refinements in partial establishing an open condition and resolving conflicts by adding ordering constraints. (View- ing HTN planning as an extension of partial-order plainning has the additional advantage that we can use the same notational conventions instead of introducing a whole new set.) We be- gin by describing action decomposition in more detail. Then we explain how the partial-order planning algorithm must be modified to handle decornpositions, and finally we discuss issues of completeness, complexity, and practicality. Representing action decompositions PLAN LIBRARY General descriptions of action decomposition methods are stored in a plan library, from which they are extracted and instantiated to fit the needs of the plan being constructed. Each method is an expression of the form Decompose(a, d). This says that an action a can be decomposed into the plan d, which is represented as a partial-order plan, as described in Section 1 1.3. Building a house is a nice, concrete example, so we will use it to illustrate the concept of action decomposition. Figure 12.5 depicts one possible decomposition of the BwildHouse action into four lower-level actions. Figure 12.6 shows some of the action descriptions for the domain, as well as the decomposition for BuildHouse as it would appear in the plan library. There might be other possible decompositions in the lilbrary. The Start action of the decomposition supplies all those preconditions of actions in the EXTERNAL plan that are not supplied by other actions. We call these i.he external preconditions. In PRECoNolTloNs our example, the external preconditions of the deconnposition are Land and Mon,ey. Sim- EXTERNALEFFECT ilarly, the external effects, which are the preconditions of Finish, are all those effects of actions in the plan that are not negated by other actions. In our example, the external effects of BuildHouse are House and money. Some HTN planners also distinguish between pri- PRIMARY EFFECT mary effects, such as House, and secondary effects, such as money. Only primary effects SECONDARY EFFECT may be used to achieve goals, whereas both kinds of effects might cause conflicts with other actions; this can greatly reduce the search space.2 It could also prevent the discovery of unexpected plans. For example, a person facing bankruptcy proceedings can eliminate all liquid assets (i.e., achieve money) by buying or building a house. This plan is useful because current law precludes the seizure of a primary residence by creditors. 424 Chapter 12. Planning and Acting in the Real World land House House 1 decomposes to Get Permit Finish Hire Builder Figure 12.5 One possible decomposition for the BuildHouse action. Action(BuyLand, PRECOND:MO, Ewc:Land A 1 Money) Action(GetLoan, PRECOND: GoodCredit, EFFECT:M A Mortgage) Action(BuildHouse, PRECOND:LU, EFFECT:HOUS) Action( GetPermit, PRECOND: Land, EFFECT: Permit) Action(HireBuilder, EFFECT: Contract) Action(Construction, PREcoND:Pz A Contract, EEC:HouseBuzlt A 7 Permit) Action(PayBuilder, PEc0D:Money A HouseBuilt, EFFECT: 7 Money A House A 7 Contract) Decompose(BuildHouse, STEPS: S1 : GetPermit, S2 : HireBuilder, S3 : Construction, S4 : PayBuilder Start 3 S2 -: 5'31, ORDERINGS: Start 3 SI 3 S3 + S4 3 Finish, L INKS: Start La"d S1, Start S4, s Permit s3, s2 GO-t s3, s3 HouEuilt s 1 t 4, 7 Money Sq HFinish, Sq - Finish)) Figure 12.6 Action descriptions for the house-building problem and a detailed decompo- sition for the BuildHouse action. The descriptions adopt a simplified view of money and an optimistic view of builders. A decomposition should be a correct implementation of the action. A plan d imple- ments an action a correctly if d is a complete and consistent partial-order plan for the problem of achieving the effects of a given the preconditions of a. Obviously, a decomposition will be correct if it is the result of running a sound partial-order planner. A plan library could contain several decompositions for any given high-level action; for example, there might be another decomposition for BuildHouse that describes a process whereby the agent builds a house from rocks and turf with its own bare hands. Each de- (1 Section 12.2. Hierarchical Task Network Planning 425 composition should be a correct plan, but it could have addlitional preconditions and effects beyond those stated in the high-level action description. For example, the decomposition for BuzldHouse in Figure 12.5 requires Money in addition to Land and has the effect money. The self-build option, on the other hand, doesn't require money, but does require a ready supply of Rocks and Turf, and could result in a BadBack. Given that a high-level action, such as BuildHouse, may have several possible decom- positions, it is inevitable that its STRIPS action description will hide some of the preconditions and effects of those decompositions. The preconditions of the high-level action should be the intersection of the external preconditions of its decompositions, and the effects should be the intersection of the external effects of the decompositions. Put another way, the high-level preconditions and effects are guaranteed to be a subset of the true preconditions and effects of every primitive implementation. Two other forms of information hiding should be noted. First, the high-level description INTERNALEFFECT completely ignores all internal effects of the decompositions. For example, our BuzldHouse decomposition has temporary internal effects Permit and Ctmty-at. Second, the high-level description does not specify the intervals "inside" the activity during which the high-level preconditions and effects must hold. For example, thle Land precondition needs to be true (in our very approximate model) only until Getpermit is performed, and House is true only after PayBuilder is performed. Information hiding of this kind is essential if hierarchical planning is to reduce com- plexity; we need to be able to reason about high-level actions without worrying about myriad details of the implementations. There is, however, a price to pay. For example, conflicts might exist between internal conditions of one high-level action and internal actions of another, but these is no way to detect this from the high-level descriptions. This issue has significant implications for HTN planning algorithms. In a nutshell, whereas primitive actions can be treated as point events by the planning algorithm, high-level actions have temporal extent within which all sorts of things can be going on. Modifying the planner for decompositions We now show how to modify POP to incorporate HTN planning. We do that by modifying the POP successor function (page 390) to allow decomposition methods to be applied to the current partial plan P. The new successor plans are formed by first selecting some nonprimi- tive action a' in P and then, for any Decompose(a, d) method from the plan library such that a and a' unify with substitution 8, replacing a' with d' = SUB ST(, d). Figure 12.7 shows an example. At the top, there is a plan P for getting a house. The high-level action, a' = BuildHouse, is selected for decomposition. The decomposition d is selected from Figure 12.5, and BuildHouse is replaced by this decomposition. An additional step, GetLoan, is then introduced to resolve the new open condition, Money, that is created by the decomposition step. Replacing an action with its deconnposition is a bit like transplant surgery: we have to take the new subplan out of its packaging (the Start and Fin,islz steps), Constrction negates the Permit, otherwise the same permit could be used to build many houses. Unfortu- nately, Construction does not terminate the Contract because we have to PayBuilder first. 426 Chapter 12. Planning and Acting in the Real World Ions? Buy land I Land Build House House I 1 Start Figure 12.7 Decomposition of a high-level action within an existing plan. The BuildHouse action is replaced by the decomposition from Figure 12.5. The external precon- dition Land is supplied by the existing causal link from BuyLand. The external precondition Money remains open after the decomposition step, so we add a new action, GetLoan. insert it, and hook everything up properly. There might be several ways to do this. To be more precise, we have for each possible decomposition d': 1. First, the action a' is removed from P. Then, for each step s in the decomposition d', we need to choose an action to fill the role of s and add it to the plan. It can be either a new instantiation of s or an existing step sf from P that unifies with s. For example, the decomposition of a Make Wine action might suggest that we BuyLand; possibly, we can use the same BuyLand action that we already have in the plan. We call this SUBTASK SHARING subtask sharing. In Figure 12.7, there are no sharing opportunities, so new instances of the actions are created. Once the actions have been chosen, all the internal constraints from d' are copied over-for example, that Getpermit is ordered before Construction and that there is a causal link between these two steps supplying the Permit precondition of Construction. This completes the task of replacing a' with the instantiation of dB. 2. The next step is to hook up the ordering constraints for a' in the original plan to the steps in d'. First, consider an ordering constraint in P of the form B -i a'. How should B be ordered with respect to the steps in d'? The most obvious solution is that B should come before every step in d', and that can be achieved by replacing every constraint of the form Start + s in d' with a constraint B + s. On the other hand, this approach might be too strict For example, BuyLand has to come before BuildHouse, but there is no need for BuyLand to come before HireBuilder in the expanded plan. Imposing an overly strict ordering might prevent some solutions from being found. Therefore, the reason for the constraint; then, best solution is for each ordering constraint to record the when a high-level action is expanded, the new ordering constraints can be as relaxed as possible, consistent with the reason for the original constraint. Exactly the same considerations apply when we are replacing constraints of the form a' 4 C. (1 Section 12.2. Hierarchical Task Network Planning 427 3. The final step is to hook up causal links. If B -5 a' was a causal link in the original plan, replace it by a set of causal links from B to all the steps in dl with preconditions p that were supplied by the Start step in the decomposition d (i.e., to all the steps in d' for which p is an external precondition). In the example, the causal link BuyLand Ld BuildHouse is replaced by the link BuyLand Permit. (The hloney precondition for PayBuilder in the decomposition becomes an open condition, because no action in the original plan supplies Money to BuildHouse.) Similarly, for each causal link a' 5 C in the plan, replace it with a set of causal links to C from whichever step in dl supplies p to the Finish step in the decomposition d (i.e., from the step in dl that has p as an external effect). In the example, we replaice the: link BuildHouse H3e Finish with the link PayBuilder H3 Finish. This completes the additions required for generating deconnpositions in the context of the POP planner.4 Additional modifications to the POP algorithm are required because of the fact that high-level actions hide information about their final primitive implementations. In larticular, the original POP algorithm backtracks with failure if the current plan contains an irresolvable conflict-that is, if an action conflicts with a causal lintk but cannot be ordered either before or after the link. (Figure 1 1.9 shows an example.) With high-level actions, on the other hand, apparently irresolvable conflicts can sometimes be resolved by decomposing the conflicting actions and interleaving their steps. An example is given in Figure 12.8. Thus, it may be decornposition the case that a complete and consistent primitive plan can be obtained by even when no complete and consistent high-level plan exists. This possibility means that a complete HTN planner must forgo many pruning opporiunities that are available to a standard POP planner. Alternatively, we can prune anyway and hope that no solution is missed. Discussion Let's begin with the bad news: pure HTN planning (where the only allowable plan refinement is decomposition) is undecidable, even though the underlying state space isfinite This might seem very depressing, since the point of HTN planning is to gain efficiency. The difficulty arises because action decompositions can be recursive-for example, going for a walk can be implemented by taking a step and then going for a walk- so HTN plans can be arbitrarily long. In particular, the shortest HTN solution could be arbitrarily long, so that there is no way to terminate the search after any fixed time. There are, however, at least three ways to look on the bright side: 1. We can rule out recursion, which very few domains require. In that case, all HTN plans are of finite length and can be enumerated. 2. We can bound the length of solutions we care about. B'ecause the state space is finite, a plan that has more steps than there are states in the state space must include a loop that visits the same state twice. We would lose littl'e by ruling out HTN solutions of this kind, and we would control the search. a There are some additional minor modifications required for handling conflict resolution with high-level ac- tions; the interested reader can consult the papers cited at the end of the chapter. 428 Chapter 12. Planning and Acting in the Real World - - Watch Hair Wafch HappHHe) Finish Hair HappHSh (a) Initial Problem (b) Abstract Inconsistent Plan Watch i comb Comb Wafch li 7 Watch Owe( Watch) 1 Owe(Watch) re i t (On C d ) Happy(She) Watch Give Chain Deliver i Watch (On Credit) Happy(He) a: Hair , o(HM (c) Decomposition of (b) into a Consistent Solution 1 I - - - - - - - - - Figure 12.8 The Gift of the Magi problem, taken from the 0. Henry story, shows an inconsistent abstract plan that nevertheless can be deconlposed into a consistent solution. Part (a) shows the problem: A poor couple has only two prized possessions-he a gold watch and she her beautiful long hair. Each plans to buy a present to make the other happy. He decides to trade his watch to buy a silver comb for her hair, and she decides to sell her hair to get a gold chain for his watch. In (b) the partial plan is inconsistent, because there is no way to order the "Give Comb" and "Give Chain" abstract steps without a conflict. (We assume that the "Give Comb" action has the precondition Hair, because if the wife doesn't have her long hair, the action won't have the intended effect of making her happy, and similarly for the "Give Chain" action.) In (c) we decompose the "Give Comb" step with an "installment plan" method. In the first step of the decomposition, the husband takes possession of the comb and gives it to his wife, while agreeing to deliver the watch in payment at a later date. In the second step, the watch is handed over and the obligation is fulfilled. A similar method decomposes the "Give Chain" step. As long as both giving steps are ordered before the delivery steps, this decomposition solves the problem. (Note that it relies on the problem being defined so that the happiness of using the chain with the watch or the comb with the hair persists even after the possessions are surrendered.) 3. We can adopt the hybrid approach that combines POP and HTN planning. Partial-order planning by itself suffices to decide whether a plan exists, so the hybrid problem is clearly decidable. We need to be a little bit careful with the third answer. POP can string together primitive actions in arbitrary ways, so we might find ourselves with solutions that are very hard to un- derstand and do not have the nice, hierarchical organization of HTN plans. An appropriate compromise is to control the hybrid search so that action decompositions are preferred over adding new actions, although not to such an extent that arbitrarily long HTN plans are gener- ated before any primitive actions can be added. One way to do this is to use a cost function Section 12.2. Hierarchical Task Network Planning 429 that gives a discount for actions introduced by decomposition; the larger the discount, the more the search will resemble pure HTN planning and the more hierarchical the solution will be. Hierarchical plans are usually much easier to execute in realistic settings, and are easier to fix when things go wrong. Another important characteristic of HTN plans is the possibility of subtask sharing. Re- call that subtask sharing means using the same action to1 implement two different steps in plan decompositions. If we disallow subtask sharing, then each instantiation of a deconposition d' can be done in only one way, rather than many, thereby greatly pruning the search space. Usually, this pruning saves some time and at worst leads to a solution that is slightly longer than optimal. In some cases, however, it can be more problematic. For example, consider the goal "enjoy a honeymoon and raise a family." The plan library might come up with "get mar- ried and go to Hawaii" for the first subgoal and "get married and have two children" for the second. Without subtask sharing, the plan will include two distinct marriage actions, often considered highly undesirable. An interesting example of the costs and benefits of subtask sharing occurs in optimiz- ing compilers. Consider the problem of compiling the expression tan(x) - sin(z). Most compilers accomplish this by merging two separate sllbroutine calls in a trivial way: all the steps of tan come before any of the steps of sin. But consider the following Taylor series approximations for sin and tan: 23 2 17 7 tan=z+++; sinr=x-+- 6 120 5040 ' An HTN planner with subtask sharing could generate a more efficient solution, because it could choose to implement many steps of the sin computation with existing steps from tan. Most compilers do not do this kind of interprocedural sharing because it would take too much time to consider all the possible shared plans. Instead, most compilers generate each subplan independently, and then perhaps modify the result with a peephole optimizer. Given all the additional complications caused by the introduction of action decomposi- tions, why do we believe that HTN planning can be efficient? 'The actual sources of complex- ity are hard to analyze in practice, so we consider an idealized case. Suppose, for example, that we want to construct a plan with n actions. For a nonhierarchical, forward state-space n planner with b allowable actions at each state, the cost is O(b ). For an HTN planner, let us suppose a very regular decomposition structure: each nonlprimitive action has d possible decompositions, each into k actions at the next lower level. We want to know how many different decomposition trees there are with this structure. Now, af there are n actions at the primitive level, then the number of levels below the root is logl, n, so the number of internal 2 decomposition nodes is 1 + k + k + . . . f klogk = (n - l)/(k - 1). Each internal node has d possible decompositions, so there are d(n-l)l(k-l) possible regular decomposition trees that could be constructed. Examining this formula, we see that keeping d small and k large can result in huge savings: essentially we are taking the kth root of the nonhierarchical cost, if b and d are comparable. On the other hand, constructing ii plan library that has a small number of long decompositions, but nonetheless allows us to solve any problem, is not al- ways possible. Another way of saying this is that long macro:; that are usable across a wide range of problems are extremely precious. 430 Chapter 12. Planning and Acting in the Real World Another, and perhaps better, reason for believing that HTN planning is efficient is that it works in practice. Almost all planners for large-scale applications are HTN planners, be- cause HTN planning allows the human expert to provide the crucial knowledge about how to perform complex tasks so that large plans can be constructed with little computational effort. For example, 0-PLAN (Bell and Tate, 1985), which combines HTN planning with schedul- ing, has been used to develop production plans for Hitachi. A typical problem involves a product line of 350 different products, 35 assembly machines, and over 2000 different op- erations. The planner generates a 30-day schedule with three 8-hour shifts a day, involving millions of steps. The key to HTN planning, then, is the construction of a plan library containing known methods for implementing complex, high-level actions. One method of constructing the li- brary is to learn the methods from problem-solving experience. After the excruciating ex- perience of constructing a plan from scratch, the agent can save the plan in the library as a method for implementing the high-level action defined by the task. In this way, the agent can become more and more competent over time as new methods are built on top of old methods. One important aspect of this learning process is the ability to generalize the methods that are constructed, eliminating detail that is specific to the problem instance (e.g., the name of the builder or the address of the plot of land) and keeping just the key elements of the plan. Methods for achieving this kind of generalization are described in Chapter 19. It seems to us inconceivable that humans could be as competent as they are without some such mechanism. 12.3 PLANNING AND ACTING IN NONDETERMINISTIC DOMAINS So far we have considered only classical planning domains that are fully observable, static, and deterministic. Furthermore, we have assumed that the action descriptions are correct and complete. In these circumstances, an agent can plan first and then execute the plan "with its eyes closed." In an uncertain environment, on the other hand, an agent must use its percepts to discover what is happening while the plan is being executed and possibly modify or replace the plan if something unexpected happens. Agents have to deal with both incomplete and incorrect information. Incompleteness arises because the world is partially observable, nondeterministic, or both. For example, the door to the office supply cabinet might or might not be locked; one of my keys might or might not open the door if it is locked; and I might or might not be aware of these kinds of incompleteness in my knowledge. Thus, my model of the world is weak, but correct. On the other hand, incorrectness arises because the world does not necessarily match my model of the world; for example, I might believe that my key opens the supply cabinet, but I could be wrong if the locks have been changed. Without the ability to handle incorrect information, an agent can end up being as unintelligent as the dung beetle (page 37), which attempts to plug up its nest with a ball of dung even after the ball has been removed from its grasp. The possibility of having complete or correct knowledge depends on how much indeter- BOUNDED minacy there is in the world. With bounded indeterminacy, actions can have unpredictable Section 12.3. Planning and Acting in Nondeterministic Domains 43 1 effects, but the possible effects can be listed in the action description axioms. For exam- ple, when we flip a coin, it is reasonable to say that the oiltcome will be Heads or Tails. An agent can cope with bounded indeterminacy by making plans that work in all possible UNBOUNDED circumstances. With unbounded indeterminacy, on the other hand, the set of possible pre- conditions or effects either is unknown or is too large to be enumerated completely. This would be the case in very complex or dynamic domains such as driving, economic planning, and military strategy. An agent can cope with unbounded incleternlinacy only if it is prepared to revise its plans andfor its knowledge base. Unbounded indeterminacy is closely related to the qualification problem discussed in Chapter 10-tht: impossibility of listing all the preconditions required for a real-world action to have its intended effect. There are four planning methods for handling indeterminacy. The first two are suitable for bounded indeterminacy, and the second two for unbounded indeterminacy: SENSORLESS 0 Sensorless planning: Also called conformant pdanning, this method constructs stan- PLANNING dard, sequential plans that are to be executed without perception. The sensorless plan- ning algorithm must ensure that the plan achieves the goal in all possible circumstances, regardless of the true initial state and the actual action outcomes. Sensorless planning relies on coercion-the idea that the world can be forced into a given state even when the agent has only partial information about the current state. Coercion is not always possible, so sensorless planning is often inapplicable. Sensorless problem solving, in- volving search in belief state space, was described in Chapter 3. CONDITIONAL 0 Conditional planning: Also known as contingency planning, this approach deals with PLANNING bounded indeterminacy by constructing a conditional plan with different branches for the different contingencies that could arise. Just as in c1;issical planning, the agent plans first and then executes the plan that was produced. The agent finds out which part of SENSING ACTIONS the plan to execute by including sensing actions in the plan to test for the appropriate conditions. In the air transport domain, for example, we could have plans that say "check whether SF0 airport is operational. If so, fly there; otherwise, fly to Oakland." Conditional planning is covered in Section 12.4. EXECUTION MONITORING AND 0 Execution monitoring and replanning: In this approach, the agent can use any of the REPLANNING preceding planning techniques (classical, sensorless, or conditional) to construct a plan, but it also uses execution monitoring to judge whether the plan has a provision for the actual current situation or need to be revised. Replanning occurs when something goes wrong. In this way, the agent can handle unbounded indeterminacy. For example, even if a replanning agent did not envision the possibility of SFO's being closed, it can recognize that situation when it occurs and call the planner again to find a new path to the goal. Replanning agents are covered in Section 12.5. CONTINUOUS 0 Continuous planning: All the planners we have seen so far are designed to achieve PLANNING a goal and then stop. A continuous planner is designed to persist over a lifetime. It can handle unexpected circumstances in the environment, even if these occur while the agent is in the middle of constructing a plan. It can also handle the abandonment of goals and the creation of additional goals by goal formulation. Continuous planning is described in Section 12.6. 432 Chapter 12. Planning and Acting in the Real World Let's consider an example to clarify the differences among the various kinds of agents. The problem is this: given an initial state with a chair, a table, and some cans of paint, with everything of unknown color, achieve the state where the chair and table have the same color. A classical planning agent could not handle this problem, because the initial state is not fully specified-we don't know what color the furniture is. A sensorless planning agent must find a plan that works without requiring any sensors during plan execution. The solution is to open any can of paint and apply it to both chair and table, thus coercing them to be the same color (even though the agent doesn't know what the color is). Coercion is most appropriate when propositions are expensive or impossible to perceive. For example, doctors often prescribe a broad-spectrum antibiotic rather than using the conditional plan of doing a blood test, then waiting for the results to come back, and then prescribing a more specific antibiotic. They do this because the delays and costs involved in performing the blood tests are usually too great. A conditional planning agent can generate a better plan: first sense the color of the table and chair; if they are already the same then the plan is done. If not, sense the labels on the paint cans; if there is a can that is the same color as one piece of furniture, then apply the paint to the other piece. Otherwise paint both pieces with any color. A replanning agent could generate the same plan as the conditional planner, or it could generate fewer branches at first and fill in the others at execution time as needed. It could also deal with incorrectness of its action descriptions. For example, suppose that the Paint ( o bj , color) action is believed to have the deterministic effect Color (obj , color). A conditional planner would just assume that the effect has occurred once the action has been executed, but a replanning agent would check for the effect, and if it were not true (perhaps because the agent was careless and missed a spot), it could then replan to repaint the spot. We will return to this example on page 441. A continuous planning agent, in addition to handling unexpected events, can revise its plans appropriately if, say, we add the goal of having dinner on the table, so that the painting plan must be postponed. In the real world, agents use a combination of approaches. Car manufacturers sell spare tires and air bags, which are physical embodiments of conditional plan branches designed to handle punctures or crashes; on the other hand, most car drivers never consider these possibilities, so they respond to punctures and crashes as replanning agents. In general, agents create conditional plans only for those contingencies that have important consequences and a nonnegligible chance of going wrong. Thus, a car driver contemplating a trip across the Sahara desert might do well to consider explicitly the possibility of breakdowns, whereas a trip to the supermarket requires less advance planning. The agents we describe in this chapter are designed to handle indeterminacy, but are not capable of making tradeoffs between the probability of success and the cost of plan construc- tion. Chapter 16 provides additional tools for dealing with these issues. 434 Chapter 12. Planning and Acting in the Real World test is a Boolean function of the state variables. For example, a conditional step for the vacuum world might be, "if AtL A CleanL then Right else Suck." The execution of such a step proceeds in the obvious way. By nesting conditional steps, plans become trees. We want conditional plans that work regardless of which actiorz outcomes actually oc- cur. We have seen this problem before, in a different guise. In two-player games (Chapter 6), we want moves that will win regardless of which moves the opponent makes. For this reason, GAMES AGAINST nondeterministic planning problems are often called games against nature. NATURE Let us consider a specific example in the vacuum world. The initial state has the robot in the right square of a clean world; because the environment is fully observable, the agent knows the full state description, AtR A CleanL A CleanR. The goal state has the robot in the left square of a clean world. This would be quite trivial, were it not for the "double Murphy" vacuum cleaner that sometimes deposits dirt when it moves to a clean destination square and sometimes deposits dirt if Suck is applied to a clean square. A "game tree" for this environment is shown in Figure 12.9. Actions are taken by the robot in the "state" nodes of the tree, and nature decides what the outcome will be at the "chance" nodes, shown as circles. A solution is a subtree that (1) has a goal node at every leaf, (2) specifies one action at each of its "state" nodes, and (3) includes every outcome branch at each of its "chance" nodes. The solution is shown in bold lines in the figure; it corresponds to the plan Left, if AtL A CleanL A CleanR then I else Suck. (For now, because we are using a state-space planner, the tests in conditional steps will be complete state descriptions.) For exact solutions of games, we use the minimax algorithm (Figure 6.3). For condi- tional planning, there are typically two modifications. First, MAX and MIN nodes can become GOAL n.-r, /\ c..r. LOOP r ax+ / \ V.,-I, \ /\ LOOP \ GOAL /\ Figure 12.9 The first two levels of the search tree for the "double Murphy" vacuum world. State nodes are OR nodes where some action must be chosen. Chance nodes, shown as circles, are AND nodes where every outcome must be handled, as indicated by the arc linking the outgoing branches. The solution is shown in bold lines. Section 12.4. Conditional Planning 435 function AND-OR-GRAPH-SEARCH(O) returns a contlitional plan, or failure OR-SEARCH(INITIAL-STATE, problem, I) - function OR-sEARCH(state, problem, path) returns a conditional plan, or failure if GOAL-TEsprobem(state) then return the empty plan if state is on path then return failure for each action, state-set in SCCSRSprobem(state) do plan c AN-sERc(state-set, problem, state I path) if plan failure then return action / plan return failure function A-s(state_set, problem, path) returns a conditional plan, or failure for each si in state-set do plani c OR-SEARCH(, problem, path) if plan =failure then return failure return if sl then plan, else if sz then plan, else . . . iif s,-1 then plan,-, else plan, An algorithm for searching AND-OR graphs generated by nondeterministic Figure 12.10 SUCCESSORS returns a list of actions, each associated with a environments. We assume that set of possible outcomes. The aim is to find a conditional plain that reaches a goal state in all circumstances. OR and AND nodes. Intuitively, the plan needs to take some action at every state it reaches, but must handle every outcome for the action it takes. Second, the algorithm needs to return a conditional plan rather than just a single move. At an OR node, the plan is just the action selected, followed by whatever comes next. At an AND node, the plan is a nested series of if- then-else steps specifying subplans for each outcome; tlhe tests in these steps are the complete state decritions. Formally speaking, the search space we have defined is an AND-OR graph. In Chap- ter 7, AND-OR graphs showed up in propositional Horn clause inference. Here, the branches are actions rather than logical inference steps, but the algorjthm is the same. Figure 12.10 gives a recursive, depth-first algorithm for AND-OR graph search. One key aspect of the algorithm is the way in which it deals with cycles, which often arise in nondeterministic planning problems (e.g., if an action sometimes has no effect, or if an unintended effect can be corrected). If the current state is identical to a state on the path from the root, then it returns with failure. This doesn't mean that there is no solution from the current state; it simply means that if there is a noncyclic solution, it must be reachable from the earlier incarnation of the current state, so the new incarnation can be discarded. With this check, we ensure that the algorithm terminates in every finite state space, because every path must reach a goal, a dead end, or a repeated state. Notice that the algorithm does not check whether the current state is a repetition of a state on some other path from the root. Exercise 12.15 investigates this issue. Such plans could also be written using a case construct. 436 Chapter 12. Planning and Acting in the Real World Figure 12.11 The first level of the search graph for the "triple Murphy" vacuum world, where we have shown cycles explicitly. All solutions for this problem are cyclic plans. The plans returned by AND-OR-GRAPH-SEARCH contain conditional steps that test the entire state description to decide on a branch. In many cases, we can get away with less exhaustive tests. For example, the solution plan in Figure 12.9 could be written simply as Left, if CleanL then I else Suck. This is because the single test, CleanL, suffices to divide the states at the AND-node into two singleton sets, so that after the test the agent knows exactly what state it is in. In fact, a series of if-then-else tests of single variables always suffices to divide a set of states into singletons, provided that the state is fully observable. We could, therefore, restrict the tests to be of single variables without loss of generality. There is one final complication that often arises in nondeterministic domains: things don't always work the first time, and one has to try again. For example, consider the "triple Murphy" vacuum cleaner, which (in addition to its previously stated habits) sometimes fails to move when commanded-for example, Left can have the disjunctive effect AtL V AtR, as in Equation (12.1). Now the plan Left, if CleanL then I else Suck is no longer guaranteed to work. Figure 12.11 shows part of the the search graph; clearly, there are no longer any acyclic solutions, and AND-OR-GRAPH-SEARCH would return with failure. There is, however, a CYCLICSOLUTION cyclic solution, which is to keep trying Left until it works. We can express this solution by adding a label to denote some portion of the plan and using that label later instead of LABEL repeating the plan itself. Thus, our cyclic solution is L1 : Left, if AtR then L1 else if CleanL then I else Suck . (A better syntax for the looping part of this plan would be "while AtR do Left.") The modi- fications needed to AND-OR-GRAPH-SEARCH are covered in Exercise 12.16. The key real- ization is that a loop in the state space back to a state L translates to a loop in the plan back to the point where the subplan for state L is executed. Now we have the ability to synthesize complex plans that look more like programs, with conditionals and loops. Unfortunately, these loops are, potentially, inJinite loops. For example, nothing in the action representation for the triple Murphy world says that Left will eventually succeed. Cyclic plans are therefore less desirable than acyclic plans, but they may be considered solutions, provided that every leaf is a goal state and a leaf is reachable from every point in the plan.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.