What are informed search strategies in Artificial intelligence
informed search strategies in artificial intelligence examples, informed search strategies in artificial intelligence examples,types of informed search strategies and differentiate informed and uninformed search strategies
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
Comment
INFORMED SEARCH AND
4 EXPLORATION
:
In which we see how information about the state space can prevent algorithms
from blundering about in the dark.
Chapter 3 showed that uninformed search strategies can find solutions to problems by system-
atically generating new states and testing them against the goal. Unfortunately, these strate-
gies are incredibly inefficient in most cases. This chapter shows how an informed search
strategy-one that uses problem-specific knowledge-can find solutions more efficiently.
Section 4.1 describes informed versions of the algorithms in Chapter 3, and Section 4.2 ex-
plains how the necessary problem-specific information can be obtained. Sections 4.3 and 4.4
cover algorithms that perform purely local search in the state space, evaluating and modify-
ing one or more current states rather than systematically exploring paths from an initial state.
These algorithms are suitable for problems in which the path cost is irrelevant and all that
matters is the solution state itself. The family of local-search algorithms includes methods
inspired by statistical physics (simulated annealing) and evolutionary biology (genetic al-
gorithms). Finally, Section 4.5 investigates online search, in which the agent is faced with a
state space that is completely unknown.
INFORMED SEARCH This section shows how an informed search strategyone that uses problem-specific knowl-
edge beyond the definition of the problem itself-can find solutions more efficiently than an
uninformed strategy.
The general approach we will consider is called best-first search. Best-first search is
BEST-FIRST SEARCH
an instance of the general TREE-SEARCH or GRAPH-SEARCH algorithm in which a node is
EVALUATION
selected for expansion based on an evaluation function, f (n) . Traditionally, the node with
FUNCTION
the lowest evaluation is selected for expansion, because the evaluation measures distance to
the goal. Best-first search can be implemented within our general search framework via a
priority queue, a data structure that will maintain the fringe in ascending order of f -values.
7
The name "best-first search ' is a venerable but inaccurate one. After all, if we could
really expand the best node first, it would not be a search at all; it would be a straight march to Section 4.1. Informed (Heuristic) Search Strategies 95
the goal. All we can do is choose the node that appears to be best according to the evaluation
function. If the evaluation function is exactly accurate, then this will indeed ble the best
node; in reality, the evaluation function will sometimes be off, and can lead the search astray.
Nevertheless, we will stick with the name "best-first search," because "seemingly-best-first
search" is a little awkward.
There is a whole family of BEST-FIRST-SEARCH algorithms with different evaluation
HEURISTIC
functions.' A key component of these algorithms is a heuristic fnction, denoted h(n):
FUNCTION
h(n) = estimated cost of the cheapest path from node n to a goal node.
For example, in Romania, one might estimate the cost of the cheapest path from Arad to
Bucharest via the straight-line distance from Arad to Bucharest.
Heuristic functions are the most common form in which additional knowledge of the
problem is imparted to the search algorithm. We will study heuristics in more depth in Sec-
tion 4.2. For now, we will consider them to be arbitrary problem-specific functions, with one
constraint: if n is a goal node, then h(n) = 0. The remainder of this section covers two ways
to use heuristic information to guide search.
Greedy best-first search
3
Greedy best-first search tries to expand the node that is closest to the goal, on the: grounds
SEARCH
that this is likely to lead to a solution quickly. Thus, it evaluates nodes by using just the
heuristic function: f (n) = h(n).
Let us see how this works for route-finding problems in Romania, using the straight-
line distance heuristic, which we will call hsLD. If the goal is Bucharest, we will need to
DlSTANCli
know the straight-line distances to Bucharest, which are shown in Figure 4.1. For example,
hsLD(In(Arad)) = 366. Notice that the values of hsLD cannot be computed from the prob-
lem description itself. Moreover, it takes a certain amount of experience to know that hSLD
is correlated with actual road distances and is, therefore, a useful heuristic.
Arad 366 Mehadia 24 1
Bucharest 0 Neamt 234
Craiova 160 Oradea 380
Drobeta 242 Pitesti 100
Eforie 161 Rimnicu Vilcea 193
Fagaras 176 Sibiu 253
Giurgiu 77 Timisoara 329
Hirsova 151 Urziceni 80
Iasi 226 Vaslui 199
Lugoj 244 Zerind 374
Figure 4.1
Values of SLD-straight-line distances to Bucharest.
Exercise 4.3 asks you to show that this family includes several familiar uninformed algorithms.
"heuristic function h(n) takes a node as input, but it depends only on the state at that node.
Our first edition called this greedy search; other authors have called it best-first search. Our mare general
usage of the latter tern follows Pearl (1984). 96 Chapter 4. Informed Search and Exploration
(a) The initial state
(c) After expanding Sibiu
d& &&
(d) After expanding Fagaras
253 0
Figure 4.2 Stages in a greedy best-first search for Bucharest using the straight-line dis-
tance heuristic hsto. Nodes are labeled with their h-values.
Figure 4.2 shows the progress of a greedy best-first search using hsLD to find a path
from Arad to Bucharest. The first node to be expanded from Arad will be Sibiu, because it
is closer to Bucharest than either Zerind or Timisoara. The next node to be expanded will
be Fagaras, because it is closest. Fagaras in turn generates Bucharest, which is the goal.
For this particular problem, greedy best-first search using hsLD finds a solution without ever
expanding a node that is not on the solution path; hence, its search cost is minimal. It is
not optimal, however: the path via Sibiu and Fagaras to Bucharest is 32 kilometers longer
than the path through Rimnicu Vilcea and Pitesti. This shows why the algorithm is called
2
"greedy '-at each step it tries to get as close to the goal as it can.
Minimizing h(n) is susceptible to false starts. Consider the problem of getting from
Iasi to Fagaras. The heuristic suggests that Neamt be expanded first, because it is closest Section 4.1. Informed (Heuristic) Search Strategies 97
to Fagaras, but it is a dead end. The solution is to go first to Vaslui- a step that is actually
farther from the goal according to the heuristic-and then to continue to Urziceni, Bucharest,
and Fagaras. In this case, then, the heuristic causes unnecessary nodes to be expanded. Fur-
thermore, if we are not careful to detect repeated states, the solution will never be found-the
search will oscillate between Neamt and Iasi.
Greedy best-first search resembles depth-first search in the way it prefers to follow a
single path all the way to the goal, but will back up when it hits a dead end. It suffers from
the same defects as depth-first search-it is not optimal, and it is incomplete (becixuse it can
start down an infinite path and never return to try other possibilities). The worst-case time
m
and space complexity is O(b ), where m is the maximum depth of the search space. With a
good heuristic function, however, the complexity can be reduced substantially. Tlie amount
of the reduction depends on the particular problem and on the quality of the heuristic.
A search: Minimizing the total estimated solution cost
A SEARCH The most widely-known form of best-first search is called A search (pronounced "A-star
search"). It evaluates nodes by combining g(n), the cost to reach the node, and h(n.), the cost
to get from the node to the goal:
Since g(n) gives the path cost from the start node to node n, and h(n) is the estirnated cost
of the cheapest path from n to the goal, we have
f (n) = estimated cost of the cheapest solution through n
Thus, if we are trying to find the cheapest solution, a reasonable thing to try first is the
node with the lowest value of g(n) + h(n). It turns out that this strategy is more than just
reasonable: provided that the heuristic function h(n) satisfies certain conditions, K search is
both complete and optimal.
The optimality of A is straightforward to analyze if it is used with TREE-SEARCH.
ADMISSIBLE
HEURISTIC In this case, A is optimal if h(n) is an admissible heuristic-that is, provided that h(n)
never overestimates the cost to reach the goal. Admissible heuristics are by nature optimistic,
because they think the cost of solving the problem is less than it actually is. Since g(n) is the
exact cost to reach n, we have as immediate consequence that f (n) never overestimates the
true cost of a solution through n.
An obvious example of an admissible heuristic is the straight-line distance hsLD that
we used in getting to Bucharest. Straight-line distance is admissible because the shortest path
between any two points is a straight line, so the straight line cannot be an overestimate. In
Figure 4.3, we show the progress of an A tree search for Bucharest. The values of g are
computed from the step costs in Figure 3.2, and the values of hsLD are given in Figure 4.1.
Notice in particular that Bucharest first appears on the fringe at step (e), but it is noit selected
for expansion because its f -cost (450) is higher than that of Pitesti (417). Another way to
say this is that there might be a solution through Pitesti whose cost is as low as 417, so the
algorithm will not settle for a solution that costs 450. From this example, we can extract
a general proof that A using TREE-SEARCH is optimal if h(n) is admissible. Suppose a 98 Chapter 4. Informed Search and Exploration
(a) The initial state
366=0+366
(b) After expanding Arad
393=140+253
(c) After expanding Sibiu
646=280+366 415=239+176 671=291+380 413=220+193
(d) After expanding Rimnicu Vilce
526=366+160 417=317+100 553=300+253
(e) After expanding Fagaras
591=338+253 450=450+0 526=366+160 417=317+100 553=300+253
(f) After expanding Pitesti
418=418+0 615=455+160 607=414+193
Stages in an A search for Bucharest. Nodes are labeled with f = g + h. The
Figure 4.3
h values are the straight-line distances to Bucharest taken from Figure 4.1. Section 4.1. Informed (Heuristic) Search Strategies 99
suboptimal goal node G2 appears on the fringe, and let the cost of the optimal solution be C.
Then, because G2 is suboptimal and because h(G2) = 0 (true for any goal node), vie know
f (G2) = g(G2) + h(G2) = g(G2) C .
Now consider a fringe node n that is on an optimal solution path-for example, Pitesti in the
example of the preceding paragraph. (There must always be such a node if a solution exists.)
If h(n) does not overestimate the cost of completing the solution path, then we know that
f (n) = g(n) + h(n) 5 C .
Now we have shown that f (n) 5 C f (G2), so G2 will not be expanded anti A must
return an optimal solution.
If we use the GRAPH-SEARCH algorithm of Figure 3.19 instead of TREE-SEARCH,
then this proof breaks down. Suboptimal solutions can be returned because GRAPH-SEARCH
can discard the optimal path to a repeated state if it is not the first one generated.
(See
There are two ways to fix this problem. The first solution is to extend
Exercise 4.4.)
GRAPH-SEARCH so that it discards the more expensive of any two paths found to the same
node. (See the discussion in Section 3.5.) The extra bookkeeping is messy, but it does guar-
antee optimality. The second solution is to ensure that the optimal path to any repeated state is
always the first one followed-as is the case with uniform-cost search. This property holds if
CONSISTENCY
we impose an extra requirement on h(n), namely the requirement of consistency (also called
MONOTONICIN monotonicity). A heuristic h(n) is consistent if, for every node n and every successor n' of
n generated by any action a, the estimated cost of reaching the goal from n is no greater than
the step cost of getting to n' plus the estimated cost of reaching the goal from n':
h(n) 5 c(n, a, n') t h(nf) .
TRlANGLli
This is a form of the general triangle inequality, which stipulates that each side of ;I triangle
INEQUALIN
cannot be longer than the sum of the other two sides. Here, the triangle is formed by n, n',
and the goal closest to n. It is fairly easy to show (Exercise 4.7) that every consistent heuristic
is also admissible. The most important consequence of consistency is the following: A using
GRAPH-SEARCH is optimal h(n) is consistent.
Although consistency is a stricter requirement than admissibility, one has to wlork quite
hard to concoct heuristics that are admissible but not consistent. All the admissible heuristics
we discuss in this chapter are also consistent. Consider, for example, hsLD. We know that
Ithe general triangle inequality is satisfied when each side is measured by the straight-line
distance, and that the straight-line distance between n and n' is no greater than c(n, a, n').
IHence, hsLD is a consistent heuristic.
Another important consequence of consistency is the following: Zf h(n) is consistent,
then the values off (n) along any path are nondecueasing. The proof follows directly from
the definition of consistency. Suppose n' is a successor of n; then g(n') = g(n) + c(n, a, n')
for some a, and we have
f(nf) = g(nf) + h(nf) = g(n) + c(n, a, n') + h(nf) 2 g(n) + h(n) = f (n) .
It follows that the sequence of nodes expanded by A using GRAPH-SEARCH is in nonde-
creasing order of f (n). Hence, the first goal node selected for expansion must be an optimal
solution, since all later nodes will be at least as expensive. 100 Chapter 4. Informed Search and Exploration
Figure 4.4 Map of Romania showing contours at f = 380, f = 400 and f = 420, with
Arad as the start state. Nodes inside a given contour have f -costs less than or equal to the
contour value.
The fact that f -costs are nondecreasing along any path also means that we can draw
contours in the state space, just like the contours in a topographic map. Figure 4.4 shows an
CONTOURS
example. Inside the contour labeled 400, all nodes have f (n) less than or equal to 400, and so
on. Then, because A expands the fringe node of lowest f -cost, we can see that an A search
fans out from the start node, adding nodes in concentric bands of increasing f -cost.
With uniform-cost search (A search using h(n) = O), the bands will be "circular"
around the start state. With more accurate heuristics, the bands will stretch toward the goal
state and become more narrowly focused around the optimal path. If C is the cost of the
optimal solution path, then we can say the following:
A expands all nodes with f (n) C.
A might then expand some of the nodes right on the "goal contour" (where f (n) = C)
before selecting a goal node.
Intuitively, it is obvious that the first solution found must be an optimal one, because goal
nodes in all subsequent contours will have higher f -cost, and thus higher g-cost (because all
goal nodes have h(n) = 0). Intuitively, it is also obvious that A search is complete. As we
add bands of increasing f, we must eventually reach a band where f is equal to the cost of
the path to a goal state.4
Notice that A expands no nodes with f (n) C-for example, Timisoara is not
expanded in Figure 4.3 even though it is a child of the root. We say that the subtree below
PRUNING Timisoara is pruned; because hsLD is admissible, the algorithm can safely ignore this subtree
Completeness requires that there be only finitely many nodes with cost less than or equal to C, a condition
that is true if all step costs exceed some finite t and if b is finite. Section 4.1. Informed (Heuristic) Search Strategies 101
while still guaranteeing optimality. The concept of pruning-eliminating possibilities from
consideration without having to examine them-is important for many areas of AI.
One final observation is that among optimal algorithms of this type-algorithms that
extend search paths from the root- A is optimally efficient for any given heuristic function.
That is, no other optimal algorithm is guaranteed to expand fewer nodes than A (except
possibly through tie-breaking among nodes with f (n) = C). This is because any algorithm
that does not expand all nodes with f (n) C runs the risk of missing the optimal solution.
That A search is complete, optimal, and optimally efficient among all such algorithms
is rather satisfying. Unfortunately, it does not mean that A is the answer to all our searching
needs. The catch is that, for most problems, the number of nodes within the goal contour
search space is still exponential in the length of the solution. Although the proof of the result
is beyond the scope of this book, it has been shown that exponential growth will occur unless
the error in the heuristic function grows no faster than the logarithm of the actual path cost.
In mathematical notation, the condition for subexponential growth is that
where h (n) is the true cost of getting from n to the goal. For almost all heuristics in practical
use, the error is at least proportional to the path cost, and the resulting exponential growth
eventually overtakes any computer. For this reason, it is often impractical to insist on finding
an optimal solution. One can use variants of A that find suboptimal solutions quickly, or one
can sometimes design heuristics that are more accurate, but not strictly admissible. In any
case, the use of a good heuristic still provides enormolus savings compared to the use of an
uninformed search. In Section 4.2, we will look at the question of designing good heuristics.
Computation time is not, however, A's main drawback. Because it keeps all generated
RAPH-SEARCH algorithms), A usually runs out of space long
nodes in memory (as do all G
before it runs out of time. For this reason, A is not practical for many large-scale prob-
lems. Recently developed algorithms have overcome the space problem without sacrificing
optimality or completeness, at a small cost in execution time. These are discussed next.
Memory-bounded heuristic search
The simples1 way to reduce memory requirements for A" is to adapt the idea of iterative deep-
ening to the heuristic search context, resulting in the iterative-deepening A " (IDA) algorithm.
The main difference between IDA and standard iterative deepening is that the cutoff used
f -cost (g + h) rather than the depth; at each iteration, the cutoff value is the small-
is the
est f -cost of any node that exceeded the cutoff on the previous iteration. IDA is practical
for many problems with unit step costs and avoids the substantial overhead associated with
keeping a sorted queue of nodes. Unfortunately, it suffers from the same difficulties with real-
valued costs as does the iterative version of uniform-cost search described in Exercise 3.11.
This section briefly examines two more recent memory-bounded algorithms, called RBFS
and MA.
RECURSIVE
BEST-FIRST SEARCH Recursive best-first search (RBFS) is a simple recursive algorithm that attempts to
-first search, but using only linear space. The algorithm is
mimic the operation of standard best
shown in Figure 4.5. Its structure is similar to that of a recursive depth-first search, but rather 102 Chapter 4. Informed Search and Exploration
function RECURSIVE-BET-FIRT-SEARCH(O) returns a solution, or failure
RBFS(problem, MAKE-NoDE(INITIAL-STATEO), oo)
function RBFS(problem, node, f-limit) returns a solution, or failure and a new f -cost limit
if GoAL-TEsTo(STATEO) then return node
successors +- EXPAND(O, problem)
'
if successors is empty then return failure, oo
for each s in successors do
f sl +max(g(s) + h(s), f nodel)
repeat
best +the lowest f -value node in successors
iff best f-limit then return failure, f best
alternative +- the second-lowest f -value among successors
result, f best + RBFS(problem, best,min( f-limit, alternative))
if result failure then return result
Figure 4.5 The algorithm for recursive best-first search.
1
than continuing indefinitely down the current path, it keeps track of the f-value of the best
alternative path available from any ancestor of the current node. If the current node exceeds
this limit, the recursion unwinds back to the alternative path. As the recursion unwinds, RBFS
replaces the f -value of each node along the path with the best f -value of its children. In this
way, RBFS remembers the f -value of the best leaf in the forgotten subtree and can therefore
decide whether it's worth reexpanding the subtree at some later time. Figure 4.6 shows how
RBFS reaches Bucharest.
RBFS is somewhat more efficient than IDA, but still suffers from excessive node re-
generation. In the example in Figure 4.6, RBFS first follows the path via Rimnicu Vilcea,
then "changes its mind" and tries Fagaras, and then changes its mind back again. These mind
changes occur because every time the current best path is extended, there is a good chance
that its f -value will increase-h is usually less optimistic for nodes closer to the goal. When
this happens, particularly in large search spaces, the second-best path might become the best
path, so the search has to backtrack to follow it. Each mind change corresponds to an iteration
of IDA, and could require many reexpansions of forgotten nodes to recreate the best path and
extend it one more node.
Like A, RBFS is an optimal algorithm if the heuristic function h(n) is admissible. Its
space complexity is linear in the depth of the deepest optimal solution, but its time complexity
is rather difficult to characterize: it depends both on the accuracy of the heuristic function and
on how often the best path changes as nodes are expanded. Both IDA and RBFS are subject to
the potentially exponential increase in complexity associated with searching on graphs (see
Section 3.5), because they cannot check for repeated states other than those on the current
path. Thus, they may explore the same state many times.
IDA and RBFS suffer from using too little memory. Between iterations, IDA retains
only a single number: the current f -cost limit. RBFS retains more information in memory, Section 4.1. Informed (Heuristic) Search Strategies 103
(a) (a) After After expandi expanding Arad, Sibiu,
and and Rimnicu Rimnicu Vilcea Vilcea
(b) After unwinding back to Sibiu
and expanding Fagaras
(c) (c) After After switching switching back back to to Rimnicu Rimnicu Vilcea Vilcea
and and expanding expanding Pitesti Pitesti
9
Figure 4.6 Stages in an RBFS search for the shortest route to Bucharest. The f-limit
value for each recursive call is shown on top of each current node. (a) The path via Rimnicu
Vilcea is followed until the current best leaf (Pitesti) has a value that is worse than the best
alternative path (Fagaras). (b) The recursion unwinds and the best leaf value of the forgotten
subtree (417) is backed up to Rimnicu Vilcea; then Fagaras is expanded, revealing a best
leaf value of 450. (c) The recursion unwinds and the best leaf value of the forgotten subtree
(450) is backed up to Fagaras; then Rirnnicu Vilcea is expanded. This time, because the best
alternative path (through Timisoara) costs at least 447, the expansion continues to Bucharest.
J
but it uses only linear space: even if more memory were available, RBFS has no way to make
use of it.
It seems sensible, therefore, to use all available memory. Two algorithms that do this
MA'
are MA (memory-bounded A) and SMA (simplified MA). We will describe SMA, which
SMA' 1 04 Chapter 4. Informed Search and Exploration
is-well-simpler. SMA proceeds just like A, expanding the best leaf until memory is full.
At this point, it cannot add a new node to the search tree without dropping an old one. SMA
always drops the worst leaf node-the one with the highest f-value. Like RBFS, SMA
then backs up the value of the forgotten node to its parent. In this way, the ancestor of a
forgotten subtree knows the quality of the best path in that subtree. With this information,
SMA regenerates the subtree only when all otherpaths have been shown to look worse than
the path it has forgotten. Another way of saying this is that, if all the descendants of a node n
are forgotten, then we will not know which way to go from n, but we will still have an idea
of how worthwhile it is to go anywhere from n.
The complete algorithm is too complicated to reproduce here,5 but there is one subtlety
worth mentioning. We said that SMA expands the best leaf and deletes the worst leaf. What
if all the leaf nodes have the same f -value? Then the algorithm might select the same node
for deletion and expansion. SMA solves this problem by expanding the newest best leaf and
deleting the oldest worst leaf. These can be the same node only if there is only one leaf; in that
case, the current search tree must be a single path from root to leaf that fills all of memory.
If the leaf is not a goal node, then even if it is on an optimal solution path, that solution is
not reachable with the available memory. Therefore, the node can be discarded exactly as if
it had no successors.
SMA is complete if there is any reachable solution-that is, if d, the depth of the
shallowest goal node, is less than the memory size (expressed in nodes). It is optimal if any
optimal solution is reachable; otherwise it returns the best reachable solution. In practical
terms, SMA might well be the best general-purpose algorithm for finding optimal solutions,
particularly when the state space is a graph, step costs are not uniform, and node generation
is expensive compared to the additional overhead of maintaining the open and closed lists.
On very hard problems, however, it will often be the case that SMA is forced to switch
back and forth continually between a set of candidate solution paths, only a small subset of
THRASHING which can fit in memory. (This resembles the problem of thrashing in disk paging systems.)
Then the extra time required for repeated regeneration of the same nodes means that problems
that would be practically solvable by A, given unlimited memory, become intractable for
SMA. That is to say, memory limitations can make a problem intractable from the point of
view of computation time. Although there is no theory to explain the tradeoff between time
and memory, it seems that this is an inescapable problem. The only way out is to drop the
optimality requirement.
Learning to search better
We have presented several fixed strategies-breadth-first, greedy best-first, and so on-that
have been designed by computer scientists. Could an agent learn how to search better? The
answer is yes, and the method rests on an important concept called the metalevel state space.
SPACE
Each state in a metalevel state space captures the internal (computational) state of a program
STATE
that is searching in an object-level state space such as Romania. For example, the internal
SPACE
state of the A algorithm consists of the current search tree. Each action in the metalevel state
A rough sketch appeared in the first edition of this book. Section 4.2. Heuristic Functions 105
space is a computation step that alters the internal state; for example, each computation step
in A expands a leaf node and adds its successors to the tree. Thus, Figure 4.3, which shows
a sequence of larger and larger search trees, can be seten as depicting a path in the metalevel
state space where each state on the path is an object-level search tree.
Now, the path in Figure 4.3 has five steps, including one step, the expansion of Fagaras,
that is not especially helpful. For harder problems, there will be many such missteps, and a
metalevel learning algorithm can learn from these experiences to avoid exploring unpromis-
ing subtrees. The techniques used for this kind of learning are described in Chapter 21. The
goal of leaning is to minimize the total cost of problem solving, trading off computational
expense and path cost.
In this section, we will look at heuristics for the 8-puzzle, in order to shed light on the nature
of heuristics in general.
The 8-puzzle was one of the earliest heuristic search problems. As mentioned in Sec-
tion 3.2, the object of the puzzle is to slide the tiles horizontally or vertically into the empty
space until the configuration matches the goal configuration (Figure 4.7).
Start State Goal State
Figure 4.7 A typical instance of the 8-puzzle. The solution p- is 26 steps long.
The average solution cost for a randomly generated %-puzzle instance is about 22 steps.
The branching factor is about 3. (When the empty tile is in the middle, there are four possible
moves; when it is in a corner there are two; and when it is along an edge there are three.) This
means that an exhaustive search to depth 22 would look at about 322 = 3.1 x lo1' states. By
keeping track of repeated states, we could cut this down by a factor of about 170,000, because
there are only 9/2 = 181,440 distinct states that are reachable. (See Exercise 3.4.) This is
a manageable number, but the corresponding number for the 15-puzzle is roughly 1013, so
the next order of business is to find a good heuristic function. If we want to find the shortest
solutions by using , we need a heuristic function that never overestimates the number of
steps to the goal. There is a long history of such heuristics for the 15-puzzle; here are two
commonly-used candidates: 106 Chapter 4. Informed Search and Exploration
hl = the number of misplaced tiles. For Figure 4.7, all of the eight tiles are out of
position, so the start state would have hl = 8. hl is an admissible heuristic, because it
is clear that any tile that is out of place must be moved at least once.
h2 = the sum of the distances of the tiles from their goal positions. Because tiles
cannot move along diagonals, the distance we will count is the sum of the horizontal
and vertical distances. This is sometimes called the city block distance or Manhattan
MANHATTAN
distance. h2 is also admissible, because all any move can do is move one tile one step
DISTANCE
closer to the goal. Tiles 1 to 8 in the start state give a Manhattan distance of
h2=3+l+2+2+2+3+3+2=18.
As we would hope, neither of these overestimates the true solution cost, which is 26.
The effect of heuristic accuracy on performance
EFFECTIVE FACTOR
One way to characterize the quality of a heuristic is the effective branching factor b. If the
total number of nodes generated by A for a particular problem is N, and the solution depth
is d, then b is the branching factor that a uniform tree of depth d would have to have in order
to contain N + 1 nodes. Thus,
N + 1 = 1 + b + (b)2 + . . . + (b)d.
For example, if A finds a solution at depth 5 using 52 nodes, then the effective branching
factor is 1.92. The effective branching factor can vary across problem instances, but usually
it is fairly constant for sufficiently hard problems. Therefore, experimental measurements of
b on a small set of problems can provide a good guide to the heuristic's overall usefulness.
A well-designed heuristic would have a value of b close to 1, allowing fairly large problems
to be solved.
To test the heuristic functions hl and ha, we generated 1200 random problems with
solution lengths from 2 to 24 (100 for each even number) and solved them with iterative
deepening search and with A tree search using both hl and h2. Figure 4.8 gives the average
number of nodes generated by each strategy and the effective branching factor. The results
suggest that h2 is better than hl, and is far better than using iterative deepening search. On our
solutions with length 14, A with hz is 30,000 times more efficient than uninformed iterative
deepening search.
One might ask whether h2 is always better than hl. The answer is yes. It is easy to see
from the definitions of the two heuristics that, for any node n, h2(n) 2 hl (n). We thus say
DOMINATION that hZ dominates hl. Domination translates directly into efficiency: A using h2 will never
expand more nodes than A using hl (except possibly for some nodes with f (n) = C). The
argument is simple. Recall the observation on page 100 that every node with f (n) C will
surely be expanded. This is the same as saying that every node with h(n) C - g(n) will
surely be expanded. But because ha is at least as big as hl for all nodes, every node that is
surely expanded by A search with ha will also surely be expanded with hl, and hl might
also cause other nodes to be expanded as well. Hence, it is always better to use a heuristic
function with higher values, provided it does not overestimate and that the computation time
for the heuristic is not too large. Section 4.2. Heuristic Functions 107
Search Cost Effective Branching Factor
IDS IDS
d A(hl) A(ha)
A(hl) A(h2)
1.79
10 6 6 2.45 1.79
2
1.45
112 13 12 2.87 1.48
4
1.30
680 20 18 ;.73 1.34
6
1.24
6384 39 25 2.80 1.33
8
1.22
47127 93 39 2+.79 1.38
10
12 227 73 2.78 1.42 1.24
3644035
- -
539 113 1.44 1.23
14
1.45 1.25
1.46 1.26
1 ;;:: 1 ;:: 1
- 7276 676 1.47 1.27
18094 1219 - 1.48 1.28
- 39135 1641 - 1.48 1.26
,I
Figure 4.8 Comparison of the search costs and effective branching factors for the
ITERATI VE-DEEPENING-SEARCH and A algorithms ,with hl , h2. Data are averaged over
100 instances of the &puzzle, for various solution lengths.
Inventing admissible heuristic functions
We have seen that both hl (misplaced tiles) and 1 (Manhattan distance) are fairly good
heuristics for the 8-puzzle and that hz is better. How might one have come up with h2? Is it
possible for a computer to invent such a heuristic mechanically?
hl and hz are estimates of the remaining path length for the 8-puzzle, but they are
also perfectly accurate path lengths for simpl.ed versions of the puzzle. If the rules of the
puzzle were changed so that a tile could move anywhere, instead of just to the adjacent empty
square, then hl would give the exact number of steps in the shortest solution. Similarly, if
a tile could move one square in any direction, even onto an occupied square, then h2 would
give the exact number of steps in the shortest solution. A problem with fewer restrictions on
RELAXED PROBLEM
the actions is called a relaxed problem. The cost of an optimal solution to a relaxedproblem
is an admissible heuristic for the original problem. The heuristic is admissible because
the optimal solution in the original problem is, by defjnition, also a solution in the relaxed
problem and therefore must be at least as expensive as the optimal solution in the relaxed
problem. Because the derived heuristic is an exact cost for the relaxed problem, it must obey
the triangle inequality and is therefore consistent (see page 99).
If a problem definition is written down in a formal1 language, it is possible to construct
relaxed problems atornaticall. For example, if the 8-puzzle actions are described as
A tile can move from square A to square B if
A is horizontally or vertically adjacent to B an(d B is blank,
In Chapters 8 and 11, we will describe formal languages suitable for this task; with formal descriptions that
can be manipulated, the construction of relaxed problems can be automated. For now, we will use English. 108 Chapter 4. Informed Search and Exploration
we can generate three relaxed problems by removing one or both of the conditions:
(a) A tile can move from square A to square B if A is adjacent to B.
(b) A tile can move from square A to square B if B is blank.
(c) A tile can move from square A to square B.
From (a), we can derive hz (Manhattan distance). The reasoning is that h2 would be the
proper score if we moved each tile in turn to its destination. The heuristic derived from (b) is
discussed in Exercise 4.9. From (c), we can derive hl (misplaced tiles), because it would be
the proper score if tiles could move to their intended destination in one step. Notice that it is
crucial that the relaxed problems generated by this technique can be solved essentially without
search, because the relaxed rules allow the problem to be decomposed into eight independent
subproblems. If the relaxed problem is hard to solve, then the values of the corresponding
heuristic will be expensive to btain.
A program called ABSOLVER can generate heuristics automatically from problem def-
initions, using the "relaxed problem" method and various other techniques (Prieditis, 1993).
ABSOLVER generated a new heuristic for the 8-puzzle better than any preexisting heuristic
and found the first useful heuristic for the famous Rubik's cube puzzle.
One problem with generating new heuristic functions is that one often fails to get one
"clearly best" heuristic. If a collection of admissible heuristics hl . . . h, is available for a
problem, and none of them dominates any of the others, which should we choose? As it turns
out, we need not make a choice. We can have the best of all worlds, by defining
h(n) = maxhl (n), . . . , h,(n) ) .
This composite heuristic uses whichever function is most accurate on the node in question.
Because the component heuristics are admissible, h is admissible; it is also easy to prove that
h is consistent. Furthermore, h dominates all of its component heuristics.
SUBPROBLEM Admissible heuristics can also be derived from the solution cost of a subproblem of
a given problem. For example, Figure 4.9 shows a subproblem of the 8-puzzle instance
in Figure 4.7. The subproblem involves getting tiles 1, 2, 3, 4 into their correct positions.
Clearly, the cost of the optimal solution of this subproblem is a lower bound on the cost of
the complete problem. It turns out to be substantially more accurate than Manhattan distance
in some cases.
PAUERN DATABASES The idea behind pattern databases is to store these exact solution costs for every pos-
sible subproblem instance-in our example, every possible configuration of the four tiles and
the blank. (Notice that the locations of the other four tiles are irrelevant for the purposes of
solving the subproblem, but moves of those tiles do count towards the cost.) Then, we com-
pute an admissible heuristic hDB for each complete state encountered during a search simply
by looking up the corresponding subproblem configuration in the database. The database
itself is constructed by searching backwards froin the goal state and recording the cost of
each new pattern encountered; the expense of this search is amortized over many subsequent
problem instances.
Note that a perfect heuristic can be obtained simply by allowing h to run a full breadth-first search "on the
sly." Thus, there is a tradeoff between accuracy and computation time for heuristic functions. Section 4.2. Heuristic Functions 109
Start State Goal State
Figure 4.9 A subproblem of the 8-puzzle instance given in Figure 4.7. The task is to get
tiles 1, 2, 3, and 4 into their correct positions, without worrying about what happens to the
other tiles.
The choice of 1-2-3-4 is fairly arbitrary; we could also construct databases for 5-6-7-8,
and for 2-4-64, and so on. Each database yields an admissible heuristic, and these heuristics
can be combined, as explained earlier, by taking the maximum value. A combined heuristic of
this lund is much more accurate than the Manhattan distance; the number of nodes generated
when solving random 15-puzzles can be reduced by a factor of 1000.
One might wonder whether the heuristics obtained from the 1-2-3-4 database and the
5-6-7-8 could be added, since the two subproblems seem not to overlap. Would this still give
an admissible heuristic? The answer is no, because the solutions of the 1-2-3-4 subproblem
and the 5-6-7-8 subproblem for a given state will almost certainly share some moves-it is
unlikely that 1-2-3-4 can be moved into place without touching 5-6-7-8, and vice versa. But
what if we don't count those moves? That is, we record niot the total cost of solving the
1-2-3-4 subproblem, but just the number of moves involving 1-2-3-4. Then it is easy to see
that the sum of the two costs is still a lower bound on the cost of solving the entire problem.
This is the idea behind disjoint pattern databases. Using such databases, it is possible to
DATABASES
solve random 15-puzzles in a few milliseconds-the numbes of nodes generated is reduced
by a factor of 10,000 compared with using Manhattan distance. For 24-puzzles, a speedup of
roughly a million can be obtained.
Disjoint pattern databases work for sliding-tile puzzles because the problem can be
divided up in such a way that each move affects only onie subproblem-because only one tile
is moved at a time. For a problem such as
Rubik's cube, this kind of subdivision cannot be
done because each move affects 8 or 9 of the 26 cubies. Currently, it is not clear how to define
disjoint databases for such problems.
Learning heuristics from experience
A heuristic function h(n) is supposed to estimate the cost of a solution beginning from the
state at node n. How could an agent construct such a function? One solution was given in the
preceding section-namely, to devise relaxed problems for which an optimal solution can be
found easily. Another solution is to learn from experience. "E.xperience" here means solving
lots of 8-puzzles, for instance. Each optimal solution to an 8-puzzle problem provides ex- 110 Chapter 4. Informed Search and Exploration
amples from which h(n) can be learned. Each example consists of a state from the solution
path and the actual cost of the solution from that point. From these examples, an inductive
learning algorithm can be used to construct a function h(n) that can (with luck) predict solu-
tion costs for other states that arise during search. Techniques for doing just this using neural
nets, decision trees, and other methods are demonstrated in Chapter 18. (The reinforcement
learning methods described in Chapter 21 are also applicable.)
FEATURES
Inductive learning methods work best when supplied with features of a state that are
relevant to its evaluation, rather than with just the raw state description. For example, the
feature "number of misplaced tiles" might be helpful in predicting the actual distance of a
state from the goal. Let's call this feature XI (n). We could take 100 randomly generated
8-puzzle configurations and gather statistics on their actual solution costs. We might find that
when XI (n) is 5, the average solution cost is around 14, and so on. Given these data, the
value of XI can be used to predict h(n). Of course, we can use several features. A second
feature xz(n) might be "number of pairs of adjacent tiles that are also adjacent in the goal
state." How should XI (n) and xa(n) be combined to predict h(n)? A common approach is
to use a linear combination:
h(n) = clx(n) + c2x(n) .
The constants cl and ca are adjusted to give the best fit to the actual data on solution costs.
Presumably, cl should be positive and c2 should be negative.
4.3 LOCAL SEARCH ALGORITHMS AND OPTIMIZATION PROBLEMS
The search algorithms that we have seen so far are designed to explore search spaces sys-
tematically. This systematicity is achieved by keeping one or more paths in memory and by
recording which alternatives have been explored at each point along the path and which have
not. When a goal is found, the path to that goal also constitutes a solution to the problem.
In many problems, however, the path to the goal is irrelevant. For example, in the 8-
queens problem (see page 66), what matters is the final configuration of queens, not the order
in which they are added. This class of problems includes many important applications such as
integrated-circuit design, factory-floor layout, job-shop scheduling, automatic programming,
telecommunications network optimization, vehicle routing, and portfolio management.
If the path to the goal does not matter, we might consider a different class of algo-
LOCALSEARCH rithms, ones that do not worry about paths at all. Local search algorithms operate using
CURRENTSTATE a single current state (rather than multiple paths) and generally move only to neighbors
of that state. Typically, the paths followed by the search are not retained. Although local
search algorithms are not systematic, they have two key advantages: (1) they use very little
memory-usually a constant amount; and (2) they can often find reasonable solutions in large
or infinite (continuous) state spaces for which systematic algorithms are unsuitable.
In addition to finding goals, local search algorithms are useful for solving pure op-
OPTIMIZATION
timization problems, in which the aim is to find the best state according to an objective
PROBLEMS
OBJECTIVE
function. Many optimization problems do not fit the "standard" search mode1 introduced in
FUNCTION Section 4.3. Local Search Algorithms and Optimization Problems 111
Chapter 3. For example, nature provides an objective function-reproductive fitness-that
Darwinian evolution could be seen as attempting to optimize, but there is no "goal test" and
no "path cost" for this problem.
To understand local search, we will find it very useful to consider the state space land-
STATE SPACE
scape (as in Figure 4.10). A landscape has both "localtion" (defined by the state) and "eleva-
LANDSCAPE
tion" (defined by the value of the heuristic cost function or objective function). If elevation
GLOBALMINIMUM
corresponds to cost, then the aim is to find the lowest valley-a global minimum; if eleva-
tion corresponds to an objective function, then the aim is to find the highest peak- a global
GLOBALMAXIMUM
maximum,. (You can convert from one to the other just by inserting a minus sign.) Local
search algorithms explore this landscape. A complete, local search algorithm always finds a
goal if one exists; an optimal algorithm always finds a, global minimum/maximum.
maximum
I I
b state space
current
state
Figure 4.10 A one-dimensional state space landscape in which elevation corresponds to
the objective function. The aim is to find the global maximum. Hill-climbing search modifies
the current state to try to improve it, as shown by the an-ow. The various topographic features
are defined in the text.
I
Hill-climbing search
HILL-CLIMBING
The hill-climbing search algorithm is shown in Figure 4.1 1. It is simply a loop that continu-
ally moves in the direction of increasing value-that is, uphill. It terminates when it reaches a
"peak" where no neighbor has a higher value. The algorithm does not maintain a search tree,
so the current node data structure need only record the state and its objective function value.
Hill-climbing does not look ahead beyond the immediate neighbors of the current state. This
resembles trying to find the top of Mount Everest in a thick fog while suffering from amnesia.
To illustrate hill-climbing, we will use the 8-queens problem introduced on page 66.
Local-search algorithms typically use a complete-state formulation, where each state has
8 queens on the board, one per column. The successor function returns all possible states
generated by moving a single queen to another square in the same column (so each state has 112 Chapter 4. Informed Search and Exploration
function HILL-CLIMBING() returns a state that is a local maximum
inputs: problem, a problem
local variables: current, a node
neighbor, a node
current +- MAKE-NODE(INITIAL-STATEO)
loop do
neighbor +- a highest-valued successor of curren.t
if VAuEneighbor VALUEcurrent then return Scurrent
current t neighbor
Figure 4.11
The hill-climbing search algorithm (steepest ascent version), which is the
most basic local search technique. At each step the current node is replaced by the best
neighbor; in this version, that means the neighbor with the highest VALUE, but if a heuristic
cost estimate h is used, we would find the neighbor with the lowest h.
(a) (b)
(a) An 8-queens state with heuristic cost estimate h = 17, showing the value
Figure 4.12
of h for each possible successor obtained by moving a queen within its column. The best
moves are marked. (b) A local minimum in the 8-queens state space; the state has h = 1 but
every successor has a higher cost.
8 x 7 = 56 successors). The heuristic cost function h is the number of pairs of queens that
are attacking each other, either directly or indirectly. The global minimum of this function
is zero, which occurs only at perfect solutions. Figure 4.12(a) shows a state with h = 17.
The figure also shows the values of all its successors, with the best successors having h = 12.
Hill-climbing algorithms typically choose randomly among the set of best successors, if there
is more than one. Section 4.3. Local Search Algorithms and Optimization Problems 113
GREEDY LOCAL
Hill climbing is sometimes called greedy local s'earch because it grabs a good neighbor
SEARCH
state without thinking ahead about where to go next. Although greed is considered one of the
seven deadly sins, it turns out that greedy algorithms often perform quite well. Hill climbing
often makes very rapid progress towards a solution, be'cause it is usually quite easy to improve
a bad state. For example, from the state in Figure 4.12(a), it takes just five steps to reach the
state in Figure 4.12(b), which has h = 1 and is very nearly a solution. Unfortunately, hill
climbing often gets stuck for the following reasons:
0 Local maxima: a local maximum is a peak that is higher than each of its neighboring
states, but lower than the global maximum. Hill-climbing algorithms that reach the
vicinity of a local maximum will be drawn upwards towards the peak, but will then be
stuck with nowhere else to go. Figure 4.10 illusfrates the problem schematically. More
concretely, the state in Figure 4.12(b) is in fact a local maximum (i.e., a local minimum
for the cost h); every move of a single queen makes the situation worse.
0 Ridges: a ridge is shown in Figure 4.13. Ridges result in a sequence of local maxima
that is very difficult for greedy algorithms to navigate.
0 Plateaux: a plateau is an area of the state space landscape where the evaluation function
SHOULDER
is flat. It can be a flat local maximum, from which no uphill exit exists, or a shoulder,
from which it is possible to make progress. (See Figure 4.10.) A hill-climbing search
might be unable to find its way off the plateau.
In each case, the algorithm reaches a point at which no progress is being made. Starting from
a randomly generated %queens state, steepest-ascent hill climbing gets stuck 86% of the time,
solving only 14% of problem instances. It works quickly, taking just 4 steps on average when
it succeeds and 3 when it gets stuck-not bad for a state space with 88 "N 17 million states.
The algorithm in Figure 4.11 halts if it reaches a plateau where the best successor has
the same value as the current state. Might it not be a good idea to keep going-to allow a
SIDEWAYSMOVE sideways move in the hope that the plateau is really a shoulder, as shown in Figure 4. lo? The
answer is usually yes, but we must take care. If we always allow sideways moves when there
are no uphill moves, an infinite loop will occur whenlever the algorithm reaches a flat local
maximum that is not a shoulder. One common solution is to put a limit on the number of con-
secutive sidleways moves allowed. For example, we could allow up to, say, 100 consecutive
sideways moves in the %queens problem. This raises the percentage of problem instances
solved by hill-climbing from 14% to 94%. Success comes at a cost: the algorithm averages
roughly 21 steps for each successful instance and 64 for each failure.
STOCHASTIC HILL
Many variants of hill-climbing have been invented. Stochastic hill climbing chooses at
CLIMBING
random from among the uphill moves; the probability of selection can vary with the steepness
of the uphK1 move. This usually converges more slowly than steepest ascent, but in some
C FIRSCHOIcEH'LL LIMBING
state landscapes it finds better solutions. First-choice hill climbing implements stochastic
hill climbing by generating successors randomly until one is generated that is better than the
current state. This is a good strategy when a state has many (e.g., thousands) of successors.
Exercise 4.16 asks you to investigate.
The hill-climbing algorithms described so far are incomplete-they often fail to find
a goal when one exists because they can get stuck on local maxima. Random-restart hill
Advise:Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.