Adversarial search in Artificial intelligence

adversarial search by evolutionary computation and adversarial inquisitions rethinking the search for the truth pdf free download
HalfoedGibbs Profile Pic
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us. Chapter 2 introduced multiagent environments, in which any given agent will need to con- sider the actions of other agents and how they affect its own welfare. The unpredictability of these other agents can introduce many possible colntingencies into the agent's problem- solving process, as discussed in Chapter 3. The distinction between cooperative and compet- itive multiagent environments was also introduced in Chapter 2. Competitive environments, in which the agents' goals are in conflict, give rise to adversarial search problems-often GAMES known as games. Mathematical game theory, a branch of economics, views any multiagent environment as a game provided that the impact of each agent on the others is "significant," regardless of n whether the agents are cooperative or competitive.' In AI, .'games are usually of a rather specialized kind-what game theorists call deterministic, turn-taking, two-player, zero-sum ZERO-SUM GAMES games of perfect information. In our terminology, this means deterministic, fully observable PERFECT environments in which there are two agents whose actions must alternate and in which the INFORMATION utility values at the end of the game are always equal and opposite. For example, if one player wins a game of chess (+I), the other player necessarily loses (-1). It is this opposition between the agents' utility functions that makes the situation adversarial. We will consider multiplayer games, non-zero-sum games, and stochastic games briefly in this chapter, but will delay discussion of game theory proper until Chapter 17. Games have engaged the intellectual faculties of humans-sometimes to an alarming degree-for as long as civilization has existed. For A1 researchers, the abstract nature of games makes them an appealing subject for study. The state of a game is easy to represent, and agents are usually restricted to a small number of actions whose outcomes are defined by Environments with very many agents are best viewed as econonies rather than games. 162 Chapter 6. Adversarial Search precise rules. Physical games, such as croquet and ice hockey, have much more complicated descriptions, a much larger range of possible actions, and rather imprecise rules defining the legality of actions. With the exception of robot soccer, these physical games have not attracted much interest in the A1 community. Game playing was one of the first tasks undertaken in AI. By 1950, almost as soon as computers became programmable, chess had been tackled by Konrad Zuse (the inventor of the first programmable computer and the first programming language), by Claude Shannon (the inventor of information theory), by Norbert Wiener (the creator of modern control theory), and by Alan Turing. Since then, there has been steady progress in the standard of play, to the point that machines have surpassed humans in checkers and Othello, have defeated human champions (although not every time) in chess and backgammon, and are competitive in many other games. The main exception is Go, in which computers perform at the amateur level. Games, unlike most of the toy problems studied in Chapter 3, are interesting because they are too hard to solve. For example, chess has an average branching factor of about 35, and games often go to 50 moves by each player, so the search tree has about 35 loo or 10 154 nodes (although the search graph has "only" about 10 40 distinct nodes). Games, like the real world, therefore require the ability to make some decision even when calculating the optimal decision is infeasible. Games also penalize inefficiency severely. Whereas an implementation of A search that is half as efficient will simply cost twice as much to run to completion, a chess program that is half as efficient in using its available time probably will be beaten into the ground, other things being equal. Game-playing research has therefore spawned a number of interesting ideas on how to make the best possible use of time. We begin with a definition of the optimal move and an algorithm for finding it. We then look at techniques for choosing a good move when time is limited. Pruning allows us to ignore portions of the search tree that make no difference to the final choice, and heuristic evaluation functions allow us to approximate the true utility of a state without doing a com- plete search. Section 6.5 discusses games such as backgammon that include an element of IMPERFECT chance; we also discuss bridge, which includes elements of imperfect information because INFORMATION not all cards are visible to each player. Finally, we look at how state-of-the-art game-playing programs fare against human opposition and at directions for future developments. We will consider games with two players, whom we will call MAX and MIN for reasons that will soon become obvious. MAX moves first, and then they take turns moving until the game is over. At the end of the game, points are awarded to the winning player and penalties are given to the loser. A game can be formally defined as a kind of search problem with the following components: The initial state, which includes the board position and identifies the player to move. A successor function, which returns a list of (move, state) pairs, each indicating a legal move and the resulting state. Section 6.2. Optimal Decisions in Games 163 TERMINALTEST a A terminal test, which determines when the galme is over. States where the game has ended are called terminal states. a A utility function (also called an objective function or payoff function), which gives a numeric value for the terminal states. In chess, the outcome is a win, loss, or draw, with values +1, -1, or 0. Some games have a wider ,variety of possible outcomes; the payoffs in backgammon range from +I92 to -192. This chapter deals mainly with zero-sum games, although we will briefly mention non-zero-sum games. GAME TREE The initial state and the legal moves for each side define the game tree for the game. Fig- ure 6.1 shows part of the game tree for tic-tac-toe (noughts and crosses). From the initial state, MAX has nine possible moves. Play alternates between MAX'S placing an x and MIN'S placing an o until we reach leaf nodes corresponding to ter.mina1 states such that one player has three in a row or all the squares are filled. The number on each leaf node indicates the utility value of the terminal state from the point of view of MAX; high values are assumed to be good for MAX and bad for MIN (which is how the players get their names). It is MAX'S job to use the search tree (particularly the utility of terminal states) to determine the best move. Optimal strategies In a normal search problem, the optimal solution would be a sequence of moves leading to a goal state-a terminal state that is a win. In a game, on the other hand, MIN has something STRATEGY to say about it. MAX therefore must find a contingent strategy, which specifies MAX'S move in the initial state, then MAX'S moves in the states reslting from every possible response by MIN, then MAX'S moves in the states resulting from every possible response by MIN to those moves, and so on. Roughly speaking, an optimal strategy leads to outcomes at least as good as any other strategy when one is playing an infallible oppoinent. We will begin by showing how to find this optimal strategy, even though it should be infeasible for MAX to compute it for games more complex than tic-tac-toe. Even a simple game like tic-tac-toe is too complex for us to draw the entire game tree, so we will switch to the trivial game in Figure 6.2. The possible moves for MAX at the root node are labeled al, a2, and as. The possible replies to a1 for MIN are bl, ba, bs, and so on. This particular game ends after one move each by MAX and MIN. (In game parlance, we say PLY that this tree is one move deep, consisting of two half-moves, each of which is called a ply.) The utilities of the terminal states in this game range from 2 lo 14. Given a game tree, the optimal strategy can be determined by examining the minimax MINIMAXVALUE value of each node, which we write as MINIMAX- VALUE(). The minimax value of a node is the utility (for MAX) of being in the corresponding state, assuming that both players play optimally from there to the end of the game. Obviously, the minimax value of a terminal state is just its utility. Furthermore, given a choice, MAX will prefer to move to a state of maximum value, whereas MIN prefers a state of minirnum value. So we have the following: MINIMAX- VALUE() = UTILITY() if n is a terminal state maxsEsuccessors(n) MINIMAX-VALUE(S) if n is a MAX node rninsESuccessors(n) MINIMAX-VALUE(S) if n is a MIN node. 164 Chapter 6. Adversarial Search - - TERMINAL Utility - 1 0 +1 Figure 6.1 A (partial) search tree for the game of tic-tac-toe. The top node is the initial ( state, and MAX moves first, placing an X in an empty square. We show part of the search tree, giving alternating moves by MIN (0) and MAX, until we eventually reach terminal states, which can be assigned utilities according to the rules of the game. I MAX MLN 3 12 8 246 14 5 2 Figure 6.2 A two-ply game tree. The A nodes are "MAX nodes," in which it is MAX'S turn to move, and the D nodes are "MIN nodes." The terminal nodes show the utility values for MAX; the other nodes are labeled with their minimax values. MAX'S best move at the root is al, because it leads to the successor with the highest minimax value, and MIN'S best reply is bl, because it leads to the successor with the lowest minimax value. Let us apply these definitions to the game tree in Figure 6.2. The terminal nodes on the bottom level are already labeled with their utility values. The first MIN node, labeled B, has three successors with values 3, 12, and 8, so its minimax value is 3. Similarly, the other two MIN nodes have minimax value 2. The root node is a MAX node; its successors have minimax Section 6.2. Optimal Decisions in Games MINIMAX DECISION values 3, 2, and 2; so it has a minimax value of 3. We can also identify the minimax decision at the root: action a1 is the optimal choice for MAX because it leads to the successor with the highest minimax value. This definition of optimal play for MAX assumes that MIN also plays optimally-it maximizes the worst-case outcome for MAX. What if lH1N does not play optimally? Then it is easy to show (Exercise 6.2) that MAX will do even better. There may be other strategies against suboptimal opponents that do better than the minimax strategy; but these strategies necessarily do worse against optimal opponents. The minimax algorithm MINIMAX ALGORITHM The minimax algorithm (Figure 6.3) computes the minimax decision from the current state. It uses a simple recursive computation of the minimax values of each successor state, directly implementing the defining equations. The recursion proceecls all the way down to the leaves BACKED UP of the tree, and then the minimax values are backed up through the tree as the recursion unwinds. For example, in Figure 6.2, the algorithm first recurses down to the three bottom- left nodes, and uses the UTILITY function on them to discover that their values are 3, 12, and 8 respectively. Then it takes the minimum of these values, 3, and returns it as the backed-up value of node B. A similar process gives the backed up values of 2 for C and 2 for D. Finally, we take the maximum of 3,2, and 2 to get the backed-up value of 3 for the root node. The minimax algorithm performs a complete depth-first exploration of the game tree. If the maximum depth of the tree is m, and there are b legal moves at each point, then the time complexity of the minimax algorithm is O(b m). The space complexity is O(bm) for an algorithm that generates all successors at once, or O(m) for an algorithm that generates successors one at a time (see page 76). For real games, of course, the time cost is totally impractical, but this algorithm serves as the basis for the mathematical analysis of games and for more practical algorithms. Optimal decisions in multiplayer games Many popular games allow more than two players. Let us examine how to extend the minimax idea to multiplayer games. This is straightforward from the technical viewpoint, but raises some interesting new conceptual issues. First, we need to replace the single value for each nocle with a vector of values. For example, in a three-player game with players A, B, and C, a vector (vA, v, vc) is associated with each node. For terminal states, this vector gives the utility of the state from each player's viewpoint. (In two-player, zero-sum games, the two-element kector can be reduced to a single value because the values are always opposite.) The simplest way to implement this is to have the UTILITY function return a vector of utilities. Now we have to consider nonterminal states. Consider the node marked X in the game tree shown in Figure 6.4. In that state, player C chooses what to do. The two choices lead to terminal states with utility vectors (21A = 1, vg = 2, v = 6) and (vA = 4, va = 2, vc = 3). Since 6 is bigger than 3, C should choose the first move. This rneans that if state X is reached, subsequent play will lead to a terminal state with utilities (vll = 1, v = 2, vc = 6). Hence, 166 Chapter 6. Adversarial Search function MINIMAX-DECISION() returns an action inputs: state, current state in game v +- MAX-VALUE(SU) return the action in SUCCESSORS() with value v function MAx-VLlJE(state) returns a utility value if TERMINAL-TEST() then return UTILITY(SU) v + -00 for a, s in Succssos(state) do v t MAX(V, MIN-VALUE(S)) return v function MIN-ALUE(S) returns a utility value if TERMINAL-TEST() then return UTILITY(SU) V+OO for a, s in Succssos(state) do v t MIN(V, MAX-VALUE(S)) return v An algorithm for calculating minimax decisions. It returns the action corre- Figure 6.3 sponding to the best possible move, that is, the move that leads to the outcome with the best utility, under the assumption that the opponent plays to minimize utility. The functions MAX-VALUE and MIN-VALUE go through the whole game tree, all the way to the leaves, to determine the backed-up value of a state. to move A (1,2,6) (4,2,3) (6, 1,2) (7,4, 1) 5 1 1 (1,5,2) (7,7, 1) (5,4,5) The first three ply of a game tree with three players (A, B, C). Each node is Figure 6.4 labeled with values from the viewpoint of each player. The best move is marked at the root. the backed-up value of X is this vector. In general, the backed-up value of a node n is the utility vector of whichever successor has the highest value for the player choosing at n. Anyone who plays multiplayer games, such as ilomac, quickly becomes aware that there is a lot more going on than in two-player games. Multiplayer games usually involve ALLIANCES alliances, whether formal or informal, among the players. Alliances are made and broken Section 6.3. Alpha-Beta Pruning 167 as the game proceeds. How are we to understand such behavior? Are alliances a natural consequence of optimal strategies for each player in a multiplayer game? It turns out that they can be. For example suppose A and B are in weak positions and C is in a stronger position. Then it is often optimal for both A and B to attack C rather than each other, lest C destroy each of them individually. In this way, colKaboration emerges from purely selfish behavior. Of course, as soon as C weakens under the joint onslaught, the alliance loses its value, and either A or B could violate the agreement. In some cases, explicit alliances merely make concrete what would have happened anyway. In other cases there is a social stigma to breaking an alliance, so players must balance the immediate advantage of breaking an alliance against the long-term disadvantage of being perceived as untrustworthy. See Section 17.6 for more on these complications. If the game is not zero-sum, then collaboration can also occur with just two players. Suppose, for example, that there is a terminal state with utilities (vA = 1000, v = 1000), and that 1000 is the highest possible utility for each player. Then the optimal strategy is for both players to do everything possible to reach this statethat is, the players will automatically cooperate to achieve a mutually desirable goal. The problem with minimax search is that the numbeir of game states it has to examine is exponential in the number of moves. Unfortunately we can't eliminate the exponent, but we can effectively cut it in half. The trick is that it is possible to compute the correct minimax decision without looking at every node in the game tree. That is, we can borrow the idea of pruning from Chapter 4 in order to eliminate large parts of the tree from consideration. ALPHA-BETA The particular technique we will examine is called alpha-beta pruning. When applied to a PRUNING standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision. Consider again the two-ply game tree from Figure 6.2. Let's go through the calculation of the optimal decision once more, this time paying careful attention to what we know at each point in the process. The steps are explained in Figure 6.5. The outcome is that we can identify the minimax decision without ever evaluating two of the leaf nodes. Another way to look at this is as a simplification of the formula for MINIMAX-VALUE. Let the two unevaluated successors of node C in Figure 6.5 have values x and y and let z be the minimum of x and y. The value of the root node is given by MINIMAX-VALUE(OO) = max(min(3,12,8), min(2, s, y), min(l4,5,2)) = max(3, min(2, x, y), 2) = max(3, z, 2) where ;s 5 2 - - 3. In other words, the value of the root and hence the minimax decision are independent of the values of the pruned leaves x and y. 168 Chapter 6. Adversarial Search - - Figure 6.5 Stages in the calculation of the optimal decision for the game tree in Figure 6.2. At each point, we show the range of possible values for each node. (a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3. (b) The second leaf below B has a value of 12; MIN would avoid this move, so the value of B is still at most 3. (c) The third leaf below B has a value of 8; we have seen all B's successors, so the value of B is exactly 3. Now, we can infer that the value of the root is at least 3, because MAX has a choice worth 3 at the root. (d) The first leaf below C has the value 2. Hence, C, which is a MIN node, has a value of at most 2. But we know that B is worth 3, so MAX would never choose C. Therefore, there is no point in loohng at the other successors of C. This is an example of alpha-beta pruning. (e) The first leaf below D has the value 14, so D is worth at most 14. This is still higher than MAX'S best alternative (i.e., 3), so we need to keep exploring D's successors. Notice also that we now have bounds on all of the successors of the root, so the root's value is also at most 14. (f) The second successor of D is worth 5, so again we need to keep exploring. The third successor is worth 2, so now D is worth exactly 2. MAX'S decision at the root is to move to B, giving a value of 3. Alpha-beta pruning can be applied to trees of any depth, and it is often possible to prune entire subtrees rather than just leaves. The general principle is this: consider a node n somewhere in the tree (see Figure 6.6), such that Player has a choice of moving to that node. If Player has a better choice m either at the parent node of n or at any choice point further up, n will never be reached in actual play. So once we have found out enough about n (by then examining some of its descendants) to reach this conclusion, we can prune it. Remember that minimax search is depth-first, so at any one time we just have to con- sider the nodes along a single path in the tree. Alpha-beta pruning gets its name from the Section 6.3. Alpha-Beta Pruning 169 Figure 6.6 Alpha-beta pruning: the general case. If m is lbetter than n for Player, we will never get to n in play. following two parameters that describe bounds on the backed. -up values that appear anywhere along the path: a = the value of the best (i.e., highest-value) choice we have found so far at any choice point along the path for MAX. p = the value of the best (i.e., lowest-value) choice we have found so far at any choice point along the path for MIN. Alpha-beta search updates the values of a and /3 as it goes along and prunes the remaining branches at a node (i.e., terminates the recursive call) as soon as the value of the current node is known to be worse than the current a or p value for MAX or MIN, respectively. The complete algorithm is given in Figure 6.7. We encourage the reader to trace its behavior when applied to the tree in Figure 6.5. The effectiveness of alpha-beta pruning is highly dependent on the order in which the successors are examined. For example, in Figure 6.5(e) and (0, we could not prune any successors of D at all because the worst successors (from the point of view of MIN) were generated first. If the third successor had been generated first, we would have been able to prune the other two. This suggests that it might be worthwhile to try to examine first the successors that are likely to be best. If we assume that this can be done,2 then it turns out that alpha-beta needs to examine only 0(bmi2) nodes to pick the best move, instead of O(bm) for minimax. This means that the effective branching factor becomes & instead of bfor chess, 6 instead of 35. Put another way, alpha-beta can look ahead roughly twice as far as minimax in the same amount of time. If successors are examined in random order rather than best-first, the total number of nodes examined will be roughly (b/) for moderate b. For chess, a fairly simple ordering function (such as trying captures first, then threats, then forward moves, and then backward moves) gets you to within about a factor of 2 of the 0(bml2) result. Adding dynamic Obviously, it cannot be done perfectly; otherwise the ordering function could be used to play a perfect game 170 Chapter 6. Adversarial Search function ALPHA-BETA-SEARCH() returns an action inputs: state, current state in game ' u +- MAX-ALUE(S, -a, f oo) return the action in SUCCESSORS(state) with value v function MAX-VALUE(, a, /3) returns a utility value inputs: state, current state in game a, the value of the best alternative for MA X along the path to state p, the value of the best alternative for MIN along the path to state if TERMINAL-TEST() then return UTIlT(state) vt-00 for a, s in SUCCESSORS() do u +- MAX(U, MIN-VALUE(S, a, P)) if u 2 /3 then return v a t MAx(a, v) return u function MIN-ALUE(S, a, p) returns a utility value inputs: state, current state in game a, the value of the best alternative for MAX along the path to state p, the value of the best alternative for MIN along the path to state if TERMINAL-TEST() then return uILI(state) v+-+cc for a, s in SUCCESSOR() do v t MIN(U, MAX-VALUE(% a, P)) if v 5 a: then return v P +- MIN(P, u) return v The alpha-beta search algorithm. Notice that these routines are the same as Figure 6.7 the MINIMAX routines in Figure 6.3, except for the two lines in each of MIN-VALUE and MAX-VALUE that maintain a and /3 (and the bookkeeping to pass these parameters along). -ordering schemes, such as trying first the moves that were found to be best last time, move brings us quite close to the theoretical limit. In Chapter 3, we noted that repeated states in the search tree can cause an exponential increase in search cost. In games, repeated states occur frequently because of transposi- TRANSPOSITIONS tions-different permutations of the move sequence that end up in the same position. For example, if White has one move a1 that can be answered by Black with bl and an unre- lated move a2 on the other side of the board that can be answered by b2, then the sequences al, bl , a2 , b2 and al , b2, az, bl both end up in the same position (as do the permutations beginning with a2). It is worthwhile to store the evaluation of this position in a hash table the first time it is encountered, so that we don't have to recompute it on subsequent occurrences. Section 6.4. Imperfect, Real-Time Decisions 17 1 T TRANsPoslTloN ABLE The hash table of previously seen positions is traditionally (called a transposition table; it is essentially identical to the closed list in GRAPH-SEARCH (page 83). Using a transposition table can have a dramatic effect, sometimes as much as doubling the reachable search depth in chess. On the other hand, if we are evaluating a million nodes per second, it is not practical to keep all of them in the transposition table. Various strategies have been used to choose the most valuable ones. -beta algo- The minimax algorithm generates the entire game search space, whereas the alpha rithm allows us to prune large parts of it. However, alpha-beta still has to search all the way to terminal states for at least a portion of the search space. This depth is usually not practical, because moves must be made in a reasonable amount of tiime-typically a few minutes at most. Shannon's 1950 paper, Programming a computer for playing chess, proposed instead that programs should cut off the search earlier and apply a heuristic evaluation function to states in the search, effectively turning nonterminal nodes into terminal leaves. In other words, the suggestion is to alter minimax or alpha-beta in two ways: the utility function is replaced by a heuristic evaluation function EVAL, which gives an estimate of the position's CUTOFF TEST utility, and the terminal test is replaced by a cutoff test that decides when to apply EVAL. Evaluation functions An evaluation function returns an estimate of the expected utility of the game from a given position, just as the heuristic functions of Chapter 4 return an estimate of the distance to Shannon proposed it. For centuries, the goal. The idea of an estimator was not new when chess players (and aficionados of other games) have developed ways of judging the value of a position, because humans are even more limited in the amount of search they can do than are computer programs. It should be clear that the performance of a game-playing program is dependent on the quality of its evaluation function. An inaccurate evaluation function will guide an agent toward positions that turn out to be Lost. How exactly do we design good evaluation functions? First, the evaluation function should order the terminal states in the same way as the true utility function; otherwise, an agent using it miglht select suboptimal moves even if it can see ahead all the way to the end of the game. Second, the computation must not take too long (The evaluation function could call MINIMAX-DECISION as a subroutine and calculate the exact value of the position, but that would defeat the whole purpose: to save time.) Third, for nonterminal states, the evaluation function should be strongly correlated with the actual chances of winning. One might well wonder about the phrase "chances of winning." After all, chess is not a game of chance: we know the current state with certainty, and there are no dice involved. But if the search must be cut off at nonterminal states, then the algorithm will necessarily be uncertain about the final outcomes of those states. This type of uncertainty is induced by 172 Chapter 6. Adversarial Search computational, rather than informational, limitations. Given the limited amount of computa- tion that the evaluation function is allowed to do for a given state, the best it can do is make a guess about the final outcome. Let us make this idea more concrete. Most evaluation functions work by calculating FEATURES various features of the state-for example, the number of pawns possessed by each side in a game of chess. The features, taken together, define various categories or equivalence classes of states: the states in each category have the same values for all the features. Any given category, generally speaking, will contain some states that lead to wins, some that lead to draws, and some that lead to losses. The evaluation function cannot know which states are which, but it can return a single value that reflects the proportion of states with each outcome. For example, suppose our experience suggests that 72% of the states encountered in the category lead to a win (utility +I); 20% to a loss (-I), and 8% to a draw (0). Then a EXPECTEDVALUE reasonable evaluation for states in the category is the weighted average or expected value: (0.72 x +1) + (0.20 x -1) + (0.08 x 0) = 0.52. In principle, the expected value can be determined for each category, resulting in an evaluation function that works for any state. As with terminal states, the evaluation function need not return actual expected values, as long as the ordering of the states is the same. In practice, this kind of analysis requires too many categories and hence too much experience to estimate all the probabilities of winning. Instead, most evaluation functions compute separate numerical contributions from each feature and then combine them to find MATERIALVALUE the total value. For example, introductory chess books give an approximate material value for each piece: each pawn is worth 1, a knight or bishop is worth 3, a rook 5, and the queen 9. Other features such as "good pawn structure" and "king safety" might be worth half a pawn, say. These feature values are then simply added up to obtain the evaluation of the position. A secure advantage equivalent to a pawn gives a substantial likelihood of winning, and a secure advantage equivalent to three pawns should give almost certain victory, as illustrated in Figure 6.8(a). Mathematically, this kind of evaluation function is called a weighted linear function, because it can be expressed as FUNCTION n EVAL(S) = WII(S) + WZZ(S) + . . . + wnfn(s) = Ew2f,(s) , 2=1 where each w, is a weight and each f, is a feature of the position. For chess, the f, could be the numbers of each kind of piece on the board, and the w, could be the values of the pieces (1 for pawn, 3 for bishop, etc.). Adding up the values of features seems like a reasonable thing to do, but in fact it involves a very strong assumption: that the contribution of each feature is independent of the values of the other features. For example, assigning the value 3 to a bishop ignores the fact that bishops are more powerful in the endgame, when they have a lot of space to maneuver. For this reason, current programs for chess and other games also use nonlinear combinations of features. For example, a pair of bishops might be worth slightly more than twice the value of a single bishop, and a bishop is worth more in the endgame than at the beginning. The astute reader will have noticed that the features and weights are not part of the rules of chess They come from centuries of human chess -playing experience. Given the Section 6.4. Imperfect, Real-Time Decisions 173 (a) White to move (b) White to move Two slightly different chess positions. In (a), black has an advantage of a Figure 6.8 knight and two pawns and will win the game. In (b), black will lose after white captures the linear form of the evaluation, the features and weights result in the best approximation to the true ordering of states by value. In particular, experience suggests that a secure material advantage of more than one point will probably win the game, all other things being equal; a three-point advantage is sufficient for near-certain victory. In games where this kind of experience is not available, the weights of the evaluation function can be estimated by the machine learning techniques of Chapter 18. Reassuringly, applying these techniques to chess has confirmed that a bishop is indeed worth about three pawns. Cutting off search The next step is to modify ALPHA-BETA-SEARCH so that it will call the heuristic EVAL function when it is appropriate to cut off the search. In terms of implementation, we replace the two lines in Figure 6.7 that mention TERMINAL-TIEST with the following line: if cu0FF-TEST(state, depth) then return EVA,L(state) We also must arrange for some bookkeeping so that the current depth is incremented on each recursive call. The most straightforward approach to controlling the amount of search is to set a fixed depth limit, so that CuolT-Ts(state, depth) returns true for all depth greater than some fixed depth d. (It must also return true for all terminal states, just as TERMINAL-TEST did.) The depth d is chosen so that the amount of time used will not exceed what the rules of the game allow. A more robust approach is to apply iterative deepening, as defined in Chapter 3. When time runs out, the program returns the move selected by the deepest completed search. How- ever, these approaches can lead to errors due to the approximate nature of the evaluation function. Consider again the simple evaluation function for chess based on material advan- tage. Suppose the program searches to the depth limit, reaching the position in Figure 6.8(b), 174 Chapter 6. Adversarial Search where Black is ahead by a knight and two pawns. It would report this as the heuristic value of the state, thereby declaring that the state will likely lead to a win by Black. But White's next move captures Black's queen with no compensation. Hence, the position is really won for White, but this can be seen only by looking ahead one more ply. Obviously, a more sophisticated cutoff test is needed. The evaluation function should be QUIESCENCE applied only to positions that are quiescent-that is, unlikely to exhibit wild swings in value in the near future. In chess, for example, positions in which favorable captures can be made are not quiescent for an evaluation function that just counts material. Nonquiescent positions can be expanded further until quiescent positions are reached. This extra search is called a QUIESCENCE quiescence search; sometimes it is restricted to consider only certain types of moves, such SEARCH as capture moves, that will quickly resolve the uncertainties in the position. HORIZON EFFECT The horizon effect is more difficult to eliminate. It arises when the program is facing a move by the opponent that causes serious damage and is ultimately unavoidable. Consider the chess game in Figure 6.9. Black is ahead in material, but if White can advance its pawn from the seventh row to the eighth, the pawn will become a queen and create an easy win for White. Black can forestall this outcome for 14 ply by checking White with the rook, but inevitably the pawn will become a queen. The problem with fixed-depth search is that it believes that these stalling moves have avoided the queening move-we say that the stalling moves push the inevitable queening move "over the search horizon" to a place where it cannot be detected. As hardware improvements lead to deeper searches, one expects that the horizon effect will occur less frequently-very long delaying sequences are quite rare. The use of singular SINGULAR extensions has also been quite effective in avoiding the horizon effect without adding too EXTENSIONS much search cost. A singular extension is a move that is "clearly better" than all other moves in a given position. A singular-extension search can go beyond the normal depth limit without incurring much cost because its branching factor is 1. (Quiescence search can be thought of as a variant of singular extensions.) In Figure 6.9, a singular extension search will find the checking moves and white's king moves can eventual queening move, provided that black's be identified as "clearly better" than the alternatives. So far we have talked about cutting off search at a certain level and about doing alpha- beta pruning that provably has no effect on the result. It is also possible to do forward pruning, meaning that some moves at a given node are pruned immediately without further FORWARD PRUNING consideration. Clearly, most humans playing chess only consider a few moves from each position (at least consciously). Unfortunately, the approach is rather dangerous because there is no guarantee that the best move will not be pruned away. This can be disastrous if applied near the root, because every so often the program will miss some "obvious" moves. Forward pruning can be used safely in special situations-for example, when two moves are symmetric or otherwise equivalent, only one of them need be considered-or for nodes that are deep in the search tree. Combining all the techniques described here results in a program that can play cred- itable chess (or other games). Let us assume we have implemented an evaluation function for chess, a reasonable cutoff test with a quiescence search, and a large transposition table. Let us also assume that, after months of tedious bit-bashing, we can generate and evaluate Section 6.5. Games That Include an Element of Chance 175 Black to move Figure 6.9 The horizon effect. A series of checks by the black rook forces the inevitable queening move by white "over the horizon" and makes this position look like a win for black, when it is really a win for white. around a million nodes per second on the latest PC, allowing us to search roughly 200 million nodes per move under standard time controls (three minutes per move). The branching factor for chess is about 35, on average, and 355 is about 50 imillion, so if we used minimax search we could look ahead only about five plies. Though not incompetent, such a program can be fooled easily by an average human chess player, who can occasionally plan six or eight plies ahead. With alpha-beta search we get to about 10 ply, which results in an expert level of play. Section 6.7 describes additional pruning techniques that can extend the effective search depth to roughly 14 plies. To reach grandmaster status we would need an extensively tuned evaluation function and a large database of optimal opening and endgame moves. It wouldn't hurt to have a supercomputer to run the program on. In real life, there are many unpredictable external events that put us into unforeseen situations. Many games mirror this unpredictability by including a random element, such as the throwing of dice. In this way, they take us a step nearer reality, and it is worthwhile to see how this affects the decision-making process. Backgammon is a typical game that combines luck and skill. Dice are rolled at the beginning of a player's turn to determine the legal moves. In the backgammon position of Figure 6.10, for example, white has rolled a 6-5, and has four possible moves. Although White knows what his or her own legal moves are, White does not know what Black is going to roll and thus does not know what Black's legal moves will be. That means White cannot construct a standard game tree of the sort we saw in chess and tic-tac-toe. A Chapter 6. Adversarial Search 176 0 25 A typical backgammon position. The goal of the game is to move all one's Figure 6.10 pieces off the board. White moves clockwise toward 25, and black moves counterclockwise toward 0. A piece can move to any position unless there are multiple opponent pieces there; if there is one opponent, it is captured and must start over. In the position shown, White has rolled 6- 5 and must choose among four legal moves: (5-10,5-1 I), (5-11,19-24), (5-10,1- 16), and (5-11,ll-16). TERMINAL 2 -I 1 -1 1 Schematic game tree for a backgammon position. Figure 6.11 Section 6.5. Games That Include an Element of Chance 177 CHANCENODES game tree in backgammon must include chance nodes in addition to MAX and MIN nodes. Chance nodes are shown as circles in Figure 6.11. The branches leading from each chance node denote the possible dice rolls, and each is labeled with the roll and the chance that it will occur. There are 36 ways to roll two dice, each equally likely; but because a 6-5 is the same as a 5-6, there are only 21 distinct rolls. The six doubles (1-1 through 6-6) have a 1/36 chance of coming up, the other 15 distinct rolls a 1/18 chance each. The next step is to understand how to make correct decisions. Obviously, we still want to pick the move that leads to the best position. However, the resulting positions do not have definite minimax values. Instead, we can only calculate the expected value, where the expectation is taken over all the possible dice rolls that could occur. This leads us to generalize V EXPECTIMINIMAX ALUE the minimax value for deterministic games to an expectiminimax value for games with chance nodes. Terminal nodes and MAX and MIN nodes (for which the dice roll is known) work exactly the same way as before; chance nodes are evaluated by taking the weighted average of the values resulting from all possible dice rolls, that is, EXPECTIMINIMAX() = UTILITY() if n is a terminal state if n is a MAX node ma() EXPECTIMINIMAX(S) if n is a MIN node minsuccessors(n) EXPECTIMINIMAX(S) sESuccessors(n) P(s) . EXPECTIMINIMA.I:(S) if n is a chance node where the successor function for a chance node n simply augments the state of n with each possible dice roll to produce each successor s and P(s) is the probability that that dice roll occurs. These equations can be backed up recursively all the way to the root of the tree, just as in minimax. We the details of the algorithm as an exercise. Position evaluation in games with chance nodes As with minimax, the obvious approximation to make with expectiminimax is to cut the search off at some point and apply an evaluation function to each leaf. One might think that evaluation functions for games such as backgammon slhould be just like evaluation functions for chess-they just need to give higher scores to better positions. But in fact, the presence of chance nodes means that one has to be more careful about what the evaluation values mean. Figure 6.12 shows what happens: with an evaluation functilon that assigns values I, 2, 3, 41 to the leaves, move A1 is best; with values I, 210, 30, 4001, move A2 is best. Hence, the program behaves totally differently if we make a change in the scale of some evaluation values It turns out that, to avoid this sensitivity, the evaluaiion function must be a positive linear transformation of the probability of winning from a polsition (or, more generally, of the expected utility of the position). This is an important and general property of situations in which uncertainty is involved, and we discuss it further in Chapter 16. Complexity of expectiminimax If the program knew in advance all the dice rolls that would occur for the rest of the game, solving a game with dice would be just like solving a game without dice, which minimax 178 Chapter 6. Adversarial Search MAX CHANCE MIN Figure 6.12 An order-preserving transformation on leaf values changes the best move. 1 m does in O(b ) time. Because expectiminimax is also considering all the possible dice-roll m m sequences, it will take O(b n ), where n is the number of distinct rolls. Even if the search depth is limited to some small depth d, the extra cost compared with that of minimax makes it unrealistic to consider looking ahead very far in most games of chance. In backgammon n is 21 and b is usually around 20, but in some situations can be as high as 4000 for dice rolls that are doubles. Three plies is probably all we could manage. -beta is that Another way to think about the problem is this: the advantage of alpha it ignores future developments that just are not going to happen, given best play. Thus, it concentrates on likely occurrences. In games with dice, there are no likely sequences of moves, because for those moves to take place, the dice would first have to come out the right way to make them legal. This is a general problem whenever uncertainty enters the picture: the possibilities are multiplied enormously, and forming detailed plans of action becomes pointless, because the world probably will not play along. No doubt it will have occurred to the reader that perhaps something like alpha-beta pruning could be applied to game trees with chance nodes. It turns out that it can. The analysis for MIN and MAX nodes is unchanged, but we can also prune chance nodes, using a bit of ingenuity. Consider the chance node C in Figure 6.1 1 and what happens to its value as we examine and evaluate its children. Is it possible to find an upper bound on the value of C before we have looked at all its children? (Recall that this is what alpha-beta needs to prune a node and its subtree.) At first sight, it might seem impossible, because the value of C is the average of its children's values. Until we have looked at all the dice rolls, this average could be anything, because the unexamined children might have any value at all. But if we put bounds on the possible values of the utility function, then we can arrive at bounds for the average. For example, if we say that all utility values are between +3 and -3, then the value of leaf nodes is bounded, and in turn we can place an upper bound on the value of a chance node without looking at all its children. Section 6.5. Games That lnclude an Element of Chance 179 Card games Card games are interesting for many reasons besides tlheir connection with gambling. Among the huge variety of games, we will focus on those in which cards are dealt randomly at the beginning of the game, with each player receiving a hand of cards that is not visible to the other players. Such games include bridge, whist, heats, and some forms of poker. At first sight, it might seem that card games are just like dice games: the cards are dealt randomly and determine the moves available to each player, but all the dice are rolled at the beginning We will pursue this observation further. It will turn out to be quite useful in practice. It is also quite wrong, for interesting reasons. Imagine two players, MAX and MIN, playing some practice hands of four-card two handed bridge with all the cards showing. The hands are as follows, with MAX to play first: MAX: 06 06 498 MIN: v4 42 4,105. Suppose that MAX leads the 4 9. MIN must now follow suit, playing either the 4 10 or the 4 5. MIN plays the 4 10 and wins the trick. MIN goes nexl. and leads the ) 2. MAX has no spades (and so cannot win the trick) and therefore must throw away some card. The obvious choice is the 0 6 because the other two remaining cards are winners. Now, whichever card MIN leads for the next trick, MAX will win both remaining tricks and the game will be tied at two tricks each. It is easy to show, using a suitable vaviant of minimax (Exercise 6.12), that MAX'S lead of the 4 9 is in fact an optimal choice. Now let's modify MIN'S hand, replacing the V 4 with the 0 4: MAX: 0606498 MIN: 04424105, The two cases are entirely symmetric: play will be identical, except that on the second trick MAX will throw away the V 6. Again, the game will be tied at two tricks each and the lead of the 4 9 is an optimal choice. So far, so good. Now let's hide one of MIN's cards: MAX knows that MIN has either the first hand (with the V 4) or the second hand (with the 0 4), but has no idea which. MAX reasons as follows: The ) 9 is an optimal choice against MIN'S first hand and against MIN'S second hand, so it must be optimal now because I know that MIN has one of the two hands. More generally, MAX is using what we might call "averaging over clairvoyancy." The idea is to evaluate a given course of action when there are unseen cards by first computing the minimax value of that action for each possible deal of the cards, and then computing the expected value over all deals using the probability of each deal. If you think ths is reasonable (or if you have no idea because you don't understand bridge), consider the following story: Day 1: Road A leads to a heap of gold pieces; Road B leads to a fork. Take the left fork and you'll find a mound of jewels, but take the right fork and you'll be run over by a bus. Day 2: Road A leads to a heap of gold pieces; Road B leads to a fork. Take the right fork and you'll find a mound of jewels, but take the left fork and you'll be run over by a bus. Day 3: Road A leads to a heap of gold pieces; Road B leads to a fork. Guess correctly and you'll find a mound of jewels, but guess incorrectly and you'll be run over by a bus. 180 Chapter 6. Adversarial Search Obviously, it's not unreasonable to take Road B on the first two days. No sane person, though, would take Road B on Day 3. Yet this is exactly what averaging over clairvoyancy suggests: Road B is optimal in the situations of Day 1 and Day 2; therefore it is optimal on Day 3, because one of the two previous situations must hold. Let us return to the card game: after MAX leads the ) 9, MIN wins with the 4 10. As before, MIN leads the 4 2, and now MAX is at the fork in the road without any instructions. If MAX throws away the V 6 and MIN still has the V 4, the 4 becomes a winner and MAX loses the game. Similarly, If MAX throws away the 0 6 and MIN still has the 0 4, MAX also loses. Therefore, playing the 4 9 first leads to a situation where MAX has a 50% chance of losing. (It would be much better to play the V 6 and the 0 6 first, guaranteeing a tied game.) The lesson to be drawn from all this is that when information is missing, one must consider what informatiorz one will have at each point in the game. The problem with MAX'S algorithm is that it assumes that in each possible deal, play will proceed as if all the cards are visible. As our example shows, this leads MAX to act as if all future uncertainty will be resolved when the time comes. MAX'S algorithm will also never decide to gather information (or provide information to a partner), because within each deal there's no need to do so; yet in games such as bridge, it is often a good idea to play a card that will help one discover things about one's opponent's cards or that will tell one's partner about one's own cards. These kinds of behaviors are generated automatically by an optimal algorithm for games of imperfect information. Such an algorithm searches not in the space of world states (hands of cards), but in the space of belief states (beliefs about who has which cards, with what probabilities). We will be able to explain the algorithm properly in Chapter 17, once we have developed the necessary probabilistic machinery. In that chapter, we will also expand on one final and very important point: in games of imperfect information, it's best to give away as little information to the opponent as possible, and often the best way to do this is to act unpredictably. This is why restaurant hygiene inspectors do random inspection visits. One might say that game playing is to A1 as Grand Prix motor racing is to the car indus- try: state-of-the-art game programs are blindingly fast, incredibly well-tuned machines that incorporate very advanced engineering techniques, but they aren't much use for doing the shopping. Although some researchers believe that game playing is somewhat irrelevant to mainstream AI, it continues to generate both excitement and a steady stream of innovations that have been adopted by the wider community. Chess: In 1957, Herbert Simon predicted that within 10 years computers would beat the CHESS human world champion. Forty years later, the Deep Blue program defeated Garry Kasparov in a six-game exhibition match. Simon was wrong, but only by a factor of 4. Kasparov wrote: The decisive game of the match was Game 2, which left a scar in my memory . . . we saw something that went well beyond our wildest expectations of how well a computer would be able to foresee the long-term positional consequences of its decisions. The machine

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.