Decision theory lecture notes

Combining Beliefs and Desires under Uncertainty, decision theory in expert systems and artificial intelligence and decision making utility theory in artificial intelligence pdf free download
HalfoedGibbs Profile Pic
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
MAKING SIMPLE 16 mcmms In which we see how an agent should make decisions so that it gets what it wants- on average, at least. In this chapter, we return to the idea of utility theory that was introduced in Chapter 13 and -theoretic agent-an agent show how it is combined with probability theory to yield a decision that can make rational decisions based on what it believes and what it wants. Such an agent can make decisions in contexts where uncertainty and conflicting goals leave a logical agent with no way to decide. In effect, a goal-based agent has a binary distinction between good (goal) and bad (non-goal) states, while a decision-theoretic agent has a continuous measure of state quality. Section 16.1 introduces the basic principle of decision theory: the maximization of expected utility. Section 16.2 shows that the behavior of any rational agent can be captured by supposing a utility function that is being maximized. Section 16.3 discusses the nature of utility functions in more detail, and in particular their relation to individual quantities such as money. Section 16.4 shows how to handle utility functions that depend on several quantities. In Section 16.5, we describe the implementation of decision-making systems. In particular, we introduce a formalism called decision networks (also known as influence diagrams) that extends Bayesian networks by incorporating actions and utilities. The remainder of the chapter discusses issues that arise in applications of decision theory to expert systems. COMBINING BELIEFS AND DESIRES UNDER UNCERTAINTY 16.1 In the Port-Royal Logic, written in 1662, the French philosopher Arnauld stated To judge what one must do to obtain a good or avoid an evil, it is necessary to consider not only the good and the evil in itself, but also the probability that it happens or does not happen; and to view geometrically the proportion that all these things have together. Modern texts talk of utility rather than good and evil, but the principle is exactly the same. An agent's preferences between world states are captured by a utility function, which assigns Section 16.1. Combining Beliefs and Desires under Uncertainty 585 a single number to express the desirability of a state. Utilities are combined with outcome probabilities of actions to give an expected utility for each action. We will use the notation U(S) to denote the utility of state S according to the agent that is making the decisions. For now, we will consider states as complete snapshots of the world, similar to the situations of Chapter 10. This will simplify our initial discussions, but it can become rather cumbersome to specify the utility of each possible state separately. In Section 16.4, we will see how states can be decomposed under some circumstances for the purpose of assigning utilities. A nondeterministic action A will have possible outcome states Result, (A), where the index i ranges over the different outcomes. Prior to the execution of A, the agent assigns prob- ability P(Result, (A) I Do (A), E) to each outcome, where E summarizes the agent's available evidence about the world and Do(A) is the proposition that action A is executed in the current EXPECTEDUTILITY state. Then we can calculate the expected utility of the action given the evidence, E'U(A1 E), using the following formula: EU(A1E) = (esult,()lo(), E) U(Result,(A)) . (16.1) 2 EXPECTED The principle of maximum expected utility (MEU) says that a rational agent should choose UTILITY an action that maximizes the agent's expected utility. If we wanted to choose the best se- quence of actions using this equation we would have to enumerate all action sequences and choose the best; this is clearly infeasible for long sequences. Therefore, this chapter will focus on simple decisions (usually a single action) and the next chapter will introduce new techniques for efficiently dealing with action sequences. In a sense, the MEU principle could be seen as defining all of AI. All an intelligent agent has to do is calculate the various quantities, maximize utility over its actions, and away it goes. But this does not mean that the A1 problem is solved by the definition Although the MEU principle defines the right action to take in any decision problem, the computations involved can be prohibitive, and it is sometimes difficult even to formu- late the problem completely. Knowing the initial state of the world requires perception, learning, knowledge representation, and inference. Computing P(Result, (A) 1 Do (A), E) requires a complete causal model of the world and, as we saw in Chapter 14, NP-hard in- ference in Bayesian networks. Computing the utility of each state, U(Result,(il)), often requires searching or planning, because an agent does not know how good a state is until it knows where it can get to from that state. So, decision theory is not a panacea that solves the A1 problem. On the other hand, it does provide a framework mto which we can see where all the components of an A1 system fit. The MEU principle has a clear relation to the ideal of performance measures introduced in Chapter 2. The basic idea is very simple. Consider the environments that could lead to an agent having a given percept history, and consider the different agents that we could design. If an agent maximizes a utility function that correctly reflects the performance measure by which its behavior is being judged, then it will achieve the higrhestpossible performance score if we average over the environments in which the agent cou1,d be placed. This is the central justification for the MEU principle itself. While the claim may seem tautological, it does in fact embody a very important transition from a global, external criterion of rationality-the 5 86 Chapter 16. Making Simple Decisions performance measure over environment histories-to a local, internal criterion involving the maximization of a utility function applied to the next state. ONE-SHOT In this chapter, we will be concerned only with single or one-shot decisions, whereas DECISIONS Chapter 2 defined performance measures over environment histories, which usually involve many decisions. In the next chapter, which covers the case of sequential decisions, we will show how these two views can be reconciled. Intuitively, the principle of Maximum Expected Utility (MEU) seems like a reasonable way to make decisions, but it is by no means obvious that it is the only rational way. After all, why should maximizing the average utility be so special? Why not try to maximize the sum of the cubes of the possible utilities, or try to minimize the worst possible loss? Also, couldn't an agent act rationally just by expressing preferences between states, without giving them numeric values? Finally, why should a utility function with the required properties exist at all? Perhaps a rational agent can have a preference structure that is too complex to be captured by something as simple as a single real number for each state. Constraints on rational preferences These questions can be answered by writing down some constraints on the preferences that a MEU principle can be derived from the rational agent should have and then showing that the constraints. We use the following notation to describe an agent's preferences: A is preferred to B. A + B A - B the agent is indifferent between A and B. A k B the agent prefers A to B or is indifferent between them. Now the obvious question is, what sorts of things are A and B? If the agent's actions are deterministic, then A and B will typically be the concrete, fully specified outcome states LOTTERIES of those actions. In the more general, nondeterministic case, A and B will be lotteries. A lottery is essentially a probability distribution over a set of actual outcomes (the "prizes" of the lottery). A lottery L with possible outcomes C1, . . . , C, that can occur with probabilities PI, . . . , p, is written L= I,CI; 2,C2; ... Pn,Cn . (A lottery with only one outcome can be written either as A or as I, A.) In general, each outcome of a lottery can be either an atomic state or another lottery. The primary issue for utility theory is to understand how preferences between complex lotteries are related to preferences between the underlying states in those lotteries. To do this, we impose reasonable constraints on the preference relation, much as we imposed rationality constraints on degrees of belief in order to obtain the axioms of probabil- ity in Chapter 13. One reasonable constraint is that preference should be transitive: that is, if A + B and B + C, then we would expect that A + C. We argue for transitivity by showing Section 16.2. The Basis of Utility Theory 587 that an agent whose preferences do not respect transitivity lcvould behave irrationally. Sup- pose, for example, that an agent has the nontransitive preferences A + B + C + A, where A, B, and C are goods that can be freely exchanged. If the agent currently has A, then we could offer to trade C for A and some cash. The agent prefers C, and so would be willing to give up some amount of cash to make this trade. We could then offer to trade B for C, extracting more cash, and finally trade A for B. This brings us back where we started from, except that the agent has less money (Figure 16.l(a)). We can keep going around the cycle until the agent has no money at all. Clearly, this case the agent has not acted rationally. is equivalent to Figure 16.1 (a) A cycle of exchanges showing that the nontransitive preferences A + B C A result in irrational behavior. (b) The decoinposability axiom. The following six constraints are known as the axioms of utility theory. They specify the most obvious semantic constraints on preferences and lotteries. ORDERABILITY 0 Orderability: Given any two states, a rational agent miust either prefer one to the other or else rate the two as equally preferable. That is, the agent cannot avoid deciding. As we said on page 474, refusing to bet is like refusing to allow time to pass. (A B) V (B + A) V (A - B) . TRANSITIVITY 0 Transitivity: Given any three states, if an agent pirefers A to B and prefers B to C, then the agent must prefer A to C. (A + B) A (B C) =+ (A + C) . CONTINUITY 0 Continuity: If some state B is between A and C in preference, then there is some proba.bility p for which the rational agent will be indifferent between getting B for sure and the lottery that yields A with probability p and C with probability 1 - p. 588 Chapter 16. Making Simple Decisions SUBSTITUTABILITY 0 Substitutability: If an agent is indifferent between two lotteries, A and B, then the agent is indifferent between two more complex lotteries that are the same except that B is substituted for A in one of them. This holds regardless of the probabilities and the other outcome(s) in the lotteries. A- B + p,A; 1-p,C b,B;1-p,C. MONOTONICITY 0 Monotonicity: Suppose there are two lotteries that have the same two outcomes, A and B. If an agent prefers A to B, then the agent must prefer the lottery that has a higher probability for A (and vice versa). AFB + (pq b,A; l-p,BIkq,A; 1-q,B). DECOMPOSABILITY 0 Decomposability: Compound lotteries can be reduced to simpler ones using the laws of probability. This has been called the "no fun in gambling" rule because it says that two consecutive lotteries can be compressed into a single equivalent lottery, as shown in Figure 16.l(b).' P,A; l-p,q,B; l-q,CIl b,A; (l-p)q,B; (l-p)(l-q),CI. And then there was Utility Notice that the axioms of utility theory do not say anything about utility: They talk only about preferences. Preference is assumed to be a basic property of rational agents. The existence of a utility function follows from the axioms of utility: 1. Utility principle If an agent's preferences obey the axioms of utility, then there exists a real-valued func- tion U that operates on states such that U(A) U(B) if and only if A is preferred to B, and U(A) = U(B) if and only if the agent is indifferent between A and B. U(A) U(B) A F B ; U(A) = U(B) A - B . 2. Maximum Expected Utility principle The utility of a lottery is the sum of the probability of each outcome times the utility of that outcome. U(PI, SI;. ; n, Sn) = Cpiu(Si) . i In other words, once the probabilities and utilities of the possible outcome states are specified, the utility of a compound lottery involving those states is completely determined. Because the outcome of a nondeterministic action is a lottery, this gives us the MEU decision rule from Equation (16.1). It is important to remember that the existence of a utility function that describes an agent's preference behavior does not necessarily mean that the agent is explicitly maximizing that utility function in its own deliberations. As we showed in Chapter 2, rational behavior can be generated in any number of ways, some of which are more efficient than explicit We can account for the enjoyment of gambling by encoding gambling events into the state description; for example, "Have 10 and gambled" could be preferred to "Have 10 and didn't gamble." Section 16.3. Utility Functions 589 utility maximization. By observing a rational agent's preferences, however, it is possible to construct the utility function that represents what it is that the agent's actions are actually trying to achieve. Utility is a function that maps from states to real numbers. Is that all we can say about utility functions? Strictly speaking, that is it. Beyond the constraints listed earlier, an agent can have any preferences it likes. For example, an agent might prefer to have a prime number of dollars in its bank account; in which case, if it had 16 it would give away 3. It might prefer a dented 1973 Ford Pinto to a shiny new Mercedes. Preferences can also inleract: for example, it might only prefer prime numbers of dollars when it owns the Pinto, but when it owns the Mercedes, it might prefer more dollars to less. If all utility functions were as arbitrary as this, however, then utility theory would not be of much help because we would have to observe the agent's preferences in every possible combination of circumstances before being able to make any predictions about its behavior. Fortunately, the preferences of real agents are usually more systematic. Conversely, there are systematic ways of designing utility functions that, when installed in an artificial agent, cause it to generate the kinds of behavior we want. The utility of money Utility theory has its roots in economics, and economics provides one obvious candidate for a utility measure: money (or more specifically, an agent's total net assets). The almost universal exchangeability of money for all kinds of goods arid services suggests that money most people think of economics plays a significant role in human utility functions. (In fact, as the study of money, when actually the root of the word c7cordomy refers to management, and its current emphasis is on the management of choice.) If we restrict our attention to actions that only affect the amount of money that an agent has, then it will usually be the case that the agent prefers more money to less, all other things MONOTONIC being equal. We say that the agent exhibits a monotonic preference for definite amounts of PREFERENCE money. This is not, however, sufficient to guarantee that money behaves as a utility function, because it says nothing about preferences between lotteries involving money. Suppose you have triumphed over the other competitors in a television game show. The host now ofTers you a choice: either you can take the 1,000,000 prize or you can gamble it on the flip of a coin. If the coin comes up heads, you end up with nothing, but if it comes up tails, you get 3,000,000. If you're like most people, you would decline the gamble and pocket the million. Are you being irrational? EXPECTED Assuming you believe that the coin is fair, the expected monetary value (EMV) of the MONETAW VALUE gamble is &(o) -t &(3,000,000) = 1,500,000, and the EMV of taking the orignnal prize is of course 1,000,000, which is less. But that does not necessarily mean that accepting the gamble is a better decision. Suppose we use S, to denote the state of possessing total 590 Chapter 16. Making Simple Decisions wealth n, and that your current wealth is k. Then the expected utilities of the two actions of accepting and declining the gamble are EU(Accept) = u(s) + (k+3,000,000) , EU(Decline) = u(Sk+1,000,000) . To determine what to do, we need to assign utilities to the outcome states. Utility is not directly proportional to monetary value, because the utility-the positive change in lifestyle- for your first million is very high (or so we are told), whereas the utility for an additional million is much smaller. Suppose you assign a utility of 5 to your current financial status (Sk), a 10 to the state Sk+s,ooo,ooo, and an 8 to the state Sk+l,ooo,ooo. Then the rational action would be to decline, because the expected utility of accepting is only 7.5 (less than the 8 for declining). On the other hand, suppose that you happen to have 500,000,000 in the bank already (and appear on game shows just for fun, one assumes). In this case, the gamble is probably acceptable, because the additional benefit of the 503rd million is probably about the same as that of the 501st million. In a pioneering study of actual utility functions, Grayson (1960) found that the utility of money was almost exactly proportional to the logarithm of the amount. (This idea was first suggested by Bernoulli (1738); see Exercise 16.3.) One particular curve, for a certain Mr. Beard, is shown in Figure 16.2(a). The data obtained for Mr. Beard's preferences are consistent with a utility function U(Sk+n) = -263.31 + 22.091og(n + 150,000) for the range between n = -150,000 and n = 800,000. We should not assume that this is the definitive utility function for monetary value, but it is likely that most people have a utility function that is concave for positive wealth. Going into debt is usually considered disastrous, but preferences between different levels of debt can display a reversal of the concavity associated with positive wealth. For example, (a) (b) The utility of money. (a) Empirical data for Mr. Beard over a limited range. Figure 16.2 (b) A typical curve for the full range. Section 16.3. Utility Functions 59 1 someone already 10,000,000 in debt might well accept a gamble on a fair coin with a gain of 10,000,000 for heads and a loss of 20,000,000 for tails.' This yields the S-shaped curve shown in Figure 16.2(b). If we restrict our attention to the positive part of (he curves, where the slope is decreas- ing, then for any lottery L, the utility of being faced with that lottery is less than the utility of being handed the expected monetary value of the lottery as a sure thing: U(L) (SEMVL)) . RISK-AVERSE That is, agents with curves of this shape are risk-averse: they prefer a sure thing with a payoff that is less than the expected monetary value of a gamble. On the other hand, in the RISK -SEEKING "desperate" region at large negative wealth in Figure 16.2(b), the behavior is risk-seeking. CERTAINTY The value an agent will accept in lieu of a lottery is called the certainty equivalent of the EQUIVALENT lottery. Studies have shown that most people will accept about 400 in lieu of a gamble that gives 1000 half the time and 0 the other half-that is, the certainty equivalent of the lottery is 400. The difference between the expected monetary value of a lottery and its certainty INSURANCE equivalent is called the insurance premium. Risk aversion is the basis for the insurance PREMIUM industry, because it means that insurance premiums are positive. People would rather pay a small insurance premium than gamble the price of their house against the chance of a fire. From the insurance company's point of view, the price of the house is very small compared with the firm's total reserves. This means that the insurer's utility curve is approximately linear over such a small region, and the gamble costs the conlpany almost nothing. Notice that for small changes in wealth relative to the current wealth, almost any curve RISK-NEUTRAL will be approximately linear. An agent that has a linear curve is said to be risk-neutral. For gambles with small sums, therefore, we expect risk neutralty. In a sense, this justifies the simplified procedure that proposed small gambles to assess probabilities and to justify the axioms of probability in Chapter 13. Utility scales and utility assessment The axioms of utility do not specify a unique utility fun'ction for an agent, given its preference behavior. For example, we can transform a utility funcition U (S) into U1(S) = kl + k2U(S) , where kl is a constant and k2 is any positive constant. Clearly, this linear transformation leaves the agent's behavior unchanged. In deterministic contexts, where there are states but no lotteries, behavior is unchanged by any morzotonic transformation. For example, we can take the cube root of all the utilities without affecting the preference ordering on actions. Ail agent in a deterministic environment VALUE FUNCTION is said to have a value function or ordinal utility fiinctioa; the function really provides ORDINAL UTILITY just rankings of states rather than meaningful nunlerical values.. We saw this distinction in FUNCTION Chapter 6 for games: evaluation functions in deterministic games such as chess are value functions, whereas evaluation functions in nondeterministic games like backgammon are true utility functions. Such behavior might be called desperate, but it is rational if one is already in a desperate situation. 592 Chauter 16. Malunn Simle Decisions HUMAN JUDGMENT AND FALLIBILITY Decision theory is a normative theory: it describes how a rational agent should act. The application of economic theory would be greatly enhanced if it were also a descriptive theory of actual human decision making. However, there is experimental evidence indicating that people systematically violate the axioms of utility theory. An example is given by the psychologists Tversky and Kahneman (1982), based on an example by the economist Allais (1953). Subjects in this experiment are given a choice between lotteries A and B and then between C and D : A : 80% chance of 4000 C : 20% chance of 4000 B : 100% chance of 3000 D : 25% chance of 3000 . The majority of subjects choose B over A and C over D. But if we assign U(O) = 0, then the first of these choices implies that 0.8U(4000) U(3000), whereas the second choice implies exactly the reverse. In other words, there seems to be no utility function that is consistent with these choices. One possible conclusion is that humans are simply irrational by the standards of our utility axioms. An alternative view is that the analysis does not take into account regret-the feeling that humans know they would experience if they gave up a certain reward (B) for an 80% chance at a higher reward and then lost. In other words, if A is chosen, there is a 20% chance of getting no money and feeling like u complete idiot. Kahneman and Tversky go on to develop a descriptive theory that explains how people are risk-averse with high-probability events, but are willing to take more risks with unlikely payoffs. The connection between this finding and A1 is that the choices our agents can make are only as good as the preferences they are based on. If our human informants insist on contradictory preference judgments, there is nothing our agent can do to be consistent with them. Fortunately, preference judgments made by humans are often open to revision in the light of further consideration. In early work at Harvard Business School on assessing the utility of money, Keeney and Raiffa (1976, p. 210) found the following: A great deal of empirical investigation has shown that there is a serious defi- ciency in the assessment protocol. Subjects tend to be too risk-averse in the small and therefore . . . the fitted utility functions exhibit unacceptably large risk premiums for lotteries with a large spread. . . . Most of the subjects, how- ever, can reconcile their inconsistencies and feel that they have learned an important lesson about how they want to behave. As a consequence, some subjects cancel their automobile collision insurance and take out more term insurance on their lives. Even today, human (ir)rationality is the subject of intensive investigation. Section 16.4. Multiattribute Utility Functions 593 One procedure for assessing utilities is to establish a scale with a "best possible prize" NORMALIZED at U(S) = u and a "worst possible catastrophe" at U(S) == u. Normalized utilities use UTILITIES a scale with ul = 0 and u = 1. Utilities of intermediate cutcomes are assessed by asking the agent to indicate a preference between the given outcome state S and a standard lottery STANDARDLOTTERY b, UT; (1 - p), uI. The probability p is adjusted until the agent is indifferent between S and the standard lottery. Assuming normalized utilities, the utility of S is given by p. In medical, transportation, and environmental decision problems, among others, peo- ple's lives are at stake. In such cases, ul is the value assigned to immediate death (or perhaps many deaths). Although nobody feels comfortable with putting a value on human life, it is a fact that tradeoffs are made all the time. Aircraft are given a complete overhaul at inter- vals determined by trips and miles flown, rather than after every trip. Car bodies are made with relatively thin sheet metal to reduce costs, despite the decrease in accident survival rates. Leaded fuel is still widely used even though it has known health hazards. Paradoxically, a refusal to "put a monetary value on life" means that lifie is often undervalued. Ross Shachter relates an experience with a government agency that commissioned a study on removing as- bestos from schools. The study assumed a particular dollar value for the life of a school-age child, and argued that the rational choice under that assumption was to remove the asbestos. The government agency, morally outraged, rejected rhe report out of hand. It then decided against asbestos removal. Some attempts have been made to find out the value thai people place on their own lives. MICROMORT Two common "currencies" used in medical and safety analysis are the micromort (a one in QALY a million chance of death) and the QALY, or quality-adjusted life year (equivalenl to a year in good health with no infirmities). A number of studiles across a wide range of individuals have shown that a micromort is worth about 20 (1980 dollars). We have already seen that utility functions need not be linear, so this does not innply tlhat a decision maker would kill himself for 20 million. Again, the local linearity of any utility curve means that nnicromort and QALY values are most appropriate for small incrementall risks and rewards. Decision making in the field of public policy involves both millions of dollars and life and death. For example, in deciding what levels of a carcinogenic substance to allow into the environment, policy makers must weigh the prevention of deaths against the economic hard- ship that might result from the elimination of certain products and processes. Siting a new airport requires consideration of the disruption caused by construction; the cost of land; the distance from centers of population; the noise of flight operations; safety issues arising from local topography and weather conditions; and so on. Problems like these, in which outcomes are characterized by two or more attributes, are handled by multiattribute utility theory. We will call the attributes X = XI,. . . , X,; a complete vector of assignments will be x = (XI, . . . , 2,). Each attribute is generally assumed to have discrete or continuous scalar values. For simplicity, we will assume that each attribute is defined in such a way that, 594 Chapter 16. Malung Simple Decisions all other things being equal, higher values of the attribute correspond to higher utilities. For example, if we choose AbsenceOfNoise as an attribute in the airport problem then the greater its value, the better the solution. In some cases, it may be necessary to subdivide the range of values so that utility varies monotonically within each range. We begin by examining cases in which decisions can be made without combining the attribute values into a single utility value. Then we look at cases where the utilities of attribute combinations can be specified very concisely. Dominance Suppose that airport site S1 costs less, generates less noise pollution, and is safer than site S2. STRICTDOMINANCE One would not hesitate to reject S2. We then say that there is strict dominance of Sl over S2. In general, if an option is of lower value on all attributes than some other option, it need not be considered further. Strict dominance is often very useful in narrowing down the field of choices to the real contenders, although it seldom yields a unique choice. Figure 16.3(a) shows a schematic diagram for the two-attribute case. x2 I This region I dominates A I I Co B, I A c (a (b) Figure 16.3 Strict dominance. (a) Deterministic: Option A is strictly dominated by B but not by C or D. (b) Uncertain: A is strictly dominated by B but not by C. That is fine for the deterministic case, in which the attribute values are known for sure. What about the general case, where the action outcomes are uncertain? A direct analog of strict dominance can be constructed, where, despite the uncertainty, all possible concrete outcomes for S1 strictly dominate all possible outcomes for S2. (See Figure 16.3(b).) Of course, this will probably occur even less often than in the deterministic case. Fortunately, there is a more useful generalization called stochastic dominance, which occurs very frequently in real problems. Stochastic dominance is easiest to understand in the context of a single attribute. Suppose we believe that the cost of siting the airport at S1 is uniformly distributed between 2.8 billion and 4.8 billion and that the cost at S2 is uniformly distributed between 3 billion and 5.2 billion. Figure 16.4(a) shows these distributions, with cost plotted as a negative value. Then, given only the information that utility decreases with cost, we can say that S1 stochastically dominates Sz (i.e., S2 can be discarded). It is important Section 16.4. Multiattribute Utility Functions 595 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 Negative cost Negative cost (a (b Stochastic dominance. (a) S1 stochastically dominates S2 on cost. (b) Cu- Figure 16.4 mulative distributions for the negative cost of S1 and S2. to note that this does not follow from comparing the expected costs. For example, if we knew the cost of S1 to be exactly 3.8 billion, then we would be unable to make a decision without additional information on the utility of money.3 The exact relationship between the attribute distributiorls needed to establish stochastic dominance is best seen by examining the cumulative distributions, shown in Figure 16.4(b). The cumulative distribution measures the probability that the cost is less than or equal to any given amount-that is, it integrates the original distribution. If the cumulative distribution for S1 is always to the riglit of the cumulative distribution for S:2, then, stochastically speaking, S1 is cheaper than S2. Formally, if two actions Al and A2 liead to probability distributions p1 (x) and p2 (x) on attribute X, then A1 stochastically dominates A2 on X if The relevance of this definition to the selection of optimal decisions comes from the following property: Al stochastically dominates A2, then for an,y monotonically nondecreasing utility function U(z), the expected utility of A1 is at least as high as the expected utility of A2. Hence, if ail action is stochastically dominated by another action on all attributes, then it can be discarded. The stochastic dominance condition might seem rather technical and perhaps not so easy to evaluate without extensive probability calculations. In fact, it can be decided very easily in many cases. Suppose, for example, that the construction cost depends on the distance to centers of population. The cost itself is uncertain, but the greater the distance, the greater the cost. If S1 is less remote than S2, then S1 will dominate S2 on cost. Although we will not It might seem odd that more information on the cost of S1 could make the agent less able to decide. The paradox is resolved by noting that the decision reached in the abse.nce of exact cost information is less likely to have the highest utility payoff. 596 Chapter 16. Making Simple Decisions present them here, there exist algorithms for propagating this kind of qualitative information QUALITATIVE PROBABILISTIC among uncertain variables in qualitative probabilistic networks, enabling a system to make NETWORKS rational decisions based on stochastic dominance, without using any numeric values. Preference structure and multiattribute utility Suppose we have n attributes, each of which has d distinct possible values. To specify the n complete utility function U(xl, . . . , x,), we need d values in the worst case. Now, the worst case corresponds to a situation in which the agent's preferences have no regularity at all. Mul- tiattribute utility theory is based on the supposition that the preferences of typical agents have much more structure than that. The basic approach is to identify regularities in the preference REPRESENTATION behavior we would expect to see and to use what are called representation theorems to show THEOREMS that an agent with a certain lund of preference structure has a utility function U(x1, .. . ?xn) = ffi(xl),... ?fn(xn) ,I where f is, we hope, a simple function such as addition. Notice the similarity to the use of Bayesian networks to decompose the joint probability of several random variables. Preferences without uncertainty Let us begin with the deterministic case. Remember that for deterministic environments the agent has a value function V(xl,. . . , x,); the aim is to represent this function concisely. The basic regularity that arises in deterministic preference structures is called preference PREFERENCE independence. Two attributes X1 and X2 are preferentially independent of a third attribute X3 if the preference between outcomes (zl, 22, 23) and (xi, 4, 23) does not depend on the particular value xs for attribute X3. Going back to the airport example, where we have (among other attributes) Noise, Cost, and Deaths to consider, one may propose that Noise and Cost are preferentially inde- pendent of Deaths. For example, if we prefer a state with 20,000 people residing in the flight path and a construction cost of 4 billion to a state with 70,000 people residing in the flight path and a cost of 3.7 billion when the safety level is 0.06 deaths per million passenger miles in both cases, then we would have the same preference when the safety level is 0.13 or 0.01; and the same independence would hold for preferences between any other pair of values for Noise and Cost. It is also apparent that Cost and Deaths are preferentially independent of Noise and that Noise and Deaths are preferentially independent of Cost. We say that the MUTUAL PREFERENTIAL set of attributes Nozse, Cost, Deaths) exhibits mutual preferential independence (MPI). INDEPENDENCE MPI says that, whereas each attribute may be important, it does not affect the way in which one trades off the other attributes against each other. Mutual preferential independence is something of a mouthful, but thanks to a remark- able theorem due to the economist Debreu (1960), we can derive from it a very simple form for the agent's value function: Ifattributes XI, . . . , X, are mutuallypreferentially indepen- dent, then the agent's preference behavior can be described as maximizing the function Section 16.5. Decision Networks 597 where each V, is a valuefunction referring only to the attribute Xi. For example, it might well be the case that the airport decision can be made using a value function V(noise, cost, deaths) = -noise x lo4 - cost - deaths x 1012 . ADDITIVEVALUE FUNCTION A value function of this type is called an additive value function. Additive functions are an extremely natural way to describe an agent's value function and are valid in many real-world situations. Even when MPI does not strictly hold, as might be the case at extreme values of the attributes, an additive value function might still provide a good approximation to the agent's preferences. This is especially true when the violatjons of MPI occur in portions of the attribute ranges that are unlikely to occur in practice. Preferences with uncertainty When uncertainty is present in the domain, we will also need to consider the structure of pref- erences between lotteries and to understand the resulting properties of utility functions, rather than just value functions. The mathematics of this problem can become quite complicated, so we will present just one of the main results to give a flavor of what can be done. The reader is referred to Keeney and Raiffa (1976) for a thorough survey of the field. UTILITY The basic notion of utility independence extends preference independence to cover INDEPENDENCE lotteries: a set of attributes X is utility-independent of a set of attributes Y if preferences be- tween lotteries on the attributes in X are independent of the particular values of the attributes in Y. A set of attributes is mutually utility-indepeindent (MUI) if each of its subsets is INDEPENDENT utility-independent of the remaining attributes. Again, it seems reasonable to propose that the airport attributes are MUI. MU1 implies that the agent's behavior can be described using a multiplicative utility function (Keeney, 1974). The general form of a multiplicati-ve utility function is best seen by ',L&oN looking at the case for three attributes. For conciseness, we will use U, to mean U, (x,) : Although this does not look very simple, it contains just three single-attribute utility functions and three constants. In general, an n-attribute problem exhibiting MU1 can be modeled using n single-attribute utilities and n constants. Each of the single-attribute utility functions can be developed independently of the other attributes, and this combination will be guaranteed to generate the correct overall preferences. Additional assumptions are required to obtain a purely additive utility function. In this section, we will look at a general mechanism for making rational decisions. The INFLUENCE DIAGRAM notation is often called an influence diagram (Howard. and l\rlatheson, 1984), but we will use DECISION NETWORK the more descriptive term decision network. Decision networks combine Bayesian networks with additional node types for actions and utilities. We, will use airport siting as an example. 598 Chapter 16. Malung Simple Decisions Airpofl Site Figure 16.5 A simple decision network for the airport-siting problem. / i Representing a decision problem with a decision network In its most general form, a decision network represents information about the agent's current state, its possible actions, the state that will result from the agent's action, and the utility of that state. It therefore provides a substrate for implementing utility-based agents of the type first introduced in Section 2.4. Figure 16.5 shows a decision network for the airport siting problem. It illustrates the three types of nodes used: CHANCE NODES 0 Chance nodes (ovals) represent random variables, just as they do in Bayes nets. The agent could be uncertain about the construction cost, the level of air traffic and the potential for litigation, and the Deaths, Noise, and total Cost variables, each of which also depends on the site chosen. Each chance node has associated with it a conditional distribution that is indexed by the state of the parent nodes. In decision networks, the parent nodes can include decision nodes as well as chance nodes. Note that each of the -state chance nodes could be part of a large Bayes net for assessing construction current costs, air traffic levels, or litigation potentials. DECISION NODES Decision nodes (rectangles) represent points where the decision-maker has a choice of actions. In this case, the Airportsite action can take on a different value for each site under consideration. The choice influences the cost, safety, and noise that will result. In this chapter, we will assume that we are dealing with a single decision node. Chapter 17 deals with cases where more than one decision must be made. UTILITY NODES 0 Utility nodes (diamonds) represent the agent's utility fnction. The utility node has as parents all variables describing the outcome that directly affect utility. Associated with the utility node is a description of the agent's utility as a function of the parent attributes. The description could be just a tabulation of the function, or it might be a parameterized additive or multilinear function. These nodes are often called value nodes in the literature. We prefer to maintain the distinction between utility and value functions, as discussed earlier, because the outcome state may represent a lottery. Section 16.5. Decision Networks 599 A simplified form is also used in many cases. The notation remains identical, but the chance nodes describing the outcome state are omitted. Instead, the utility node is connected directly to the current-state nodes and the decision node. In this case, rather than representing a utility function on states, the utility node represents the expected utility associated with T ACT'oN-UT'L'TY ABLES each action, as defined in Equation (16.1). We therefore call such tables action-utility tables. Figure 16.6 shows the action-utility representation of the airport problem. Air Trafic a Litigation c3 Figure 16.6 A simplified representation of the airport-siting problem. Chance nodes cor- responding to outcome states have been factored out. Notice that, because the Noise, Deaths, and Cost chance nodes in Figure 16.5 refer to future states, they can never have their values set as evidence variables. Thus, the simplified version that omits these nodes can be used whenever the rnore general form car1 be used. Although the simplified form contains fewer nodes, the omission of an explicit description of the outcome of the siting decision means that it is less flexible with respect to changes in circumstances. For example, in Figure 16.5, a change in aircraft noise levels can be reflected by a change in the conditional probability table associated with the Noise node, whereas a change in the weight accorded to noise pollution in the utility function can be reflected by a change in the utility table. In the action-utility diagram, Figure 16.6, on the other hand, all such changes have to be reflected by changes to the action-utility table. Essentially, the action-utility formulation is a compiled version of the original formulation. Evaluating decision networks Actions are selected by evaluating the decision network for each possible setting of the deci- sion node. Once the decision node is set, it behaves exactly like a chance node that has been set as an evidence variable. The algorithm for evaluating decision networks is the following: I. Set the evidence variables for the current state. 2. For each possible value of the decision node; (a) Set the decision node to that value. 600 Chapter 16. Making Simple Decisions (b) Calculate the posterior probabilities for the parent nodes of the utility node, using a standard probabilistic inference algorithm. (c) Calculate the resulting utility for the action. 3. Return the action with the highest utility. This is a straightforward extension of the Bayes net algorithm and can be incorporated directly into the agent design given in Figure 13.1. We will see in Chapter 17 that the possibility of executing several actions in sequence makes the problem much more interesting. In the preceding analysis, we have assumed that all relevant information, or at least all avail- able information, is provided to the agent before it makes its decision. In practice, this is hardly ever the case. One of the most important parts of decision making is knowing what questions to ask. For example, a doctor cannot expect to be provided with the results of all possible diagnostic tests and questions at the time a patient first enters the consulting room.5 Tests are often expensive and sometimes hazardous (both directly and because of associated delays). Their importance depends on two factors: whether the test results would lead to a significantly better treatment plan, and how likely the various test results are. INFORMATION VALUE This section describes information value theory, which enables an agent to choose THEORY what information to acquire. The acquisition of information is achieved by sensing actions, as described in Chapter 12. Because the agent's utility function seldom refers to the contents of the agent's internal state, whereas the whole purpose of sensing actions is to affect the internal state, we must evaluate sensing actions by their effect on the agent's subsequent 7 "real ' actions. Thus, information value theory involves a form of sequential decision making. A simple example Suppose an oil company is hoping to buy one of n indistinguishable blocks of ocean drilling rights. Let us assume flrther that exactly one of the blocks contains oil worth C dollars and that the price of each block is C/n dollars. If the company is risk-neutral, then it will be indifferent between buying a block and not buying one. Now suppose that a seismologist offers the company the results of a survey of block number 3, which indicates definitively whether the block contains oil. How much should the company be willing to pay for the information? The way to answer this question is to examine what the company would do if it had the information: With probability lln, the survey will indicate oil in block 3. In this case, the company will buy block 3 for Cln dollars and make a profit of C - C/n = (n - 1)Cln dollars. With probability (n - l)/n, the survey will show that the block contains no oil, in which case the company will buy a different block. Now the probability of finding oil in one In the United States, the only question that is always asked beforehand is whether the patient has insurance. Section 16.6. The Value of Information 60 1 of the other blocks changes from 1 /n to 1 / (n - It), so the company makes an expected profit of C/(n - 1) - C/n = C/n(n - 1) dollars. Now we can calculate the expected profit, given the suirvey information: Therefore, the company should be willing to pay the seismclogist up to C/n dollars for the information: the information is worth as much as the block itself. The value of information derives from the fact that with the information, one's course of action ca.n be changed to suit the actual situation. One can discriminate according to the situation, whereas without the information, one has to do what's best on average over the possible situations. In general, the value of a given piece of information is definedl to be the before and after information is obtained. difference in expected value between best actions A general formula It is simple to derive a general mathematical formula for the value of information. Usually, we assume that exact evidence is obtained about the value of some random variable Ej, so OF the phrase value of perfect information (VPI) is used.6 Let the agent's current knowledge INFORMATION be E. Then the value of the current best action a is defined by EU(a(E) = max U(Resulti(A)) P(Reszlt,(A) 1 Do(A), E) A i and the value of the new best action (after the new evidence .Ej is obtained) will be EU(E, IE, Ej) = max U(Resulti(A)) P(Eeszlt(A) IDo(A), E, El) . A i But E,, is a random variable whose value is currently unknown, so we must average over all possible values elk that we might discover for E,, usiing our current beliefs about its value. The value of discovering E:,, given current information E, is then defined as In order to get some intuition for this formula, coinsider the simple case where there are only two actions, A1 and A2, from which to choose. Their current expected utilities are Ul and U2. The information E:, will yield some new expected utilities Ul and Ui for the actions, but before we obtain E,, we will have some probability distributions over the possible values of U: and Ui (which we will assume are independent). Suppose that Al and A2 represent two different routes through a mountain range in winter. A1 is a nice, straight highway through a low pass, and A2 is a winding dirt road over the top. Just given this information, A1 is clearly preferable, because it is quite likely that the second route is blocked by avalanches, whereas it is quite unlikely that the first route is blocked by traffic. Ul is therefore clearly higher than U2. It is possible to obtain satellite Imperfect information about a variable X can be modeled as perfect nnformation about a variable Y that is probabilistically related to X. See Exercise 16.1 1 for an example of this. 602 Chapter 16. Making Simple Decisions Figure 16.7 Three generic cases for the value of information. In (a), Al will almost cer- tainly remain superior to A2, so the information is not needed. In (b), the choice is unclear and the information is crucial. In (c), the choice is unclear but because it makes little differ- ence, the information is less valuable. reports E3 on the actual state of each road that would give new expectations, Ui and Ui, for the two crossings. The distributions for these expectations are shown in Figure 16.7(a). Obviously, in this case, it is not worth the expense of obtaining satellite reports, because it is unlikely that the information derived from them will change the plan. With no change, information has no value. Now suppose that we are choosing between two different winding dirt roads of slightly different lengths and we are carrying a seriously injured passenger. Then, even when Ul and U2 are quite close, the distributions of Ui and U; are very broad. There is a significant possibility that the second route will turn out to be clear while the first is blocked, and in this case the difference in utilities will be very high. The VPI formula indicates that it might be worthwhile getting the satellite reports. Such a situation is shown in Figure 16.7(b). Now suppose that we are choosing between the two dirt roads in summertime, when blockage by avalanches is unlikely. In this case, satellite reports might show one route to be more scenic than the other because of flowering alpine meadows, or perhaps wetter because of errant streams. It is therefore quite likely that we would change our plan if we had the information. But in this case, the difference in value between the two routes is still likely to be very small, so we will not bother to obtain the reports. This situation is shown in Figure 16.7(c). In sum, information has value to the extent that it is likely to cause a change of plan and to the extent that the new plan will be signijicantly better than the old plan. Properties of the value of information One might ask whether it is possible for information to be deleterious: can it actually have negative expected value? Intuitively, one should expect this to be impossible. After all, one could in the worst case just ignore the information and pretend that one has never received it. This is confirmed by the following theorem, which applies to any decision-theoretic agent: 603 The Value of Information Section 16.6. The value of information is nonnegative: Vj,E VPIE(Ej) 2 0. The theorem follows directly from the definition of VPI, and we leave the proof as an exer- cise (Exercise 16.12). It is important to remember that VPI depends on the current state of information, which is why it is subscripted. It can change as more information is acquired. In the extreme case, it will become zero if the variable in question already has a known value. Thus, VPI is not additive. That is, (in general) . VPIE(EJ, Ek) VPIE(E,) + VPIE(Ek) VPI is, however, order-independent, which should be iintuitively obvious. That is, VPIE(E, Ek) = VPIE(E,) + VPIE,E (Ek) = VPIE(E) + VPIE,E(JT) . Order independence distinguishes sensing actions froin ordinary actions and simplifies the problem of calculating the value of a sequence of sensing actions. Implementing an information-gathering agent A sensible agent should ask questions of the user in a reasonable order, should avoid aslung questions that are irrelevant, should take into account the importance of each piece of infor- mation in relation to its cost, and should stop asking qluestions when that is appropriate. All of these capabilities can be achieved by using the value of information as a guide. Figure 16.8 shows the overall design of an agent that can gather information intelli- gently before acting. For now, we will assume that with each observable evidence variable E,, there is an associated cost, Cost(E,), which reflects the cost of obtaining the evidence through tests, consultants, questions, or whatever. The agent requests what appears to be the most valuable piece of information, compared with its cost. We assume that the result of the Request(E,) is that the next percept provides the value of EJ. If no observation is action worth its cost, the agent selects a "real" action. The agent algorithm we have described implernents a form of information gathering MYOPIC that is called myopic. This is because it uses the VE'I formula shortsightedly, calculating function INFRMATION-GATHERING-AGENT(C) returns an action static: D, a decision network integrate percept into D j +- the value that maximizes VPI (Ej) - Cost (Ej) if VPI(Ej) Cost(Ej) then return REQUEST(E,.) else return the best action from D Figure 16.8 Design of a simple information-gathering agent. The agent works by repeat- edly selecting the observation with the highest information value, until the cost of the next observation is greater than its expected benefit.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.