Intelligent agents in Artificial intelligence

intelligent agents and their applications, intelligent agents and multi-agent systems intelligent agents characteristics and intelligent agents theory and practice pdf free download
HalfoedGibbs Profile Pic
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
Comment
- 2 INTELLIGENT AGENTS In which we discuss the nature of agents, peect or otherwise, the diversity of environments, and the resulting menagerie of agent types. Chapter 1 identified the concept of rational agents as central to our approach to arti- ficial intelligence. In this chapter, we make this notion more concrete. We will see that the concept of rationality can be applied to a wide variety of agents operating in any imaginable environment. Our plan in this book is to use this concept to develop a small set of design principles for building successful agents-systems that can reasonably be called intelligent. We will begin by examining agents, environments, and the coupling between them. The observation that some agents behave better than others leads naturally to the idea of a rational agent-one that behaves as well as possible. How well an agent can behave depends on the nature of the environment; some environments are more difficult than others. We give a crude categorization of environments and show how properties of an environment influence the design of suitable agents for that environment. We describe a number of basic "skeleton" agent designs, which will be fleshed out in the rest of the book. 2.1 AGENTS AND ENVIRONMENTS ENVIRONMENT An agent is anything that can be viewed as perceiving its environment through sensors and SENSOR acting upon that environment through actuators. This simple idea is illustrated in Figure 2.1. ACTUATOR A human agent has eyes, ears, and other organs for sensors and hands, legs, mouth, and other body parts for actuators. A robotic agent might have cameras and infrared range finders for sensors and various motors for actuators. A software agent receives keystrokes, file contents, and network packets as sensory inputs and acts on the environment by displaying on the screen, writing files, and sending network packets. We will make the general assumption that every agent can perceive its own actions (but not always the effects). We use the term percept to refer to the agent's perceptual inputs at any given instant. An PERCEPT PERCEPTSEQUENCE agent's percept sequence is the complete history of everything the agent has ever perceived. In general, an agent's choice of action at any given instant can depend on the entire percept sequence observed to date. If we can specify the agent's choice of action for every possible Section 2.1. Agents and Environments 3 3 Figure 2.1 Agents interact with environments through sensors and actuators. 7 percept sequence, then we have said more or less everything there is to say about the agent. AGENT FUNCTION Mathematically speaking, we say that an agent's behavior is described by the agent function that maps any given percept sequence to an action. We can imagine tabulating the agent function that describes any given agent; for most agents, this would be a very large table-infinite, in fact, unless we place a bouind on the length of percept sequences we want to consider. Given an agent to experiment with, we can, in principle, construct this table by trying out all possible percept sequences and recording which actions the agent does in response.' The table is, of course, an external characterization of the agent. Internally, Ihe agent function for an artificial agent will be implemented by an AGENTPROGRAM agent program. It is important to keep these two ideas distinct. The agent function is an abstract mathematical description; the agent program is a concrete implementation, running on the agent architecture. To illustrate these ideas, we will use a very simple example-the vacuum-cleaner world shown in Figure 2.2. This world is so simple that we can describe everything that happens; it's also a made-up world, so we can invent many variations. This particular world has just two locations: squares A and B. The vacuum agent perceives which square it is in and1 whether there is dirt in the square. It can choose to move left, move right, suck up the dirt, or do nothing. One very simple agent function is the following: if the current square is dirty, then suck, otherwise move to the other square. A partial tabulation of this agent function is shown in Figure 2.3. A simple agent program for this agent function is given later in the chapter, in Figure 2.8. Looking at Figure 2.3, we see that various vacuum-world agents can be defined simply za by filling in the right-hand column in various ways. The obvious question, then, is this: What If the agent uses some randomization to choose its actions, then we would have to try each sequence many times to identify the probability of each action. One might imagine that acting randomly is rather silly, but we'll :see later in this chapter that it can be very intelligent. 34 Chapter 2. Intelligent Agents - - Figure 2.2 A vacuum-cleaner world with just two locations. I Percept sequence 1 Action 1 A, Clean Right A, Dirty Suck B, Clean Left B, Dirty Suck A, Clean, A, Clean Right A, Clean, A, Dirty Suck A, Clean, A, Clean, A, Clean Right A, Clean, A, Clean, A, Dirty Suck Partial tabulation of a simple agent function for the vacuum-cleaner world Figure 2.3 shown in Figure 2.2. is the right way to Jill out the table? In other words, what makes an agent good or bad, intelligent or stupid? We answer these questions in the next section. Before closing this section, we will remark that the notion of an agent is meant to be a tool for analyzing systems, not an absolute characterization that divides the world into agents and non-agents. One could view a hand-held calculator as an agent that chooses the action of displaying "4" when given the percept sequence "2 + 2 =," but such an analysis would hardly aid our understanding of the calculator. RATIONALAGENT A rational agent is one that does the right thing-conceptually speaking, every entry in the table for the agent function is filled out correctly. Obviously, doing the right thing is better than doing the wrong thing, but what does it mean to do the right thing? As a first approximation, we will say that the right action is the one that will cause the agent to be Section 2.2. Good Behavior: The Concept of Rationality 35 most successful. Therefore, we will need some way to measure success. Together with the description of the environment and the sensors and actuators of the agent, this will provide a complete specification of the task facing the agent. Given this, we can define more precisely what it means to be rational. Performance measures A performance measure embodies the criterion for success of an agent's behaviior. When MEASURE an agent is plunked down in an environment, it generates a sequence of actions according to the percepts it receives. This sequence of actions causes the environment to go through a sequence of states. If the sequence is desirable, then the agent has performed well. Obviously, there is not one fixed measure suitable for all agents. We could ask the agent for a subjective opinion of how happy it is with its own performance, but some agents would be unable to answer, and others would delude themelves. Therefore, we will insist on an objective performance measure, typically one imposed by the designer who is constructing the agent. Consider the vacuum-cleaner agent from the preceding section. We might propose to measure performance by the amount of dirt cleaned up in a single eight-hour shift. With a rational agent, of course, what you ask for is what you get. A rational agent can maximize this performance measure by cleaning up the dirt, then dumping it all on the floor, then cleaning it up again, and so on. A more suitable performance measure would reward the agent for having a clean floor. For example, one point could be awarded for each clean square at each time step (perhaps with a penalty for electricity consumed and noise generated). As a general rule, it is better to design perormance measures according to what one actually wants in the environment, rather than according to how one thinks the agent should behave. The selection of a performance measure is not always easy. For example, the notion of "clean floor" in the preceding paragraph is based on average cleanliness over time. Yet the same average cleanliness can be achieved by two different agents, one of which does a mediocre job all the time: while the other cleans energetically but takes long breaks. Which is preferable might seem to be a fine point of janitorial science, but in fact it is a deep philo- sophical question with far-reaching implications. Which is better-a reckless life of highs and lows, or a safe but humdrum existence? Which is better-an economy where everyone lives in moderate poverty, or one in which some live in plenty while others are very poor? We will leave these questions as an exercise for the diligent reader. Rationality What is rational at any given time depends on four things: The performance measure that defines the criterion of success. The agent's prior knowledge of the environment. The actions that the agent can perform. The agent's percept sequence to date. x Human agents in particular are notorious for "sour grapes-believing they did not really want something after not getting it, as in, "Oh well, never mind, I didn't want that stupid Nobel prize anyway." 3 6 Chapter 2. Intelligent Agents This leads to a definition of a rational agent: ,NN,T For each possible percept sequence, a rational agent should select an action that is ex- pected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has. Consider the simple vacuum-cleaner agent that cleans a square if it is dirty and moves to the other square if not; this is the agent function tabulated in Figure 2.3. Is this a rational agent? That depends First, we need to say what the performance measure is, what is known about the environment, and what sensors and actuators the agent has. Let us assume the following: The performance measure awards one point for each clean square at each time step, over a "lifetime" of 1000 time steps. The "geography" of the environment is known a priori (Figure 2.2) but the dirt distri- bution and the initial location of the agent are not. Clean squares stay clean and sucking cleans the current square. The Lefi and Right actions move the agent left and right except when this would take the agent outside the environment, in which case the agent remains where it is. The only available actions are Left, Right, Suck, and NoOp (do nothing). The agent correctly perceives its location and whether that location contains dirt. We claim that under these circumstances the agent is indeed rational; its expected perfor- mance is at least as high as any other agent's. Exercise 2.4 asks you to prove this. One can see easily that the same agent would be irrational under different circum- stances. For example, once all the dirt is cleaned up it will oscillate needlessly back and forth; if the performance measure includes a penalty of one point for each movement left or right, the agent will fare poorly. A better agent for this case would do nothing once it is sure that all the squares are clean. If clean squares can become dirty again, the agent should occa- sionally check and re-clean them if needed. If the geography of the environment is unknown, the agent will need to explore it rather than stick to squares A and B. Exercise 2.4 asks you to design agents for these cases. Omniscience, learning, and autonomy OMNISCIENCE We need to be careful to distinguish between rationality and omniscience. An omniscient agent knows the actual outcome of its actions and can act accordingly; but omniscience is impossible in reality. Consider the following example: I am walking along the Champs ElysCes one day and I see an old friend across the street. There is no traffic nearby and I'm not otherwise engaged, so, being rational, I start to cross the street. Meanwhile, at 33,000 feet, a cargo door falls off a passing airiner, and before I make it to the other side of the street I am flattened. Was I irrational to cross the street? It is unlikely that my obituary would read "Idiot attempts to cross street." This example shows that rationality is not the same as perfection. Rationality max- imizes expected performance, while perfection maximizes actual performance. Retreating from a requirement of perfection is not just a question of being fair to agents. The point is See N. Henderson, "New door latches urged for Boeing 747 jumbo jets," Washington Post, August 24, 1989. Section 2.2. Good Behavior: The Concept of Rationality 37 that if we expect an agent to do what turns out to be the best action after the fact, it will be impossible to design an agent to fulfill this specification-unless we improve the peirformance of crystal balls or time machines. Our definition of rationality does not require omniscience, then, because the rational choice depends only on the percept sequence to date. We must also ensure that we haven't inadvertently allowed the agent to engage in decidedly underintelligent activities. For exam- ple, if an agent does not look both ways before crossing a busy road, then its percept sequence will not tell it that there is a large truck approaching at high speed. Does our definition of rationality say that it's now OK to cross the road? Far from it First, it would not be rational to cross the road given this uninformative percept sequence: the risk of accident from cross- ing without looking is too great. Second, a rational agent should choose the "looking" action before stepping into the street, because looking helps maximize the expected performance. Doing actions in order ro modih future percepts-sometimes called information gather- INFORMATION ing-is an important part of rationality and is covered in depth in Chapter 16. A second GATHERING EXPLORATION example of information gathering is provided by the exploration that must be undertaken by a vacuum-cleaning agent in an initially unknown environment. LEARNING Our definition requires a rational agent not only to gather information, but also to learn as much as possible from what it perceives. The agent's initial configuration could reflect some prior knowledge of the environment, but as the agent gains experience this may be modified and augmented. There are extreme cases in which the environment is completely known a priori. In such cases, the agent need not perceive or learn; it simply acts correctly. Of course, such agents are very fragile. Consider the lowly dung beetle. After digging its nest and laying its eggs, it fetches a ball of dung from a nearby heap to plug the entrance. If the ball of dung is removed from its grasp en route, the beetle continues on and pantomimes plugging the nest with the nonexistent dung ball, never noticing that it is missing. Evolution has built an assumption into the beetle's behavior, and when it is violated, unsuccessful behavior results. Slightly more intelligent is the sphex wasp. The female sphex will dig a burrow, go out and sting a caterpillar and drag it to the burrow, enter the burrow again to check all is well, drag the caterpillar inside, and lay its eggs. The caterpillar serves as a food source when the eggs hatch. So far so good, but if an entomologist moves the caterpillar a few inches away while the sphex is doing the check, it will revert back to the "drag" step of its plan, and will continue the plan without modification, even after dozens of caterpillar-moving interventions. The sphex is unable to learn that its innate plan is failing, and thus will not change it. Successful agents split the task of computing the agent function into three different periods: when the agent is being designed, some of the computation is done by its designers; when it is deliberating on its next action, the agent does more computation; and as it learns from experience, it does even more computation to decide how to modify its behavior. To the extent that an agent relies on the prior knowledge of its designer rather than AUTONOMY on its own percepts, we say that the agent lacks autonomy. A rational agent should be autonomous-it should learn what it can to compensate for partial or incorrect prior knowl- edge. For example, a vacuum-cleaning agent that learns to foresee where and when additional dirt will appear will do better than one that does not. As a practical matter, one seldom re- quires complete autonomy from the start: when the agent has had little or no experience, it 38 Chapter 2. Intelligent Agents would have to act randomly unless the designer gave some assistance. So, just as evolution provides animals with enough built-in reflexes so that they can survive long enough to learn for themselves, it would be reasonable to provide an artificial intelligent agent with some initial knowledge as well as an ability to learn. After sufficient experience of its environment, the behavior of a rational agent can become effectively independent of its prior knowledge. Hence, the incorporation of learning allows one to design a single rational agent that will succeed in a vast variety of environments. Now that we have a definition of rationality, we are almost ready to think about building ratio- TASK nal agents. First, however, we must think about task environments, which are essentially the "problems" to which rational agents are the "solutions." We begin by showing how to specify a task environment, illustrating the process with a number of examples. We then show that task environments come in a variety of flavors. The flavor of the task environment directly affects the appropriate design for the agent program. Specifying the task environment In our discussion of the rationality of the simple vacuum-cleaner agent, we had to specify the performance measure, the environment, and the agent's actuators and sensors. We will group all these together under the heading of the task environment. For the acronymically PEAS minded, we call this the PEAS (Performance, Environment, Actuators, Sensors) description. In designing an agent, the first step must always be to specify the task environment as fully as possible. The vacuum world was a simple example; let us consider a more complex problem: an automated taxi driver. We will use this example throughout the rest of the chapter. We should point out, before the reader becomes alarmed, that a fully automated taxi is currently somewhat beyond the capabilities of existing technology. (See page 27 for a description of an existing driving robot, or look at recent proceedings of the conferences on Intelligent Transportation Systems.) The full driving task is extremely open-ended. There is no limit to the novel combinations of circumstances that can arise-another reason we chose it as a focus for discussion. Figure 2.4 summarizes the PEAS description for the taxi's task environment. We discuss each element in more detail in the following paragraphs. First, what is the performance measure to which we would like our automated driver to aspire? Desirable qualities include getting to the correct destination; minimizing fuel con- sumption and wear and tear; minimizing the trip time and/or cost; minimizing violations of traffic laws and disturbances to other drivers; maximizing safety and passenger comfort; max- imizing profits. Obviously, some of these goals conflict, so there will be tradeoffs involved. Next, what is the driving environment that the taxi will face? Any taxi driver must deal with a variety of roads, ranging from rural lanes and urban alleys to 12-lane freeways. The roads contain other traffic, pedestrians, stray animals, road works, police cars, puddles, Section 2.3. The Nature of Environments 3 9 (Type (1 Performance Environment Actuators Sensors Measure Cameras, sonar, Taxi driver Safe: fast, legal, Roads, other Steering, speedometer, comfortable trip, traffic, accelerator, GPS, odometer, maximize profits pedestrians, brake, signal, accelerometer, customers horn, display engine sensors, key boar'd C 1 Figure 2.4 PEAS description of the task environment for an automated taxi. 1 and potholes. The taxi must also interact with potential and actual passengers. There are also some optional choices. The taxi might need to operate in Southern California, where snow is seldom a problem, or in Alaska, where it seldom is not. It could always be driving on the right, or we might want it to be flexible enough to drive on the left when in Britain or Japan. Obviously, the more restricted the environment, the easier the design problem. The actuators available to an automated taxi will be more or less the same as those available to a human driver: control over the engine through the accelerator and control over steering and braking. In ;addition, it will need output to a display screen or voice synthesizer to talk back to the passengers, and perhaps some way to communicate with other vehicles, politely or otherwise. To achieve its goals in the driving environment, the taxi will need to know where it is, what else is on the road, and how fast it is going. Its basic sensors should therefore include one or more controllable TV cameras, the speedometer, and the odometer. To control the vehicle properly, especially on curves, it should have an accelerometer; it will also need to stxte of the vehicle, so it will need the usual array of engine and elec- know the mechanical trical system sensors. It might have instruments that are not available to the average human driver: a satellite global positioning system (GPS) to give it accurate position information with respect to an electronic map, and infrared or sonar sensors to detect distances to other cars and obstacles. Finally, it will need a keyboard or microphone for the passenger to request a destination. In Figure 2.5, we have sketched the basic PEAS elements for a number of additional agent types. Further examples appear in Exercise 2.5. It may come as a surprise to some readers that we include in our list of agent types some programs that operate in the entirely artificial environment defined by keyboard input and character output on a screen. "Surely," one might say, "this is not a real environment, is it?" In fact, what matters is not the dis- tinction between "real" and "artificial" environments, but the complexity of the relationship among the behavior of the agent, the percept sequence generated by the environment, and the performance measure. Some "real" environments are actually quite simple. For example, a robot designed to inspect parts as they come by on a conveyor belt can make use of a num- ber of simplifying assumptions: that the lighting is always just so, that the only thing on the conveyer belt will be parts of a kind that it knows about, and that there are only two actions (accept or reject). 40 Chapter 2. Intelligent Agents Agent Type Performance Environment Actuators Sensors Measure Medical Healthy patient, Patient, hospital, Display Keyboard entry diagnosis system minimize costs, staff questions, tests, of symptoms, lawsuits diagnoses, findings, patient's treatments, answers referrals Satellite image Correct image Downlink from Display Color pixel analysis system categorization orbiting satellite categorization of arrays scene Part-picking Percentage of Conveyor belt Jointed arm and Camera, joint robot parts in correct with parts; bins hand angle sensors bins Refinery Refinery, Maximize purity, Valves, pumps, Temperature, controller operators yield, safety heaters, displays pressure, chemical sensors Interactive Maximize Set of students, Display Keyboard entry English tutor student's score testing agency exercises, on test suggestions, corrections Examples of agent types and their PEAS descriptions. Figure 2.5 SOFTWARE AGENTS In contrast, some software agents (or software robots or softbots) exist in rich, un- limited domains. Imagine a softbot designed to fly a flight simulator for a large commercial SOFTBOTS airplane. The simulator is a very detailed, complex environment including other aircraft and ground operations, and the software agent must choose from a wide variety of actions in real time. Or imagine a softbot designed to scan Internet news sources and show the interesting items to its customers. To do well, it will need some natural language processing abilities, it will need to learn what each customer is interested in, and it will need to change its plans dynamically-for example, when the connection for one news source goes down or when a new one comes online. The Internet is an environment whose complexity rivals that of the physical world and whose inhabitants include many artificial agents. Properties of task environments The range of task environments that might arise in A1 is obviously vast. We can, however, identify a fairly small number of dimensions along which task environments can be catego- rized. These dimensions determine, to a large extent, the appropriate agent design and the Section 2.3. The Nature of Environments 4 1 applicability of each of the principal families of techniques for agent implementation. First, we list the dimensions, then we analyze several task environments to illustrate the ideas. The definitions here are informal; later chapters provide more precise statements and examples of each kind of environment. FULLY OBSERVABLE 0 Fully observable vs. partially observable. If an agent's sensors give it access to the complete state of the environment at each point in time, then we say that the task environment is fully obervable. A task envi- ronment is effectively fully observable if the sensors detect all aspects that are relevant to the choice of action; relevance, in turn, depends on the performance measure. Fully observable environments are convenient because the agent need not maintain any in- ternal state to keep track of the world. An environment might be partially observable because of noisy and inaccurate sensors or because parts of the state are simplly missing from the sensor data-for example, a vacuum agent with only a local dirt sensor cannot tell whether there is dirt in other squares, and an automated taxi cannot see what other drivers are thinking. DETERMINISTIC 0 Deterministic vs. stochastic. STOCHASTIC If the next state of the environment is completely determined by the current state and the action executed by the agent, then we say the environment is deterministic; other- wise, it is stochastic. In principle, an agent need not worry about uncertainty in a fully observable, deterministic environment. If the environment is partially observable, how- ever, then it could appear to be stochastic. This is particularly true if the environment is complex, making it hard to keep track of all the unobserved aspects. Thus, it is often better to think of an environment as deterministic or stochastic from the point of view of the agent. Taxi driving is clearly stochastic in this sense, because one can never predict the behavior of traffic exactly; moreover, one's tires blow out and one's engine seizes up without warning. The vacuum world as we described it is deterministic, but varia- tions can include stochastic elements such as randomly appearing dirt and an unreliable suction mechanism (Exercise 2.12). If the environment is deterministic except for the STRATEGIC actions of other agents, we say that the environment is strategic. EPISODIC 0 Episodic vs. sequentiaL5 SEQUENTIAL In an episodic task environment, the agent's experience is divided into atomic episodes. Each episode consists of the agent perceiving and then performing a single action. Cru- cially, the next episode does not depend on the actions taken in previous episodes. In episodic environments, the choice of action in each episode depends only on the episode itself. Many classification tasks are episodic. For example, an agent that has to spot de- fective parts on an assembly line bases each decision on the current part, regardless of previous decisions; moreover, the current decision doesn't affect whether the next The first edition of this book used the terms accessible and inaccessible instead of fully and partially observ- able; nondeterministic instead of stochastic; and nonepisodic instead of sequential. The new terminology is more consistent with established usage. The word "sequential" is also used in computer science as the antonym of "parallel." The two mleanings are largely unrelated. 42 Chapter 2. Intelligent Agents part is defective. In sequential environments, on the other hand, the current decision could affect all future decisions. Chess and taxi driving are sequential: in both cases, short-term actions can have long-term consequences. Episodic environments are much simpler than sequential environments because the agent does not need to think ahead. STATIC 0 Static vs, dynamic. DYNAMIC If the environment can change while an agent is deliberating, then we say the environ- ment is dynamic for that agent; otherwise, it is static. Static environments are easy to deal with because the agent need not keep looking at the world while it is deciding on an action, nor need it worry about the passage of time. Dynamic environments, on the other hand, are continuously asking the agent what it wants to do; if it hasn't decided yet, that counts as deciding to do nothing. If the environment itself does not change with the passage of time but the agent's performance score does, then we say the envi- SEMIDYNAMIC ronment is semidynamic. Taxi driving is clearly dynamic: the other cars and the taxi itself keep moving while the driving algorithm dithers about what to do next. Chess, when played with a clock, is semidynamic. Crossword puzzles are static. DISCRETE 0 Discrete vs. continuous. CONTINUOUS The discrete/continuous distinction can be applied to the state of the environment, to the way time is handled, and to the percepts and actions of the agent. For example, a discrete-state environment such as a chess game has a finite number of distinct states. Chess also has a discrete set of percepts and actions. Taxi driving is a continuous- state and continuous-time problem: the speed and location of the taxi and of the other vehicles sweep through a range of continuous values and do so smoothly over time. Taxi-driving actions are also continuous (steering angles, etc.). Input from digital cam- eras is discrete, strictly speaking, but is typically treated as representing continuously varying intensities and locations. SINGLE AGENT 0 Single agent vs. multiagent. MULTIAGENT The distinction between single-agent and multiagent environments may seem simple enough. For example, an agent solving a crossword puzzle by itself is clearly in a single-agent environment, whereas an agent playing chess is in a two-agent environ- ment. There are, however, some subtle issues. First, we have described how an entity may be viewed as an agent, but we have not explained which entities must be viewed as agents. Does an agent A (the taxi driver for example) have to treat an object B (another vehicle) as an agent, or can it be treated merely as a stochastically behaving object, analogous to waves at the beach or leaves blowing in the wind? The key distinction is whether B's behavior is best described as maximizing a performance measure whose value depends on agent A's behavior. For example, in chess, the opponent entity B is trying to maximize its performance measure, which, by the rules of chess, minimizes COMPETITIVE agent A's performance measure. Thus, chess is a competitive multiagent environment. In the taxi-driving environment, on the other hand, avoiding collisions maximizes the COOPERATIVE performance measure of all agents, so it is a partially cooperative multiagent environ- ment. It is also partially competitive because, for example, only one car can occupy a parking space. The agent-design problems arising in multiagent environments are often Section 2.3. The Nature of Environments 43 Observable Deterministic Episodic Static Discrete Agents Task Environment Fully Deterministic Sequential Static Discrete Single Crossword puzzle Fully Strategic Sequential Semi Discrete Multi Chess with a clock Poker Partially Stochastic Sequential Static Discrete Multi Fully Stochastic Sequential Static Discrete Multi Backgammon Taxi dnving Partially Stochastic Sequential Dynamic Continuons Multi Partially Stochastic Sequential Dynamic Continuous Single Medical diagnosis Image-analysis Fully Deterministic Episodic Semi Continuou. Single Part-picking robot Partially Stochastic Episodic Dynamic Continuous Single Partially Stochastic Sequential Dynamic Continuous Single Refinery controller Interactive English tutor Partially Stochastic Sequential Dynamic Discrete Multi - Figure 2.6 Examples of task environments and their characteristics. quite different from those in single-agent environments; for example, communication often emerges as a rational behavior in multiagent environments; in some partially ob- servable competitive environments, stochastic behavior is rational because it avoids the pitfalls of predictability. As one might expect, the hardest case is partially observable, stochastic, sequential, dynamic, continuous, and multiagent. It also turns out that most real situations are so cornplex that whether they are really deterministic is a moot point. For practical purposes, they must be treated as stochastic. Taxi driving is hard in all these senses. Figure 2.6 lists the properties of a number of familiar environments. Note that the an- swers are not always cut and dried. For example, we have listed chess as fully observable; strictly speaking, this is false because certain rules about castling, en passant capture, and draws by repetition require remembering some facts about the game history that are not ob- servable as part of the board state. These exceptions to observability are of course minor compared to those faced by the taxi driver, the English tutor, or the medical diagnosis system. Some other answers in the table depend on how the task environment is defined. We have listed the medical-diagnosis task as single-agent because the disease process in a patient is not profitably modeled as an agent; but a medical-diagnosis system might also have to deal with recalcitrant patients and skeptical staff, so the environment could have a niultiagent aspect. Furthermore, medical diagnosis is episodic if one conceives of the task as selecting a diagnosis given a list of s:ymptoms; the problem is sequential if the task can include proposing a series of tests, evaluating progress over the course of treatment, and so on. Also, many environments are episodic at higher levels than the agent's individual actions. For example, a chess tournament consists of a sequence of games; each game is an episode, because (by and large) the contribution of the moves in one game to the agent's overall performance is not affected by the moves in its previous game. On the other hand, decision making within a single game is certainly sequential. 44 Chapter 2. Intelligent Agents The code repository associated with this book (aima.cs.berkeley.edu) includes imple- mentations of a number of environments, together with a general-purpose environment simu- lator that places one or more agents in a simulated environment, observes their behavior over time, and evaluates them according to a given performance measure. Such experiments are often carried out not for a single environment, but for many environments drawn from an en- vironment class. For example, to evaluate a taxi driver in simulated traffic, we would want to CLASS run many simulations with different traffic, lighting, and weather conditions. If we designed the agent for a single scenario, we might be able to take advantage of specific properties of the particular case but might not identify a good design for driving in general. For this reason, the code repository also includes an environment generator for each environment GENERATOR class that selects particular environments (with certain likelihoods) in which to run the agent. For example, the vacuum environment generator initializes the dirt pattern and agent location randomly. We are then interested in the agent's average performance over the environment class. A rational agent for a given environment class maximizes this average performance. Exercises 2.7 to 2.12 take you through the process of developing an environment cIass and evaluating various agents therein. So far we have talked about agents by describing behavior-the action that is performed after any given sequence of percepts. Now, we will have to bite the bullet and talk about AGENTPROGRAM how the insides work. The job of A1 is to design the agent program that implements the agent function mapping percepts to actions. We assume this program will run on some sort ARCHITECTURE of computing device with physical sensors and actuators-we call this the architecture: agent = architecture +program . Obviously, the program we choose has to be one that is appropriate for the architecture. If the program is going to recommend actions like Walk, the architecture had better have legs. The architecture might be just an ordinary PC, or it might be a robotic car with several onboard computers, cameras, and other sensors. In general, the architecture makes the percepts from the sensors available to the program, runs the program, and feeds the program's action choices to the actuators as they are generated. Most of this book is about designing agent programs, although Chapters 24 and 25 deal directly with the sensors and actuators. Agent programs The agent programs that we will design in this book all have the same skeleton: they take the current percept as input from the sensors and return an action to the actuatom6 Notice the difference between the agent program, which takes the current percept as input, and the agent function, which takes the entire percept history. The agent program takes just the current There are other choices for the agent program skeleton; for example, we could have the agent programs be coroutines that run asynchronously with the environment. Each such coroutine has an input and output port and consists of a loop that reads the input port for percepts and writes actions to the output port. Section 2.4. The Structure of Agents 45 function TABLE-DRIVEN-AGENT(CP) returns an action static: percepts, a sequence, initially empty table, a table of actions, indexed by percept sequences, initially fully specified append percept to the end of percepts action +- LOK(percepts, table) return action Figure 2.7 The TABLE-DRIVEN-AGENT program is invoked for each new percept and returns an action each time. It keeps track of the percept sequence using its own private data structure. percept as input because nothing more is available from the environment; if the agent's actions depend on the entire percept sequence, the agent will have to remember the percepts. We will describe the agent programs via the simple pseudocode language that is defined in Appendix B. (The online code repository contains implementations in real programming languages.) For example, Figure 2.7 shows a rather trivial agent program that keeps track of the percept sequence and then uses it to index into a table of actions to decide what to do. The table represents explicitly the agent function that the agent program embodies. To build a rational agent in this way, we as designers must construct a table that contains the appropriate action for every possible percept sequence. It is instructive to consider why the table-driven approach to agent construction is doomed to failure. Let P be the set of possible percepts and let T be the lifetiime of the agent (the total number olf percepts it will receive). The lookup table will contain c:= ('PIt entries. Consider the automated taxi: the visual input from a single camera comes in at the rate of roughly 27 megabytes per second (30 frames per second, 640 x 480 pixels with 24 bits of color information). This gives a lookup table with over 10250000000000 entries for an hour's driving. Even the lookup table for chess-a tiny, well-behaved fragment of the real world-would have at least entries. The daunting size of these tables (the number of atoms in the observable universe is less than los0) means that (a) no physical agent in this universe will have the space to store the table, (b) the designer would not have time to create the table, (c) no agent could ever learn all the right table entries from its experience, and (d) even if the environment is simple enough to yield a feasible table size, the designer still has no guidance about how to fill in the table entries. Despite all this, TABLE-DRIVEN-AGENT does do what we want: it implements the desired agent function. The key challenge for A1 is to find out how to write programs that, to the extent possible, produce rational behavior from a small amount of code rather than from a large number of table entries. We have many examples showing that this can be done successfully in other areas: for example, the huge tables of square roots used by engineers and schoolchildren prior to the 1970s have now been replaced by a five-line program for Newton's method running on electronic calculators. The question is, can A1 do for general intelligent behavior what Newton did for square roots? We believe the answer is yes. 46 Chapter 2. Intelligent Agents I function R-Vcuu-A(1ocation,status) returns an action I if status = Dirty then return Suck else if location = A then return Right else if location = B then return Left Figure 2.8 The agent program for a simple reflex agent in the two-state vacuum environ- ment. This program implements the agent function tabulated in Figure 2.3. In the remainder of this section, we outline four basic kinds of agent program that embody the principles underlying almost all intelligent systems: Simple reflex agents; Model-based reflex agents; Goal-based agents; and Utility-based agents. We then explain in general terms how to convert all these into learning agents. Simple reflex agents REFLEX The simplest kind of agent is the simple reflex agent. These agents select actions on the basis AGENT of the current percept, ignoring the rest of the percept history. For example, the vacuum agent whose agent function is tabulated in Figure 2.3 is a simple reflex agent, because its decision is based only on the current location and on whether that contains dirt. An agent program for this agent is shown in Figure 2.8. Notice that the vacuum agent program is very small indeed compared to the correspond- ing table. The most obvious reduction comes from ignoring the percept history, which cuts down the number of possibilities from 4T to just 4. A further, small reduction comes from the fact that, when the current square is dirty, the action does not depend on the location. Imagine yourself as the driver of the automated taxi. If the car in front brakes, and its brake lights come on, then you should notice this and initiate braking. In other words, some processing is done on the visual input to establish the condition we call "The car in front is braking." Then, this triggers some established connection in the agent program to the action CONDITION-ACTION RULE "initiate braking." We call such a connection a condition-action rule: written as if car-in-front-is-braking then initiate-braking. Humans also have many such connections, some of which are learned responses (as for driv- ing) and some of which are innate reflexes (such as blinking when something approaches the eye). In the course of the book, we will see several different ways in which such connections can be learned and implemented. The program in Figure 2.8 is specific to one particular vacuum environment. A more general and flexible approach is first to build a general-purpose interpreter for condition- Also called situation-action rules, productions, or if-then rules. Section 2.4. The Structure of Agents 47 Figure 2.9 Schematic diagram of a simple reflex agent. function SIMPLE-REFLEX-AGENT(T-C) returns an action static: rules, a set of condition-action rules state + INTERPRET-INPUT() rule + RULE-MATCH(, rules) action + RULE-ACTION return action Figure 2.10 A simple reflex agent. It acts according to a rule whose condition matches the current state, as defined by the percept. action rules and then to create rule sets for specific task environments. Figure 2.9 gives the structure of this general program in schematic form, showing how the condition-action rules allow the agent to make the connection from percept to action. (Do not worry if this seems trivial; it gets more interesting shortly.) We use rectangles to denote the current internal state of the agent's decision process and ovals to represent the background information used in the process. The agent program, which is also very simple, is shown in Figure 2.10. The INTERPRET-INPUT funcl-ion generates an abstracted description of the current state from the percept, and the RULE-MATCH function returns the first rule in the set of rules that matches the given state description. Note that the description in terms of "rules" and "matching" is purely conceptual; actual implementations can be as simple as a collection of logic gates implementing a Boolean circuit. Simple reflex agents have the admirable property of being simple, but they turn out to be of very limited intelligence. The agent in Figure 2.10 will work only if the correct deci- sion can be made on the basis of only the current percept-that is, only if the envionrnent is fully observable. Even a little bit of unobservability can cause serious trouble. For example, 48 Chapter 2. Intelligent Agents the braking rule given earlier assumes that the condition car-in-front-is-braking can be deter- mined from the current percept-the current video image-if the car in front has a centrally mounted brake light. Unfortunately, older models have different configurations of taillights, brake lights, and turn-signal lights, and it is not always possible to tell from a single image whether the car is braking. A simple reflex agent driving behind such a car would either brake continuously and unnecessarily, or, worse, never brake at all. We can see a similar problem arising in the vacuum world. Suppose that a simple reflex vacuum agent is deprived of its location sensor, and has only a dirt sensor. Such an agent has just two possible percepts: Dirty and Clean. It can Suck in response to Dirty; what should it do in response to Clean? Moving Lefl fails (for ever) if it happens to start in square A, and moving Right fails (for ever) if it happens to start in square B. Infinite loops are often unavoidable for simple reflex agents operating in partially observable environments. RANDOMIZATION Escape from infinite loops is possible if the agent can randomize its actions. For ex- ample, if the vacuum agent perceives Clean, it might flip a coin to choose between Left and Right. It is easy to show that the agent will reach the other square in an average of two steps. Then, if that square is dirty, it will clean it and the cleaning task will be complete. Hence, a randomized simple reflex agent might outperform a deterministic simple reflex agent. We mentioned in Section 2.3 that randomized behavior of the right kind can be rational in some multiagent environments. In single-agent environments, randomization is usually not rational. It is a useful trick that helps a simple reflex agent in some situations, but in most cases we can do much better with more sophisticated deterministic agents. Model-based reflex agents The most effective way to handle partial observability is for the agent to keep track of the part of the world it can't see now. That is, the agent should maintain some sort of internal INTERNALSTATE state that depends on the percept history and thereby reflects at least some of the unobserved aspects of the current state. For the braking problem, the internal state is not too extensive- just the previous frame from the camera, allowing the agent to detect when two red lights at the edge of the vehicle go on or off simultaneously. For other driving tasks such as changing lanes, the agent needs to keep track of where the other cars are if it can't see them all at once. Updating this internal state information as time goes by requires two kinds of knowl- edge to be encoded in the agent program. First, we need some information about how the world evolves independently of the agent-for example, that an overtalung car generally will be closer behind than it was a moment ago. Second, we need some information about how the agent's own actions affect the world-for example, that when the agent turns the steering wheel clockwise, the car turns to the right or that after driving for five minutes northbound on the freeway one is usually about five miles north of where one was five minutes ago. This n knowledge about "how the world works-whether implemented in simple Boolean circuits or in complete scientific theories-is called a model of the world. An agent that uses such a MODEL-BASED model is called a model-based agent. AGENT Figure 2.1 1 gives the structure of the reflex agent with internal state, showing how the current percept is combined with the old internal state to generate the updated description Section 2.4. The Structure of Agents 49 A model-based reflex agent. Figure 2.11 7 function REFLEX-AGENT-WITH-STATE(C) returns an action static: state, a description of the current world state rules, a set of condition-action rules action, the most recent action, initially none state +- UPDATE-STATE(, actiol2, percept) rule +- Ru-MTcH(state, rules) action t RULE-ACTION return action Figure 2.12 A model-based reflex agent. It keeps track of the current state of the world using an internal model. It then chooses an action in the same way as the reflex agent. of the current state. The agent program is shown in Figure 2.12. The interesting part is the function UPDATE-STATE, which is responsible for creating the new internal state description. As well as interpreting the new percept in the light of existing knowledge about the state, it uses information about how the world evolves to keep track of the unseen parts of the world, and also must know about what the agent's actions do to the state of the world. Detailed examples appear in Chapters 10 and 17. Goal-based agents Knowing about the current state of the environment is not always enough to decide what to do. For example, at a road junction, the taxi can turn left, turn right, or go straight on. The correct decision depends on where the taxi is trying to get to. In other words, as well GOAL as a current state description, the agent needs some sort of goal information that describes situations that are desirable-for example, being at the passenger's destination. The agent 50 Chapter 2. Intelligent Agents Figure 2.13 A model-based, goal-based agent. It keeps track of the world state as well as a set of goals it is trying to achieve, and chooses an action that will (eventually) lead to the achievement of its goals. program can combine this with information about the results of possible actions (the same information as was used to update internal state in the reflex agent) in order to choose actions that achieve the goal. Figure 2.13 shows the goal-based agent's structure. Sometimes goal-based action selection is straightforward, when goal satisfaction results immediately from a single action. Sometimes it will be more tricky, when the agent has to consider long sequences of twists and turns to find a way to achieve the goal. Search (Chapters 3 to 6) and planning (Chapters 11 and 12) are the subfields of A1 devoted to finding action sequences that achieve the agent's goals. Notice that decision making of this kind is fundamentally different from the condition- action rules described earlier, in that it involves consideration of the future-both "What will happen if I do such-and-such?' and "Will that make me happy?'In the reflex agent designs, this information is not explicitly represented, because the built-in rules map directly from percepts to actions. The reflex agent brakes when it sees brake lights. A goal-based agent, in principle, could reason that if the car in front has its brake lights on, it will slow down. Given the way the world usually evolves, the only action that will achieve the goal of not hitting other cars is to brake. Although the goal-based agent appears less efficient, it is more flexible because the knowledge that supports its decisions is represented explicitly and can be modified. If it starts to rain, the agent can update its knowledge of how effectively its brakes will operate; this will automatically cause all of the relevant behaviors to be altered to suit the new conditions. For the reflex agent, on the other hand, we would have to rewrite many condition-action rules. The goal-based agent's behavior can easily be changed to go to a different location. The reflex agent's rules for when to turn and when to go straight will work only for a single destination; they must all be replaced to go somewhere new. Sectjon 2.4. The Structure of Agents 5 1 Utility-based agents Goals alone are not really enough to generate high-quality behavior in most environments. For example, there are many action sequences that will get the taxi to its destination (thereby achieving the goal) but some are quicker, safer, more reliable, or cheaper than others. Goals just provide a crude binary distinction between "happy" and "unhappy" states, whereas a more general performance measure should allow a comparison of different world states ac- cording to exactly how happy they would make the agent if they could be achieved. Because "happy" does not sound very scientific, the customary terminology is to say that if one world UTILITY state is preferred to another, then it has higher utility for the agent.' A utility function maps a state (or a sequence of states) onto a real number, which UTILITY FUNCTION describes the associated degree of happiness. A complete specification of the utility function allows rational decisions in two lunds of cases where goals are inadequate. First, when there are conflicting goals, only some of which can be achieved (for example, speed and safety), the utility function specifies the appropriate tradeoff. Second, when there are several goals that the agent can aim for, none of which can be achieved with certainty, utility provides a way in which the likelihood of success can be weighed up against the importance of the goals. In Chapter 16, we will show that any rational agent must behave as if it possesses a utility function whose expected value it tries to maximize. An agent that possesses an explicit utility function therefore can make rational decisions, and it can do so via a general-purpose algorithm that does not depend on the specific utility function being maximized. In this way, the "global" definition of rationality-designating as rational those agent functions that have the highest performance-is turned into a "local" constraint on rational-agent designs that can be expressed in a simple program. The utility-based agent structure appears in Figure 2.14. Utility-based agent programs appear in Part V, where we design decision malung agents that must handle the uncertainty inherent in partially observable environments. Learning agents We have described agent programs with various methods for selecting actions. We have not, so far, explained how the agent programs come into being. In his famous early paper, Turing (1950) considers the idea of actually programming his intelligent machines by hand. He estimates how much work this might take and concludes "Some more expeditious method seems desirable." The imethod he proposes is to build learning machines and then to teach them. In many areas of AS, this is now the preferred method for creating state-of-the-art systems. Learning has another advantage, as we noted earlier: it allows the agenl: to operate in initially unknown environments and to become more competent than its initial knowledge alone might allow. In this section, we briefly introduce the main ideas of learning agents. In almost every chapter of the book, we will comment on opportunities and methods for learning in particular kinds of agents. Part VI goes into much more depth on ithe various learning algorithms themselves. The word "utility" here refers to "the quality of being useful," not to the electric company or water works.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.