Optimization with equality constraints

optimization problems with nonlinear equality constraints and the equality constraints and inequality constraints
libbyConway Profile Pic
Published Date:14-07-2017
Your Website URL(Optional)
i Lecture Notes on Optimization Pravin VaraiyaiiContents 1 INTRODUCTION 1 2 OPTIMIZATION OVER AN OPEN SET 7 3 Optimization with equality constraints 15 4 Linear Programming 27 5 Nonlinear Programming 49 6 Discrete-time optimal control 75 7 Continuous-time linear optimal control 83 8 Coninuous-time optimal control 95 9 Dynamic programing 121 iiiiv CONTENTSPREFACE to this edition Notes on Optimization was published in 1971 as part of the Van Nostrand Reinhold Notes on Sys- tem Sciences, edited by George L. Turin. Our aim was to publish short, accessible treatments of graduate-level material in inexpensive books (the price of a book in the series was about five dol- lars). The effort was successful for several years. Van Nostrand Reinhold was then purchased by a conglomerate which cancelled Notes on System Sciences because it was not sufficiently profitable. Books have since become expensive. However, the World Wide Web has again made it possible to publish cheaply. Notes on Optimization has been out of print for 20 years. However, several people have been using it as a text or as a reference in a course. They have urged me to re-publish it. The idea of making it freely available over the Web was attractive because it reaffirmed the original aim. The only obstacle was to retype the manuscript in LaTex. I thank Kate Klohe for doing just that. I would appreciate knowing if you find any mistakes in the book, or if you have suggestions for (small) changes that would improve it. Berkeley, California P.P. Varaiya September, 1998 vvi CONTENTSPREFACE These Notes were developed for a ten-week course I have taught for the past three years to first-year graduate students of the University of California at Berkeley. My objective has been to present, in a compact and unified manner, the main concepts and techniques of mathematical programming and optimal control to students having diverse technical backgrounds. A reasonable knowledge of advanced calculus (up to the Implicit Function Theorem), linear algebra (linear independence, basis, matrix inverse), and linear differential equations (transition matrix, adjoint solution) is sufficient for the reader to follow the Notes. The treatment of the topics presented here is deep. Although the coverage is not encyclopedic, an understanding of this material should enable the reader to follow much of the recent technical literature on nonlinear programming, (deterministic) optimal control, and mathematical economics. The examples and exercises given in the text form an integral part of the Notes and most readers will need to attend to them before continuing further. To facilitate the use of these Notes as a textbook, I have incurred the cost of some repetition in order to make almost all chapters self-contained. However, Chapter V must be read before Chapter VI, and Chapter VII before Chapter VIII. The selection of topics, as well as their presentation, has been influenced by many of my students and colleagues, who have read and criticized earlier drafts. I would especially like to acknowledge the help of Professors M. Athans, A. Cohen, C.A. Desoer, J-P. Jacob, E. Polak, and Mr. M. Ripper. I also want to thank Mrs. Billie Vrtiak for her marvelous typing in spite of starting from a not terribly legible handwritten manuscript. Finally, I want to thank Professor G.L. Turin for his encouraging and patient editorship. Berkeley, California P.P. Varaiya November, 1971 viiviii CONTENTSChapter 1 INTRODUCTION In this chapter, we present our model of the optimal decision-making problem, illustrate decision- making situations by a few examples, and briefly introduce two more general models which we cannot discuss further in these Notes. 1.1 The Optimal Decision Problem These Notes show how to arrive at an optimal decision assuming that complete information is given. The phrase complete information is given means that the following requirements are met: 1. The set of all permissible decisions is known, and 2. The cost of each decision is known. When these conditions are satisfied, the decisions can be ranked according to whether they incur greater or lesser cost. An optimal decision is then any decision which incurs the least cost among the set of permissible decisions. In order to model a decision-making situation in mathematical terms, certain further requirements must be satisfied, namely, 1. The set of all decisions can be adequately represented as a subset of a vector space with each vector representing a decision, and 2. The cost corresponding to these decisions is given by a real-valued function. Some illustrations will help. Example 1: The Pot Company (Potco) manufacturers a smoking blend called Acapulco Gold. The blend is made up of tobacco and mary-john leaves. For legal reasons the fraction of mary- 1 john in the mixture must satisfy 0 . From extensive market research Potco has determined 2 their expected volume of sales as a function of and the selling pricep. Furthermore, tobacco can be purchased at a fixed price, whereas the cost of mary-john is a function of the amount purchased. If Potco wants to maximize its profits, how much mary-john and tobacco should it purchase, and what pricep should it set? Example 2: Tough University provides “quality” education to undergraduate and graduate stu- dents. In an agreement signed with Tough’s undergraduates and graduates (TUGs), “quality” is 12 CHAPTER 1. INTRODUCTION defined as follows: every year, eachu (undergraduate) must take eight courses, one of which is a seminar and the rest of which are lecture courses, whereas eachg (graduate) must take two seminars and five lecture courses. A seminar cannot have more than 20 students and a lecture course cannot have more than 40 students. The University has a faculty of 1000. The Weary Old Radicals (WORs) have a contract with the University which stipulates that every junior faculty member (there are 750 of these) shall be required to teach six lecture courses and two seminars each year, whereas every senior faculty member (there are 250 of these) shall teach three lecture courses and three seminars each year. The Regents of Touch rate Tough’s President at points peru and points perg “pro- cessed” by the University. Subject to the agreements with the TUGs and WORs how manyu’s and g’s should the President admit to maximize his rating? Example 3: (See Figure 1.1.) An engineer is asked to construct a road (broken line) connection point a to pointb. The current profile of the ground is given by the solid line. The only requirement is that the final road should not have a slope exceeding 0.001. If it costs c per cubic foot to excavate or fill the ground, how should he design the road to meet the specifications at minimum cost? Example 4: Mr. Shell is the manager of an economy which produces one output, wine. There are two factors of production, capital and labor. IfK(t) andL(t) respectively are the capital stock used and the labor employed at timet, then the rate of output of wineW(t) at time is given by the production function W(t)= F(K(t);L(t)) As Manager, Mr. Shell allocates some of the output rateW(t) to the consumption rateC(t),and the remainderI(t) to investment in capital goods. (Obviously,W ,C,I,andK are being measured in a common currency.) Thus,W(t)= C(t)+I(t)= (1−s(t))W(t) wheres(t)= I(t)=W(t) . b . a Figure 1.1: Admissable set of example. 2 0;1 is the fraction of output which is saved and invested. Suppose that the capital stock decays exponentially with time at a rate 0, so that the net rate of growth of capital is given by the following equation: d _ K(t)= K(t) (1.1) dt = −K(t)+s(t)W(t) = −K(t)+s(t)F(K(t);L(t)): The labor force is growing at a constant birth rate of 0. Hence,1.1. THE OPTIMAL DECISION PROBLEM 3 _ L(t)= L(t): (1.2) Suppose that the production function F exhibits constant returns to scale, i.e., F(K;L)= F(K;L) for all 0. If we define the relevant variable in terms of per capita of labor, w = W=L;c =C=L;k =K=l, and if we letf(k)=F(k;l), then we see thatF(K;L)−LF(K=L;1) = Lf(k), whence the consumption per capita of labor becomesc(t)=(l−s(t))f(k(t)). Using these definitions and equations (1.1) and (1.2) it is easy to see thatK(t) satisfies the differential equation (1.3). _ k(t)= s(t)f(k(t))−k(t) (1.3) where=( + ). The first term of the right-hand side in (3) is the increase in the capital-to-labor ratio due to investment whereas the second terms is the decrease due to depreciation and increase in the labor force. Suppose there is a planning horizon timeT , and at time 0 Mr. Shell starts with capital-to-labor R T ratiok . If “welfare” over the planning period 0;T is identified with total consumption c(t)dt, o 0 what should Mr. Shell’s savings policy s(t), 0  t  T, besoastomaximizewelfare? What savings policy maximizes welfare subject to the additional restriction that the capital-to-labor ratio at timeT should be at leastk ? If future consumption is discounted at rate 0 and if time horizon T R 1 − is1, the welfare function becomes e tc(t)dt. What is the optimum policy corresponding to 0 this criterion? These examples illustrate the kinds of decision-making problems which can be formulated math- ematically so as to be amenable to solutions by the theory presented in these Notes.Wemustalways remember that a mathematical formulation is inevitably an abstraction and the gain in precision may have occurred at a great loss of realism. For instance, Example 2 is caricature (see also a faintly re- lated but more more elaborate formulation in Bruno 1970), whereas Example 4 is light-years away from reality. In the latter case, the value of the mathematical exercise is greater the more insensitive are the optimum savings policies with respect to the simplifying assumptions of the mathematical model. (In connection with this example and related models see the critique by Koopmans 1967.) In the examples above, the set of permissible decisions is represented by the set of all points in some vector space which satisfy certain constraints. Thus, in the first example, a permissible 1 decision is any two-dimensional vector ( ;p) satisfying the constraints 0 and 0 2 p. In the second example, any vector (u;g) with u  0;g  0; constrained by the number of faculty and the agreements with the TUGs and WORs is a permissible decision. In the last example, a permissible decision is any real-valued function s(t); 0  t  T , constrained by 0s(t) 1. (It is of mathematical but not conceptual interest to note that in this case a decision is represented by a vector in a function space which is infinite-dimensional.) More concisely then, these Notes are concerned with optimizing (i.e. maximizing or minimizing) a real-valued function over a vector space subject to constraints. The constraints themselves are presented in terms of functional inequalities or equalities.4 CHAPTER 1. INTRODUCTION At this point, it is important to realize that the distinction between the function which is to be optimized and the functions which describe the constraints, although convenient for presenting the mathematical theory, may be quite artificial in practice. For instance, suppose we have to choose the durations of various traffic lights in a section of a city so as to achieve optimum traffic flow. Let us suppose that we know the transportation needs of all the people in this section. Before we can begin to suggest a design, we need a criterion to determine what is meant by “optimum traffic flow.” More abstractly, we need a criterion by which we can compare different decisions, which in this case are different patterns of traffic-light durations. One way of doing this is to assign as cost to each decision the total amount of time taken to make all the trips within this section. An alternative and equally plausible goal may be to minimize the maximum waiting time (that is the total time spent at stop lights) in each trip. Now it may happen that these two objective functions may be inconsistent in the sense that they may give rise to different orderings of the permissible decisions. Indeed, it may be the case that the optimum decision according to the first criterion may be lead to very long waiting times for a few trips, so that this decision is far from optimum according to the second criterion. We can then redefine the problem as minimizing the first cost function (total time for trips) subject to the constraint that the waiting time for any trip is less than some reasonable bound (say one minute). In this way, the second goal (minimum waiting time) has been modified and reintroduced as a constraint. This interchangeability of goal and constraints also appears at a deeper level in much of the mathematical theory. We will see that in most of the results the objective function and the functions describing the constraints are treated in the same manner. 1.2 Some Other Models of Decision Problems Our model of a single decision-maker with complete information can be generalized along two very important directions. In the first place, the hypothesis of complete information can be relaxed by allowing that decision-making occurs in an uncertain environment. In the second place, we can replace the single decision-maker by a group of two or more agents whose collective decision determines the outcome. Since we cannot study these more general models in these Notes,we merely point out here some situations where such models arise naturally and give some references. 1.2.1 Optimization under uncertainty. A person wants to invest 1,000 in the stock market. He wants to maximize his capital gains, and at the same time minimize the risk of losing his money. The two objectives are incompatible, since the stock which is likely to have higher gains is also likely to involve greater risk. The situation is different from our previous examples in that the outcome (future stock prices) is uncertain. It is customary to model this uncertainty stochastically. Thus, the investor may assign probability 0.5 to the event that the price of shares in Glamor company increases by 100, probability 0.25 that the price is unchanged, and probability 0.25 that it drops by 100. A similar model is made for all the other stocks that the investor is willing to consider, and a decision problem can be formulated as follows. How should 1,000 be invested so as to maximize the expected value of the capital gains subject to the constraint that the probability of losing more than 100 is less than 0.1? As another example, consider the design of a controller for a chemical process where the decision variable are temperature, input rates of various chemicals, etc. Usually there are impurities in the chemicals and disturbances in the heating process which may be regarded as additional inputs of a1.2. SOME OTHER MODELS OF DECISION PROBLEMS 5 random nature and modeled as stochastic processes. After this, just as in the case of the portfolio- selection problem, we can formulate a decision problem in such a way as to take into account these random disturbances. If the uncertainties are modelled stochastically as in the example above, then in many cases the techniques presented in these Notes can be usefully applied to the resulting optimal decision problem. To do justice to these decision-making situations, however, it is necessary to give great attention to the various ways in which the uncertainties can be modelled mathematically. We also need to worry about finding equivalent but simpler formulations. For instance, it is of great signif- icance to know that, given appropriate conditions, an optimal decision problem under uncertainty is equivalent to another optimal decision problem under complete information. (This result, known as the Certainty-Equivalence principle in economics has been extended and baptized the Separation Theorem in the control literature. See Wonham 1968.) Unfortunately, to be able to deal with these models, we need a good background in Statistics and Probability Theory besides the material presented in these Notes. We can only refer the reader to the extensive literature on Statistical De- cision Theory (Savage 1954, Blackwell and Girshick 1954) and on Stochastic Optimal Control (Meditch 1969, Kushner 1971). 1.2.2 The case of more than one decision-maker. Agent Alpha is chasing agent Beta. The place is a large circular field. Alpha is driving a fast, heavy car which does not maneuver easily, whereas Beta is riding a motor scooter, slow but with good maneuverability. What should Alpha do to get as close to Beta as possible? What should Beta do to stay out of Alpha’s reach? This situation is fundamentally different from those discussed so far. Here there are two decision-makers with opposing objectives. Each agent does not know what the other is planning to do, yet the effectiveness of his decision depends crucially upon the other’s decision, so that optimality cannot be defined as we did earlier. We need a new concept of rational (optimal) decision-making. Situations such as these have been studied extensively and an elaborate structure, known as the Theory of Games, exists which describes and prescribes behavior in these situations. Although the practical impact of this theory is not great, it has proved to be among the most fruitful sources of unifying analytical concepts in the social sciences, notably economics and political science. The best single source for Game Theory is still Luce and Raiffa 1957, whereas the mathematical content of the theory is concisely displayed in Owen 1968. The control theorist will probably be most interested in Isaacs 1965, and Blaquiere, et al., 1969. The difficulty caused by the lack of knowledge of the actions of the other decision-making agents arises even if all the agents have the same objective, since a particular decision taken by our agent may be better or worse than another decision depending upon the (unknown) decisions taken by the other agents. It is of crucial importance to invent schemes to coordinate the actions of the individual decision-makers in a consistent manner. Although problems involving many decision-makers are present in any system of large size, the number of results available is pitifully small. (See Mesarovic, et al., 1970 and Marschak and Radner 1971.) In the author’s opinion, these problems represent one of the most important and challenging areas of research in decision theory.6 CHAPTER 1. INTRODUCTIONChapter 2 OPTIMIZATION OVER AN OPEN SET In this chapter we study in detail the first example of Chapter 1. We first establish some notation which will be in force throughout these Notes. Then we study our example. This will generalize to a canonical problem, the properties of whose solution are stated as a theorem. Some additional properties are mentioned in the last section. 2.1 Notation 2.1.1 All vectors are column vectors, with two consistent exceptions mentioned in 2.1.3 and 2.1.5 below and some other minor and convenient exceptions in the text. Prime denotes transpose so that if n 0 0 0 x 2 R thenx is the row vectorx = (x ;:::;x ),andx =(x ;:::;x ) . Vectors are normally 1 n 1 n n denoted by lower case letters, the ith component of a vector x 2 R is denoted x , and different i j k vectors denoted by the same symbol are distinguished by superscripts as inx and x . 0 denotes both the zero vector and the real number zero, but no confusion will result. 0 0 0 Thus ifx =(x ;:::;x ) andy =(y ;:::;y ) thenxy = x y +::: +x y as in ordinary 1 n 1 n 1 1 n n p n 0 matrix multiplication. Ifx2R we definejxj=+ xx. 2.1.2 0 0 Ifx=(x ;:::;x ) andy =(y ;:::;y ) thenxy meansx y ,i=1;:::;n: In particular if 1 n 1 n i i n x2R ,thenx 0,ifx  0;i=1;:::;n: i 2.1.3 j Matrices are normally denoted by capital letters. IfA is anmn matrix, thenA denotes the jth j column of A,andA denotes the ith row of A. Note that A is a row vector. A denotes the entry i i i ofA in theith row andjth column; this entry is sometimes also denoted by the lower case letter a , and then we also writeA =fa g. I denotes the identity matrix; its size will be clear from the ij ij context. If confusion is likely, we writeI to denote thenn identity matrix. n 78 CHAPTER 2. OPTIMIZATION OVER AN OPEN SET 2.1.4 n m n Iff :R R is a function, its ith component is writtenf;i=1;:::;m. Note thatf :R R. i i Sometimes we describe a function by specifying a rule to calculate f(x) for everyx. In this case we writef :x7f(x). For example, ifA is anmn matrix, we can writeF :x7Ax to denote n m n the functionf :R R whose value at a pointx2R isAx. 2.1.5 n Iff :R 7R is a differentiable function, the derivative off atx is the row vector ((f=x )(x );:::;(f=x )(x )). 1 n This derivative is denoted by (f=x)(x ) orf (x) orf=xj orf j , and if the argumentx x x=x x x=x 0 is clear from the context it may be dropped. The column vector (f (x)) is also denotedr f(x), x x and is called the gradient of f at x .If f :(x;y) 7 f(x;y) is a differentiable function from n m R R intoR, the partial derivative off with respect tox at the point (x;y ) is then-dimensional row vector f (x;y )=(f=x)(x; y )=((f=x )(x; y );:::;(f=x )(x; y )), and similarly x 1 n n m f (x;y )=(f=y)(x; y )= ((f=y )(x; y );:::;(f=y )(x; y )). Finally, iff : R R is y 1 m a differentiable function with componentsf ;:::;f , then its derivative atx is themn matrix 1 m 2 3 f (x) 1x f 6 7 . . (x)=f x = x 4 5 . x f (x) mx 2 3 f f 1 1 ::: (x) (x) x x 1 n 6 7 . . 6 7 . . = . . 4 5 ::: f f m m (x) (x) x x 1 n 2.1.6 n 2 Iff :R R is twice differentiable, its second derivative atx is thenn matrix ( f=xx)(x )= j 2 f (x) where (f (x)) =( f=x x )(x ). Thus, in terms of the notation in Section 2.1.5 above, xx xx j i i 0 f (x)=(=x)(f ) (x). xx x 2.2 Example We consider in detail the first example of Chapter 1. Define the following variables and functions: = fraction of mary-john in proposed mixture, p = sale price per pound of mixture, v = total amount of mixture produced, f( ;p)= expected sales volume (as determined by market research) of mixture as a function of( ;p):2.2. EXAMPLE 9 Since it is not profitable to produce more than can be sold we must have: v = f( ;p); m = amount (in pounds) of mary-john purchased, and t = amount (in pounds) of tobacco purchased. Evidently, m = v; and t =(l− )v: Let P (m)= purchase price ofm pounds of mary-john, and 1 P = purchase price per pound of tobacco. 2 Then the total cost as a function of ;p is C( ;p)=P ( f( ;p)) +P (1− )f( ;p): 1 2 The revenue is R( ;p)=pf( ;p); so that the net profit is N( ;p)=R( ;p)−C( ;p): 1 The set of admissible decisions is Ω,where Ω =f( ;p)j0 ;0p1g. Formally, we 2 have the the following decision problem: Maximize N( ;p); subject to ( ;p)2 Ω:  Suppose that ( ;p) is an optimal decision, i.e.,   ( ;p )2 Ω and (2.1)   N( ;p )N( ;p) for all ( ;p)2 Ω:   We are going to establish some properties of (a ;p ). First of all we note that Ω is an open subset 2 ofR . Hence there exits" 0 such that   ( ;p)2 Ω whenever j( ;p)− ( ;p )j" (2.2) 0 2 In turn (2.2) implies that for every vector h =(h ;h ) in R there exists 0 ( of course 1 2 depends onh) such that   (( ;p )+(h ;h ))2 Ω for 0 (2.3) 1 210 CHAPTER 2. OPTIMIZATION OVER AN OPEN SET 1 2   ( ;p )+(h ;h ) 1 2  .   (a ;p ) Ω h h p Figure 2.1: Admissable set of example. Combining (2.3) with (2.1) we obtain (2.4):     N( ;p )N( +h ;p +h ) for 0 (2.4) 1 2 Now we assume that the functionN is differentiable so that by Taylor’s theorem   N( ;p ) N N       N( +h ;p +h )= + ( ;p )h + ( ;p )h (2.5) 1 2 1 2 p +o(); where o 0 as  0: (2.6)  Substitution of (2.5) into (2.4) yields N   N   0 ( ;p )h + ( ;p )h +o(): 1 2 p Dividing by 0 gives o() N   N   (2.7) 0 ( ;p )h + ( ;p )h + : 1 2 p  Letting approach zero in (2.7), and using (2.6) we get N N     0 ( ;p )h + ( ;p )h : (2.8) 1 2 p   Thus, using the facts thatN is differentiable, ( ;p ) is optimal, and is open, we have concluded 2 that the inequality (2.9) holds for every vectorh2R . Clearly this is possible only if N N     ( ;p )=0: ( ;p )=0; (2.9) p Before evaluating the usefulness of property (2.8), let us prove a direct generalization.2.3. THE MAIN RESULT AND ITS CONSEQUENCES 11 2.3 The Main Result and its Consequences 2.3.1 Theorem . n n  Let Ω be an open subset ofR .Letf:R R be a differentiable function. Letx be an optimal solution of the following decision-making problem: Maximize f(x) (2.10) subject to x2 Ω: Then f  (x )=0: (2.11) x  Proof: Sincex 2 Ω and Ω is open, there exists" 0 such that  x2 Ω whenever jx−x j": (2.12) n In turn, (2.12) implies that for every vectorh2R there exits 0 ( depending onh) such that  (x +h)2 Ω whenever 0: (2.13)  Sincex is optimal, we must then have   f(x )f(x +h) whenever 0: (2.14) Sincef is differentiable, by Taylor’s theorem we have f    (2.15) f(x +h)=f(x )+ (x )h +o(); x where o() as  0 (2.16) 0  Substitution of (2.15) into (2.14) yields f  0 (x )h +o() x and dividing by 0 gives f o()  (2.17) 0 (x )h + x  Letting approach zero in (2.17) and taking (2.16) into account, we see that f  0 (x )h; (2.18) x n Since the inequality (2.18) must hold for everyh2R ,wemusthave f  0= (x ); x and the theorem is proved. 12 CHAPTER 2. OPTIMIZATION OVER AN OPEN SET Table 2.1 Does there exist At how many points an optimal deci- in Ω is 2.2.2 Further Case sion for 2.2.1? satisfied? Consequences  1 Yes Exactly one point, x is the  sayx unique optimal 2 Yes More than one point 3 No None 4 No Exactly one point 5 No More than one point 2.3.2 Consequences. Let us evaluate the usefulness of (2.11) and its special case (2.18). Equation (2.11) gives us n    0 equations which must be satisfied at any optimal decisionx =(x ;:::;x ) . 1 n These are f f   f  (x )=0; (x )=0;:::; (x )=0 (2.19) x x x 1 2 n Thus, every optimal decision must be a solution of thesen simultaneous equations ofn variables, so that the search for an optimal decision from Ω is reduced to searching among the solutions of (2.19). In practice this may be a very difficult problem since these may be nonlinear equations and it may be necessary to use a digital computer. However, in these Notes we shall not be overly concerned with numerical solution techniques (but see 2.4.6 below). The theorem may also have conceptual significance. We return to the example and recall the N = R−C. Suppose thatR andC are differentiable, in which case (2.18) implies that at every   optimal decision ( ;p ) R C R   C       ( ;p )= ( ;p ); ( ;p )= ( ;p ); p p or, in the language of economic analysis, marginal revenue = marginal cost. We have obtained an important economic insight. 2.4 Remarks and Extensions 2.4.1 A warning.  Equation (2.11) is only a necessary condition forx to be optimal. There may exist decisionsx 2 Ω such thatf (x)=0 butx is not optimal. More generally, any one of the five cases in Table 2.1 may x occur. The diagrams in Figure 2.1 illustrate these cases. In each caseΩ=(−1;1). Note that in the last three figures there is no optimal decision since the limit points -1 and +1 are not in the set of permissible decisionsΩ=(−1;1). In summary, the theorem does not give us any clues concerning the existence of an optimal decision, and it does not give us sufficient conditions either.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.