Lecture notes on Optimization Theory

lecture notes on optimization techniques lecture notes on constrained optimization and robust optimization lecture notes pdf free download
ImogenCameron Profile Pic
Published Date:14-07-2017
Your Website URL(Optional)
FUNDAMENTALS OF OPTIMIZATION LECTURE NOTES 2007 R. T. Rockafellar Dept. of Mathematics University of Washington Seattle CONTENTS 1. What is Optimization? 1 2. Problem Formulation 15 3. Unconstrained Minimization 33 4. Constrained Minimization 49 5. Lagrange Multipliers 65 6. Games and Duality 90 X. Exercises 1161. WHAT IS OPTIMIZATION? Optimization problem: Maximizing or minimizing some function relative to some set, often representing a range of choices available in a certain situation. The function allows comparison of the different choices for determining which might be “best.” Common applications: Minimal cost, maximal profit, best approximation, optimal de- sign, optimal management or control, variational principles. Goals of the subject: Understanding the practical and theoretical aspects of: Modeling issues: What to look for in setting up an optimization problem? What fea- tures are advantageous or disadvantageous? What aids to formulation are avail- able? How can problems usefully be categorized? Analysis of solutions: What is meant by a “solution?” When do solutions exist, and whenaretheyunique? Howcansolutionsberecognizedandcharacterized? What happens to solutions under perturbations? Numerical methods: How can solutions be determined by iterative schemes of compu- tation? What modes of local simplification of a problem are appropriate? How can different solution techniques be compared and evaluated? Distinguishing features of optimization as a mathematical discipline: Descriptive math −→ prescriptive math: Much of mathematics in the past has been devotedtodescribinghowsystemsbehave, e.g.inthelawsofphysics. Theadvent of computers, however, has brought an emphasis on using mathematics to make systems behave in chosen ways, and choices lead to questions of optimization. Equations−→ inequalities: Optimization typically deals with variables that have to lie within certain ranges, dictated by the limitations of what choices are allowable, and this leads to a dominance of relationships that are expressed by inequalities rather than equations. Linear/nonlinear −→ convex/nonconvex: The familiar division between linearity and nonlinearityislessimportantinoptimizationthantheonebetweenconvexityand nonconvexity, which for its appreciation requires a new investment in concepts. Differential calculus−→ subdifferential calculus: The prevalence of inequalities, along with the special properties of “max” and “min” as operations, raise the need for a methodology that doesn’t rely so much as classical mathematics on supposing surfaces to be smooth and functions to be differentiable everywhere. 1Finite-dimensionaloptimization: Thecasewhereachoicecorrespondstoselectingthe values of a finite number of real variables, called decision variables. For purposes of general discussion, such as now, the decision variables may be denoted byx ,...,x 1 n n and each allowable choice therefore identified with a point x = (x ,...,x )∈IR . 1 n Feasibility: The points x representing allowable choices are said to be feasible. They n form the subset ofIR over which the maximization or minimization takes place. Constraints: Conditions on the decision variables that are used to specify the set of n feasible points x in IR . Equality and inequality constraints: Conditions of the form f (x) = c , f (x) ≤ c or i i i i n f (x) ≥ c for certain functions f on IR and constants c in IR. (Note: Strict i i i i inequalities are avoided in constraints, for reasons which will be clear later.) Range constraints: Conditions restricting the values of some decision variables to lie within certain closed intervals of IR. Very important in many situations, for instance, are nonnegativity constraints: some variables x may only be allowed j to take values ≥ 0; the interval then is 0,∞). Range constraints can also arise from the desire to keep a variable between certain upper and lower bounds. Linear constraints: This covers range constraints and conditionsf (x) =c ,f (x)≤c , i i i i or f (x)≥ c , in which the function f is linear—in the standard sense of being i i i expressible as sum of constant coefficients times the variables x ,...,x . 1 n Data parameters: Problem statements usually involve not only decision variables but symbols designating “given” constants and coefficients. Conditions on such ele- ments, such as the nonnegativity of a particular coefficient, are not among the “constraints” in a problem of optimization, however crucial they may be from some other perspective. Knownsandunknowns: Agoodwaytothinkabouttheimportantdistinctionbetween decision variables and data parameters is as follows. This may help you through confusing situations where certain “parameters” might actually be decision variables and certain “variables” actually data parameters. In practice, of course, all kinds of symbols besides x get used, and it can get hard to tell things apart. The decision variables in an optimization problem are unknowns that are open to manipulation in the process of maximization or minimization, whereas the data parameters aren’t open to manipulation when it comes to solving a particular prob- lem, but instead would be furnished with specific numerical values as inputs to any solution procedure. 2Mathematical programming: This is a synonym for finite-dimensional optimization. Its usage predates “computer programming,” which actually arose from attempts at solving optimization problems on early computers. “Programming,” in the sense of optimization, survives in problem classifications such as linear programming, quadratic programming, convex programming, integer programming, and so forth. EXAMPLE 1: Engineering design General description: In the design of some object, system or structure, the values of certain parameters can be chosen subject to some conditions expressing their ranges and interrelationships. The choice determines the values of a number of other variables on which the desirability of the end product depends, such as cost, weight, speed, bandwidth, reliability, .... Among the choices of the design parameters that meet certain performance specifications, what’s “best” by some overall criterion? An illustration, optimal proportions of a can: The following example, although toy-sized, brings out many features of optimization modeling. A cylindrical can of a given volumeV is to be proportioned in such a way as to minimize the total 0 cost of the material in a box of 12 cans, arranged in a 3×4 pattern. The cost expression takes the formc S +c S , whereS is the surface area of the 12 cans 1 1 2 2 1 andS is the surface area of the box. The coefficientsc andc are nonnegative. 2 1 2 A side requirement is that no dimension of the box can exceed a given size D . 0 design parameters: r = radius of can, h = height of can 2 volume constraint: πr h =V (or ≥V , see below) 0 0 2 surface area of cans: S = 12(2πr +2πrh) = 24πr(r+h) 1 box dimensions: 8r×6r×h 2 surface area of box: S = 2(48r +8rh+6rh) = 4r(24r+7h) 2 size constraints: 8r≤D , 6r≤D , h≤D 0 0 0 nonnegativity constraints: r≥ 0, h≥ 0 () Summary: The decision variables are r and h, and the choices that are available can 2 be identified with the setC⊂IR consisting of all (r,h) satisfying the conditions 2 r≥ 0, h≥ 0, 8r≤D , 6r≤D , h≤D , πr h =V . 0 0 0 0 The first five conditions can be thought of as range constraints and the sixth as 2 an equality constraint on f (r,h) =πr h. Over the set C we wish to minimize 1     2 f (r,h) =c 24πr(r+h) +c 4r(24r+7h) =d r +d rh, 0 1 2 1 2 3whered = 24πc +96c andd = 24πc +28c . The symbolsV ,D ,c andc , 1 1 2 2 1 2 0 0 1 2 andultimatelyd andd ,aredataparameters. Althoughc ≥ 0andc ≥ 0,these 1 2 1 2 aren’t “constraints” in the problem. As forS andS , they were only introduced 1 2 as temporary symbols and didn’t end up as decision variables. Redundantconstraints: Itisobviousthatthecondition6r≤D isimpliedbytheother 0 constraints and therefore could be dropped without affecting the problem. But in problems with many variables and constraints such redundancy may be hard to recognize. The elimination of redundant constraints could pose a practical challenge as serious as that of solving the optimization problem itself. Inactiveconstraints: Itcouldwellbetruethattheoptimalpair(r,h)(unique??) issuch that either the condition 8r≤D or the condition h≤D is satisfied as a strict 0 0 inequality, or both. In that case the constraints in question are inactive in the local characterization of optimal point, although they do affect the shape of the setC. Again, however, there is little hope, in a problem with many variables and constraints, of determining by some preliminary procedure just which constraints will be active and which will not. This is the crux of the difficulty in many numerical approaches. 2 Redundant variables: It would be possible to solve the equation πr h = V for h in 0 terms ofr and thereby reduce the given problem to one in terms of justr, rather than (r,h). Fine—but besides being a technique that is usable only in special circumstances, the elimination of variables from (generally nonlinear) systems of equations is not necessarily helpful. There may be a trade-off between the lower dimensionality achieved in this way and other properties. 2 Inequalities versus equations: The constraint πr h =V could be written in the form 0 2 πr h ≥ V without affecting anything about the solution. This is because of 0 0 the nature of the cost function; no pair (r,h) in the larger set C , obtained by substituting this weaker condition for the equation, can minimize f unless 0 actually (r,h) ∈ C. While it may seem instinctive to prefer the equation to the inequality in the formulation, the inequality turns out to be superior in the 0 present case because the set C happens to be “convex,” whereas C isn’t. Convexity: Even with the reformulation just suggested, the problem wouldn’t be fully of convex type because the function ofr andh being minimized isn’t itself “con- vex”;furthermaneuversmightgetaroundthat. Thelessonisthattheformulation of a problem can be subtle, when it comes to bringing out features of importance in optimization. Knowledge and experience play a valuable role. 4EXAMPLE 2: Utilization of Resources General description: Actions are to be taken that hopefully will result in profit, but these actions, which correspond to selecting the values of a number of variables, drawonsomelimitedresources. Whatisthebestwaytoallocatetheresourcesso as to maximize the profit, or for that matter to optimize the actions with respect to some other criterion? There are many variants, such as using resources to meet given needs while minimizing cost or pollution. In this class of problems inequality constraints are especially prominent. An illustration, tourist sales: We keep to toy reality for the purpose of concentrat- ingbetteronkeyaspectsofoptimizationmodeling. ThesummerOlympicGames are coming to town, and a teenage entrepreneur is planning to make money off tourists by selling them souvenir sun hats and umbrellas. He can get plain sun hats and umbrellas at costs of c and c dollars each, respectively, and then use 1 2 a kit he has in order to imprint them with a souvenir logo. With the logo he can sell them for p and p dollars each, in unlimited quantity. 1 2 He does have limitations from two directions, however. First, he has only k dollars to invest in the venture. Second, he has to store his items between the time he can get them and the time of the Games. He has a total ofb cubic meters of space that he can use for free in his own basement, and there’s an extra e cubic meters of space available in his neighbor’s house—but to the extent that he uses that he’ll have to pay d dollars per cubic meter over the storage period. Sun hats and umbrellas take up s and s cubic meters of storage space each, 1 2 respectively. How many sun hats and umbrellas should he stock up, and how much space should he rent from the neighbor, if any? The “resource” limits here are in the capital k, the free space b, and the rental space e. actions: x = hats ordered, x = umbrellas ordered, x = space rented 1 2 3 range constraints: x ≥ 0, x ≥ 0, 0≤x ≤e 1 2 3 storage constraint: s x +s x ≤b+x 1 1 2 2 3 investment constraint: c x +c x +dx ≤k 1 1 2 2 3 profit expression: (p −c )x +(p −c )x −dx 1 1 1 2 2 2 3 Summary: The decision variables are x , x and x . Besides the range constraints 1 2 3 there are two inequality constraints in terms of functions of these variables—this isbetterseenbywritingthestoragerequirementintheforms x +s x −x ≤b. 1 1 2 2 3 3 All these constraints together determine a feasible set of points (x ,x ,x )∈IR . 1 2 3 Over this set the function given by the profit expression is to be maximized. 5Linearity: Note that in this formulation a linear function of the decision variables is being maximized subject to a system of linear constraints. Continuous versus integer variables: It would make sense to restrict the variables x 1 and x to have whole values only, since there’s no meaning to selling a fraction 2 of a hat or umbrella. That would have major effects on the ease with which the problem could be solved, however. It’s customary therefore to represent discrete quantities by continuous variables, as a practical simplification as long as the magnitudes are large relative to the “quantum of discreteness.” In some situations we couldn’t take that view. If the optimization problem involved a decision about whether to build a shopping center or not, and we modeled that as a zero-one variable (no=0, yes=1), we surely would want to steer away from fractions. Integer programming: Problems with integer variables constitute a special, more difficult area of optimization which we won’t be treating. Breaks in cost structure: The way storage costs have been treated in this example deserves close attention. We could have set the problem up instead with x , x 1 2 and a decision variabley standing for the amount of storage space to be used; the relationbetweenthesevariableswouldthenbehandledasanequationconstraint, s x +s x −y = 0. Theny would be restricted to the interval 0,b+e, and the 1 1 2 2 profit expressionto be maximizedwouldbe (p −c )x +(p −c )x −g(y), where 1 1 1 2 2 2 g(y) denotes the storage cost fory cubic meters. What would the functiong look like? We would haveg(y) = 0 for 0≤y≤b butg(y) =dy−b forb≤y≤b+e. It wouldn’t be linear, just “piecewise linear” with a breakpoint at y = b, where it would have a kink and not be differentiable. Incontrast, theformulationaboveachievedlinearity. How? Thatwasdoneby widening the range of possible actions. The entrepreneur was permitted (in the variablex ) to rent space from his neighbor even if, on the basis of the quantities 3 x and x he had chosen, he didn’t need all of it. No constraint was added to 1 2 prevent squandering money on useless space. Allowing such squandering was harmless because it would be eliminated in the course of optimizing. Extension to uncertain prices, a worst-case model: Often in planning for the future there are uncertainties about some of the data parameters. That’s a major issue which we can’t get into here, but a simple tactic is worth discussing now for its mathematical features, which often come up in other circumstances as well. The tourist sales example is a good vehicle for explaining this. 6The pricesp andp for the sun hats and umbrellas could depend on how the 1 2 weather turns out at the time of the Games. Let’s imaginep andp correspond 1 2 tonormalweather, buttherearetwootherpossibilities: verysunnyorveryrainy. 0 0 In the very sunny case the two prices would instead bep andp (higher for hats, 1 2 00 00 lower for umbrellas), whereas in the very rainy case they would be p and p 1 2 (lower for hats, higher for umbrellas). Then the entrepreneur would in fact face 0 0 three possible sales outcomes, given by the expressions p x +p x , p x +p x 1 1 2 2 1 2 1 2 00 00 and p x +p x . There might be a statistical approach to coping with this, but 1 2 1 2 an alternative is to maximize, subject to the given constraints, the function  0 0 00 00 f(x ,x ,x ) = min p x +p x , p x +p x , p x +p x 1 2 3 1 1 2 2 1 2 1 2 1 2 1 2 −c x −c x −dx . 1 1 2 2 3 This function gives the worst of all the possible profit outcomes that could result from a particular choice of x , x and x . (Note that “min” refers merely to the 1 2 3 numerical operation of taking the lowest value from a collection of numbers, here a collection of three numbers which happen to depend on x , x and x .) 1 2 3 Nonlinearity: Even though the function f is built up from linear expressions, it’s not itself linear. Moreover it has kinks at which its differentiability fails. Linearizing device for worst-case models: This example can be reformulated to get rid of the nonlinearity. The trick is to introduce another variable, u say, and think of the problem as one in x , x , x and u in which the old constraints are kept, 1 2 3 0 0 but there are new constraints p x +p x −u ≥ 0, p x +p x −u ≥ 0 and 1 1 2 2 1 2 1 2 00 00 p x +p x −u≥ 0, and the expression to maximize is u−c x −c x −dx . 1 2 1 1 2 2 3 1 2 EXAMPLE 3: Management of Systems General description: A sequence of decisions must be made in discrete time which will affect the operation of some kind of “system,” often of an economic nature. As in Example 2, the decisions, each in terms of choosing the values of a number of variables, have to respect limitations in resources, and the aim is to minimize cost or maximize profit or efficiency, say, but over a particular time horizon. An illustration, inventory management: A warehouse with total capacity a (in unitsofvolume)istobeoperatedovertimeperiodst = 1,...,T asthesolefacility for the supply of a number of different commodities (or medicines, or equipment parts, etc.), indexed byj = 1,...,n. The demand for commodityj during period tistheknownamountd ≥ 0(involumeunits)—thisisadeterministic approach tj 7to modeling the situation. In each periodt it is possible not only to fill demands but to acquire additional supplies up to certain limits, so as to maintain stocks. The problem is to plan the pattern of acquiring supplies in such a way as to maximize the net profit over the T periods, relative to the original inventory amounts and the desired terminal inventory amounts. inventory variables: x units of commodity j at the end of period t tj P n inventory constraints: x ≥ 0, x ≤a for t = 1,...,T tj tj j=1 initial inventory: x units of j given at the beginning 0j terminal constraints: x =b (given amounts) for j = 1,...,n j Tj inventory costs: s dollars per unit of j held from t to t+1 tj supply variables: u units of j acquired during period t tj supply constraints: 0≤u ≤a (given availabilities) tj tj supply costs: c dollars per unit of j acquired during t tj  dynamical constraints: x = max 0, x +u −d tj t−1,j tj tj rewards: p dollars per unit of filled demand tj  filled demand: min d , x +u units of j during period t tj t−1,j tj   P P T n net profit: p min d , x +u −s x −c u tj tj t−1,j tj tj tj tj tj t=1 j=1 Summary: The latter expression, as a function of x and u for t = 1,...,T and tj tj j = 1,...,n(thesebeingthedecisionvariables),istobemaximizedsubjecttothe inventoryconstraints, terminalconstraints, supplyconstraintsandthe dynamical 2Tn constraints, which are regarded as determining a feasible set of points in IR . The symbols d , a, x , b , s , a , c and p stand for data parameters. tj 0j j ij tj tj j Large-scale context: The number of variables and constraints that can be involved in a problem may well be very large, and the interrelationships may be too complex to appreciate in any direct manner. This calls for new ways of thinking and for more reliance on guidelines provided by theory. Uncertainty: Clearly, the assumption that the demands d are known precisely in tj advance is unrealistic for many applications, although by solving the problem in this case one might nonetheless learn a lot. To pass from deterministic modeling tostochastic modeling,whereeachd isarandomvariable(andthesameperhaps tj forotherdataelementslikea ), itisnecessarytoexpandtheconceptualhorizons tj considerably. The decision vector (u ,...,u ) at time t must be viewed as an t1 tn unknown function of the “information” available to the decision maker at that time, rather than just at the initial time, but this type of optimization is beyond 8us here. A worst-case approach could be taken, although this likewise would seriously raise the level of complication. Dependentvariables: Thevaluesofthevariablesx aredeterminedbythevaluesofthe tj variables u for t = 1,...,T and j = 1,...,n through the dynamical equations tj and the initial values. In principal, therefore, a specific expression in the latter variables couldbe substitutedforeachx , andthe dimensionalityof the problem tj could thereby be cut sharply. But this trick, because it hides basic aspects of structure, could actually make the problem harder to analyze and solve. P n Constraints versus penalties: The requirements that x ≤ a for t = 1,...,T tj j=1 and thatx =b , although innocent-looking, are troublesome. Better modeling j Tj would involve some recourse in the eventuality of these conditions not being satisfied. For instance, instead of a constraint involving the capacity one could incorporateintothefunctionbeingminimizedapenaltyterm,whichkicksinwhen the total inventory being stored rises above a (perhaps with the interpretation that extra storage space has to be rented). Max and min operations: The “max” operation in the dynamics and the “min” op- eration in the expression of the net profit force the consideration of functions that aren’t differentiable everywhere and thus don’t submit to ordinary calculus. Sometimes this is unavoidable and points to the need for fresh developments in analysis. Other times it can be circumvented by reformulation. The present ex- ample fits with the latter. Really, it would be better to introduce more variables, namelyv as the amount of goodj used to meet demands at timet. In terms of tj these additional variables, constrained by 0≤ v ≤ d , the dynamics turn out tj tj to be linear, x =x +u −v , tj t−1,j tj tj and so too does the profit expression, which is now T n XX  p v −s x −c u . tj tj tj tj tj tj t=1 j=1 Hidden assumptions: The alternative model just described with variables v is better tj in other ways too. The original model had the hidden assumption that demands inanyperiodshouldalwaysbemetasfaraspossiblefromthestocksonhand. But this might be disadvantageous if rewards will soon be higher but inventory can only be built up slowly due to the constraints on availabilities. The alternative model allows sales to be held off in such circumstances. 9EXAMPLE 4: Identification of Parameter Values Generaldescription: It’scommoninscienceandengineeringtomodelbehaviorbya mathematicallaw, likeanequation, whichhowevercan’tbeimplementedwithout specifying the values of various parameters (“constants”) that appear in it. A basic task is to determine the parameter values that provide the best fit to the available data, known through experiment or observation. This is central to statistics (regression, maximum likelihood), econometrics, error analysis, etc. In speaking of “best” fit, reference is evidently being made to some criterion for optimization, but there isn’t always just one. Note also a linguistic pitfall: “the” best fit suggests uniqueness of the answer being sought, but even relative to a single criterion there might be a tie, with different answers equally good. This kind of optimization is entirely technical: the introduction of something to be optimized is just a mathematical construct. Still, in analyzing and comput- ing solutions the challenges are the same as in other areas of optimization. An illustration, “least squares” estimates: Starting out very simply, suppose thattwovariablesxandy arebeingmodeledasrelatedbyalinearlawy =ax+b, either for inherent theoretical reasons or as a first-level approximation. The values ofa andb are not known a priori but must be determined from the data, 2 consistingofalargecollectionofpairs(x ,y )∈IR fork = 1,...,N. Thesepairs k k have been gleaned from experiments (where random errors of measurement could arise along with other discrepancies due to oversimplifications inthe model). The error expression N X 2 E(a,b) = y −(ax +b) k k k=1 is often taken as representing the goodness of the fit of the parameter pair (a,b). 2 The problem is to minimize this over all (a,b)∈IR . Note that, from the optimization perspective, a and b are decision variables, whereasthesymbolsx andy standfordataparameters. Wordscanbeslippery k k Multidimensional extension: More generally, instead of a real variable x and a real n m variable y one could be dealing with a vector x ∈ IR and a vector y ∈ IR , m×n whicharesupposedtoberelatedbyaformulay =Ax+bforamatrixA∈IR m and a vector b ∈ IR . Then the error expression E(A,b) would depend on the m×(n+1) components of A and b. Incorporationofconstraints: Theproblem, asstatedsofar, concernsthe unconstrained minimization of a certain quadratic function of a and b, but it’s easy to find 10situationswheretheparametersmightbesubjecttosideconditions. Forinstance, it may be known on the basis of theory for the model in question that 1/2≤a≤ 3/2, while b ≥ −1. In the multidimensional case of y = Ax +b, there could be conditions on A such as positive semidefiniteness. (In the applications that make use of least squares estimation, such conditions are sometimes neglected by practitioners, andthenumericalanswerobtainedissimply“fixedup”ifitdoesn’t have the right form. But that’s clearly not good methodology.) Nonlinear version: A so-called problem of linear least squares has been presented, but the same ideas can be used when the underlying relation between x and y is ax bx supposed to be nonlinear. For instance, a law of the form y = e −e would lead to an error expression N X 2 ax bx k k E(a,b) = y −(e −e ) . k k=1 2 In minimizing this with respect to (a,b)∈ IR , we would not be dealing with a quadratic function, but something much more complicated. The graph of E in a problem of nonlinear least squares could have lots of “bumps” and “dips,” which could make it hard to find the minimum computationally. Beyondsquares: Many other expressions for error could be considered instead of a sum of squares. Back in the elementary case of y =ax+b, one could look at N X E(a,b) = y −(ax +b) . k k k=1 A different (a,b) would then be “best.” The optimization problem would have a technically different character as well, because E would lack differentiability at points (a,b) where y −(ax +b) = 0 for some k. Attractive instead, as a worst k k case approach, would be the error expression E(a,b) = max y −(ax +b) . k k k=1,...,N The formula in this case means that the value assigned by E to the pair (a,b) is the largest value occurring among the errors y −(ax +b), k = 1,...,N. It’s k k this maximum deviation that we wish to make as small as possible. Once more, 2 E isn’t a differentiable function on IR . 11Tricks of reformulation: When E(a,b) is given by the max, the problem of min- imizing it can be simplified by introducing an another variable u and try- 3 ing to minimize its value over the combinations (a,b,u) ∈ IR such that u ≥ y −(ax −b) for k = 1,...,N. Each inequality u ≥ y −(ax −b) k k k k can moreover be replaced by u−y +ax −b≥ 0 and u+y −ax +b≥ 0. k k k k P N Similarly, whenE(a,b) = y −(ax +b)onecanminimizeu +···+u k k 1 N k=1 over all (a,b,u ,...,u ) satisfying u ≥y −(ax +b) for k = 1,...,N. 1 N k k k Inverseproblems: Thistermisoftenusedforparameteridentificationproblemsthat involve differential equations. There is a model of behavior in which a differential equation dictates what should happen, i.e., what “outputs” should be occur in response to given “inputs,” but the coefficients in the equation aren’t fully known and have to be inferred from experimental data. Geologicalinvestigations: Anexampleisthatofexploringsubterraneanstratabysend- ing signals (as inputs) which pass through the earth and, after being modified by that process, are picked up (as outputs) by sensors. The “coefficients” in this case are physical parameters, like densities, that describe what the underlying rocks are like and where the layers begin and end. Tomography: A similar application is found in the medical technology of trying to reconstruct the internal geometry of a patient’s body from data that has been collected by passing x-rays through the body from different angles. Image reconstruction or enhancement: Data received through radar or a satellite cam- era may be fuzzy, distorted or marred by signal interference. How can one use it to best advantage in approximating the true image that gave rise to it? This is a kind of parameter identification problem in which the parameters measure the shades of grayness of the pixels in an image. EXAMPLE 5: Variational Principles Generaldescription: The kinds of equations that are the focus of much of numerical analysis are often associated in hidden ways with problems of optimization. A n variational principle for an equation F(x) = 0, involving a mapping F : IR 7→ n IR , is an expression of F as the gradient mapping∇f associated with function n f :IR 7→IR. It leads to the interpretation of the desired x as satisfying a first- order optimality condition with respect to f. Sometimes there are reasons to concludethatthattheequation’ssolutionactuallyminimizesf,atleast“locally.” Then a way of solving F(x) = 0 by optimization is opened up. 12Quite similar in concept are numerous examples where one wishes to solve an equation A(u) = 0 where u is some unknown function and A is a differential operator, so that an ordinary or partial differential equation is at issue. A vari- ational principle characterizes the desired u as providing the minimum, say, of some expression. Many equations of physics have such an interpretation. On a different front, conditions of price equilibrium in economics can some- times be characterized as stemming from the actions of a multitude of “economic agents,” like producers and consumers, all optimizing from their own perspec- tives. Yet again, the equilibrium state following the reactions which take place in acomplicatedchemicalbrewmaybecharacterizedthroughavariationalprinciple as the configuration of substances that minimizes a certain energy function. Status in optimization: In the study of variational principles, optimization theory can provide interesting insights quite independently of whether a numerical solution to a particular case is sought or not. EXAMPLE 6: Optimal Control General description: The evolution of a system in continuous time t can often be characterizedbyanordinarydifferentialequationx˙(t) =f(t,x(t))withx(0) =x 0 (initial condition), where x˙(t) = (dx/dt)(t). (Equations in higher derivatives are typically reducible to ones of first order.) Here x(t), called the state of the n system, is a point in IR . This is descriptive mathematics. We get prescriptive mathematics when the ODE involves parameters for which the values can be m chosen as a function of time: x˙(t) =f(t,x(t),u(t)), where u(t)∈U ⊂IR . Without going into the details necessary to provide a rigorous foundation, the idea can be appreciated that under certain assumptions there will be a mapping which assigns to each choice of a control function u(·) over a time interval 0,T a corresponding state trajectory x(·). Then, subject to whatever restrictions may be necessary or desirable on these functions, one can seek the choice of u(·) which is optimal according to some criterion. Such a problem would be infinite-dimensional, but a finite-dimensional version would arise as soon as the differential equation is approximated by a difference equation in discrete time. Stochastic version: The system may be subject to random disturbances which the controller must react to. Further, there may be difficulty in knowing exactly what the state is at any timet, due to measurement errors and shortcomings of the sensors. Control must be framed in terms of feedback mappings, giving the response at time t to the information available right then about x(t). 13Adaptive version: Also intriguing as a mathematical challenge, but largely out of reach of current concepts and techniques, is adaptive control, where the con- troller has not only to react to events but learn the basics of the system being controlled as time goes on. A major difficulty in this area is deciding what to optimize and for that matter what can or can’t be assumed about the imperfectly known system. Control of PDE’s: The state of a system may be given by an element of a function n space rather than a point in IR , as for instance when the problem revolves around the temperature distribution at time t over a solid body represented 3 by a closed, bounded region Ω⊂IR . The temperature can be influenced by heating or cooling elements arrayed on the surface of the body. How should these elements be operated in order to bring the temperature of the body uniformly within a certain range—in the shortest possible time, or with the least expenditure of energy? EXAMPLE 7: Combinatorial optimization General description: Many problems involve the optimal ordering or arrangements of discrete objects or actions, or either-or choices which might be represented by zero-one variables. Examples are found in the scheduling of processing jobs in manufacturing, or the scheduling of airline flights and crews, but also in shortest- path problems and the like. Modelsinvolvingnetworks(directedgraphs)areoftenusefulinthisregardand can produce enormous simplifications, but in general such optimization problems may be hard to solve and even intractable. Work on them goes on nonetheless because of their practical importance. Success is often measured in heuristic schemes that at least are able to generate improvements over the status quo. Overview of where we are now headed: Because we’ll be concentrating on finite- dimensional optimization (as in Examples 1, 2, 3, and 4), no infinite-dimensional applications to variational principles (as in Example 5) or optimal control (as in Ex- ample6)willbecovereddirectly,norwillspecialtoolsforcombinatorialoptimization (as in Example 7) be developed. The ideas we’ll develop are highly relevant, though, asbackgroundforsuchotherareasofoptimization. Inparticular,infinite-dimensional problems are often approximated by finite-dimensional ones through some kind of discretization of time, space, or probability. 142. PROBLEM FORMULATION To set the stage for solving a problem of optimization, it’s necessary first to formulate it in a manner not only reflecting the situation being modeled, but so as to be amenable to computational techniques and theoretical analysis. This raises a number of fundamental issues, which range from the problem format to be adopted to criteria for when a problem is “well posed.” n Basicproblem: Minimizeafunctionf :IR →IR,theobjective function,overaspecified 0 n set C⊂IR , the feasible set. Max versus min: Maximizing a function g is equivalent to minimizing −g, so there’s no loss of generality in concentrating on minimization. This is the convention in much of optimization theory. Solution concepts. Different things can be sought in a problem of optimization. The following terms identify the main concepts. Feasible solution: Any point x that belongs to C, regardless of the value it gives to f . Just finding such a point could be difficult numerically in cases where the 0 constraints are complicated or numerous, and indeed, the very existence of a feasible solution may sometimes be an open question. This is just a first-level solution concept, but important nevertheless. Optimal solution: A pointx¯ furnishing the minimum value off overC, i.e., a feasible 0 solution such that f (x¯)≤ f (x) for all other feasible solutions x. This is more 0 0 specificallywhatiscalledaglobally optimalsolution,whencontrastmustbemade with the next concept. Locally optimal solution: A point x¯ ∈ C such that, for some neighborhood U of x¯, one has f (x¯) ≤ f (x) for all x ∈ C∩U. Optimality in this case isn’t asserted 0 0 relative toC as a whole, but only relative to a sufficiently small ballU aroundx¯. In practice it may be very hard to distinguish whether a numerical method has produced a globally optimal solution or just a locally optimal one, if that much. Optimal set: The set of all (globally) optimal solutions (if any). Optimal value: The greatest lower bound to the values of f (x) as x ranges over C. 0 There may or may not be a point x¯∈C at which f actually attains this value. 0 Furthermore, although the optimal value is always well defined, it could fail to be finite. It is−∞ when f is not bounded below on C, and on the other hand, 0 it is∞ by convention if C =∅. 15Constraint manipulation: Constraints can be expressed in more than one way, and some forms of expression may be more convenient in one context than another. Function constraints: In conditions like f (x) = c , or f (x) ≤ c , or f (x) ≥ c , f is i i i i i i i called a constraint function. (1) An equality constraintf (x) =c can be expressed equivalently, if desired, as a i i pair of inequality constraints: f (x)≤c and f (x)≥c . i i i i (2) An inequality constraint f (x)≥c can be expressed also as−f (x)≤−c . i i i i (3) An inequality constraint f (x)≤c can be expressed also as−f (x)≥−c . i i i i (4) An inequality constraint f (x)≤c can be expressed as an equality constraint i i f (x)+s =c involving an additional decision variable s , itself constrained i i i i to be nonnegative. Such a variable is called a slack variable. (5) Any constraint f (x) = c , or f (x) ≤ c , or f (x) ≥ c , can be expressed in i i i i i i terms of g (x) =f (x)−c as g (x) = 0, or g (x)≤ 0, or g (x)≥ 0. i i i i i i Geometric, or abstract constraints: For methodological purposes it’s often convenient to represent only some of the constraints in a problem in terms of constraint functions f and to lump the rest together in the abstract form x∈ X. This is i especially handy for treating range constraints. Example: For instance, a requirement on x = (x ,...,x ) that 0 ≤ x ≤ 1 could 1 n 1 be represented by two function constraints g (x) ≥ 0 and g (x) ≤ 1 with 1 2 g (x) = g (x) = 1·x +0·x +···+0·x , but it could also be incorporated 1 2 1 2 n into the description of a set X to which x must belong. n Boxes: A set X ⊂IR is a box if it is a product I ×···×I of closed intervals I ⊂IR. 1 n j To requirex∈X is to requirex ∈I forj = 1,...,n. This is just what it means to j j have range constraints on x. Here I could be bounded or even consist of just one j point, or it could be unbounded or even (−∞,∞). n n Nonnegative orthant: the boxIR = 0,∞)×···×0,∞). ForX =IR , the constraint + + x∈X means that the components x of x have to be nonnegative. j n n Whole space: the box IR = (−∞,∞)×···×(−∞,∞). For X =IR , the constraint x∈X trivializes; each component x of x is “free.” j n Linear and affine functions: A function g on IR is called affine if it can be expressed in the form g(x ,...,x ) = d +d x +··· +d x for some choice of constants 1 n 0 1 1 n n d ,d ,...,d . Many people simply refer to such a function as linear, and in this they 0 1 n are following a long tradition, but in higher mathematics the term linear is reserved 16for the special case of such a function where the constant term vanishes: d = 0. 0 n Thus, g is linear when there’s a vector d = (d ,...,d )∈IR such that 1 n g(x) =d·x (the inner product, or dot product, of two vectors) Linearconstraints: Conditionsf (x) =c ,f (x)≤c orf (x)≥c inwhichthefunction i i i i i i f is linear—or affine. Or, conditions x∈X in which X is a box. i Standard problem format in finite-dimensional optimization: n minimize f (x) over all x = (x ,...,x )∈X ⊂IR satisfying 0 1 n  (P) ≤ 0 for i = 1,...,s, f (x) i = 0 for i =s+1,...,m. The feasible set C for (P) consists of all the points x ∈ X that satisfy all the constraints f (x) ≤ 0 or f (x) = 0. Here in particular X could be i i n all of IR , in which case the condition x ∈ X would impose no restriction whatever. More typically, however, X might be chosen to be some box other n than IR , in which case the requirement that x∈X is a way of representing range constraints on x that underlie (P). n Unconstrained minimization: the case where X = IR and “m = 0,” i.e., no equality n or inequality constraints are present, so that C =IR . Linearprogramming: thecasewherealinear(oraffine)functionf isminimizedsubject 0 to linear constraints: the functions f ,...,f are affine and the set X is a box 1 m n n (e.g. X =IR or X =IR ). + Quadratic programming: like linear programming, but the objective function f is al- 0 lowed to have quadratic terms, as long as it remains convex, as defined later. (Note: in quadratic programming the constraints are still only linear) Nonlinear programming: this term is used in contrast to linear programming, but a muchmoreimportantwatershedwilleventuallybeseeninthedistinctionbetween convex programming and nonconvex programming. Geometric considerations: In problems with a few, simple constraints, the feasible set C might be decomposable into a collection of “pieces,” each of which could be inspected separately in an attempt to locate the minimum of the objective function. 3 For instance, if C were a (solid) cube in IR , one could look at what happens at the 8 corners, along the 12 edges, on the 6 faces, and in the cube’s interior. 17For most problems of interest in modern applications, however, there is little hope in such an approach. The number of “pieces” would be astronomical, or there would benoeasyorganizationorlistingofthem. Afurtherdifficultywouldlieinidentifying which of the constraints might be redundant. Then too, there could be problems of degeneracy, where the constraints line up in odd ways and spoil the possibility of a good description of the “pieces” of C. As if this weren’t enough trouble, there is the real prospect thatC might be disconnected. These considerations force a differ- ent perspective on the analyst, who must look instead for a new kind of geometric framework on which to base computational schemes. Geometry of linear constraints: Initial insight into a kind of geometry that does pro- vide important support in optimization can be gained through the following ideas, which will later be subsumed under the broader heading of “convexity.”  n Half-spaces and hyperplanes: Subsets of IR of the form x d·x = c for a vector d = (d ,...,d ) 6= (0,...,0) and some constant c ∈ IR are called hyperplanes, 1 n   while those of the form x d·x ≤ c or x d·x ≥ c are called closed half- spaces. (With strict inequality, the latter would be open half-spaces.) A linear equality or inequality constraint on x thus corresponds to making x belong to a certainhyperplaneorclosedhalf-space(unlessthelinearfunctionis≡ 0, inwhich n case the set isn’t a hyperplane or half-space but just∅ or IR , depending on c). n Polyhedral sets: A set C ⊂ IR is called polyhedral if it can be represented as the intersection of a collection of finitely many hyperplanes or closed half-spaces, or inotherwords,specifiedbyafinitesystemoflinearconstraints. (Thewholespace n IR isregardedas fittingthisdescriptionbyvirtueofbeingtheintersectionofthe “empty collection” of hyperplanes. The empty set fits because it can be viewed as the intersection of two parallel hyperplanes with no point in common.) Argument: Whenasetisspecifiedbyacollectionofconstraints,itistheintersection of the sets specified by each of these constraints individually. Inequalities alone: In the definition of “polyhedral” it would be enough to speak just of closed half-spaces, inasmuch as any hyperplane is itself the intersection of the two closed half-spaces associated with it. Boxes as a special case: Any box is in particular a polyhedral set, since it’s determined by upper or lower bounds on coordinates x of x = (x ,...,x ), each of which j 1 n could be expressed in the form d·x≤c or d·x≥c for a vector d lining up with some coordinate axis. A box is thus an intersection of certain closed half-spaces. 18Linear subspaces as a special case: Polyhedral sets don’t have to have “corners” or n “edges.” For instance, any subspace of IR is polyhedral, since by linear algebra it can be specified by finitely many homogeneous linear equations. Geometric interpretation of linear programming: The feasible set C in any linear programming problem is a certain polyhedral set. The function f being minimized 0  overC is a linear function, so (unlessf ≡ 0) its “isosurfaces” x f (x) =α , asα 0 0 ranges over IR, form a family of parallel hyperplanes H . The gradient of f , which α 0 is the same at all pointsx, is a certain vector that points in a direction perpendicular to all these hyperplanes, toward increases in α. Inthispicture,f takesonavalueαsomewhereinC ifandonlyifthehyperplane 0 H meetsC,i.e.,hasH ∩C6=∅. Inminimizingf overC,oneisseekingthe“lowest” α α 0 of these hyperplanes that still meets C. Penalties and the choice of objectives: The standard problem format suggests that a modeler should approach a situation looking for a family of functionsf of certain i decision variables x , one of these functions being the objective function, and the j rest, constraint functions. But reality can be murkier. The distinction between what should be set up as a constraint and what should be incorporated into the expression to be minimized may be quite subtle and even in some cases just a matter of the notation being adopted. Hard versus soft constraints: Some kinds of constraints are “hard” in the sense of representing intrinsic conditions that can’t be violated. For instance, a vector (x ,...,x ) may give a system of probabilities or weights through the stipulation 1 n thatx ≥ 0,···,x ≥ 0 andx +...+x = 1. It wouldn’t make sense to consider 1 n 1 n the alternative where x ≥ −.0001 and x +··· +x = .9999. Constraints of 1 1 n such type are often built into the specification of the set X in problems in the standard format. Other constraints may have quite a different, “soft” character. For instance, in asking that a mechanism under design have a strength coefficient of at least .78, the modeler may be expressing a general desire that could be changed a bit once the costs and trade-offs are better known. A coefficient value of.76 may be quite acceptable, once it is realized that the difference could cost a fortune. Penalty expressions: In dealing with soft constraints f (x)≤ 0 or f (x) = 0, it may be i i better in many cases to introduce a penalty expression instead. Thus, instead of ◦ enforcing an exact constraint, a termϕ f could be added to the objective where i i (for the inequality constraint) ϕ (t) = 0 when t≤ 0 but ϕ (t) 0 when t 0. A i i 19

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.