Lagrangian Optimization methods for Nonlinear programming

lagrangian methods in experimental fluid mechanics and lagrangian and hamiltonian methods for nonlinear control, on augmented lagrangian methods with general lower-level constraints pdf free download
Prof.EvanBaros Profile Pic
Prof.EvanBaros,United Kingdom,Teacher
Published Date:26-07-2017
Your Website URL(Optional)
Comment
The Dual Problem 2 1 Lagrangian Methods 1.2 The Dual Problem Let us define 1.1 Lagrangian methods φ(b)= inf f(x) and g(λ)= inf L(x,λ). x∈X(b) x∈X Let P(b) denote the optimization problem Then for all λ minimizef(x), subject to h(x) =b, x∈X. φ(b) = inf L(x,λ)≥ inf L(x,λ)=g(λ). (1.1) x∈X(b) x∈X Let x ∈ X(b) = x ∈ X : h(x) = b. We say that x is feasible if x ∈ X(b). Define the Lagrangian as Thus g(λ) is a lower bound on φ(b), i.e., a lower bound on the solution value of P(b). As this holds for all λ it is interesting to make this lower bound as large ⊤ L(x,λ) =f(x)−λ (h(x)−b). as possible. Of course we can restrict ourselves to λ for which g(λ) −∞. This motivates the dual problem, defined as n n m m Typically,X ⊆R ,h :R 7→R , withb,λ∈R . Hereλ is calledaLagrangian multiplier. maximize g(λ), subject to λ∈Y , where Y = λ : g(λ) −∞. In (1.1) we have a proof of the so-called weak Theorem 1.1 (Lagrangian Sufficiency Theorem) If x¯ is feasible for P(b) ¯ duality theorem that and there exists λ such that inf f(x)≥ maxg(λ). (1.2) λ∈Y x∈X(b) ¯ ¯ inf L(x,λ)=L(x¯,λ) x∈X The left hand side of (1.2) poses the primal problem. then x¯ is optimal for P(b). 1.3 Strong Lagrangian Proof of the LST. For all x∈X(b) and λ we have ⊤ f(x) =f(x)−λ (h(x)−b)=L(x,λ). We say that P(b) is Strong Lagrangian if there exists λ such that φ(b)= inf L(x,λ). (1.3) Now x¯∈X(b)⊆X and so by assumption, x∈X ¯ ¯ f(x¯)=L(x¯,λ)≤L(x,λ)=f(x), for all x∈X(b). In other words, P(b) is Strong Lagrangian if it can be solved by the Lagrangian method. But when does this happen? Usually we just try the method and see. Thus x¯ is optimal for the optimization problem P(b). If we are lucky, as in Example 1.1, then we will be able to establish that there exists a λ that lets us solve P(b) this way. However, there are important classes 2 2 Example 1.1 Minimize x +x subject to a x +a x =b, x ,x ≥ 0. 1 1 2 2 1 2 1 2 of problems for which we can guarantee that Lagrangian methods always work. 2 2 Note that (1.3) says that there is a λ such that φ(b) = g(λ). Combining this Here L =x +x −λ(a x +a x −b). We consider the problem 1 1 2 2 1 2 with (1.2), we see that if the problem is Strong Lagrangianthen min of primal= 2 2 minimizex +x −λ(a x +a x −b). 1 1 2 2 1 2 max of dual. x ,x ≥0 1 2 This has a stationary point where (x ,x ) = (λa /2,λa /2). Now we choose 1 2 1 2 2 2 1.4 Hyperplanes λ such that a x + a x = b. This happens for λ = 2b/(a + a ). We have 1 1 2 2 1 2 2 2 2 a minimum since ∂ L/∂x 0, ∂ L/∂x ∂x = 0. Thus with this value of λ 1 2 i Let the hyperplane (c,α) be given by 2 2 2 the conditions of the LST are satisfied and the optimal value is b /(a +a ) at 1 2 2 2 ⊤ (x ,x )= (a b,a b)/(a +a ). 1 2 1 2 1 2 α=β−λ (b−c). 13 Lagrangian Methods Hyperplanes 4 It has intercept at β on the vertical axis through b, and has slope(s) λ. Consider This is important since a (non-vertical) supporting hyperplane exists if φ(b) is a the following approach to finding φ(b): convex function of b. We can find conditions that make φ convex. Proof. Suppose there exists a (non-vertical) supporting hyperplane to φ at b. 1. For each λ, find β ≡ maxβ such that the hyperplane lies completely below λ This means that there exists λ such that the graph of φ. ⊤ m 2. Now choose λ to maximize β . φ(b)−λ (b−c)≤φ(c) for all c∈R . λ This implies φ(c) φ(c)   Case 1 Case 2 ⊤ φ(b)≤ inf φ(c)−λ (c−b) m c∈R   ⊤ = inf inf f(x)−λ (h(x)−b) m c∈R x∈X(c) = inf L(x,λ) x∈X =g(λ) β =φ(b) λ β φ(b) λ However, we have the opposite inequality in (1.1). Hence φ(b) = g(λ). This means that P(b) is Strong Lagrangian, i.e., can be solved by minimizing L(x,λ) b b with respect to x. Conversely, if the problem is Strong Lagrangian then there exists λ such that Lagrangian methods work in Case 1 because of the existence of a tangent to for all x∈X φ at b. Define a supporting hyperplane (c,α) at b as ⊤ ⊤ ⊤ m φ(b)≤f(x)−λ (h(x)−b). α =φ(b)−λ (b−c), where φ(c)≥φ(b)−λ (b−c) for all c∈R . Imagine minimizing the right hand side over x ∈ X(c), where h(x) = c. This In fact, β =g(λ) =min L(x,λ). To see this, we argue λ x∈X gives g(λ) = inf L(x,λ) ⊤ x∈X φ(b)≤φ(c)−λ (c−b). ⊤ = inf inf f(x)−λ (h(x)−b) m c∈R x∈X(c) This is true for all c, and hence ⊤ = inf φ(c)−λ (c−b) m c∈R ⊤ m φ(b)−λ (b−c)≤φ(c) for all c∈R . ⊤ m = supβ : β−λ (b−c)≤φ(c), for all c∈R =β . Hence, φ has a (non-vertical) supporting hyperplane at b. λ Hence, the dual problem is maxβ . Again, we see the weak duality result of λ maxβ ≤φ(b), with equality if the problem is Strong Lagrangian. λ Theorem 1.2 The following are equivalent: (a) there exists a (non-vertical) supporting hyperplane to φ at b; (b) the problem is Strong Lagrangian.Linear programs 6 2 Linear Programming This holds for all x ∈X(b ) and x ∈X(b ) so taking infimums gives 1 1 2 2 φ(b)≤δφ(b )+(1−δ)φ(b ) 1 2 2.1 Convexity and Lagrangian methods so that φ is convex. Remark. Consider the constraint h(x) = b. This is the same as h(x) ≤ b and 1. A set S is a convex set if for all 0≤δ≤ 1 −h(x) ≤−b. So φ is convex under these constraints if X is a convex set and f, x,y∈S =⇒δx+(1−δ)y∈S. h and−h are all convex. Thus h should be linear in x. 2. A real-valued f is a convex function if for all x,y∈S and 0≤δ≤1 2.2 Linear programs δf(x)+(1−δ)f(y)≥f(δx+(1−δ)y). 1 We will study problems of the form  ⊤ 3. A point x is an extreme point of S if whenever minimize c x : Ax≤b, x≥0 where x and c are n-vectors, b is a m-vector and A is a m×n matrix. Such x =δy+(1−δ)z problems are also written out in the longer form for some y,z∈S and 0δ 1 then x=y =z. ⊤ minimize c x, subject to Ax≤b,x≥0. Theorem 2.1 (Supporting Hyperplane Theorem) Suppose φ is convex and Example b lies in the interior of the set of points where φ is finite. Then there exists a x = 0 1 (non-vertical) supporting hyperplane to φ at b. x +x =const minimize −(x +x ) 1 2 1 2 subject to So, we are interested in conditions on the problem that make φ convex. D x + 2x ≤ 6 1 2 x −x = 3 1 2 Theorem 2.2 Consider the problem P(b), defined as x − x ≤ 3 1 2 x , x ≥ 0. 1 2 C minimizef(x) subject to h(x)≤b. x = 0 x∈X 2 A B E If X is a convex set and f and h are convex then φ is convex. x +2x = 6 1 2 Proof. Take b ,b and b =δb +(1−δ)b for 0δ 1 with b ,b such that φ is 1 2 1 2 1 2 defined. Take x feasible for P(b ) for i = 1,2 and consider x =δx +(1−δ)x . i i 1 2 Then X convex, x ,x ∈X implies that x∈X. Also, h convex gives F 1 2 h(x) =h(δx +(1−δ)x ) 1 2 2.3 Duality of linear programs ≤δh(x )+(1−δ)h(x ) 1 2 The primal LP optimization problems ≤δb +(1−δ)b 1 2 ⊤ =b. (LP =): minimizec x : Ax =b, x≥ 0 ⊤ (LP ≥): minimizec x : Ax≥b, x≥ 0 So x is feasible for P(b). So, if f is convex 1 For a thorough introduction to the topic of linear programming see Richard Weber’s course φ(b)≤f(x) =f(δx +(1−δ)x )≤δf(x )+(1−δ)f(x ). 1 2 1 2 on Optimization, available at: http://www.statslab.cam.ac.uk/ rrw1/opt/ 57 Linear Programming Basic insight 8 have corresponding dual problems 2.7 Basic insight ⊤ ⊤ Dual of(LP =): maximizeb λ : A λ≤c If an LP has a finite optimum then it has an optimum at an extreme point of the ⊤ ⊤ feasible set. Dual of(LP ≥): maximizeb λ : A λ≤c, λ≥ 0 There are a finite number of extreme points so an algorithm for solving the LP is 2.4 Derivation of the dual LP problem • Find all the vertices of the feasible set. Consider (LP ≥), and introduce slack variables z to form the problem • Pick the best one. ⊤ minimize c x, subject to Ax−z =b, x≥ 0, z≥ 0.   n+m m+n Our examplehasanoptimum atC. However,thereare vertices,sothis So the set X ⊂R is given by m algorithm could take a long time X =(x,z) : x≥0, z ≥0. We use a Lagrangian approach. The Lagrangian is  2.8 Basic solutions ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ L((x,z);λ) =c x−λ (Ax−z−b)= c −λ A x+λ z+λ b Abasic solution to Ax =b is a solution with at least n−m zero variables. The with finite minimum over (x,z)∈X if and only if solutionisnon-degenerate if exactlyn−mvariablesarezero. Thechoiceofthe ⊤ ⊤ λ∈Y =λ : λ≥ 0, c −λ A≥ 0. m non-zero variables is called the basis. Variables in the basis are called basic;  the others are called non-basic. ⊤ ⊤ The minimum of L((x,z);λ) for λ∈Y is attained where both c −λ A x= 0 If a basic solution satisfies x≥ 0 then it is called a basic feasible solution ⊤ and λ z =0, so that ⊤ (bfs). The following is a theorem. g(λ)≡ inf L(x;λ) =λ b. x∈X The basic feasible solutions are the extreme points of the Hence form of dual problem. feasible set. 2.5 Shadow prices In our example, the vertices A–F are basic solutions (and non-degenerate) and A–D are basic feasible solutions. The Lagrange multipliers play the role of prices since we have that ∂φ ∂g(λ) = =λ . i ∂b ∂b i i The variables λ are also known as shadow prices. i 2.6 Conditions for optimality For the (LP ≥) problem, x andλ are primal and dual optimal respectively if and only if x is primal feasible, λ is dual feasible and, in addition, for anyi = 1,...,n and j =1,...,m ⊤ ⊤ (c −λ A) x = 0=λ (Ax−b) . i i j j These are known as the complementary slackness conditions.6 Test for optimality 10 3 The Simplex Algorithm basic non-basic −1 −1 I A A A b N B B −1 −1 ⊤ ⊤ ⊤ 1. Start with a bfs. 0 c −c A A −c A b N N B B B B 2. Test whether this bfs is optimal. 3.3 Test for optimality 3. If YES then stop. ⊤ 4. If NO then move to an ‘adjacent’ bfs which is better. Return to step 2. Suppose we want to maximize c x and we find ⊤ ⊤ −1 −1 (c −c A A )≤ 0 and A b≥ 0. N N B B B 3.1 Algebraic viewpoint Then for all feasible x (since x≥ 0=⇒x ≥0) N Abasis,B,isachoiceofmnon-zerovariables. Foranyxsatisfyingtheconstraints Ax =b, we can write −1 −1 −1 ⊤ ⊤ ⊤ ⊤ f(x)=c A b+(c −c A A )x ≤c A b. N N B N B B B B B A x +A x =b B B N N −1 ⊤ −1 whereA is am×mmatrix,A is am×(n−m)matrix,x andbarem-vectors But for bfs xb with xb = A b and xb = 0 we have f(xb) = c A b. So, xb is B N B N B B B B and x is a (n−m)-vector. optimal. N A basic solution has x = 0 and A x =b and abasic feasible solution This gives us an easy way to check if a given bfs is optimal. N B B has x = 0, A x =b and x ≥0. N B B B As we have seen, if there exists a finite optimum then there exists a bfs that is optimal. 3.4 Choice of new bfs −1 ⊤ ⊤ Alternatively, if some (c −c A A ) is positive we can increase the value of N i Nondegeneracy assumptions N B B the objective function by increasing from zero the value of (x ) . N i We will assume that the following assumptions hold. (If they do not, then they We would like to increase (x ) by as much as possible. However, we need to N i will do so for a small perturbation of the data). keep the constraints satisfied. So as we alter (x ) the other variables alter and N i we must stop increasing (x ) if one becomes zero. N i 1. The matrix A has linearly independent rows, i.e., rank(A) =m. The net effect is that we interchange one basic variable with one non-basic 2. Any m×m matrix formed from m columns of A is non-singular. variable. 3. All basic solutions A x =b, x = 0 have exactly m non-zero variables, i.e., B B N x =0 for i∈B. i 3.5 Simplex algorithm 3.2 Simplex tableau 1. Find an initial bfs with basis B. −1 ⊤ ⊤ −1 Now for any x with Ax=b, we have x =A (b−A x ). Hence, B N N 2. Check the sign of (c −c A A ) for i ∈ N. Are all components non- B N i N B B positive? ⊤ ⊤ ⊤ f(x) =c x =c x +c x B N B N ⊤ −1 ⊤ =c A (b−A x )+c x 3. If YES then we are at an optimum. Stop. N N N B N B ⊤ −1 ⊤ ⊤ −1 =c A b+(c −c A A )x . N N B B N B B −1 ⊤ ⊤ 4. If NO, so that (c −c A A ) 0, say with i ∈ N, increase (x ) as N i N i N B B We can assemble this information in a tableau. much as possible. 96 11 The Simplex Algorithm Simplex algorithm: tableau form 12 Either we can do this indefinitely, which means the maximum is If there is more than one i minimizing a /a the problem has a degenerate i0 ij unbounded. Stop. basic feasible solution. or one of x variables becomes zero, giving a new new bfs. Repeat from B In our example we have at this point step 2. x x z z a 1 2 1 2 i0 z basic 1 2 1 0 6 3.6 Simplex algorithm: tableau form 1 z basic 1 −1 0 1 3 2 0. Find an initial basic feasible solution. a 1 1 0 0 0 0j The tableau takes the form 3. Pivot on the element a ij (a ) a ij i0 The purpose of this step is to get the equations into the appropriate form for the new basic feasible solution. a a 0j 00 • Multiply row i by 1/a . ij This is easy when the constraints are of the form • Add−(a /a )×(rowi)to eachrowk =i, including theobjectivefunction kj ij Ax≤b, b≥0. row. Thenew tableauform: (after re-arrangingrowsandcolumns), is as at theend We can write this as of Section 3.6. In our example we reach Ax+z =b, z≥ 0 and take an initial basic feasible solution of x x z z a 1 2 1 2 i0 x= 0, z =b≥ 0. z basic 0 3 1 −1 3 1 x basic 1 −1 0 1 3 1 It is best to think of this as extending x to (x,z) and then setting a 0 2 0 −1 −3 0j (x ,x )= (z,x)= (b,0). B N Now return to Step 1. 1. Choose a variable to enter the basis Look for a j such that a 0. Column j is called the pivot column and the 0j In our example, one further iteration brings us to the optimum. variable corresponding to column j will enter the basis. If a ≤ 0 for all j ≥ 1 0j then the currentsolutionis optimal. If there is morethan onej suchthata 0 0j x x z z a choose any one. A common rule-of-thumb is to choosethe j for which a is most 1 2 1 2 i0 0j 1 1 positive. Alternatively, we could choose the least j ≥ for which a 0. 0j x basic 0 1 − 1 2 3 3 1 2 x basic 1 0 4 1 2. Find the variable to leave the basis 3 3 2 1 a 0 0 − − −5 0j 3 3 Choose i to minimize a /a from the set i : a 0. Row i is called the i0 ij ij pivot row and a is called the pivot. If a ≤ 0 for all i then the problem is ij ij unbounded and the objective function can be increased without limit. This corresponds to the bfs x =4, x = 1, z =z =0, i.e., vertex C. 1 2 1 2Two phase simplex method 14 4 Advanced Simplex Procedures Preliminary step. The Phase I objective must be written in terms of the non- basic variables. This is accomplished by adding rows 1 and 2 to the bottom row, to give 4.1 Two phase simplex method x x z z z y y 1 2 1 2 3 1 2 y 1 1 −1 0 0 1 0 1 1 Suppose we do not have the obvious basic feasible solution. Consider y 2 −1 0 −1 0 0 1 1 2 maximize −6x −3x maximize −6x −3x 1 2 1 2 z 0 3 0 0 1 0 0 2 3 Phase II −6 −3 0 0 0 0 0 0 subject to subject to ≡ Phase I 3 0 −1 −1 0 0 0 2 x +x ≥ 1 x +x −z = 1 1 2 1 2 1 2x −x ≥ 1 2x −x −z = 1 1 2 1 2 2 Begin Phase I. Pivot on a to get 21 3x ≤ 2 3x +z = 2 2 2 3 x x z z z y y 1 2 1 2 3 1 2 x ,x ≥ 0 x ,x ,z ,z ,z ≥ 0 1 2 1 2 1 2 3 3 1 1 1 y 0 −1 0 1 − Unfortunately, the basic solution 1 2 2 2 2 1 1 1 1 x = 0 x = 0 z =−1 z =−1 z = 2 x 1 − 0 − 0 0 1 1 2 1 2 3 2 2 2 2 z 0 3 0 0 1 0 0 2 is not feasible. The trick is to add artificial variables, y ,y to the constraints 3 1 2 and then minimize y +y subject to 1 2 0 −6 0 −3 0 0 3 3 3 1 3 1 x +x −z +y = 1 0 −1 0 0 − 1 2 1 1 2 2 2 2 2x −x −z +y = 1 1 2 2 2 Pivot on a to get 14 3x +z = 2 2 3 x x z z z y y 1 2 1 2 3 1 2 x ,x ,z ,z ,z ,y ,y ≥ 0 1 2 1 2 3 1 2 z 0 3 −2 1 0 2 −1 1 2 We can take the ‘easy’ initial bfs of y = 1,y =1,z = 2,x = 0,x = 0. 1 2 3 1 2 x 1 1 −1 0 0 1 0 1 1 In Phase I we minimize y +y , starting with y = 1, y = 1 and z = 2. 1 2 1 2 3 z 0 3 0 0 1 0 0 2 3 (Notice we did not need an artificial variable in the third equation.) Provided 0 3 −6 0 0 6 0 6 the original problem is feasible we should be able to obtain a minimum of 0 with y = y = 0 (since y and y are not needed to satisfy the constraints if the 1 2 1 2 0 0 0 0 0 −1 −1 0 original problem is feasible). At the end of Phase I the simplex algorithm will have found a bfs for the original problem. Phase II proceeds with the solution End of Phase I. y =y =0 and we no longer need these variables, and so drop 1 2 of the original problem, starting from this bfs. the last two columns and Phase I objective row. We have a bfs with which to Note: the original objective function doesn’t enter into Phase I, but it is start Phase II, with x = 1, z = 1, z = 2. The rest of the tableau is already 1 2 3 useful to carry it along as an extra row in the tableau since the algorithm will in appropriate form. So we rewrite the preceeding tableau without the y ,y 1 2 then arrange for it to be in the appropriate form to start Phase II. columns. We start with Begin Phase II. x x z z z y y 1 2 1 2 3 1 2 x x z z z 1 2 1 2 3 y 1 1 −1 0 0 1 0 1 1 z 0 3 −2 1 0 1 2 y 2 −1 0 −1 0 0 1 1 2 x 1 1 −1 0 0 1 1 z 0 3 0 0 1 0 0 2 3 z 0 3 0 0 1 2 3 Phase II −6 −3 0 0 0 0 0 0 0 3 −6 0 0 6 Phase I 0 0 0 0 0 −1 −1 0 1315 Advanced Simplex Procedures Dual simplex algorithm 16 In one more step we reach the optimum, by pivoting on a . Primal algorithms maintain primal feasibility and complementary slackness 12 and seek dual feasibility. Dual algorithms maintain dual feasibility and com- x x z z z plementary slackness and seek primal feasibility. 1 2 1 2 3 2 1 1 x 0 1 − 0 2 3 3 3 1 1 2 x 1 0 − − 0 1 4.3 Dual simplex algorithm 3 3 3 z 0 0 2 −1 1 1 3 The dual simplex algorithm starts with and maintains a primal/dual basic 0 0 −4 −1 0 5 solution that is dual feasible and satisfies complementary slackness while seeking primal feasibility. This can be useful. In general, artificial variables are needed when there are constraints like It may be easier to spot a dual feasible solution ≤−1, or ≥ 1, or =1, unless they happen to be of a special form for which it is easy to spot a bfs. If x +2x +x ≥3 1 2 3 the Phase I objective cannot be minimized to zero then the original problem is minimize 2x +3x +4x s.t. 1 2 3 2x −x −3x ≥4 1 2 3 infeasible. TheproblemwehavesolvedisthedualoftheproblemPthatweconsideredin x ,x ,x ≥0 1 2 3 Chapters 2–3, augmented by the constraint 3x ≤2. It is interesting to compare 2 Note c ≥0 for all i. Let us add slack variables, z ≥ 0 to obtain i i the final tableau abovewith the tableau obtained in solving the primal. They are x +2x +x −z =3 essentially transposes of one another. 1 2 3 1 2x −x −3x −z =4 1 2 3 2 The primal algorithm must use two-phases since z = −3,z = −4 is not 4.2 Primal and dual algorithms 1 2 primal feasible. However,the tableau containsa dualfeasible solution,λ =λ = 1 2 ⊤ ⊤ ⊤ Consider the problem (LP =), defined as minimizec x : Ax = b,x ≥ 0. This 0, and c −λ A= (2,3,4,0,0)≥0. ⊤ ⊤ ⊤ has dual maximizeλ b : c −λ A ≥ 0. At each stage of the primal simplex −1 −2 −1 1 0 −3 algorithm, we have a tableau, −2 1 3 0 1 −4 basic, x ≥ 0 non-basic, x = 0 B N 2 3 4 0 0 0 −1 −1 I A A A b≥ 0 N B B Rule: for rows, i, with a 0 pick column j with a 0 to minimize a /−a . i0 ij 0j ij −1 −1 −1 ⊤ ⊤ ⊤ ⊤ ⊤ c −c A A =0 c −c A A , free −c A b B N B B N B B B B B Pivoting on a gives 21 −1 5 5 1 Here we have a basic feasible solution for the primal, x = A b, and a basic B B 0 − − 1 − −1 2 2 2 ⊤ ⊤ −1 (though not necessarily feasible) solution for the dual, λ = c A . We always B B B 1 3 1 1 − − 0 − 2 have primal feasibility and complementary slackness. Recall 2 2 2 0 4 7 0 1 −4 primal dual complementary and then on a gives 12 + + feasibility feasibility slackness 2 1 2 0 1 1 − ⊤ ⊤ ⊤ ⊤ 5 5 5 Ax =b and x≥ 0 (c −λ A)≥ 0 (c −λ A)x = 0 1 2 11 1 0 −2 − − 5 5 5 8 1 28 =⇒ optimality. 0 0 3 − 5 5 517 Advanced Simplex Procedures Dual simplex algorithm 18 28 11 2 So the optimum is , with x = , x = , x = 0. Gomory’s cutting plane method 1 2 3 5 5 5 Notice that for problems of the form Ax≥b we can write The addition of new constraints is useful in Gomory’s cutting plane method for integer linear programming. Consider the final tableau on page 16. Pick a Ax−z =b z≥ 0 row in which the far right hand element is not an integer, say row 1. This says x · · · −1 0 x 2 1 2 x +x − z + z = . A = =b 2 3 1 2 5 5 5 z · · · 0 −1 z Supposex ,x ,x ,z ,z arerestrictedtobenon-negativeintegers. Thenwemust 1 2 3 1 2 Hence have 2 1 2     x +x −1z +0z ≤x +x − z + z = . · · · −1 0 2 3 1 2 2 3 1 2 ⊤ ⊤ 5 5 5 (c −λ A)= 2 3 4 0 0 − λ λ 1 2 2 1 · · · 0 −1 This is because z ,z ≥0 and we have replaced− and by the integers that lie 1 2 5 5   just below them. = · · · λ λ 1 2 Since the left hand side is an integer, it can be no more than the integer just 2 below . So a solution in integers must satisfy 5 So the dual variables, λ, can be found in the objective function row under the 8 1 slack variables in the optimal tableau. E.g., λ= ( , ). 5 5 x +x −z ≤ 0. 2 3 1 2 However, the present solution does not satisfy this, since x = , x =z =z = We may wish to add constraints to optimal solutions 2 3 1 2 5 0. Thus we can add this new constraint (or cutting plane) to give Suppose we have solved an LP and have the final tableau 2 1 2 0 1 1 − 0 5 5 5 non-basic basic 1 2 11 1 0 −2 − − 0 5 5 5 0 1 1 −1 0 1 0 I +ve 8 1 28 0 0 3 0 − 5 5 5 which is written into standard form +ve 0 2 1 2 0 1 1 − 0 Now we wish to add a new constraint 5 5 5 1 2 11 1 0 −2 − − 0 5 5 5 a x +a x +···+a x ≤b. 1 1 2 2 n n 3 1 2 0 0 0 − − 1 − 5 5 5 If the optimal solution satisfies this constraint the solution remains optimal for 8 1 28 0 0 3 0 − the new problem. Otherwise, we add it to the tableau to give 5 5 5 0 0 Applying the dual simplex algorithm to this, and repeating the process, we . . . . will eventually arrive at the optimal integer solution. In this example we reach I . +ve I . +ve the optimum of x = 3, x =x = 0 in just one more interation. 1 2 3 −→ 0 0 a a 1 b 0 1 -ve? N B 0 1 1 −1 0 1 0 +ve 0 0 +ve 0 0 1 0 −2 1 0 −2 3 0 0 0 3 1 −5 2 Notice that we still have a dual feasible solution. The problem solution may 0 0 3 1 0 1 −6 not be primal feasible. However, we can apply the dual simplex algorithm to find the new optimum under this additional constraint.The Travelling Salesman Problem 20 5 Complexity of Algorithms 5.2 The Travelling Salesman Problem Given a finite set of points S in the plane, the TSP asks for the shortest tour of S. More formally, given S =s ,s ,...,s a shortest tour that visits all points 1 2 n 5.1 Theory of algorithmic complexity of S is specified by a permutation σ that minimizes the sum of distances An instance of an optimization problem is defined by its input data. E.g., an d(s ,s )+d(s ,s )+···+d(s ,s )+d(s ,s ) σ(1) σ(2) σ(2) σ(3) σ(n−1) σ(n) σ(n) σ(1) instance of linear programming with n variables and m constraints is described by the inputs c, A and b. There are mn+m+n numbers and if all of them can where d(s ,s ) is the distance between points i and j. i j be expressed in no more than k bits, the instance can be described in a string of In general, it is difficult to prove that a problem does not have a polynomial (mn+m+n)k bits. This is the instance size. time algorithm. No polynomial time algorithm is known for TSP. But there is also no proof that a polynomial time algorithm for TSP does not exist. We see in Anoptimizationproblemissolvedbyacomputationalalgorithmwhoserun- Lecture 6 that the simplex algorithm for LP is an exponential time algorithm. It ning time depends on how it is programmed and the speed of the hardware. A was not until the 1970s that a polynomial time algorithm was discovered for LP. slarge instance can be easy to solve, such as LP with A=I. However, in general, we expect an algorithm’s running time to increase with size of the instance. Ig- noring details of the implementation, the running time depends on the number of 5.3 Decision Problems arithmetic operations involved. For example, the linear system Ax = b, with A 3 being n×n, can be solved by the algorithm of Gaussian elimination, usingO(n ) A decision problem (or recognition problem) is one that takes the form of a operations of addition, subtraction, multiplication and division. We define question with a yes/no answer. For example, decision-TSP is • f(n)=O(g(n)) if there exists a c such that f(n)≤cg(n) for all n. Given the points S, and L is there a tour of length ≤L? (5.1) This differs fromoptimization-TSP: find the shortest tour, ortheevaluation-TSP: • f(n)= Ω(g(n)) if there exists a c such that f(n)≥cg(n) for all n. findthe length of the shortest tour. Ofcoursethethreetypesofproblemareclosely related. We focus on decision problems. This avoids problems in which the size • f(n)= Θ(g(n)) if f(n) is both O(g(n)) and Ω(g(n)). of the input or output causes non-polynomial time behaviour. Of course multiplication is more difficult than addition and so in computing running time we might count operations according more elementary computer 5.4 P and NP problems instructions. In what follows we make use of Turing’s famous proof that the class of things that can be computed is the class things that can be computed A decision problem in is P if its answer is computable in polynomial time. I.e., by a Deterministic Turing Machine (DTM). A DTM is essentially a finite-state thereexists a deterministicTuring machinewhich, givenanyinstance(with input k machine that can read and write to an external storage medium. data x), can compute the answer within a number of steps bounded by x (for When a DTM is given an input x it runs for some number of steps and then some fixed k). outputsanasnwer,f(x). Thisnumberofstepsisitsrunningtime. Therearemany A decision problem belongs to NP if there exists a checking function r(x,y) Turing machines. Let T (n) be the worst-case running time of some Turning such that the answer to the decision problem is yes iff there exists a y (called a M machine, say M, over inputs x of size x = n. We say that a function f(x) is certificate) such that r(x,y) = 1 and r(x,y) is computable in polynomial time. computable in polynomial time if there exists some Turing machine that can For example, if the answer to (5.1) is yes then y could be the order in which the k compute f(x) withinx steps (for some fixed k). The definition is robust, since points shouldbe visited. It takesonly timeO(n) to addup the lengthofthis tour different Turing machines can simulate one another and more efficient models and check it is less than L (this being the computation r(x,y)). of computation, by at most squaring or cubing the the computation time. In ‘NP’ stands fornondeterministic polynomial. An equivalent definition of cn contrast, if T (n) = Ω(2 ) for all M, then f(x) is said to be computable in NP is that it is the class of decision problems whose answers can be computed M exponential time. in polynomial time on a ‘nondeterministic Turing machine’ (NDTM). A NDTM 196 21 Complexity of Algorithms Examples of NP-complete problems 22 consists of many DTMs working in parallel, any one of which can answer yes in harderthan either oftheseproblems. If youcanfind a polynomialtime algorithm polynomial time without consulting the others. Essentially, these computers are for TSP then you have found a polynomial time algorithm for all problems in carrying out parallel calculations of r(x,y) for all possible y. Either one of them NP and it would be true thatP =NP. As we said, since no one has ever found k produces the answer yes within time n , or the answer is no. A NDTM for (5.1) a polynomial time algorithm for any NP-complete problem, it is believed that could consist of (n−1) DTMs, each of them checking one possible tour to see if P =NP. To show that a new problem, Π, isNP-complete we must(i) showthat its length is less than L. Clearly, P ⊆NP. It is believed that P ⊂NP: that is, Π∈NP, and (ii) show that a knownNP-complete problem reduces to Π. there are problems inNP which are not inP. However, this is a major unsolved problem. 5.7 Examples of NP-complete problems NP-complete Satisfiability(Cook(1971) GivenapropositionalformulaewithAND’s,NOT’s, NP-hard OR’s and Boolean (T or F) variables X ,X ,...,X , for example, 1 2 r NP (X ORNOTX )AND(X ANDX ) 1 2 3 4 P is there an assignment of truth values to the variables that makes the formulae true? (e.g. X =X =X =X =T in the above example.) 1 2 3 4 Hamiltonian circuit Given a graph G. Is there a set of edges forming a tour 5.5 Polynomial reduction of all the vertices? To see that an instance of this is no harder than as TSP, think ofaTSPinstancewithd(s ,s )= 1ifthereisanedgefromitoj andd(s ,s )= 2 i j i j When is problem Π no harder than another problem Π ? We say that Π re- 1 2 1 if there is not. Ask, ‘is there a tour of length≤n?’ duces to Π if we can construct an algorithm for Π as follows. 2 1 ′ 1. Makea(polynomialtime)transformationoftheinstanceofΠ intoaninstance 1 Subgraph isomorphism Given two graphsG, G . DoesG containa subgraph ′ ′ of Π . 2 isomorphic to G? Interestingly, Graph ismorphism (i.e., ‘Are graphs G and G isomorphic?’) is known to be in NP, but it is suspected to be neither in P or 2. Apply some algorithm to solve this instance of Π . 2 NP-complete. 3. Make a (polynomial time) transformation of this solution of Π back into a 2 solution of Π . Clique decision problem Given a graph G and number k. Does G contain a 1 clique of size k? (i.e., k vertices all pairs of which are connected together). E.g., TheideaisthatwecanuseanalgorithmthatsolvesΠ tosolveΠ ,withadditional 2 1 below left: k = 3, yes; k =4, no. work in steps 1 and 3 that requires at most polynomial time. Thus Π is really 1 no harder than Π . 2 5.6 NP-completeness Now we can talk about the hardest problems in NP. A problem Π is said to be NP-hard if every problem in NP can be reduced to it. It is said to be NP- complete if moreover Π∈NP. Thus all NP-complete problems can be reduced Vertex cover decision problem Given a graph G and number k. Is there a to one another and are as difficult as all problems in NP. set of k vertices such that every edge of G starts or finishes at one of them? Such TherearemanyNP-completeproblems. LPinwhichallvariablearerestricted a set of vertices is called a vertex cover. E.g., above right: k =2, no; k =3, yes. to be0 or1isNP-complete. TSPisNP-complete. SoallproblemsinNP arenoThe complexity of LP 24 6 Computational Complexity of LP worst-casetimecomplexity. Observethattheinitialandfinalverticesareadjacent and a different pivoting rule could reach the optimum in only one step. However, for all common pivoting rule that have been studied there is some instance for 6.1 Running time of the simplex algorithm whichthenumberofstepsuntilterminationisexponential. Itisunknownwhether there is a pivoting rule that might make the simplex algorithm more efficient. n Worst-case running time This is related to the Hirsch conjecture: that for a polytope inR defined by m inequality constraints the number of pivots required to move from one bfs to The simplex algorithmmoves from one basic feasible solution to an adjacent one, another bfs is never more than m−n. each time improving the value of the objective function. However, it can take an exponentially large number of steps to terminate. d Average-case running time Suppose the feasible region is the cube inR defined by the constraints Not all algorithms with exponential time complexity are bad in practice. Often, 0≤x ≤ 1, i = 1,...,d i they canhaveverygoodperformance. Thesimplex algorithmappears to perform d well on average. The difficulty is defining what is meant by ‘on average’. and we seek to maximize x . There are 2 vertices. The paths shown below visit d all vertices before terminating at (0,0,...,1). 6.2 The complexity of LP x 3 d = 2 d = 3 x x 2 2 We shall investigate alternative algorithms for solving LP. In particular, we seek a polynomial time algorithm. If we can show that LP ∈ P then this is likely to tell us something about good algorithms in practice. There are two classes of x x 1 1 methods for solving LP. Given 0ǫ1/2, consider the perturbed unit cube given by the constraints ǫ≤x ≤ 1, ǫx ≤x ≤1−ǫx i = 2,...,d. 1 i−1 i i−1 It can be verified that the cost function increases strictly with each move along the path. For example, for d= 2 we have Boundary value methods Interior point methods x 2 2 (ǫ,1−ǫ ) (1,1−ǫ) D C The size of an LP instance Any non-negative integer, r, (r≤U) can be written in binary form B A (1,ǫ) k k−1 1 0 log U 2 x r =a 2 +a 2 +···+a 2 +a 2 ≤ 2 1 k k−1 1 0 0 2 (ǫ,ǫ ) where a ,a ,...,a are 0 or 1. The number k is at most ⌊log U⌋. Thus, using 0 1 k 2 Note that x increases along the route ABCD. So if our pivoting rule is always to an extra bit for the sign, we can representany integerr wherer≤U by at most 2 move to the adjacent bfs for which the entering variable has the least index (so- (⌊log U⌋+2) bits. 2 d called Bland’s rule), then the simplex algorithm will require 2 −1 pivoting steps An instance of an LP problem is givenby am×n matrixA, a m-vectorb and beforeterminating. Withthispivotingrulethesimplexalgorithmhasexponential a n-vector c. So, assuming that the largest magnitude of any of the components 2325 Computational Complexity of LP Intuitive description of the ellipsoid method 26 is U, an LP instance has a size in bits of 6.5 Intuitive description of the ellipsoid method (mn+m+n)(⌊log U⌋+2). We generate a sequence of ellipsoids E . E has centre x , such that t t t 2 • If x ∈P, then P is non-empty and the method stops. t 6.3 Feasibility problem • If x 6∈P, then there is a violated constraint with A x b , where A is some t i t i i row of A and b is the matching element of b. i Consider the primal/dual pair: n Thus, P is contained in the half-spacex∈R : A x≥A x . Call the intersec- i i t ⊤ P : minimizec x : Ax≥b tion of this half-space with E a half-ellipsoid. t ⊤ ⊤ We construct the ellipsoid E in such a way that it covers this half-ellipsoid t+1 D : maximizeb λ : A λ=c,λ≥ 0. and has volume only a fraction of that of E . t By the strong duality of linear programming, each problem has an optimal We repeat this procedure until either we find a point ofP or weconclude that solution if and only there is a feasible solution to the volume of P is very small and therefore can be taken as empty. ⊤ ⊤ ⊤ b λ =c x Ax≥b, A λ=c, λ≥ 0. Thus we can solve LP if we can solve a feasibility problem like this. We shall E t+1 therefore focus on feasibility and the decision problem n P Is the polyhedron P =x∈R : Ax≥b non-empty? E t x t+1 The algorithm that we shall use is known as the ellipsoid method. x t ⊤ A xb i i 6.4 Preliminaries for ellipsoid method ⊤ ⊤ A xA x t Definitions 6.1 i i 1. Let D be a n×n positive definite symmetric matrix. The set The key result is as follows. n ⊤ −1 E =E(z,D)=x∈R : (x−z) D (x−z)≤1 n Theorem 6.1 Let E = E(z,D) be an ellipsoid in R , and a be a non-zero n- n n ⊤ ⊤ is called an ellipsoid with centre at z∈R . vector. Consider the half-space H =x∈R : a x≥a z and let n n n 1 Da 2. Let D be a n×n non-singular matrix and t∈R . The mapping S :R 7→R z =z+ √ , ⊤ n+1 defined by S(x) =Dx+t is called an affine transformation. a Da   2 ⊤ n 2 Daa D n 3. The volume of a set L⊂R , denoted by Vol(L), is defined by D = D− . 2 ⊤ n −1 n+1 a Da Z ′ Vol(L)= dx. Then the matrix D is symmetric and positive definite and thus E = E(z,D) is x∈L an ellipsoid. Moreover, ′ WeshallusetheresultthatifS isgivenbytheaffinetransformationS(x) =Dx+t (a) E∩H ⊂E , then ′ −1/(2(n+1)) (b) Vol(E )e Vol(E). Vol(S(L)) =det(D)Vol(L).′ Proof that E∩H ⊂E 28 ⊤ 7 The Ellipsoid Method (c) If x 6∈P find a violated constraint, i, such that A x b . t t i i n ⊤ ⊤ (d) Let H =x∈R : A x≥A x . t t i i Construct an ellipsoid E =E(x ,D ) containing E ∩H with 7.1 Ellipsoid algorithm t+1 t+1 t+1 t t 1 D A t i Khachiyan’s ellipsoid method (1979): x =x + p , t+1 t ⊤ n+1 A D A t i i   Input 2 ⊤ n 2 D A A D t i t i D = D − t+1 t 2 ⊤ n ⊤ n −1 n+1 (a) A matrix A and vector b defining the polyhedron P = x ∈ R : A x ≥ A D A t i i i b ,i =1,...,m. i (e) t:=t+1, return to (a). (b) A number v, such that either P is empty or Vol(P)v. 2 ′ (c) An ellipsoid (in fact, a ball) E = E(x ,r I) with volume at most V, such 0 0 7.2 Proof that E∩H ⊂E that P ⊂E . 0 ⊤ First, consider the case z = 0, D =I and a =e = (1,0,...,0) . So, E =x∈ 1 0 ∗ n ⊤ n Output A feasible point x ∈ P if P is non-empty, or a statement that P is R : x x≤ 1 and H =x∈R : x ≥ 0. 0 1 empty. 2 −n −n (n+1) We will show subsequently that v =n (nU) Vol(E ∗) and t ′ E 0 E 0 2 n n V = (2n) (nU) Vol(E )Vol(E )··· Vol(E ∗). 0 1 t E 1 x 1 x 0 P Hence, E ∗    t 2 E 0 e n 2 1 ′ ⊤ E =E , I− e e . 1 0 1 2 n+1 n −1 n+1 Re-writing this, we have ( )     2 2 n 2 X n+1 1 n −1 ′ n 2 E = x∈R : x − + x ≤ 1 Initialize step Let 1 0 i 2 n n+1 n i=2 ∗ 2 2 ( t =⌈2(n+1)log(V/v)⌉, E =E(x ,r I), D =r I, t= 0. n 0 0 0 2 X n −1 2(n+1) n 2 2 = x∈R : x + x i 1 2 2 n n Main iteration i=1 )     2 ∗ (a) If t=t stop; P is empty. n+1 2x 1 1 + − + ≤ 1 2 n n+1 (n+1) (b) If x ∈P stop; P is non-empty. t 276 ′ −1/(2(n+1)) 29 The Ellipsoid Method Proof that Vol(E )e Vol(E) 30 ′ −1/(2(n+1)) So that 7.3 Proof that Vol(E )e Vol(E) ( ) n 2 X n −1 1 2(n+1) ′ n 2 We have that E = x∈R : x + + x (x −1)≤1 . 0 i 1 1 ′ ′ ′ 2 2 2 Vol(E ) Vol(T(E )) Vol(E ) n n n 0 i=1 = = . Vol(E) Vol(T(E)) Vol(E ) 0 Now suppose x ∈ E ∩H . Then 0 ≤ x ≤ 1, and so x (x −1) ≤ 0. Also, 0 0 1 1 1 P Now, n 2    2 x ≤ 1. Hence i i=1 e n 2 1 ′ ⊤ E =E , I− e e . 1 0 1 2 n n+1 n −1 n+1 2 2 X n −1 1 2(n+1) n −1 1 2 x + + x (x −1)≤ + = 1, 1 1 i So, introduce the affine transformation 2 2 2 2 2 n n n n n i=1    1/2 2 ′ ′ e n 2 1 which verifies that x∈E , proving that E ∩H ⊂E . ⊤ 0 0 0 0 F(x) = + I− e e x. 1 1 2 n+1 n −1 n+1 Now, consider the general case and construct an affine transformation T(·) ′ ′ ′ such that T(E)=E , T(H)=H and T(E )=E . The result then follows since 0 0 0 One can easily check that E =F(E ). So 0 0 n affine transformations preserve set inclusion, i.e. if A ⊂ B ⊂ R and T(·) is an s    affine transformation, then T(A)⊂T(B). 2 n 2 ′ ⊤ Vol(E ) = det I− e e Vol(E ) 1 0 Given E =E(z,D), introduce the affine transformation 0 1 2 n −1 n+1 −1/2 T(x)=RD (x−z) Hence, 1/2 where R is a rotation matrix which rotates the unit ball so that D a is aligned     n/2 1/2 ′ 2 ⊤ Vol(E ) n 2 0 with the unit vector e = (1,0,...,0) i.e. 1 = 1− 2 Vol(E ) n −1 n+1 0 ⊤ 1/2 1/2   R R=I and RD a=D e (n−1)/2 1 2 n n = 2 n+1 n −1 So now, T(E)=E since 0    (n−1)/2 1 1 ⊤ −1 x∈E ⇐⇒ (x−z) D (x−z)≤1 = 1− 1+ 2 n+1 n −1 −1/2 ⊤ −1/2 ⇐⇒ (x−z)D R RD (x−z)≤ 1   (n−1)/2 2 −1/(n+1) 1/(n −1) e e −1/2 ⇐⇒RD (x−z)∈E 0 −1/(2(n+1)) =e , ⇐⇒T(x)∈E . 0 a Similarly, T(H)=H since using (twice) the inequality 1+ae (a= 0). Therefore, 0 ⊤ ′ x∈H ⇐⇒a (x−z)≥ 0 Vol(E ) −1/(2(n+1)) e . ⊤ 1/2 ⊤ −1/2 Vol(E) ⇐⇒a D R RD (x−z)≥ 0 ⊤ ⇐⇒e T(x)≥ 0 1 ⇐⇒T(x)∈H 0 ′ ′ ′ Similarly, onecan showT(E )=E . Above, weprovedthatE ∩H ⊂E , which 0 0 0 0 ′ ′ is equivalent to T(E)∩T(H)⊂T(E ), which implies E∩H ⊂E .6 The bound v 32 8 Complexity of the Ellipsoid Algorithm P is nonempty if and only if it contains an extreme point. Hence, we can test for 2n the emptiness ofP insteadofP. ButP is containedin theballE(0,n(nU) I) B B whose volume is at most 8.1 The bound V 2 n n n n V =(2n(nU) ) =(2n) (nU) . m Lemma 8.1 Let A be a m×n integer matrix and let b be a vector inR . Let U be the largest absolute value of the entries of A and b. Then, 8.2 The bound v n (a) Every extreme point of the polyhedron P =x∈R : Ax≥b satisfies We say a polyhedron P is full-dimensional if it has positive volume. For ex- n n −(nU) ≤x ≤(nU) , j = 1,...,n. ample, P = (x ,x ) : x +x = 1,x ,x ≥ 0 has Vol(P) = 0 and so is not j 1 2 1 2 1 2 full-dimensional. n (b) Every extreme point of the (standardized) polyhedron P =x∈R : Ax =b n Lemma 8.2 Let P =x∈R : Ax≥b and assume that A and b have integer satisfies m m entries which are bounded in absolute value by U. Let −(mU) ≤x ≤(mU) , j = 1,...,n. j 1 −(n+1) n ǫ= (n+1)U , P =x∈R : Ax≥b−ǫe ǫ Proof. Consider first (a). We are assuming here that m n. Let x be an 2(n+1) extreme point of P. Choose n linearly independent active constraints and write ⊤ where e =(1,1,...,1). Then b b b b Ax = b where A is n×n invertible submatrix of A and b is the matching n- −1 b b dimensional subvector of b. So, we have x=A b. (a) If P is empty, then P is empty. ǫ By Cramer’s rule, we can write the solution (b) If P is non-empty, then P is full-dimensional. ǫ j b det(A ) x = , j b Proof of (a) Suppose P is empty and consider the infeasible linear program det(A) ⊤ ⊤ ⊤ ⊤ minimize0 x : Ax ≥ b and its dual maximizeλ b : λ A = 0 , λ ≥ 0. j b b b where A is the same as A except that the jth column is replaced by b. Now Since the primal is infeasible the dual problem has value +∞. Therefore, there exists a λ≥ 0 with n X Y ⊤ ⊤ ⊤ λ A =0 λ b = 1. j σ n n b det(A ) = (−1) ba ≤nU ≤ (nU) , j =1,...,n, i,σ(i) ⊤ ⊤ b So,usingthepreviouslemma,wecanfindabfsλtotheconstraintsλ A= 0 , σ i=1 ⊤ λ b= 1, λ≥0 such that where σ is one of the n permutations of 1,...,n, with σ giving the number of n+1 b λ ≤ ((n+1)U) , for all i. inversions (i.e., ij and σ(i)σ(j)). i b b Finally, since A is invertible, det(A) = 0 and all entries in A are integer so b Since λ is a bfs, at most n+1 of its components are non-zero so that b det(A)≥1. Therefore, the extreme point x satisfies m X n+1 n b λ ≤ (n+1)((n+1)U) . x ≤ (nU) , for all j. i j i=1 Exactly,thesameargumentmaybeusedfor(b)exceptthatweuseabasismatrix Therefore, A . A is m×m and we can replace n by m throughout. m B B X 1 ⊤ b b λ (b−ǫe)= 1−ǫ λ ≥ 0. n i By part (a) of the previous lemma all the extreme points of P = x ∈R : 2 i=1 Ax≥b are contained in the bounded polyhedron P defined by B Hence, when we replace b by b−ǫe the value of the dual remains +∞ and the n P =x∈P : x ≤(nU) , j = 1,...,n. primal problem is again infeasible and P is also empty. B j ǫ 3133 Complexity of the Ellipsoid Algorithm Sliding objective ellipsoid method 34 Proof of (b) Let x be an element of P so that Ax≥b. Let y be a vector such a certain accuracy it will still correctly decide whether P is nonempty. At each ⊤ 2 that stepthecalculationofA D A requires(n )operations,whichmakestherunning t i i ǫ ǫ 6 x − ≤y ≤x + , for all j. time of the algorithm O(n log(nU)). j j j nU nU Then the ith row of Ay satisfies 8.4 Sliding objective ellipsoid method n n n X X X ǫ a y ≥ a x − a ij j ij j ij ⊤ Suppose we wish to solve the problem minimizec x :Ax≥b,x≥ 0. First, use nU j=1 j=1 j=1 n the ellipsoid method to find a feasible solution x ∈ P, where P = x ∈ R : 0 ǫ ≥b − nU Ax≥b. Now we apply the ellipsoid method again (note the strict inequality) to i nU the new polyhedron given by =b −ǫ. i n ⊤ ⊤ P ∩x∈R : c xc x . 0 Therefore, any such vector y belongs to P and the set of all such vectors y (a ǫ n cube) has positive volume (of (2ǫ/nU) ) and so is full-dimensional. Ifthisisemptythenx isoptimal. Otherwise,wehaveanewsolutionx ∈P,say, 0 1 ⊤ with strictly smaller objective function than c x . Now we reapply the ellipsoid 0 The following lemma can also be proved. method to the new polyhedron. n Lemma 8.3 Let P =x∈R : Ax≥ b be a full-dimensional bounded polyhe- dron, where the entries of A and b are integer and have absolute value bounded −c by U. Then, 2 −n −n (n+1) Vol(P)n (nU) . E t+1 ⊤ ⊤ x t+1 c xc x t+1 x t 8.3 Running time of the ellipsoid method ⊤ ⊤ c xc x P t We have the values E t 2 2 n n −n −n (n+1) V = (2n) (nU) and v =n (nU) ∗ and know that the ellipsoid method takes at most t =⌈2(n+1)log(V/v)⌉ steps. ∗ 4 This gives t =O(n log(nU)). At each iterationwe add a new constraint in the direction of the vectorc. All the In practice, we apply the ellipsoid algorithm to P . We know that P is ǫ ⊤ ⊤ constraints c x c x are parallel to one another. One can show that by this t nonemptyifandonlyifP isnonempty. LetP =x:(1/ǫ)A≥ (1/ǫ)b−e. Recall ǫ ǫ procedure we reach the optimum in polynomial running time. U is an integer,so 1/ǫis also an integer andso this writes the constraintsofP so ǫ that all coefficients are integers that are bounded in magnitude by U =U/ǫ. We ǫ 2 n n know that P is contained in a ball of volume at most V = (2n) (nU ) . Also, ǫ ǫ ǫ n when P is nonempty P has volume of at least v = (2ǫ/nU) (by the proof in ǫ ǫ part (b) of Lemma 8.2). Thus the ellipsoidalgorithmapplied toP will terminate ǫ 4 in a number of steps at most ⌈2(n+1)log(V /v )⌉ =O(n log(nU)). ǫ ǫ There are a few wrinkles to consider: on a computer we cannot actually cal- p ⊤ culate the square root A D A that is needed to find x from x . There is t i t+1 t i also a worry that in multiplying numbers together we might be forced to use ones U as large as 2 . However, it can be shown that if we carry out the algorithm toSpanning tree solutions 36 9 The Network Simplex Algorithm subject to X X b + f = f , for all i∈N i ji ij j:(j,i)∈A j:(i,j)∈A 9.1 Graph terminology m ≤f ≤M , for all (i,j)∈A. ij ij ij The next four lectures are about network flow problems. They include trans- Thesesay that flowsmust be feasibleandconserveflowat eachnode. For feasible P portation, assignment, maximum flow and shortest path problems. flows to exist we must also have b = 0. An important special case is that i i∈N AgraphG= (N,A)consistsofasetof nodes,N,andasetof arcs,A. Inan of uncapacitated flows, m = 0 and M =∞. ij ij undirectedgraphthearcsareunorderedpairsofnodesi,j∈A,i,j ∈N. Ina Note that the minimum cost flow problem is a special form of linear pro- directedgraph(alsocalledanetwork)thearcsareorderedpairsofnodes(i,j). gram. Its simple structure allows for special algorithms. Constraints are of the Awalk is an ordered list of nodes i ,i ,...,i such that, in an undirected graph, 1 2 t form Ax=b, where  i ,i ∈A, or, ina directedgraph,that either (i ,i )∈A or(i ,i )∈A, k k+1 k k+1 k+1 k  +1 node i is start of kth arc; for k = 1,...,t−1. A walk is a path if i ,i ,...,i are distinct, and a cycle if 1 2 k (A) = −1 node i is end of kth arc; ik i ,i ,...,i are distinct and i = i . A graph is connected if there is a path 1 2 k−1 1 k  0 otherwise. connecting every pair of nodes. 9.3 Spanning tree solutions 1 1 4 2 5 Assume that the network is connected. A spanning tree solution, f , is one ij that can be constructed as follows 2 4 1. Pick a set T ⊂ A of n− 1 arcs forming a spanning tree and partition the 3 5 3 6 remaining arcs A\T into the two sets L and U. 2. Set f =m for each arc (i,j)∈L and f =M for each arc (i,j)∈U. a directed graph a spanning tree (dotted) ij ij ij ij 3. Usetheflowconservationconstraintsto determinetheflowsf forarcs(i,j)∈ ij A network is acyclic if it contains no cycles. A network is a tree if it is T. We begin by determining the flows on arcs incident to leaves of the tree T. ′ ′ ′ connected and acyclic. A network (N ,A ) is asubnetwork of (N,A) if N ⊂N Subsequently we determine the flows on other arcs of T. ′ ′ ′ ′ andA ⊂A. Asubnetwork(N ,A)isaspanningtreeifitisatreeandN =N. A spanning tree solution with m ≤f ≤ M is a feasible spanning tree ij ij ij solution. 9.2 The minimum cost flow problem T 1 2 5 Let f denote the amount of flow of some material on arc (i,j) ∈ A. Let b , ij i i∈N, denote theamountof flowthat enters thenetworkatnodei∈N. Ifb 0 i 6 3 we say the node is a source (supplying b units of flow). If b 0 we say that i i the node is a sink (with a demand of b units of flow). i 4 7 Suppose there is a cost of c per unit flow on arc (i,j)∈A. The minimum ij cost flow problem is X Theorem 9.1 A flow vector is a spanning tree solution if and only if it is a minimize c f ij ij basic solution. (i,j)∈A 356 37 The Network Simplex Algorithm Finding the initial feasible tree solution 38 9.4 Optimality conditions cycle (i.e., by 1). This produces the flows shownin the right diagram. The tree is now arcs (1,3), (1,2). We recalculate: λ = 0, λ =−3 and λ =−1. The value 1 1 2 Consider the Lagrangian of the minimum cost flow problem of c −λ +λ = 2−(−3)+(−1) = 4. Since this is 0 we want flow on (2,3) 23 2 3   be minimal, which it is. So we now have the optimal solution. X X X X (1,1,3) (1,1,3)   2 2 L(f; λ)= c f − λ f − f −b ij ij i ij ji i 6 6 1 3 1 3 (i,j)∈A i∈N j:(i,j)∈A j:(j,i)∈A 1 2 X X = (c −λ +λ )f + λ b . ij i j ij i i 5 1 4 0 (3,3,7) (3,3,7) (2,0,3) (2,0,3) (i,j)∈A i∈N 4 4 Minimizing L(f; λ) over m ≤ f ≤ M gives dual feasibility and comple- ij ij ij 2 2 mentary slackness conditions: c¯ =c −λ +λ 0 =⇒f =m ij ij i j ij ij 9.6 Finding the initial feasible tree solution c¯ =c −λ +λ 0 =⇒f =M ij ij i j ij ij c¯ =c −λ +λ = 0⇐=m f M ij ij i j ij ij ij 1. Every network flow problem can be reduced to one with exactly one source node and one sink node (by adding in two nodes). Observethat if T is a spanning tree then we can solve the following equations 2. Every network flow problem can be reduced to one without sources or sinks in a unique way, where n =N. (by connecting the above two nodes with an edge). The constraints are just λ = 0, λ −λ =c , for all (i,j)∈T n i j ij Af = 0. Any f satisfying this is called a circulation and such flow problems are called circulation problems. 3. In the case that m = 0, for all i,j, the zero flow is a feasible tree solution. 9.5 Pivoting to change the basis ij If m = 0 for some arc (i,j) we can replace the flows by f −m and adjust ij ij ij We compute the reduced costs c¯ = c − (λ −λ ) for each arc (i,j) 6∈ T. the supplies b accordingly. ij ij i j i Recall c¯ = 0 for all arcs (i,j)∈T by construction. ij If c¯ ≥ 0 for all (i,j) ∈ L and c¯ ≤ 0 for all (i,j) ∈ U then the current ij ij 9.7 Integrality of optimal solutions basic feasible solution is optimal. Otherwise, choose an arc (i,j) where there is a violation. This arc together with the tree T forms a cycle. Add (or subtract) as Suppose the input data (m , M and b ) are all integers. Then the above algo- ij ij i much flow as possible around this cycle so as to increase (or decrease) f . Note ij P P rithmleadstooptimalintegersolutions. Therearenomultiplicationsordivisions. that c¯ = c = c¯ , where the sums are taken around the arcs of the kℓ kℓ ij kℓ kℓ cycle. Thus if c¯ is negative we can decrease the total cost by increasing the flow ij Theorem 9.2 (Integrality theorem) For every network flow problem with in- f . Similarly, if c¯ is positive we can decrease cost by decreasing the f . ij ij ij teger data, every basic feasible solution and, in particular, every basic optimal solution assigns integer flow to every arc. Example Considerthe minimum costflowproblembelow. On eacharcwegive Thistheoremisimportantforthemanypracticalproblemsinwhichaninteger thevaluesof(c ,m ,M ). Thereisb = 6,b =−4,andb =−2. Thespanning ij ij ij 1 2 3 solution is required for a meaningful interpretation (for example, the assignment tree consists of 2 arcs (shown undashed). In the left hand figure, we set λ = 0 1 problems). Later, we investigate linear programming problems subject to the and find λ = −3 (so c = 3 = λ −λ ). Similarly, λ =−5. On the arc (1,3) 2 12 1 2 3 additional constraint that the solution be in integers. Such problems are usually the valueofc −λ +λ = 1−(0)+(−5)=−4. Sincethis is 0 wecandecrease 13 1 3 much harder to solve than the problem without the integer constraint. However, cost by increasing f . Inserting the arc (1,3) into the tree produces the cycle 13 for network flow problems we get integer solutions for free. (1,3,2,1). We increase the flow f as much as possible shifting flow around this 13supplies s i Tableau form 40 10 Transportation and Assignment Proof. Consider the minimum cost flow problem with m = 0, M ∞, and ij ij input data G = (N,A), M , c and b . For every arc (i,j) ∈ A construct a ij ij i Problems source node with supply M . For every node i ∈ N construct a sink node with ij P demand M −b . Now connect every source node (i,j) to each of the ik i k:(i,k)∈A sink nodes i and j with infinite upper bounds on capacities. Let c = 0 and ij,i 10.1 Transportation problem c =c . ij,j ij In the transportation problem there are m suppliers of a good and n customers. demand Suppose supplier i producess units of the good, customer j demands d units of i j P supply M −b the good, and there is a balance between demand and supply so that i ik i k 0 m n X X i,j c ij s = d . i j M P ij i=1 j=1 j M −b jk j k Suppose the cost of transporting a unit of good from supplier i to consumer j is c . The problem is to match suppliers with consumers to minimize the total ij transportation cost. We can easily formulate the transportation problem as a There is a 1-1 correspondence between feasible flows in the two problems and minimum cost flow problem as follows these flows have the same costs. To see this put a flow of f on the arc from i,j ij m n to j, and a flow of M −f on the arc from i,j to i. The total amount flowing XX ij ij P P P minimize c f ij ij into node i is then (M −f )+ f , which must equal M −b . Thus ij ij ji ij i j j j i=1 j=1 we have the flow conservation constraints of the minimum cost flow problem. For this reason new algorithms are often first tested on transportation prob- subject to lems. The case in which there is an arc from every supplier to every consumer is m n X X known as the Hitchcock transportation problem. f =d , j =1,...,n, f =s , i= 1,...,m, ij j ij i i=1 j=1 f ≥ 0, for all i,j. ij 10.2 Tableau form Thisisaspecialcaseoftheminimumcostflowproblemwithm = 0,M =∞ ij ij It is convenient to present the input data and spanning tree solutions (i.e., the and the graph structure of a bipartite graph. That is, the nodes divide into bfs’s) for the transportation problem in tableau form. (This is a different form of disjoint sets S (suppliers) and C (customers) and and A⊂ S×C (the only arcs tableau to that of the simplex tableau). We express the input data in a tableau are those which connect suppliers to consumers). costs c ij Suppliers Customers λ s j −5 −1 7 −5 −1 7 s 1 1 λ s i 0 7 1 −7 7 0 7 1 −7 d 1 1 5 6 1 5 6 1 s 2 2 3 −4 11 3 −4 3 8 3 8 8 4 3 8 4 3 d 2 2 −θ +θ 13 13 2 8 14 4 18 8 16 s 3 1 9 1 1 9 1 3 +θ −θ 12 −5 13 18 14 12 2 7 12 1 3 6 1 3 6 10 22 16 Lemma 10.1 Every minimum cost flow problem is equivalent to a transportation demands d problem. j 39

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.