what is discrete optimization problem

discrete optimization branch and bound method discrete and continuous optimization based on multi-swarm coevolution
ImogenCameron Profile Pic
ImogenCameron,France,Teacher
Published Date:14-07-2017
Your Website URL(Optional)
Comment
Document last modified: January 10, 2012 Lecture Notes of the Master Course: Discrete Optimization Utrecht University Academic Year 2011/2012 Course website: http://www.cwi.nl/˜schaefer/courses/lnmb-do11 Prof. dr. Guido Scha ¨fer Center for Mathematics and Computer Science (CWI) Algorithms, Combinatorics and Optimization Group Science Park 123, 1098 XG Amsterdam, The Netherlands VU University Amsterdam Department of Econometrics and Operations Research De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands Website: http://www.cwi.nl/˜schaefer Email: g.schaefercwi.nl Document last modified: January 10, 2012 Disclaimer: These notes contain (most of) the material of the lectures of the course “Dis- crete Optimization”, given at Utrecht University in the academic year 2011/2012. The course is organized by the Dutch Network on the Mathematics of Operations Research (LNMB) and is part of the Dutch Master’s Degree Programme in Mathematics (Master- math). Note that the lecture notes have undergone some rough proof-reading only. Please feel free to report any typos, mistakes, inconsistencies, etc. that you observe by sending me an email (g.schaefercwi.nl).Note that these Lecture Notes have been last modified on January 10, 2012. I guess most of you will have printed the Lecture Notes of December 6, 2011 (see “document last modified” date on your printout). I therefore keep track of the changes that I made since then below. I also mark these changes as “major” or “minor”, depending on their importance. Changes made with respect to Lecture Notes of December 6, 2011: • major Introducing the notion of a pseudoflow on page 42: corrected “it need to satisfy the flow balance constraints” to “it need not satisfy the flow balance constraints”. Changes made with respect to Lecture Notes of December 20, 2011: (Thanks to Rutger Kerkkamp for pointing these out.) • minor Symbol for symmetric difference on page 4 is now the same as the one used in Chapter 7 (“△ ”). • minor Strictly speaking we would have to add a constraint x ≤ 1 for every j∈ j 1,...,n to the LP relaxation (2) of the integer linear program (1) on page 5. However, these constraints are often (but not always) redundant because of the minimization objective. Note that the discussion that follows refers to the LP relaxation as stated in (2) (see also remark after statement of LP (2)). • minor Corrected “multiplies” to “multipliers” in last paragraph on page 5. • minor Lower part of Figure 2 on page 2: changed edge e from solid to dotted line. • minor At various places in Chapter 3, Condition (M1) of the independent set sys- tem has not been mentioned explicitly (Example 3.1, Examples 3.3–3.5, Theorem 3.1). Mentioned now. • minor Corrected “many application” to “many applications” in first paragraph on page 27. • minor Corrected “δ(v)” and “δ(u)” to “δ(s,v)” and “δ(s,u)” in last paragraph of the proof of Lemma 5.5. on page 34 (3 occurrences). • minor Algorithms 7 and 8 should mention that the capacity function is non- negative. • major There was a mistake in the objective function of the dual linear program (6) on page 46: The sign in front of the second summation must be negative. • minor Algorithms 9, 10 and 11 should mention that the capacity function is non- negative. • minor Proof of Theorem 7.3 on page 56: corrected “O(mα(n,m))” to m “O(mα(n, ))”. n • minor Caption of the illustration on page 71 has been put on the same page. • minor Typo in the second last sentence of the proof of Theorem 6.7 on page 47: π π “c (u,v) 0” corrected to “c (u,v)≤ 0”. • minor Statement of Theorem 10.8 on page 90 has been corrected (FPTAS instead of 2-approximation).Changes made with respect to Lecture Notes of January 5, 2012: (Thanks to Tara van Zalen and Arjan Dijkstra for pointing these out.) • minor Example 3.2 on page 21 (uniform matroids): corrected “I=I⊆ SS≤ k” to “I=I⊆ SI≤ k” • minor Statement of Lemma 5.4 on page 32: corrected “The flow of any flow” to “The flow value of any flow”. • minor There was a minus sign missing in the equation (7) on page 43. • minor Second paragraph of Section 8.3 on page 60: Corrected “hyperplance” to “hyperplane”. • minor Removed sentence “We state the following proposition without proof” before statement of Lemma 8.4 on page 62. • Corrected “Note hat b is integral” to “Note that b is integral” in the proof of The- orem 8.7 on page 63. Please feel free to report any mistakes, inconsistencies, etc. that you encounter. Guido iiContents 1 Preliminaries 1 1.1 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Algorithms and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Growth of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Sets, etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.6 Basics of Linear Programming Theory . . . . . . . . . . . . . . . . . . . 5 2 Minimum Spanning Trees 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Coloring Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Kruskal’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Prim’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Matroids 14 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Matroids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Greedy Algorithm for Matroids . . . . . . . . . . . . . . . . . . . . . . . 16 4 Shortest Paths 18 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Single Source Shortest Path Problem . . . . . . . . . . . . . . . . . . . . 18 4.2.1 Basic properties of shortest paths . . . . . . . . . . . . . . . . . 19 4.2.2 Arbitrary cost functions . . . . . . . . . . . . . . . . . . . . . . 21 4.2.3 Nonnegative cost functions . . . . . . . . . . . . . . . . . . . . . 22 4.3 All-pairs Shortest-path Problem . . . . . . . . . . . . . . . . . . . . . . 23 5 Maximum Flows 27 iii5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Residual Graph and Augmenting Paths . . . . . . . . . . . . . . . . . . . 28 5.3 Ford-Fulkerson Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.4 Max-Flow Min-Cut Theorem . . . . . . . . . . . . . . . . . . . . . . . . 31 5.5 Edmonds-Karp Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 33 6 Minimum Cost Flows 36 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.2 Flow Decomposition and Residual Graph . . . . . . . . . . . . . . . . . 37 6.3 Cycle Canceling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.4 Successive Shortest Path Algorithm . . . . . . . . . . . . . . . . . . . . 42 6.5 Primal-Dual Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 7 Matchings 51 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Augmenting Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.3 Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.4 General Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 8 Integrality of Polyhedra 58 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 8.2 Convex Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 8.3 Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 8.4 Integral Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8.5 Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 9 Complexity Theory 67 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 9.2 Complexity Classes P and NP . . . . . . . . . . . . . . . . . . . . . . . 68 9.3 Polynomial-time Reductions and NP-completeness . . . . . . . . . . . . 70 iv9.4 Examples of NP-completeness Proofs . . . . . . . . . . . . . . . . . . . 72 9.5 More on Complexity Theory . . . . . . . . . . . . . . . . . . . . . . . . 76 9.5.1 NP-hard Problems . . . . . . . . . . . . . . . . . . . . . . . . . 76 9.5.2 Complexity Class co-NP . . . . . . . . . . . . . . . . . . . . . . 77 9.5.3 Pseudo-polynomiality and Strong NP-completeness . . . . . . . . 78 9.5.4 Complexity Class PSPACE . . . . . . . . . . . . . . . . . . . . . 79 10 Approximation Algorithms 80 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 10.2 Approximation Algorithm for Vertex Cover . . . . . . . . . . . . . . . . 81 10.3 Approximation Algorithms for TSP . . . . . . . . . . . . . . . . . . . . 82 10.4 Approximation Algorithm for Steiner Tree . . . . . . . . . . . . . . . . . 85 10.5 Approximation Scheme for Knapsack . . . . . . . . . . . . . . . . . . . 87 10.5.1 Dynamic Programming Approach . . . . . . . . . . . . . . . . . 88 10.5.2 Deriving a FPTAS for Knapsack . . . . . . . . . . . . . . . . . . 89 v1. Preliminaries 1.1 Optimization Problems We first formally define what we mean by an optimization problem. The definition be- low focusses on minimization problems. Note that it extends naturally to maximization problems. Definition 1.1. A minimization problem Π is given by a set of instancesI. Each instance I∈I specifies • a setF of feasible solutions for I; • a cost function c :F→R. Given an instance I =(F,c)∈I, the goal is to find a feasible solution S∈F such that c(S) is minimum. We call such a solution an optimal solution of I. In discrete (or combinatorial) optimization we concentrate on optimization problems Π, where for every instance I =(F,c) the setF of feasible solutions is discrete, i.e.,F is finite or countably infinite. We give some examples below. Minimum Spanning Tree Problem (MST): Given: An undirected graph G=(V,E) with edge costs c : E→R. Goal: Find a spanning tree of G of minimum total cost. We have F =T⊆ E T is a spanning tree of G and c(T)= c(e). ∑ e∈T Traveling Salesman Problem (TSP): Given: An undirected graph G=(V,E) with distances d : E→R. Goal: Find a tour of G of minimum total length. Here we have F =T⊆ E T is a tour of G and c(T)= d(e) ∑ e∈T Linear Programming (LP): Given: A setF of feasible solutions x=(x ,...,x ) defined by m linear constraints n 1   n n F = (x ,...,x )∈R a x≥ b ∀ j= 1,...,m 1 n i j i j ≥0 ∑ i=1 n and an objective function c(x)= c x . ∑ i i i=1 Goal: Find a feasible solution x∈F that minimizes c(x). 1Note that in this example the number of feasible solution inF is uncountable. So why does this problem qualify as a discrete optimization problem? The answer is thatF defines a feasible set that corresponds to the convex hull of a finite number of vertices. It is not hard to see that if we optimize a linear function over a convex hull then there always exists an optimal solution that is a vertex. We can thus equivalently formulate the problem as finding a vertex x of the convex hull defined byF that minimizes c(x). 1.2 Algorithms and Efficiency Intuitively, an algorithm for an optimization problem Π is a sequence of instructions specifying a computational procedure that solves every given instance I of Π. Formally, the computational model underlying all our considerations is the one of a Turing machine (which we will not define formally here). A main focus of this course is on efficient algorithms. Here, efficiency refers to the overall running time of the algorithm. We actually do not care about the actual running time (in terms of minutes, seconds, etc.), but rather about the number of basic operations. Certainly, there are different ways to represent the overall running time of an algorithm. The one that we will use here (and which is widely used in the algorithms community) is the so-called worst-case running time. Informally, the worst-case running time of an algorithm measures the running time of an algorithm on the worst possible input instance (of a given size). There are at least two advantages in assessing the algorithm’s performance by means of its worst-case running time. First, it is usually rather easy to estimate. Second, it provides a very strong performance guarantee: The algorithm is guaranteed to compute a solution to every instance (of a given size), using no more than the stated number of basic operations. On the downside, the worst-case running time of an algorithm might be an overly pessimistic estimation of its actual running time. In the latter case, assessing the performance of an algorithm by its average case running time or its smoothed running time might be suitable alternatives. Usually, the running time of an algorithm is expressed as a function of the size of the input instance I. Note that a-priori it is not clear what is meant by the size of I because there are different ways to represent (or encode) an instance. Example 1.1. Many optimization problems have a graph as input. Suppose we are given an undirected graph G=(V,E) with n nodes and m edges. One way of representing G is by its n×n adjacency matrix A=(a ) with a = 1 if(i, j)∈ E and a = 0 otherwise. The i j i j i j 2 size needed to represent G by its adjacency matrix is thus n . Another way to represent G is by its adjacency lists: For every node i∈ V, we maintain the set L ⊆ V of nodes that i are adjacent to i in a list. Note that each edge occurs on two adjacency lists. The size to represent G by adjacency lists is n+ 2m. The above example illustrates that the size of an instance depends on the underlying data structure that is used to represent the instance. Depending on the kind of operations that an algorithm uses, one might be more efficient than the other. For example, checking 2whether a given edge(i, j) is part of G takes constant time if we use the adjacency matrix, while it takes timeL (orL) if we use the adjacency lists. On the other hand, listing all i j edges incident to i takes time n if we use the adjacency matrix, while it takes timeL if i we use the adjacency lists. Formally, we define the size of an instance I as the number of bits that are needed to store all data of I using encoding L on a digital computer and useL(I) to refer to this number. Note that according to this definition we also would have to account for the number of bits needed to store the numbers associated with the instance (like nodes or edges). However, 31 most computers nowadays treat all integers in their range, say from 0 to 2 , the same and allocate a word to each such number. We therefore often take the freedom to rely on a more intuitive definition of size by counting the number of objects (like nodes or edges) of the instance rather than their total binary length. Definition 1.2. Let Π be an optimization problem and let L be an encoding of the in- stances. Then algorithm ALG solves Π in (worst-case) running time f if ALG computes for every instance I of size n =L(I) an optimal solution S∈F using at most f(n ) basic I I operations. 1.3 Growth of Functions We are often interested in the asymptotic running time of the algorithm. The following definitions will be useful. + Definition 1.3. Let g :N→R . We define + O(g(n))= f :N→R ∃c 0, n ∈N such that f(n)≤ c· g(n)∀n≥ n 0 0 + Ω(g(n))= f :N→R ∃c 0, n ∈N such that f(n)≥ c· g(n)∀n≥ n 0 0 + Θ(g(n))= f :N→R f(n)∈ O(g(n)) and f(n)∈ Ω(g(n)) We will often write, e.g., f(n)= O(g(n)) instead of f(n)∈ O(g(n)), even though this is notationally somewhat imprecise. 2 2 1 2 2 We consider a few examples: We have 10n = O(n ), n = Ω(n ), 10nlogn= Ω(n), 2 2 n+1 n 1 c 10nlogn= O(n ), 2 = Θ(2 ) and O(logm)= O(logn) if m≤ n for some constant c. 1.4 Graphs An undirected graph G consists of a finite set V(G) of nodes (or vertices) and a finite set E(G) of edges. For notational convenience, we will also write G=(V,E) to refer to a graph with nodes set V = V(G) and edge set E = E(G). Each edge e∈ E is associated with an unordered pair(u,v)∈ V×V; u and v are called the endpoints of e. If two edges have the same endpoints, then they are called parallel edges. An edge whose endpoints 1 c Recall that log(n )= clog(n). 3are the same is called a loop. A graph that has neither parallel edges nor loops is said to be simple. Note that in a simple graph every edge e=(u,v)∈ E is uniquely identified by its endpoints u and v. Unless stated otherwise, we assume that undirected graphs are simple. We denote by n and m the number of nodes and edges of G, respectively. A complete graph is a graph that contains an edge for every (unordered) pair of nodes. That is, a complete graph has m= n(n− 1)/2 edges. A subgraph H of G is a graph such that V(H)⊆ V and E(H)⊆ E and each e∈ E(H) has ′ ′ the same endpoints in H as in G. Given a subset V ⊆ V of nodes and a subset E ⊆ E of ′ ′ edges of G, the subgraph H of G induced by V and E is defined as the (unique) subgraph ′ ′ ′ ′ H of G with V(H)= V and E(H)= E . Given a subset E ⊆ E, G\ E refers to the ′ subgraph H of G that we obtain if we delete all edges in E from G, i.e., V(H)= V and ′ ′ ′ E(H)= E\ E . Similarly, given a subset V ⊆ V, G\V refers to the subgraph of G that ′ ′ we obtain if we delete all nodes in V and its incident edges from G, i.e., V(H)= V\V ′ and E(H)= E\(u,v)∈ E u∈ V. A subgraph H of G is said to be spanning if it contains all nodes of G, i.e., V(H)= V. A path P in an undirected graph G is a sequence P =hv ,...,vi of nodes such that 1 k e =(v,v ) (1≤ i k) is an edge of G. We say that P is a path from v to v , or a i i i+1 1 k v ,v -path. P is simple if all v (1≤ i≤ k) are distinct. Note that if there is a v ,v -path 1 k i 1 k in G, then there is a simple one. Unless stated otherwise, the length of P refers to the number of edges of P. A path C=hv ,...,v = vi that starts and ends in the same node 1 1 k is called a cycle. C is simple if all nodes v ,...,v are distinct. A graph is said to be 1 k−1 acyclic if it does not contain a cycle. A connected component C⊆ V of an undirected graph G is a maximal subset of nodes such that for every two nodes u,v∈ C there is a u,v-path in G. A graph G is said to be connected if for every two nodes u,v∈ V there is a u,v-path in G. A connected subgraph T of G that does not contain a cycle is called a tree of G. A spanning tree T of G is a tree of G that contains all nodes of G. A subgraph F of G is a forest if it consists of a (disjoint) union of trees. A directed graph G=(V,E) is defined analogously with the only difference that edges are directed. That is, every edge e is associated with an ordered pair(u,v)∈ V×V. Here u is called the source (or tail) of e and v is called the target (or head) of e. Note that, as opposed to the undirected case, edge(u,v) is different from edge(v,u) in the directed case. All concepts introduced above extend in the obvious way to directed graphs. 1.5 Sets, etc. Let S be a set and e∈ / S. We will write S+ e as a short for S∪e. Similarly, for e∈ S we write S− e as a short for S\e. The symmetric difference of two sets S and T is defined as S△ T =(S\ T)∪(T\ S). We useN, Z, Q andR to refer to the set of natural, integer, rational and real numbers, + + respectively. We useQ andR to refer to the nonnegative rational and real numbers, respectively. 41.6 Basics of Linear Programming Theory Many optimization problems can be formulated as an integer linear program (ILP). Let Π be a minimization problem. Then Π can often be formulated as follows: n minimize c x j j ∑ j=1 n (1) subject to a x ≥ b ∀i∈1,...,m i j j i ∑ j=1 x ∈ 0,1 ∀ j∈1,...,n j Here, x is a decision variable that is either set to 0 or 1. The above ILP is therefore also j called a 0/1-ILP. The coefficients a , b and c are given rational numbers. i j i j If we relax the integrality constraint on x , we obtain the following LP-relaxation of the j above ILP (1): n minimize c x ∑ j j j=1 n (2) subject to a x ≥ b ∀i∈1,...,m i j j i ∑ j=1 x ≥ 0 ∀ j∈1,...,n j In general, we would have to enforce that x ≤ 1 for every j∈1,...,n additionally. j However, these constraints are often redundant because of the minimization objective and this is what we assume subsequently. Let OPT and OPT refer to the objective LP function values of an optimal integer and fractional solution to the ILP (1) and LP (2), respectively. Because every integer solution to (1) is also a feasible solution for (2), we haveOPT ≤OPT. That is, the optimal fractional solution provides a lower bound on LP the optimal integer solution. Recall that establishing a lower bound on the optimal cost is often the key to deriving good approximation algorithms for the optimization problem. The techniques that we will discuss subsequently exploit this observation in various ways. Let (x ) be an arbitrary feasible solution. Note that (x ) has to satisfy each of the m j j constraints of (2). Suppose we multiply each constraint i∈1,...,m with a non-negative value y and add up all these constraints. Then i   m n m a x y≥ b y. i j j i i i ∑ ∑ ∑ i=1 j=1 i=1 m Suppose further that the multipliers y are chosen such that a y≤ c . Then ∑ i i j i j i=1     n n m m n m c x ≥ a y x = a x y≥ b y (3) j j i j i j i j j i i i ∑ ∑ ∑ ∑ ∑ ∑ j=1 j=1 i=1 i=1 j=1 i=1 That is, every such choice of multipliers establishes a lower bound on the objective func- tion value of(x ). Because this holds for an arbitrary feasible solution(x ) it also holds j j for the optimal solution. The problem of finding the best such multipliers (providing the 5largest lower bound onOPT ) corresponds to the so-called dual program of (2). LP m maximize b y i i ∑ i=1 m (4) subject to a y ≤ c ∀ j∈1,...,n i j i j ∑ i=1 y ≥ 0 ∀i∈1,...,m i We useOPT to refer to the objective function value of an optimal solution to the dual DP linear program (4). There is a strong relation between the primal LP (2) and its corresponding dual LP (4). Note that (3) shows that the objective function value of an arbitrary feasible dual solution (y) is less than or equal to the objective function value of an arbitrary feasible primal i solution (x ). In particular, this relation also holds for the optimal solutions and thus j OPT ≤OPT . This is sometimes called weak duality. From linear programming DP LP theory, we know that even a stronger relation holds: Theorem 1.1 (strong duality). Let x=(x ) and y=(y) be feasible solutions to the LPs j i (2) and (4), respectively. Then x and y are optimal solutions if and only if n m c x = b y. j j i i ∑ ∑ j=1 i=1 An alternative characterization is given by the complementary slackness conditions: Theorem 1.2. Let x= (x ) and y= (y) be feasible solutions to the LPs (2) and (4), j i respectively. Then x and y are optimal solutions if and only if the following conditions hold: 1. Primal complementary slackness conditions: for every j∈1,...,n, either x = 0 j or the corresponding dual constraint is tight, i.e., m ∀ j∈1,...,n : x 0 ⇒ a y = c . j i j i j ∑ i=1 2. Dual complementary slackness conditions: for every i∈1,...,m, either y = 0 i or the corresponding primal constraint is tight, i.e., n ∀i∈1,...,m : y 0 ⇒ a x = b. i i j j i ∑ j=1 62. Minimum Spanning Trees 2.1 Introduction We consider the minimum spanning tree problem (MST), which is one of the simplest and most fundamental problems in network optimization: Minimum Spanning Tree Problem (MST): Given: An undirected graph G=(V,E) and edge costs c : E→R. Goal: Find a spanning tree T of G of minimum total cost. Recall that T is a spanning tree of G if T is a spanning subgraph of G that is a tree. The cost c(T) of a tree T is defined as c(T)= c(e). Note that we can assume without ∑ e∈T loss of generality that G is connected because otherwise no spanning tree exists. If all edges have non-negative costs, then the MST problem is equivalent to the connected subgraph problem which asks for the computation of a minimum cost subgraph H of G that connects all nodes of G. 2.2 Coloring Procedure Most known algorithms for the MST problem belong to the class of greedy algorithms. From a high-level point of view, such algorithms iteratively extend a partial solution to the problem by always adding an element that causes the minimum cost increase in the objective function. While in general greedy choices may lead to suboptimal solutions, such choices lead to an optimal solution for the MST problem. We will get to know different greedy algorithms for the MST problem. All these algo- rithms can be described by means of an edge-coloring process: Initially, all edges are uncolored. In each step, we then choose an uncolored edge and color it either red (mean- ing that the edge is rejected) or blue (meaning that the edge is accepted). The process ends if there are no uncolored edges. Throughout the process, we make sure that we maintain the following color invariant: Invariant 2.1 (Color invariant). There is a minimum spanning tree containing all the blue edges and none of the red edges. The coloring process can be seen as maintaining a forest of blue trees. Initially, the forest consists of n isolated blue trees corresponding to the nodes in V. The edges are then iteratively colored red or blue. If an edge is colored blue, then the two blue trees containing the endpoints of this edge are combined into one new blue tree. If an edge is colored red, then this edge is excluded from the blue forest. The color invariant ensures that the forest of blue trees can always be extended to a minimum spanning tree (by using some of the uncolored edges and none of the red edges). Note that the color invariant ensures that the final set of blue edges constitutes a minimum spanning tree. 7We next introduce two coloring rules on which our algorithms are based. We first need to introduce the notion of a cut. Let G=(V,E) be an undirected graph. A cut of G is a ¯ partition of the node set V into two sets: X and X = V\ X. An edge e=(u,v) is said to ¯ ¯ cross a cut(X,X) if its endpoints lie in different parts of the cut, i.e., u∈ X and v∈ X. Let ¯ δ(X) refer to the set of all edges that cross(X,X), i.e., δ(X)=(u,v)∈ E u∈ X, v∈ V\ X. ¯ Note that δ(·) is symmetric, i.e., δ(X)= δ(X). We can now formulate the two coloring rules: ¯ Blue rule: Select a cut(X,X) that is not crossed by any blue edge. Among the uncolored edges in δ(X), choose one of minimum cost and color it blue. Red rule: Select a simple cycle C that does not contain any red edge. Among the uncol- ored edges in C, choose one of maximum cost and color it red. Our greedy algorithm is free to apply any of the two coloring rules in an arbitrary order until all edges are colored either red or blue. The next theorem proves correctness of the algorithm. Theorem 2.1. The greedy algorithm maintains the color invariant in each step and even- tually colors all edges. Proof. We show by induction on the number t of steps that the algorithm maintains the color invariant. Initially, no edges are colored and thus the color invariant holds true for t = 0 (recall that we assume that G is connected and thus a minimum spanning tree exists). Suppose the color invariant holds true after t−1 steps (t≥ 1). Let T be a minimum spanning tree satisfying the color invariant (after step t− 1). Assume that in step t we color an edge e using the blue rule. If e∈ T , then T satisfies the color invariant after step t and we are done. Otherwise, e∈ / T . Consider the cut ¯ (X,X) to which the blue rule is applied to color e=(u,v) (see Figure 1). Because T is a spanning tree, there is a path P in T that connects the endpoints u and v of e. uv ′ ′ ¯ At least one edge, say e , of P must cross (X,X). Note that e cannot be red because uv ′ T satisfies the color invariant. Also e cannot be blue because of the pre-conditions of ′ ′ applying the blue rule. Thus, e is uncolored and by the choice of e, c(e)≤ c(e). By ′ ′ ′ removing e from T and adding e, we obtain a new spanning tree T =(T− e)+ e of cost ′ ′ ′ c(T )= c(T)− c(e)+ c(e)≤ c(T). Thus, T is a minimum spanning tree that satisfies the color invariant after step t. Assume that in step t we color an edge e using the red rule. If e∈ / T , the T satisfies the color invariant after step t and we are done. Otherwise, e∈ T . Consider the cycle C to which the red rule is applied to color e=(u,v) (see Figure 2). By removing e from T, we ¯ ¯ obtain two trees whose node sets induce a cut(X,X). Note that e crosses(X,X). Because ′ ¯ C is a cycle, there must exist at least one other edge, say e , in C that crosses(X,X). Note ′ ′ ′ that e cannot be blue because e∈ / T and the color invariant. Moreover, e cannot be red ′ because of the pre-conditions of applying the red rule. Thus, e is uncolored and by the ′ ′ choice of e, c(e)≥ c(e). By removing e from T and adding e , we obtain a new spanning 8¯ X X v e u ′ e T ¯ X X v e u ′ e ′ T Figure 1: Illustration of the exchange argument used in the proof of Theorem 2.1 (blue rule). ′ ′ ′ ′ ′ tree T =(T− e)+ e of cost c(T )= c(T)− c(e)+ c(e)≤ c(T). Thus, T is a minimum spanning tree that satisfies the color invariant after step t. Finally, we show that eventually all edges are colored. Suppose the algorithm stops be- cause neither the blue rule nor the red rule applies but there is still some uncolored edge e=(u,v). By the color invariant, the blue edges constitute a forest of blue trees. If both endpoints u and v of e are part of the same blue tree T, then we can apply the red rule to the cycle induced by the unique path P from u to v in T and e to color e red. If the uv endpoints u and v are contained in two different blue trees, say T and T , then the node u v ¯ set of one of these trees, say X = V(T ), induces a cut(X,X) to which the blue rule can be u applied to color an uncolored edge (which must exist because of the presence of e). Thus an uncolored edge guarantees that either the red rule or the blue rule can be applied. 9¯ X X v e u C ′ e T ¯ X X v e u C ′ e ′ T Figure 2: Illustration of the exchange argument used in the proof of Theorem 2.1 (red rule). 2.3 Kruskal’s Algorithm Kruskal’s algorithm sorts the edges by non-decreasing cost and then considers the edges in this order. If the current edge e =(u,v) has both its endpoints in the same blue tree, it i is colored red; otherwise, it is colored blue. The algorithm is summarized in Algorithm 1. It is easy to verify that in each case the pre-conditions of the respective rule are met: If the red rule applies, then the unique path P in the blue tree containing both endpoints of uv e together with e forms a cycle C. The edges in C∩ P are blue and e is uncolored. We i i uv i can thus apply the red rule to e . Otherwise, if the blue rule applies, then e connects two i i ¯ blue trees, say T and T , in the current blue forest. Consider the cut(X,X) induced by the u v node set of T , i.e., X= V(T ). No blue edge crosses this cut. Moreover, e is an uncolored u u i edge that crosses this cut. Also observe that every other uncolored edge e∈ δ(X) has cost 10Input: undirected graph G=(V,E) with edge costs c : E→R Output: minimum spanning tree T 1 Initialize: all edges are uncolored (Remark: we implicitly maintain a forest of blue trees below) 2 Lethe ,...,e i be the list of edges of G, sorted by non-decreasing cost 1 m 3 for i← 1 to m do 4 if e has both endpoints in the same blue tree then color e red else color e i i i blue 5 end 6 Output the resulting tree T of blue edges Algorithm 1: Kruskal’s MST algorithm. c(e)≥ c(e) because we color the edges by non-decreasing cost. We can therefore apply i the blue rule to e . An immediate consequence of Theorem 2.1 is that Kruskal’s algorithm i computes a minimum spanning tree. We next analyze the time complexity of the algorithm: The algorithm needs to sort the edges of G by non-decreasing cost. There are different algorithms to do this with different running times. The most efficient algorithms sort a list of k elements in O(k logk) time. There is also a lower bound that shows that one cannot do better than that. That is, in our context we spend Θ(mlogm) time to sort the edges by non-decreasing cost. We also need to maintain a data structure in order to determine whether an edge e has both i its endpoints in the same blue tree or not. A trivial implementation stores for each node a unique identifier of the tree it is contained in. Checking whether the endpoints u and v of edge e =(u,v) are part of the same blue tree can then be done in O(1) time. Merging i two blue trees needs time O(n) in the worst case. Thus, the trivial implementation takes 2 O(m+ n ) time in total (excluding the time for sorting). One can do much better by using a so-called union-find data structure. This data struc- ture keeps track of the partition of the nodes into blue trees and allows only two types of operations: union and find. The find operation identifies the node set of the partition to which a given node belongs. It can be used to check whether the endpoints u and v of edge e =(u,v) belong to the same tree or not. The union operation unites two node sets of the i current partition into one. This operation is needed to update the partition whenever we color e =(u,v) blue and have to join the respective blue trees T and T . Sophisticated i u v union-find data structures support a series of n union and m find operations on a universe m of n elements in time O(n+ mα(n, )), where α(n,d) is the inverse Ackerman function n (see 8, Chapter 2 and the references therein). α(n,d) is increasing in n but grows ex- 65536 tremely slowly for every fixed d, e.g., α(2 ,0)= 4; for most practical situations, it can be regarded as a constant. m The overall time complexity of Kruskal’s algorithm is thus O(mlogm+ n+ mα(n, ))= n O(mlogm)= O(mlogn) (think about it). Corollary 2.1. Kruskal’s algorithm solves the MST problem in time O(mlogn). 112.4 Prim’s Algorithm Prim’s algorithm grows a single blue tree, starting at an arbitrary node s∈ V. In every step, it chooses among all edges that are incident to the current blue tree T containing s an uncolored edge e of minimum cost and colors it blue. The algorithm stops if T contains i all nodes. We implicitly assume that all edges that are not part of the final tree are colored red in a post-processing step. The algorithm is summarized in Algorithm 2. Input: undirected graph G=(V,E) with edge costs c : E→R Output: minimum spanning tree T 1 Initialize: all edges are uncolored (Remark: we implicitly maintain a forest of blue trees below) 2 Choose an arbitrary node s 3 for i← 1 to n− 1 do 4 Let T be the current blue tree containing s 5 Select a minimum cost edge e∈ δ(V(T)) incident to T and color it blue i 6 end 7 Implicitly: color all remaining edges red 8 Output the resulting tree T of blue edges Algorithm 2: Prim’s MST algorithm. Note that the pre-conditions are met whenever the algorithm applies one of the two col- oring rules: If the blue rule applies, then the node set V(T) of the current blue tree T ¯ ¯ containing s induces a cut (X,X) with X = V(T). No blue edge crosses (X,X) by con- struction. Moreover, e is among all uncolored edges crossing the cut one of minimum i cost and can thus be colored blue. If the red rule applies to edge e=(u,v), both endpoints u and v are contained in the final tree T . The path P in T together with e induce a cycle uv C. All edges in C∩ P are blue and we can thus color e red. uv The time complexity of the algorithm depends on how efficiently we are able to identify a minimum cost edge e that is incident to T . To this aim, good implementations use a i priority queue data structure. The idea is to keep track of the minimum cost connections between nodes that are outside of T to nodes in T . Suppose we maintain two data entries for every node v∈ / V(T): π(v)=(u,v) refers to the edge that minimizes c(u,v) among all u∈ V(T) and d(v)= c(π(v)) refers to the cost of this edge; we define π(v)= nil and d(v)= ∞ if no such edge exists. Initially, we have for every node v∈ V\s: ( ( (s,v) if(s,v)∈ E c(s,v) if(s,v)∈ E π(v)= and d(v)= nil otherwise. ∞ otherwise. The algorithm now repeatedly chooses a node v∈ / V(T) with d(v) minimum, adds it to the tree and colors its connecting edge π(v) blue. Because v is part of the new tree, we need to update the above data. This can be accomplished by iterating over all edges (v,w)∈ E incident to v and verifying for every adjacent node w with w∈ / V(T) whether the connection cost from w to T via edge (v,w) is less than the one stored in d(w) (via π(w)). If so, we update the respective data entries accordingly. Note that if the value of 12d(w) changes, then it can only decrease. There are several priority queue data structures that support all operations needed above: insert, find-min, delete-min and decrease-priority. In particular, using Fibonacci heaps, m decrease-priority and n insert/find-min/delete-min operations can be performed in time O(m+ nlogn). Corollary 2.2. Prim’s algorithm solves the MST problem in time O(m+ nlogn). References The presentation of the material in this section is based on 8, Chapter 6. 133. Matroids 3.1 Introduction In the previous section, we have seen that the greedy algorithm can be used to solve the MST problem. An immediate question that comes to ones mind is which other problems can be solved by such an algorithm. In this section, we will see that the greedy algorithm applies to a much broader class of optimization problems. We first define the notion of an independent set system. Definition 3.1. Let S be a finite set and letI be a collection of subsets of S. (S,I) is an independent set system if (M1) 0 /∈I ; (M2) if I∈I and J⊆ I, then J∈I. Each set I∈I is called an independent set; every other subset I⊆ S with I∈ /I is called a dependent set. Further, suppose we are given a weight function w : S→R on the elements in S. Maximum Weight Independent Set Problem (MWIS): Given: An independent set system(S,I) and a weight function w : S→R. Goal: Find an independent set I∈I of maximum weight w(I)= ∑ w(x). x∈I If w(x) 0 for some x∈ S, then x will not be included in any optimum solution because I is closed under taking subsets. We can thus safely exclude such elements from the ground set S. Subsequently, we assume without loss of generality that all weights are nonnegative. As an example, consider the following independent set system: Suppose we are given + an undirected graph G=(V,E) with weight function w : E→R . Define S = E and I =F⊆ E F induces a forest in G. Note that 0 /∈I andI is closed under taking subsets because each subset J of a forest I∈I is a forest. Now, the problem of finding an independent set I∈I that maximizes w(I) is equivalent to finding a spanning tree of G of maximum weight. (Note that the latter can also be done by one of the MST algorithms that we have considered in the previous section.) The greedy algorithm given in Algorithm 3 is a natural generalization of Kruskal’s algo- rithm to independent set systems. It starts with the empty set I = 0 / and then iteratively extends I by always adding an element x∈ S\ I of maximum weight, ensuring that I+ x remains an independent set. Unfortunately, the greedy algorithm does not work for general independent set systems as the following example shows: 14