Question? Leave a message!




Bipartite matching: max flow formulation

Bipartite matching: max flow formulation
7. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ Lecture slides by Kevin Wayne baseball elimination ‣ Copyright © 2005 PearsonAddison Wesley Copyright © 2013 Kevin Wayne http://www.cs.princeton.edu/wayne/kleinbergtardos Last updated on Dec 23, 2013, 3:31 PMSoviet rail network (1950s) "Free world" goal. Cut supplies (if cold war turns into real war). Reference: On the history of the transportation and maximum flow problems. Alexander Schrijver in Math Programming, 91: 3, 2002. Figure 2 2 From Harris and Ross 1955: Schematic diagram of the railway network of the Western So vietUnionandEasternEuropeancountries, withamaximumflowofvalue163,000tonsfrom Russia to Eastern Europe, and a cut of capacity 163,000 tons indicated as ‘The bottleneck’.Maxflow and mincut applications Maxflow and mincut are widely applicable problemsolving model. Data mining. Openpit mining. Bipartite matching. Network reliability. Baseball elimination. Image segmentation. Network connectivity. liver and hepatic vascularization segmentation Distributed computing. Security of statistical data. Egalitarian stable matching. Network intrusion detection. Multicamera scene reconstruction. Sensor placement for homeland security. Many, many, more. 37. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ SECTION baseball elimination ‣Matching Def. Given an undirected graph G = (V, E) a subset of edges M ⊆ E is a matching if each node appears in at most one edge in M. Max matching. Given a graph, find a max cardinality matching. 5Bipartite matching Def. A graph G is bipartite if the nodes can be partitioned into two subsets L and R such that every edge connects a node in L to one in R. Bipartite matching. Given a bipartite graph G = (L ∪ R, E), find a max cardinality matching. 1' 1 2' 2 3' 3 4' 4 5' 5 L R matching: 12', 31', 45' 6Bipartite matching Def. A graph G is bipartite if the nodes can be partitioned into two subsets L and R such that every edge connects a node in L to one in R. Bipartite matching. Given a bipartite graph G = (L ∪ R, E), find a max cardinality matching. 1' 1 2' 2 3' 3 4' 4 5' 5 L R matching: 11', 22', 34', 45' 7Bipartite matching: max flow formulation Create digraph G' = (L ∪ R∪ s, t, E' ). Direct all edges from L to R, and assign infinite (or unit) capacity. Add source s, and unit capacity edges from s to each node in L. Add sink t, and unit capacity edges from each node in R to t. G' 1 1' ∞ 1 1 2 2' 3 3' s t 4 4' 5 5' L R 8Max flow formulation: proof of correctness Theorem. Max cardinality of a matching in G = value of max flow in G'. Pf. ≤ Given a max matching M of cardinality k. Consider flow f that sends 1 unit along each of k paths. f is a flow, and has value k. ▪ ∞ 1 1' 1 1' 1 1 2 2' 2 2' 3 3' s 3 3' t 4 4' 4 4' G 5 5' G' 5 5' 9Max flow formulation: proof of correctness Theorem. Max cardinality of a matching in G = value of max flow in G'. Pf. ≥ Let f be a max flow in G' of value k. Integrality theorem ⇒ k is integral and can assume f is 01. Consider M = set of edges from L to R with f (e) = 1. each node in L and R participates in at most one edge in M M = k: consider cut (L∪ s, R∪ t) ▪ ∞ 1 1' 1 1' 1 1 2 2' 2 2' 3 3' s 3 3' t 4 4' 4 4' 5 5' G' 5 5' G 10Perfect matching in a bipartite graph Def. Given a graph G = (V, E) a subset of edges M ⊆ E is a perfect matching if each node appears in exactly one edge in M. Q. When does a bipartite graph have a perfect matching Structure of bipartite graphs with perfect matchings. Clearly we must have L = R . What other conditions are necessary What conditions are sufficient 11Perfect matching in a bipartite graph Notation. Let S be a subset of nodes, and let N(S) be the set of nodes adjacent to nodes in S. Observation. If a bipartite graph G = (L ∪ R, E) has a perfect matching, then N(S) ≥ S for all subsets S ⊆ L. Pf. Each node in S has to be matched to a different node in N(S). ▪ 1' 1 2' 2' 2 2 S = 2, 4, 5 N(S) = 2', 5' 3' 3 4' 4 4 5' 5' 5 5 no perfect matching 12Hall's theorem Theorem. Let G = (L ∪ R, E) be a bipartite graph with L = R . G has a perfect matching iff N(S) ≥ S for all subsets S ⊆ L. Pf. ⇒ This was the previous observation. 1' 1 2' 2' 2 2 S = 2, 4, 5 N(S) = 2', 5' 3' 3 4' 4 4 5' 5' 5 5 no perfect matching 13Proof of Hall's theorem Pf. ⇐ Suppose G does not have a perfect matching. Formulate as a max flow problem and let (A, B) be min cut in G'. By maxflow mincut theorem, cap(A, B) L . Define L = L ∩ A, L = L ∩ B , R = R ∩ A. A B A cap(A, B) = L + R . B A Since min cut can't use ∞ edges: N(L ) ⊆ R . A A N(L ) ≤ R = cap(A, B) – L L – L = L . A A B B A Choose S = L . ▪ A 1 ∞ 1 ∞ G' 1' L = 2, 4, 5 A A L = 1, 3 B 2' ∞ 2 1 R = 2', 5' A 3' s N(L ) = 2', 5' A 1 4 3 4' 5 1 5' t 14Bipartite matching running time Theorem. The FordFulkerson algorithm solves the bipartite matching problem in O(m n) time. Theorem. HopcroftKarp 1973 The bipartite matching problem can be 1/2 solved in O(m n ) time. SIAM J. COMPUT. Vol. 2, No. 4, December 1973 s/2 ANn ALGORITHM FOR MAXIMUM MATCHINGS IN BIPARTITE GRAPHS JOHN E. RICHARD M. AND KARP HOPCROFT" in a Abstract. The present paper shows how to construct a maximum matching bipartite graph with n vertices and m in a number of to edges computation steps proportional (m + n)x/. words, Key algorithm, algorithmic analysis, bipartite graphs, computational complexity, graphs, matching 1. Introduction. Suppose we are a array in which each cell given rectangular or set of is if no is designated as "occupied" "unoccupied". A cells independent ofthe lie an two cells in thesamerow orcolumn. Our object is to construct indepen dent of cells maximum set occupied having cardinality. 15 the In one interpretation, rows of the array represent boys, and the columns and represent girls. Cell is occupied ifboy and j are compatible, we wish i,j girl to match a maximum number ofcompatible couples. the the and An alternate statement of problem is obtainedby representing rows the the vertices ofa The columns of array as bipartite graph. vertices corresponding is We to row and column j are joined by an edge ifand only if cell i,j occupied. a of then seek amaximum matching; i.e., maximumnumber edges,notwo ofwhich meet at a common vertex. This problem has a wide variety of applications These include (3, 4, 5). the determination of chain decompositions in partially ordered sets, of coset in of systems of distinct representatives, and of block representatives groups, also occurs as a of sparse matrices. The problem triangular decompositions and the subroutine in the solution of the Hitchcock transportation problem, in determination ofwhether one tree is to a subtree of another. given isomorphic In view of this variety of applications, the computational complexity of the problem of finding a maximum matching in a bipartite graph is of interest. The best methods seem to require steps, where m is previous (1, 3, 4, 5) O(mn) the number of and n the number of vertices. The present method requires edges, only O((m steps. + n)x/) extend With this in We hope to our results to the nonbipartite case (cf. 2). the mind, all the results in 2 are derived for general graphs. The specialization to bipartite case occurs in 3. 2. anti paths. Let G (V, be a finite undirected Matchings augmenting E) the vertex set V edges, or isolated having graph loops, multiple vertices) (without and w is written A set and the set E. An edge incident with vertices v edge v, w. more than one in M. E is a ifno vertex v e V is incident with edge M matching a ofmaximum cardinality is called maximum matching. A matching M. vertex v is Wemake the definitions relative to a matching A following free is incident with no in M. if it edge Received the editors November 1972, and in revised form 23, 1973. This research by 14, April was in Science supported part by the National Foundation under Grants NSF GJ96, GPo25081 and GJ474. Department ofComputer Science, Cornell Ithaca, New York, 14850. University, Computer Science Department, University of California, California 94720. " Berkeley, 225Nonbipartite matching Nonbipartite matching. Given an undirected graph (not necessarily bipartite), find a matching of maximum cardinality. Structure of nonbipartite graphs is more complicated. But wellunderstood. TutteBerge, EdmondsGalai 4 Blossom algorithm: O(n ). Edmonds 1965 1/2 Best known: O(m n ). MicaliVazirani 1980, Vazirani 1994 COMBINATORICA 14 (i) (1994) 71109 PATHS, TREES, AND FLOWERS COMBINATORICA Akad6miai Kiad6 SpringerVerlag JACK EDMONDS A THEORY OF ALTERNATING PATHS AND BLOSSOMS FOR 1. Introduction. A graph G for purposes here is a finite set of elements PROVING CORRECTNESS OF THE O(v/VE) GENERAL GRAPH called vertices and a finite set of elements called edges such that each edge MAXIMUM MATCHING ALGORITHM meets exactly two vertices, called the endpoints of the edge. An edge is said to join its endpoints. A matching in G is a subset of its edges such that no two meet the same VIJAY V. VAZIRANI 1 vertex. We describe an efficient algorithm for finding in a given graph a match Received December 30, 1989 ing of maximum cardinality. This problem was posed and partly solved by Revised June 15, 1993 C. Berge; see Sections 3.7 and 3.8. Maximum matching is an aspect of a topic, treated in books on graph theory, which has developed during the last 75 years through the work of about a dozen authors. In particular, W. T. Tutte (8) characterized graphs which do not contain a perfect matching, or 1factor as he calls it—that is a set of edges with exactly one member meeting each vertex. His theorem 1. Introduction prompted attempts at finding an efficient construction for perfect matchings. This and our two subsequent papers will be closely related to other work on Finding a maximum matching in a graph is a classical problem in the study of algorithms. In this paper we present new algorithmically relevant combinatorial the topic. Most of the known theorems follow nicely from our treatment, structure of matchings. This structure yields the first proof of correctness of the though for the most part they are not treated explicitly. Our treatment is general graph matching algorithm of Mieali and Vazirani 14; this is currently the independent and so no background reading is necessary. 16 most efficient known matching algorithm. Section 2 is a philosophical digression on the meaning of "efficient algorithm." Berge's theorem 2, which says that matching M in graph G is a maximum matching if and only if there are no augmenting paths w.r.t, it, gives an iterative Section 3 discusses ideas of Berge, Norman, and Rabin with a new proof of schema for finding a maximum matching in G, i.e. successively find augmenting Berge's theorem. Section 4 presents the bulk of the matching algorithm. paths. Finding augmenting paths is fairly easy in bipartite graphs; however, not Section 7 discusses some refinements of it. so in general graphs (see 13 for a detailed history of the problem). The first There is an extensive combinatoriallinear theory related on the one hand polynomial time algorithm (o(rvI4)) for general graph matching was given by to matchings in bipartite graphs and on the other hand to linear programming. Edmonds 4. In this paper, Edmonds introduced the notion of blossom (an odd It is surveyed, from different viewpoints, by Ford and Fulkerson in (5) and length alternating cycle), and showed that by "shrinking" blossoms, one can find augmenting path efficiently. In this seminal paper, Edmonds also introduced the by A. J. Hoffman in (6). They mention the problem of extending this relation notion of a polynomial time algorithm. ship to nonbipartite graphs. Section 5 does this, or at least begins to do it. Over the years, faster implementations of Edmonds' algorithm were given by There, the Kônig theorem is generalized to a matchingduality theorem for several authors, including Whitzgall and Zahn 16, Balinski 1, Gabow 6, Lawler arbitrary graphs. This theorem immediately suggests a polyhedron which in a 12, and Kameda and Munro 10. In 1972, Hopcroft and Karp 9 proposed finding subsequent paper (4) is shown to be the convex hull of the vectors associated augmenting paths in phases; in each phase a maximal set of disjoint minimum length augmenting paths is found. They showed that only O(v/) phases are needed, with the matchings in a graph. as opposed to O(IV) iterations in the previouslymentioned schema. They also Maximum matching in nonbipartite graphs is at present unusual among presented an O(IEI) implementation of a phase in bipartite graphs, thereby giving combinatorial extremum problems in that it is very tractable and yet not of an O(Iv/llEi) matching algorithm for such graphs, and left the open problem of the "unimodular" type described in (5 and 6). 1 Partially supported by an NSF PYI Grant with matching funds from AT; T Bell Labs at Cornell University Received November 22, 1963. Supported by the O.N.R. Logistics Project at Princeton AMS subject classification codes (1991): 05 C 70, 05 C 85 University and the A.R.O.D. Combinatorial Mathematics Project at N.B.S. 449 450 JACK EDMONDS Section 6 presents a certain invariance property of the dual to maximum matching. In paper (4), the algorithm is extended from maximizing the cardinality of a matching to maximizing for matchings the sum of weights attached to the edges. At another time, the algorithm will be extended from a capacity of one edge at each vertex to a capacity of d edges at vertex v . t t This paper is based on investigations begun with G. B. Dantzig while at the RAND Combinatorial Symposium during the summer of 1961. I am Historical significance (Jack Edmonds 1965) indebted to many people, at the Symposium and at the National Bureau of Standards, who have taken an interest in the matching problem. There has been much animated discussion on possible versions of an algorithm. 2. Digression. An explanation is due on the use of the words "efficient algorithm." First, what I present is a conceptual description of an algorithm and not a particular formalized algorithm or "code." For practical purposes computational details are vital. However, my purpose is only to show as attractively as I can that there is an efficient algorithm. According to the dictionary, "efficient" means "adequate in opera tion or performance." This is roughly the meaning I want—in the sense that it is conceivable for maximum matching to have no efficient algorithm. Perhaps a better word is "good." I am claiming, as a mathematical result, the existence of a good algorithm for finding a maximum cardinality matching in a graph. There is an obvious finite algorithm, but that algorithm increases in difficulty exponentially with the size of the graph. It is by no means obvious whether or not there exists an algorithm whose difficulty increases only algebraically with the size of the graph. The mathematical significance of this paper rests largely on the assumption that the two preceding sentences have mathematical meaning. I am not prepared to set up the machinery necessary to give them formal meaning, nor is the present context appropriate for doing this, but I should like to explain the idea a little further informally. It may be that since one is customarily 17 concerned with existence, convergence, finiteness, and so forth, one is not in clined to take seriously the question of the existence of a betterthanfinite algorithm. The relative cost, in time or whatever, of the various applications of a particular algorithm is a fairly clear notion, at least as a natural phenomenon. Presumably, the notion can be formalized. Here "algorithm" is used in the strict sense co mean the idealization of some physical machinery which gives a definite output, consisting of cost plus the desired result, for each member of a specified domain of inputs, the individual problems. The problemdomain of applicability for an algorithm often suggests for itself possible measures of size for the individual problems—for maximum matching, for example, the number of edges or the number of vertices in the kregular bipartite graphs Dancing problem. Exclusive Ivy league party attended by n men and n women. Each man knows exactly k women; each woman knows exactly k men. Acquaintances are mutual. Is it possible to arrange a dance so that each woman dances with a different man that she knows 2regular bipartite graph Mathematical reformulation. Does every kregular 1 1' bipartite graph have a perfect matching 2 2' Ex. Boolean hypercube. 3 3' 4 4' women men 18kregular bipartite graphs have perfect matchings Theorem. Every kregular bipartite graph G has a perfect matching. Pf. Size of max matching = value of max flow in G'. Consider flow ⎧ 1/k if (u, v)∈ E ⎪ f (u,v) = 1 if u = s or v =t ⎨ ⎪ 0 otherwise ⎩ f is a flow in G' and its value = n ⇒ perfect matching. ▪ € 1 1/k 1' 1 1 1/k s t G' 2 2' 3 3' 4 4' a feasible flow f of value n 197. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ SECTION baseball elimination ‣Edgedisjoint paths Def. Two paths are edgedisjoint if they have no edge in common. Disjoint path problem. Given a digraph G = (V, E) and two nodes s and t, find the max number of edgedisjoint s↝t paths. 2 5 s 3 6 t digraph G 4 7 21Edgedisjoint paths Def. Two paths are edgedisjoint if they have no edge in common. Disjoint path problem. Given a digraph G = (V, E) and two nodes s and t, find the max number of edgedisjoint s↝t paths. Ex. Communication networks. 2 5 s 3 6 t digraph G 4 7 2 edgedisjoint paths 22Edgedisjoint paths Max flow formulation. Assign unit capacity to every edge. Theorem. Max number edgedisjoint s↝t paths equals value of max flow. Pf. ≤ Suppose there are k edgedisjoint s↝t paths P , …, P . 1 k Set f (e) = 1 if e participates in some path P ; else set f (e) = 0. j Since paths are edgedisjoint, f is a flow of value k. ▪ 1 1 1 1 1 1 1 1 s t 1 1 1 1 1 1 23Edgedisjoint paths Max flow formulation. Assign unit capacity to every edge. Theorem. Max number edgedisjoint s↝t paths equals value of max flow. Pf. ≥ Suppose max flow value is k. Integrality theorem ⇒ there exists 01 flow f of value k. Consider edge (s, u) with f(s, u) = 1. by conservation, there exists an edge (u, v) with f(u, v) = 1 continue until reach t, always choosing a new edge Produces k (not necessarily simple) edgedisjoint paths. ▪ can eliminate cycles to get simple paths 1 in O(mn) time if desired 1 1 1 1 (flow decomposition) 1 1 1 s t 1 1 1 1 1 1 24Network connectivity Def. A set of edges F ⊆ E disconnects t from s if every s↝t path uses at least one edge in F. Network connectivity. Given a digraph G = (V, E) and two nodes s and t, find min number of edges whose removal disconnects t from s. 2 5 s 3 6 t 4 7 25Menger's theorem Theorem. Menger 1927 The max number of edgedisjoint s↝t paths is equal to the min number of edges whose removal disconnects t from s. Pf. ≤ Suppose the removal of F ⊆ E disconnects t from s, and F = k. Every s↝t path uses at least one edge in F. Hence, the number of edgedisjoint paths is ≤ k. ▪ 2 5 2 5 s 3 6 t s 3 6 t 4 7 4 7 26Menger's theorem Theorem. Menger 1927 The max number of edgedisjoint s↝t paths equals the min number of edges whose removal disconnects t from s. Pf. ≥ Suppose max number of edgedisjoint paths is k. Then value of max flow = k. Maxflow mincut theorem ⇒ there exists a cut (A, B) of capacity k. Let F be set of edges going from A to B. F = k and disconnects t from s. ▪ 2 5 2 5 A s 3 6 t s 3 6 t 4 7 4 7 27Edgedisjoint paths in undirected graphs Def. Two paths are edgedisjoint if they have no edge in common. Disjoint path problem in undirected graphs. Given a graph G = (V, E) and two nodes s and t, find the max number of edgedisjoint st paths. 2 5 s 3 6 t digraph G 4 7 28Edgedisjoint paths in undirected graphs Def. Two paths are edgedisjoint if they have no edge in common. Disjoint path problem in undirected graphs. Given a graph G = (V, E) and two nodes s and t, find the max number of edgedisjoint st paths. 2 5 s 3 6 t digraph G 4 7 (2 edgedisjoint paths) 29Edgedisjoint paths in undirected graphs Def. Two paths are edgedisjoint if they have no edge in common. Disjoint path problem in undirected graphs. Given a graph G = (V, E) and two nodes s and t, find the max number of edgedisjoint st paths. 2 5 s 3 6 t digraph G 4 7 (3 edgedisjoint paths) 30Edgedisjoint paths in undirected graphs Max flow formulation. Replace edge edge with two antiparallel edges and assign unit capacity to every edge. Observation. Two paths P and P may be edgedisjoint in the digraph but 1 2 not edgedisjoint in the undirected graph. if P uses edge (u, v) 1 and P uses its antiparallel edge (v, u) 2 1 1 1 1 1 1 1 1 s t 1 1 1 1 1 1 31Edgedisjoint paths in undirected graphs Max flow formulation. Replace edge edge with two antiparallel edges and assign unit capacity to every edge. Lemma. In any flow network, there exists a maximum flow f in which for each pair of antiparallel edges e and e', either f (e) = 0 or f (e') = 0 or both. Moreover, integrality theorem still holds. Pf. by induction on number of such pairs of antiparallel edges Suppose f (e) 0 and f (e') 0 for a pair of antiparallel edges e and e'. Set f (e) = f (e) – δ and f (e') = f (e') – δ, where δ = min f (e), f (e') . f is still a flow of the same value but has one fewer such pair. ▪ 1 1 1 1 1 1 1 1 s t 1 1 1 1 1 1 32Edgedisjoint paths in undirected graphs Max flow formulation. Replace edge edge with two antiparallel edges and assign unit capacity to every edge. Lemma. In any flow network, there exists a maximum flow f in which for each pair of antiparallel edges e and e', either f (e) = 0 or f (e') = 0 or both. Moreover, integrality theorem still holds. Theorem. Max number edgedisjoint s↝t paths equals value of max flow. Pf. Similar to proof in digraphs; use lemma. 1 1 1 1 1 1 1 1 s t 1 1 1 1 1 1 33Menger's theorems Theorem. Given an undirected graph with two nodes s and t, the max number of edgedisjoint st paths equals the min number of edges whose removal disconnects s and t. Theorem. Given a undirected graph with two nonadjacent nodes s and t, the max number of internally nodedisjoint st paths equals the min number of internal nodes whose removal disconnects s and t. Theorem. Given an directed graph with two nonadjacent nodes s and t, the max number of internally nodedisjoint s↝t paths equals the min number of internal nodes whose removal disconnects t from s. 347. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ SECTION baseball elimination ‣Circulation with demands Def. Given a digraph G = (V, E) with nonnegative edge capacities c(e) and node supply and demands d(v), a circulation is a function that satisfies: For each e ∈ E: 0 ≤ f (e) ≤ c(e) (capacity) f (e) − f (e) = d(v) ∑ ∑ For each v ∈ V: (conservation) e in to v e out of v € (supply node) 8 6 network G flow capacity 6 / 7 1 / 7 7 / 9 4 / 10 6 / 6 2 / 4 3 / 3 4 / 4 7 11 10 0 (demand node) (transshipment node) 36Circulation with demands: maxflow formulation Add new source s and sink t. For each v with d(v) 0, add edge (s, v) with capacity d(v). For each v with d(v) 0, add edge (v, t) with capacity d(v). d(v) = − d(v) =: D ∑ ∑ Claim. G has circulation iff G' has max flow of value D = v : d (v) 0 v : d (v) 0 saturates all edges s leaving s supply and entering t € 8 7 6 8 6 network G' 7 7 9 6 4 10 3 4 7 11 10 0 11 10 demand t 37Circulation with demands Integrality theorem. If all capacities and demands are integers, and there exists a circulation, then there exists one that is integervalued. Pf. Follows from maxflow formulation + integrality theorem for max flow. Theorem. Given (V, E, c, d), there does not exist a circulation iff there exists a node partition (A, B) such that Σ d(v) cap(A, B). v ∈ B demand by nodes in B exceeds Pf sketch. Look at min cut in G'. supply of nodes in B plus max capacity of edges going from A to B 38Circulation with demands and lower bounds Feasible circulation. Directed graph G = (V, E). Edge capacities c(e) and lower bounds ℓ(e) for each edge e ∈ E. Node supply and demands d(v) for each node v ∈ V. Def. A circulation is a function that satisfies: For each e ∈ E : ℓ(e) ≤ f (e) ≤ c(e) (capacity) For each v ∈ V : (conservation) f (e) − f (e) = d(v) ∑ ∑ e in to v e out of v Circulation problem with lower bounds. Given (V, E, ℓ, c, d), does there exist € a feasible circulation 39Circulation with demands and lower bounds Max flow formulation. Model lower bounds as circulation with demands. Send ℓ(e) units of flow along edge e. Update demands of both endpoints. capacity lower bound upper bound w w v v 2, 9 7 d(v) d(v) + 2 d(w) d(w) – 2 network G' network G Theorem. There exists a circulation in G iff there exists a circulation in G'. Moreover, if all demands, capacities, and lower bounds in G are integers, then there is a circulation in G that is integervalued. Pf sketch. f (e) is a circulation in G iff f '(e) = f (e) – ℓ(e) is a circulation in G'. 407. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ SECTION baseball elimination ‣Survey design one survey question Design survey asking n consumers about n products. 1 2 per product Can only survey consumer i about product j if they own it. Ask consumer i between c and c ' questions. i i Ask between p and p ' consumers about product j. j j Goal. Design a survey that meets these specs, if possible. Bipartite perfect matching. Special case when c = c ' = p = p ' = 1. j j i i 42Survey design Maxflow formulation. Model as circulation problem with lower bounds. Add edge (i, j) if consumer j owns product i. Add edge from s to consumer j . Add edge from product i to t. Add edge from t to s. Integer circulation ⇔ feasible survey design. 0, ∞ 0, 1 1 1' c , c ' p , p ' 1 1 1 1 2 2' 3 3' s t 4 4' consumers products 437. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ SECTION baseball elimination ‣Airline scheduling Airline scheduling. Complex computational problem faced by nation's airline carriers. Produces schedules that are efficient in terms of: equipment usage, crew allocation, customer satisfaction in presence of unpredictable issues like weather, breakdowns One of largest consumers of highpowered algorithmic techniques. "Toy problem." Manage flight crews by reusing them over multiple flights. Input: set of k flights for a given day. Flight i leaves origin o at time s and arrives at destination d destination i i i at time f . i Minimize number of flight crews. 45Airline scheduling Circulation formulation. to see if c crews suffice For each flight i, include two nodes u and v . i i Add source s with demand c, and edges (s, u ) with capacity 1. i Add sink t with demand c, and edges (v , t) with capacity 1. i For each i, add edge (u , v ) with lower bound and capacity 1. i i if flight j reachable from i, add edge (v , u ) with capacity 1. i j crew can end day with any flight crew can begin day u v with any flight 1 1 0, 1 0, 1 c u v c s t 3 3 use c crews 1, 1 u v 2 2 0, 1 u v 4 4 flight 2 is performed same crew can do flights 2 and 4 46Airline scheduling: running time 3 Theorem. The airline scheduling problem can be solved in O(k log k) time. Pf. k = number of flights. c = number of crews (unknown). 2 O(k) nodes, O(k ) edges. At most k crews needed. binary search for optimal value c ⇒ solve lg k circulation problems. Value of the flow is between 0 and k. ⇒ at most k augmentations per circulation problem. 3 Overall time = O(k log k). 3 Remark. Can solve in O(k ) time by formulating as minimum flow problem. 47Airline scheduling: postmortem Remark. We solved a toy problem. Realworld problem models countless other factors: Union regulations: e.g., flight crews can only fly certain number of hours in given interval. Need optimal schedule over planning horizon, not just one day. Deadheading has a cost. Flights don't always leave or arrive on schedule. Simultaneously optimize both flight schedule and fare structure. Message. Our solution is a generally useful technique for efficient reuse of limited resources but trivializes real airline scheduling problem. Flow techniques useful for solving airline scheduling problems (and are widely used in practice). Running an airline efficiently is a very difficult problem. 487. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ SECTION baseball elimination ‣Image segmentation Image segmentation. Central problem in image processing. Divide image into coherent regions. Ex. Three people standing in front of complex background scene. Identify each person as a coherent object. liver and hepatic vascularization segmentation 50Image segmentation Foreground / background segmentation. Label each pixel in picture as belonging to foreground or background. V = set of pixels, E = pairs of neighboring pixels. a ≥ 0 is likelihood pixel i in foreground. i b ≥ 0 is likelihood pixel i in background. i p ≥ 0 is separation penalty for labeling one of i ij and j as foreground, and the other as background. Goals. Accuracy: if a b in isolation, prefer to label i in foreground. i i Smoothness: if many neighbors of i are labeled foreground, we should be inclined to label i as foreground. a + b − p Find partition (A, B) that maximizes: ∑ ∑ ∑ i j ij i∈ A j∈ B (i, j)∈ E A∩i, j =1 foreground background 51 € Image segmentation Formulate as min cut problem. Maximization. No source or sink. Undirected graph. Turn into minimization problem. a + b − p ∑ ∑ ∑ i j ij Maximizing i∈ A j∈ B (i, j)∈ E A∩i, j =1 a + b − a − b + p ∑ ∑ ∑ ∑ ∑ is equivalent to minimizing ( ) i j i j ij i∈V j∈V " i∈A j∈B (i,j)∈ E € a constant A∩i,j = 1 a + b + p ∑ ∑ ∑ j i ij or alternatively j∈ B i∈ A (i, j)∈ E A∩i, j =1 € 52 € Image segmentation edge in G Formulate as min cut problem G' = (V', E'). Include node for each pixel. p ij Use two antiparallel edges instead of two antiparallel edges in G' undirected edge. p ij Add source s to correspond to foreground. p ij Add sink t to correspond to background. a j p i j ij s t b i G' 53Image segmentation Consider min cut (A, B) in G'. A = foreground. cap(A,B) = a + b + p ∑ ∑ ∑ j i ij if i and j on different sides, j∈B i∈A (i,j)∈ E i∈A, j∈B p counted exactly once ij Precisely the quantity we want to minimize. € a j p i j ij s t b A i G' 547. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ SECTION baseball elimination ‣Project selection (maximum weight closure problem) can be positive Projects with prerequisites. or negative Set of possible projects P : project v has associated revenue p . v Set of prerequisites E : if (v, w) ∈ E, cannot do project v unless also do project w. A subset of projects A ⊆ P is feasible if the prerequisite of every project in A also belongs to A. Project selection problem. Given a set of projects P and prerequisites E, choose a feasible subset of projects to maximize revenue. MANAGEMENT SCIENCE Vol. 22, No. 1 1, July, 1976 Printed in U.SA. MAXIMAL CLOSURE OF A GRAPH AND APPLICATIONS TO COMBINATORIAL PROBLEMSt JEANCLAUDE PICARD Ecole Polytechnique, Montreal This paper generalizes the selection problem discussed by J. M. Rhys 12, J. D. Murchland 9, M. L. Balinski 1 and P. Hansen 4. Given a directed graph G, a closure of G is defined a of that if a node to as subset nodes such belongs the closure all its successors also belong to the set. If a real number is associated to each node of G a maximal closure is defined as a closure of maximal value. 56 1. Introduction The selection problem discussed J. M. and M. L. Balinksi by Rhys 12 1 can be defined as follows: A finite set of points S ("stations") together with the "cost" 0 c, of choosing ("constructing") s of S is At the same a finite any point given. time, of from collection I of subsets a points S is specified together with the "profit" of pa choosing any one of the subsets a. Define a selection to be a collection of subsets E with all of which to this from together points S belong collection. Let the value of the selection be the sum of the profits of the subsets from E minus the sum of the costs of the of S in the selection. The is to find a selection of points problem maximum value. J. M. Rhys 12 and M. L. Balinski 1 have shown that this problem can be solved as a maximal flow problem in a bipartite graph. This selection problem can easily be generalized as follows: Given a directed graph G = where V is the set of nodes and A the set of a closure of G is defined (V, A) arcs, as a of Y such that if a vertex to Y all its subset vertices belongs then successors is belong also to Y. If to each vertex associated a real number, mi, then a maximal vi closure Y of G is defined as a closure of maximal value is (i.e. E maximal). Ev y.mi In this paper, it is shown that the problem of finding a maximal closure of a graph is equivalent to solving the maximal flow problem in a network formed by the graph on its a G with infinite capacities arcs, source linked to each node of positive value vi by an arc of capacity (+ mi) and a sink linked from each node of negative value by vi an arc of capacity ( mi). The selection problem is then the maximal closure problem in a bipartite graph G = with arcs A = a E s E and s E and where the value (E, S, A) (u, s) E, S, a), associated with each node a EE is and the value associated with each node s S is pa E Cs. Applications J. M. and J. D. Murchland several of the selection Rhys 12 9 give applications The maximal closure of a finds its main in problem. graph application mining engineering for determining optimum pit mine contours 58. Given an ore body into there is a net value of the block minus the decomposed blocks, (the profit and fixed associated with each block. operating, capital costs) Determining optimum Morton Processed by Professor Klein, Departmental Editor for Network Flows and Location Analysis and Associate Editor Michael received Held; June 11, 1975, revised August 28, 1975. This paper has been with the author 2 months for revision. t This research was a from supported by grant the Iron Ore Company of Canada. 1268 Copyright 1976, The Institute of Management Sciences ) This content downloaded from 140.180.240.185 on Mon, 1 Apr 2013 10:50:35 AM All use subject to JSTOR Terms and ConditionsProject selection: prerequisite graph Prerequisite graph. Add edge (v, w) if can't do v without also doing w. w w v x v x v, w, x is feasible v, x is infeasible 57Project selection: mincut formulation Mincut formulation. Assign capacity ∞ to all prerequisite edge. Add edge (s, v) with capacity p if p 0. v v Add edge (v, t) with capacity p if p 0. v v For notational convenience, define p = p = 0. s t u w ∞ ∞ p p ∞ w u p p y ∞ z y s t z ∞ p p v x ∞ v x ∞ 58Project selection: mincut formulation Claim. (A, B) is min cut iff A − s is optimal set of projects. Infinite capacity edges ensure A − s is feasible. Max revenue because: cap(A, B) = p + (−p ) ∑ ∑ v v cap(A, B) = p + (−p ) ∑ ∑ v v v∈B: p 0 v∈A: p 0 v v v∈B: p 0 v∈A: p 0 v v = p − p ∑ ∑ v v = p − p ∑ ∑ v v v: p 0 v∈A v " v: p 0 v∈A v " constant constant ∞ w u € € ∞ ∞ p u p w p p ∞ z y z s y t ∞ p A v p x ∞ v x ∞ 59Openpit mining Openpit mining. (studied since early 1960s) Blocks of earth are extracted from surface to retrieve ore. Each block v has net value p = value of ore – processing cost. v Can't remove block v before w or x. w x v 607. NETWORK FLOW II bipartite matching ‣ disjoint paths ‣ extensions to max flow ‣ survey design ‣ airline scheduling ‣ image segmentation ‣ project selection ‣ SECTION baseball elimination ‣Baseball elimination 62Baseball elimination problem Q. Which teams have a chance of finishing the season with the most wins i team wins losses to play ATL PHI NYM MON 0 Atlanta 83 71 8 – 1 6 1 1 Philly 80 79 3 1 – 0 2 2 New York 78 78 6 6 0 – 0 3 Montreal 77 82 3 1 2 0 – Montreal is mathematically eliminated. Montreal finishes with ≤ 80 wins. Atlanta already has 83 wins. Remark. This is the only reason sports writers appear to be aware of — conditions are sufficient but not necessary 63Baseball elimination problem Q. Which teams have a chance of finishing the season with the most wins i team wins losses to play ATL PHI NYM MON 0 Atlanta 83 71 8 – 1 6 1 1 Philly 80 79 3 1 – 0 2 2 New York 78 78 6 6 0 – 0 3 Montreal 77 82 3 1 2 0 – Philadelphia is mathematically eliminated. Philadelphia finishes with ≤ 83 wins. Either New York or Atlanta will finish with ≥ 84 wins. Observation. Answer depends not only on how many games already won and left to play, but on whom they're against. 64Baseball elimination problem Current standings. Set of teams S. Distinguished team z ∈ S. Team x has won w games already. x Teams x and y play each other r additional times. xy Baseball elimination problem. Given the current standings, is there any outcome of the remaining games in which team z finishes with the most (or tied for the most) wins SIAM REVIEW Vol. 8, No. 3, July, 1966 POSSIBLE WINNERS IN PARTIALLY COMPLETED TOURNAMENTS BENJAMIN L. SCHWARTZ 1. Introduction. In this paper, we shall investigate certain questions in tourna ment For we shall the scheduling. definiteness, use terminology of baseball. We shall be concerned with the categorization of teams into three classes during the closing days of the season. A team may be definitely eliminated from pen nant it be in or it have possibility; may contention, may clinched the champion ship. It will be our convention that a team that can possibly tie for the pennant is considered stillin contention. In this paper necessary and sufficient conditions are developed to classify any team into the properly appropriate category. 2. First example. We consider first an extremely simple example, but one 65 that will instructive. one the prove Suppose team, say New York Mets, finds itself at one point in the season 37 behind the games another, perhaps Dodgers, with 36 to play for each team. Then we can surely declare that the Mets are eliminated. At the risk of belaboring the we to examine in detail obvious, propose why this is so. A natural unit for accounting for team changes in position is half behind games (henceforth HGB). The Mets are trailing the Dodgers by HGB 37 games). What do 74 (= hope they have to overcome this deficit Each game they have still to play offers them the of one HGB since potentially prospect gain, they may wb it. There are 36 such games. each game the Dodgers still have to play the Likewise, gives Mets the of one prospect HGB gain, since the Dodgers may lose. There are also 36 of these. The total maximum gain is potential only 36 36 72 HGB, + not enough to overcome the existing debit balance. Hence the Mets are indeed eliminated the by Dodgers. The generalized rule that has been illustrated here can be formalized. easily Let team x beNHGB behind team y. Let the number of games still to play for and x y be P(x) and P(y), respectively. Then, if (1) N P(x) P(y) O, we can assert that x is eliminated by y. In this criterion merely gives a condition for the of fact, possible interchange x and y in the standings before season’s end. A team has the pennant clinched when it is the and no other team can it. leading league, interchange with 3. Example of general situation. Let us now look at a more interesting situa tion. Table 1 gives standings in a fictitious league, part way through the season. Received by the editors June 1, 1965, and in final revised form December 1, 1965. United States Navy Postgraduate Monterey, California. Now at Weapons School, Systems Evaluation Institute for Defense Virginia. Division, Analyses, Arlington, from This is adaped the actual standings in the National League in midSep table tember 1963. 302 Downloaded 04/01/13 to 140.180.240.185. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpBaseball elimination problem: maxflow formulation Can team 4 finish with most wins Assume team 4 wins all remaining games ⇒ w + r wins. 4 4 Divvy remaining games so that all teams have ≤ w + r wins. 4 4 0–1 games left team 2 can still win between 1 and 2 this many more games 0–2 0 0–3 1 ∞ g 1–2 2 w + r – w t s 12 ∞ 4 4 2 1–3 3 team nodes (each team other than 4) 2–3 game nodes (each pair of teams other than 4) 66Baseball elimination problem: maxflow formulation Theorem. Team 4 not eliminated iff max flow saturates all edges leaving s. Pf. Integrality theorem ⇒ each remaining game between x and y added to number of wins for team x or team y. Capacity on (x, t) edges ensure no team wins too many games. ▪ 0–1 games left team 2 can still win between 1 and 2 this many more games 0–2 0 0–3 1 ∞ g 1–2 2 w + r – w t s 12 ∞ 4 4 2 1–3 3 team nodes (each team other than 4) 2–3 game nodes (each pair of teams other than 4) 67Baseball elimination: explanation for sports writers Q. Which teams have a chance of finishing the season with the most wins i team wins losses to play NYY BAL BOS TOR DET 0 New York 75 59 28 – 3 8 7 3 1 Baltimore 71 63 28 3 – 2 7 4 2 Boston 69 66 27 8 2 – 0 0 3 Toronto 63 72 27 7 7 0 – 0 4 Detroit 49 86 27 3 4 0 0 – AL East (August 30, 1996) Detroit is mathematically eliminated. Detroit finishes with ≤ 76 wins. Wins for R = NYY, BAL, BOS, TOR = 278. Remaining games among NYY, BAL, BOS, TOR = 3 + 8 + 7 + 2 + 7 = 27. Average team in R wins 305/4 = 76.25 games. 68Baseball elimination: explanation for sports writers Certificate of elimination. remaining games wins " " T⊆S, w(T ) := w , g(T ) := g , ∑ ∑ i xy i∈T x,y ⊆ T € Theorem. HoffmanRivlin 1967 Team z is eliminated iff there exists a subset T such that w(T)+g(T) w +g z z T Pf. ⇐ w(T)+g(T) w +g Suppose there exists T ⊆ S such that . z z T € Then, the teams in T win at least (w(T) + g(T)) / T games on average. This exceeds the maximum number that team z can win. ▪ € 69Baseball elimination: explanation for sports writers Pf. ⇒ Use maxflow formulation, and consider min cut (A, B). Let T = team nodes on source side A of min cut. Observe that game node xy ∈ A iff both x ∈ T and y ∈ T. infinite capacity edges ensure if xy ∈ A, then both x ∈ A and y ∈ A if x ∈ A and y ∈ A but xy ∉ A, then adding xy to A decreases the capacity of the cut by g xy y y ∞ g xy xy w + r – w ∞ s x z z x t x 70Baseball elimination: explanation for sports writers Pf. ⇒ Use maxflow formulation, and consider min cut (A, B). Let T = team nodes on source side A of min cut. Observe that game node xy ∈ A iff both x ∈ T and y ∈ T. Since team z is eliminated, by maxflow mincut theorem, g(S− z) cap(A, B) g(S− z) cap(A, B) g(S− z) cap(A, B) capacity of game edges leaving s capacity of team edges entering t capacity of game edges leaving s capacity of team edges leaving s capacity of game edges leaving s capacity of team edges leaving s " " capacity of game edges leaving s capacity of team edges leaving s " " " " = g(S− z)−g(T) + (w +g −w ) ∑ = g(S− z)−g(T) + (wz +gz−wx ) ∑ z z x = g(S− z)−g(T) + (w +g −w ) ∑ z z x x∈T x∈T x∈T = g(S− z)−g(T) − w(T) + T (w +g ) = g(S− z)−g(T) − w(T) + T (wz +gz) z z = g(S− z)−g(T) − w(T) + T (w +g ) z z w(T)+g(T) w +g Rearranging terms: ▪ z z € € T € € 71
Website URL
Comment