Question? Leave a message!


12. LOCAL SEARCH gradient descent ‣ Metropolis algorithm ‣ Hopfield neural networks ‣ maximum cut ‣ Nash equilibria ‣ Lecture slides by Kevin Wayne Copyright © 2005 PearsonAddison Wesley Copyright © 2013 Kevin Wayne Last updated on Sep 8, 2013 6:54 AMCoping With NPhardness Q. Suppose I need to solve an NPhard problem. What should I do A. Theory says you're unlikely to find polytime algorithm. Must sacrifice one of three desired features. Solve problem to optimality. Solve problem in polynomial time. Solve arbitrary instances of the problem. 212. LOCAL SEARCH gradient descent ‣ Metropolis algorithm ‣ Hopfield neural networks ‣ maximum cut ‣ Nash equilibria ‣ SECTION 12.1Gradient descent: vertex cover Vertex cover. Given a graph G = (V, E), find a subset of nodes S of minimal cardinality such that for each (u, v) ∈ E, either u or v (or both) are in S. Neighbor relation. S ∼ S ' if S ' can be obtained from S by adding or deleting a single node. Each vertex cover S has at most n neighbors. Gradient descent. Start with S = V. If there is a neighbor S ' that is a vertex cover and has lower cardinality, replace S with S '. Remark. Algorithm terminates after at most n steps since each update decreases the size of the cover by one. 4Gradient descent: vertex cover Local optimum. No neighbor is strictly better. optimum = all nodes on left side optimum = center node only local optimum = all nodes on right side local optimum = all other nodes optimum = even nodes local optimum = omit every third node 5Local search Local search. Algorithm that explores the space of possible solutions in sequential fashion, moving from a current solution to a "nearby" one. Neighbor relation. Let S ∼ S ' be a neighbor relation for the problem. Gradient descent. Let S denote current solution. If there is a neighbor S ' of S with strictly lower cost, replace S with the neighbor whose cost is as small as possible. Otherwise, terminate the algorithm. A funnel a jagged funnel 612. LOCAL SEARCH gradient descent ‣ Metropolis algorithm ‣ Hopfield neural networks ‣ maximum cut ‣ Nash equilibria ‣ SECTION 12.2Metropolis algorithm THE 0 R Y 0 F T RAe KEF FEe T SIN R A D I 0 L Y SIS 0 F W ATE R 1087 instead, only water molecules with different amounts of paths (a) and (b) can be designated H 0 and those 2 Metropolis algorithm. excitation energy. These may follow any of three paths: following path (c) can be designated H 0t. It seems 2 reasonable to assume for the purpose of these calcula (a) The excitation energy is lost without dissociation Simulate behavior of a physical system accor tions that the ionized ding to principles of H 0 molecules will become the 2 into radicals (by collision, or possibly radiation, as in H 0t molecules, but this is not likely to be a complete 2 aromatic hydrocarbons). correspondence. (b) The molecules dissociate, but the resulting radi statistical mechanics. In conclusion we would like to emphasize that the cals recombine without escaping from the liquid cage. qualitative result of this section is not critically de (c) The molecules dissociate and escape from the pendent on the exact values of the physical parameters Globally biased toward "downhill" steps, but occasionally cage. In this case we would not expect them to move used. However, this treatment is classical, and a correct more than a few molecular diameters through the dense treatment must be wave mechanical; therefore the makes "uphill" steps to break out of local minima. medium before being thermalized. result of this section cannot be taken as an a priori theoretical prediction. The success of the radical diffu In accordance with the notation introduced by sion model given above lends some plausibility to the Burton, Magee, and Samuel,22 the molecules following occurrence of electron capture as described by this crude calculation. Further work is clearly needed. 22 Burton, Magee, and Samuel, J. Chern. Phys. 20, 760 (1952). THE JOURNAL OF CHEMICAL PHYSICS VOLUME 21, NUMBER 6 JUNE, 1953 Equation of State Calculations by Fast Computing Machines NICHOLAS METROPOLIS, ARIANNA W. ROSENBLUTH, MARSHALL N. ROSENBLUTH, AND AUGUSTA H. TELLER, Los Alamos Scientific Laboratory, Los Alamos, New Mexico AND EDWARD TELLER, Department of Physics, University of Chicago, Chicago, Illinois (Received March 6, 1953) A general method, suitable for fast computing machines, for investigatiflg such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the twodimensional rigidsphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a fourterm virial coefficient expansion. I. INTRODUCTION II. THE GENERAL METHOD FOR AN ARBITRARY POTENTIAL BETWEEN THE PARTICLES HE purpose of this paper is to describe a general In order to reduce the problem to a feasible size for T method, suitable for fast electronic computing numerical work, we can, of course, consider only a finite machines, of calculating the properties of any substance number of particles. This number N may be as high as which may be considered as composed of interacting 8 several hundred. Our system consists of a squaret con individual molecules. Classical statistics is assumed, taining N particles. In order to minimize the surface only twobody forces are considered, and the potential effects we suppose the complete substance to be periodic, field of a molecule is assumed spherically symmetric. consisting of many such squares, each square contain These are the usual assumptions made in theories of ing N particles in the same configuration. Thus we liquids. Subject to the above assumptions, the method define dAB, the minimum distance between particles A is not restricted to any range of temperature or density. and B, as the shortest distance between A and any of This paper will also present results of a preliminary two the particles B, of which there is one in each of the dimensional calculation for the rigidsphere system. squares which comprise the complete substance. If we Work on the twodimensional case with a Lennard have a potential which falls off rapidly with distance, Jones potential is in progress and will be reported in a there will be at most one of the distances AB which later paper. Also, the problem in three dimensions is can make a substantial contribution; hence we need consider only the minimum distance dAB. being investigated. Now at the Radiation Laboratory of the University of Cali t We will use twodimensional nomenclature here since it fornia, Livermore, California. is easier to visualize. The extension to three dimensions is obvious. Downloaded 01 Jan 2013 to Redistribution subject to AIP license or copyright; see function GibbsBoltzmann function. The probability of finding a physical system in a E / (kT) state with energy E is proportional to e , where T 0 is temperature and k is a constant. For any temperature T 0, function is monotone decreasing function of energy E. System more likely to be in a lower energy state than higher one. T large: high and low energy states have roughly same probability T small: low energy states are much more probable 9Metropolis algorithm Metropolis algorithm. Given a fixed temperature T, maintain current state S. Randomly perturb current state S to new state S ' ∈ N(S). If E(S ') ≤ E(S), update current state to S '. ΔE / (kT) Otherwise, update current state to S ' with probability e , where ΔE = E(S ') – E(S) 0. Theorem. Let f (t) be fraction of first t steps in which simulation is in state S. S Then, assuming some technical conditions, with probability 1: 1 −E(S) /(kT ) lim f (t) = e , S Z t→∞ −E(S) /(kT ) where Z = e . ∑ S∈ N (S) Intuition. Simulation spends roughly the right amount of time in each state, € according to GibbsBoltzmann equation. 10Simulated annealing Simulated annealing. T large ⇒ probability of accepting an uphill move is large. T small ⇒ uphill moves are almost never accepted. Idea: turn knob to control T. Cooling schedule: T = T(i) at iteration i. Physical analog. Take solid and raise it to high temperature, we do not expect it to maintain a nice crystal structure. Take a molten solid and freeze it very abruptly, we do not expect to get a perfect crystal either. Annealing: cool material gradually from high temperature, allowing it to reach equilibrium at succession of intermediate lower temperatures. 1112. LOCAL SEARCH gradient descent ‣ Metropolis algorithm ‣ Hopfield neural networks ‣ maximum cut ‣ Nash equilibria ‣ SECTION 12.3Hopfield neural networks Hopfield networks. Simple model of an associative memory, in which a large collection of units are connected by an underlying network, and neighboring units try to correlate their states. Input: Graph G = (V, E) with integer (positive or negative) edge weights w. Configuration. Node assignment s = ± 1. u Intuition. If w 0, then u and v want to have the same state; uv if w 0 then u and v want different states. uv Note. In general, no configuration respects all constraints. 7 5 6 13Hopfield neural networks Def. With respect to a configuration S, edge e = (u, v) is good if w s s 0. That is, if w 0 then s = s ; if w 0, then s ≠ s . 𐄂 𐄂 e u v e u v e u v Def. With respect to a configuration S, a node u is satisfied if the weight of incident good edges ≥ weight of incident bad edges. w s s ≤ 0 ∑ e u v v: e=(u,v)∈ E Def. A configuration is stable if all nodes are satisfied. € 1 4 10 satisfied node: 5 – 4 – 1 – 1 0 5 1 bad edge Goal. Find a stable configuration, if such a configuration exists. 14Hopfield neural networks Goal. Find a stable configuration, if such a configuration exists. Stateflipping algorithm. Repeated flip state of an unsatisfied node. HOPFIELDFLIP (G, w) S ← arbitrary configuration. WHILE (current configuration is not stable) u ← unsatisfied node. s ← s . u u RETURN S. 15Stateflipping algorithm example unsatisfied node 10 – 8 0 unsatisfied node 8 – 4 – 1 – 1 0 stable 16Stateflipping algorithm: proof of correctness Theorem. The stateflipping algorithm terminates with a stable configuration after at most W = Σ w iterations. e e Pf attempt. Consider measure of progress Φ(S) = satisfied nodes. 17Stateflipping algorithm: proof of correctness Theorem. The stateflipping algorithm terminates with a stable configuration after at most W = Σ w iterations. e e Pf. Consider measure of progress Φ(S) = Σ w . e good e Clearly 0 ≤ Φ(S) ≤ W. We show Φ(S) increases by at least 1 after each flip. When u flips state: all good edges incident to u become bad all bad edges incident to u become good all other edges remain the same Φ(S') = Φ(S) − w + w ≥ Φ(S) + 1 ∑ ∑ e e e: e = (u,v)∈ E e: e = (u,v)∈ E e is bad e is good u is unsatisfied € 18Complexity of Hopfield neural network Hopfield network search problem. Given a weighted graph, find a stable configuration if one exists. Hopfield network decision problem. Given a weighted graph, does there exist a stable configuration Remark. The decision problem is trivially solvable (always yes), but there is no known polytime algorithm for the search problem. polynomial in n and log W 1912. LOCAL SEARCH gradient descent ‣ Metropolis algorithm ‣ Hopfield neural networks ‣ maximum cut ‣ Nash equilibria ‣ SECTION 12.4Maximum cut Maximum cut. Given an undirected graph G = (V, E) with positive integer edge weights w , find a cut (A, B) such that the total weight of edges e crossing the cut is maximized. w(A,B) := w ∑ uv u∈A, v∈B Toy application. n activities, m people. € Each person wants to participate in two of the activities. Schedule each activity in the morning or afternoon to maximize number of people that can enjoy both activities. Real applications. Circuit layout, statistical physics. 21Maximum cut Singleflip neighborhood. Given a cut (A, B), move one node from A to B, or one from B to A if it improves the solution. Greedy algorithm. MAXCUTLOCAL (G, w) (A, B) ← random cut. WHILE (there exists an improving node v) IF v ∉ A A ← A ∪ v . B ← B – v . ELSE v ∉ A B ← B ∪ v . A ← A – v . RETURN (A, B). 22Maximum cut: local search analysis Theorem. Let (A, B) be a locally optimal cut and let (A, B) be an optimal cut. Then w(A, B) ≥ ½ Σ w ≥ ½ w(A, B). e e weights are nonnegative Pf. w ≤ w Local optimality implies that for all u ∈ A : ∑ ∑ uv uv v∈A v∈B Adding up all these inequalities yields: 2 w ≤ w = w(A,B) ∑ ∑ uv uv u,v⊆A u∈ A, v∈B € 2 w ≤ w = w(A,B) ∑ ∑ Similarly uv uv u,v⊆B u∈ A, v∈B € Now, each edge counted once € w = w + w + w ≤ 2w(A,B) ∑ ∑ ∑ ∑ e uv uv uv e∈E u,v⊆A u∈ A, v∈B u,v⊆A             1 1 w(A, B) ≤ w(A, B) ≤ w(A, B) 2 2 23 € Maximum cut: big improvement flips Local search. Within a factor of 2 for MAXCUT, but not polytime Bigimprovementflip algorithm. Only choose a node which, when flipped, 2ε increases the cut value by at least w(A,B) n Claim. Upon termination, bigimprovementflip algorithm returns a cut (A, B) such that (2 + ε) w(A, B) ≥ w(A, B). € 2ε Pf idea. Add w(A,B) to each inequality in original proof. ▪ n 1 Claim. Bigimprovementflip algorithm terminates after O(ε n log W) flips, where W = Σ w . € e e Each flip improves cut value by at least a factor of (1 + ε / n). After n / ε iterations the cut value improves by a factor of 2. Cut value can be doubled at most log W times. ▪ 2 x if x ≥ 1, (1 + 1/x) ≥ 2 24Maximum cut: context Theorem. SahniGonzales 1976 There exists a ½approximation algorithm for MAXCUT. Theorem. There exists an 0.878approximation algorithm for MAXCUT. Theorem. Unless P = NP, no 0.942approximation algorithm for MAXCUT. Improved Approximation Algorithms for SomeOptimalInapproximabilityResults Maximum Cut and Satisfiability Problems Using Semidefinite Programming ˚ JOHANHASTAD Royal Institute of Technology, Stockholm, Sweden MIC13EL X. GOEMANS Massachusetts Institute of Technology, Cambridge, Massachusetts Abstract. We prove optimal, up to an arbitrary 0, inapproximability results for MaxEkSat for k≥ 3, maximizing the number of satisfied linear equations in an overdetermined system of linear AND equations modulo a prime p and Set Splitting. As a consequence of these results we get improved lowerboundsfortheefficientapproximabilityofmanyoptimizationproblemsstudiedpreviously.In DAVID P. WILLIAMSON particular,forMaxE2Sat,MaxCut,MaxdiCut,andVertexcover. IBM T. J. Watson Research Center, Yorktown Heights, New York CategoriesandSubjectDescriptors:F2.2AnalysisofAlgorithmsandProblemComplexity:Non numericalAlgorithmsandProblems GeneralTerms:Theory Abstract. We present randomized approximation algorithms for the maximum cut (MAX CUT) AdditionalKeyWordsandPhrases:Inapproximability,linearequations,maxsat,NPhardoptimiza and maximum 2satisfiability (MAX 2SAT) problems that always deliver solutions of expected tionproblems,probabilisticallycheckableproofs value at least .87856 times the optimal value. These algorithms use a simple and elegant technique that randomly rounds the solution to a nonlinear programming relaxation. This relaxation can be interpreted both as a semidefinite program and asan eigenvalue minimization problem. The best previously known approximation algorithms for these problems had perfcr mance guarantees of for MAX CUT and for MAX 2SAT. Slight extensions of our analysis 1. Introduction lead to a .79607approximation algorithm for the maximum directed cut problem (MAX DICUT) and a .758approximation algorithm for MAX SAT, where the best previously known approxima ManynaturaloptimizationproblemsareNPhard,whichimpliesthattheyareprob 25 tion algorithms had performance guarantees of and , respectively. Our algorithm givesthe first ably hard to solve exactly in the worst case. In practice, however, it is sufficient substantial progress in approximating MAX CUT in nearly twenty years, and represents the first togetreasonablygoodsolutionsforall(orevenmost)instances.Inthispaper,we use of :semidefinite programming in the design of approximation algorithms. study the existence of polynomial time approximation algorithms for some of the Categories and Subject Descriptors: F2.2 Analysis of Algorithms and Problem Complexity: Nonumerical Algorithms and Problems—computations on discrete structures; G2.2 Discrete Math basicNPcompleteproblems.Foramaximizationproblemwesaythatanalgorithm isaCapproximationalgorithmifit,foreachinstance,producesansolutionwhose objective value is at least OPT/C where OPT is the global optimum. A similar A preliminary version has appeared in Proceedings of the 26th AnnualACM Symposium on Theory definitionappliestominimizationproblems. of Computing (Montreal, Que., Canada). ACM, New York, 1994,pp. 422–431. Afundamentalquestionis,foragivenNPcompleteproblem,forwhatvalueof The research of M. X. Goemans was supported in part by National Science Foundation (NSF) C can we hope for a polynomial time Capproximation algorithm. Posed in this contract CCR 9302476 and DARPA contract NOO01492J1799. generality, this is a large research area with many positive and negative results. In The research of D. P. Williamson was supported by an NSF Postdoctoral Fellowship. This research was conducted while the author wasvisiting MIT. this paper, we concentrate on negative results, that is, results of the form that for Authors’ addresses:M. X. Goemans, Department of Mathematics, Room 2382, Massachusetts someC1acertainproblemcannotbeapproximatedwithinCinpolynomialtime. Institute of Technology, Cambridge, MA 02139,email: edu; D. P.Williamson, These results are invariably based on plausible complexity theoretic assumptions, IBM T, J. Watson Research Center, Room 33219, P.O. Box 218, Yorktown Heights, NY 1059I8, email: theweakestpossiblebeingNP"=PsinceifNP=P,allconsideredproblemscanbe Permission to make digital/hard copy of part or all of this work for personal or classroom use is solvedexactlyinpolynomialtime. grantedl without fee provided that copies are not made or distributed for profit or commercial ThemostbasicNPcompleteproblemissatisfiabilityofCNFformulasandprob advantage, the copyright notice, the title of the publication, and its date appear, and notice is given tlhat copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on ably the most used variant of this is 3SAT where each clause contains at most 3 servers,or to redistribute to lists, requires prior specific permission and/or a fee. 01995 ACM 00045411/95/11001115 03.50 An earlier version of this paper appeared in Proceedings of the 29th Annual ACM Symposium on JournaloftheAssociationforComputinsMachinery,Vol.42,No.6,November1995,pp.11151145. Theory of Computing(ElPaso,Tex.,May4–6).ACM,NewYork,1997,pp.1–10. Author’saddress:DepartmentofNumericalAnalysisandComputerScience,RoyalInstituteofTech nology,S10044Stockholm,Sweden, Permissiontomakedigital/hardcopyofpartorallofthisworkforpersonalorclassroomuseisgranted without fee provided that the copies are not made or distributed for profit or commercial advantage, thecopyrightnotice,thetitleofthepublication,anditsdateappear,andnoticeisgiventhatcopying is by permission of the Association for Computing Machinery (ACM), Inc. To copy otherwise, to republish,topostonservers,ortoredistributetolists,requirespriorspecificpermissionand/orafee. C 2001ACM00045411/01/070007985.00 JournaloftheACM,Vol.48,No.4,July2001,pp.798–859.Neighbor relations for max cut 1flip neighborhood. Cuts (A, B) and (A', B') differ in exactly one node. kflip neighborhood. Cuts (A, B) and (A', B') differ in at most k nodes. cut value of (A , B ) 1 1 KLneighborhood. KernighanLin 1970 may be worse than (A, B) To form neighborhood of (A, B): Iteration 1: flip node from (A, B) that results in best cut value (A , B ), 1 1 and mark that node. Iteration i: flip node from (A , B ) that results in best cut value (A , B ) i–1 i–1 i i among all nodes not yet marked. Neighborhood of (A, B) = (A , B ), …, (A , B ). n–1 n–1 1 1 Neighborhood includes some very long sequences of flips, but without the computational overhead of a kflip neighborhood. Practice: powerful and useful framework. Theory: explain and understand its success in practice. 2612. LOCAL SEARCH gradient descent ‣ Metropolis algorithm ‣ Hopfield neural networks ‣ maximum cut ‣ Nash equilibria ‣ SECTION 12.7Multicast routing Multicast routing. Given a directed graph G = (V, E) with edge costs c ≥ 0, a source node s, and k agents located at terminal nodes t , …, t . e 1 k Agent j must construct a path P from node s to its terminal t . j j Fair share. If x agents use edge e, they each pay c / x. e s 1 2 1 pays 2 pays 5 outer outer 4 8 8 4 outer middle 4 5 + 1 v middle outer 5 + 1 8 1 1 middle middle 5/2 + 1 5/2 + 1 t t 1 2 28Multicast routing Best response dynamics. Each agent is continually prepared to improve its solution in response to changes made by other agents. Nash equilibrium. Solution where no agent has an incentive to switch. Fundamental question. When do Nash equilibria exist s Ex: Two agents start with outer paths. 5 Agent 1 has no incentive to switch paths 8 4 (since 4 5 + 1), but agent 2 does (since 8 5 + 1). v Once this happens, agent 1 prefers middle 1 1 path (since 4 5/2 + 1). Both agents using middle path is a Nash t t 1 2 equilibrium. 29Nash equilibrium and local search Local search algorithm. Each agent is continually prepared to improve its solution in response to changes made by other agents. Analogies. Nash equilibrium : local search. Best response dynamics : local search algorithm. Unilateral move by single agent : local neighborhood. Contrast. Bestresponse dynamics need not terminate since no single objective function is being optimized. 30Socially optimum Social optimum. Minimizes total cost to all agent. Observation. In general, there can be many Nash equilibria. Even when its unique, it does not necessarily equal the social optimum. s s k agents 3 5 5 1 + ε k v 1 1 t t t 1 2 social optimum = 1 + ε social optimum = 7 Nash equilibrium A = 1 + ε unique Nash equilibrium = 8 Nash equilibrium B = k 31Price of stability Price of stability. Ratio of best Nash equilibrium to social optimum. Fundamental question. What is price of stability Ex: Price of stability = Θ(log k). Social optimum. Everyone takes bottom paths. Unique Nash equilibrium. Everyone takes top paths. Price of stability. H(k) / (1 + ε). s 1 + 1/2 + … + 1/k 1/3 1/k 1 1/2 . . . t t t t 1 + ε 1 2 3 k 0 0 0 0 s 32Finding a Nash equilibrium Theorem. The following algorithm terminates with a Nash equilibrium. BESTRESPONSEDYNAMICS (G, c, k) FOR j = 1 to k P ← any path for agent j. j WHILE (not a Nash equilibrium) j ← some agent who can improve by switching paths. P ← better path for agent i. j RETURN (P , P , …, P ). 1 2 k Pf. Consider a set of paths P , …, P . 1 k Let x denote the number of paths that use edge e. H(0) = 0 e k 1 H(k) = ∑ Let Φ(P , …, P ) = Σ c · H(x ) be a potential function, where 1 k e ∈ E e e i i=1 Since there are only finitely many sets of paths, it suffices to show that Φ strictly decreases in each step. € 33Finding a Nash equilibrium Pf. continued Consider agent j switching from path P to path P '. j j Agent j switches because c c f e ∑ ∑ x + 1 x f e f ∈ P '− P e∈ P − P ' j j j j           newly incurred cost cost saved c f Φ increases by c H(x +1) − H(x ) = ∑ ∑ f f f x + 1 f € f ∈ P '− P f ∈ P '− P j j j j c e c H(x ) − H(x − 1) = ∑ ∑ e e e x Φ decreases by e e∈ P − P ' e∈ P − P ' j j j j € Thus, net change in Φ is negative. ▪ € 34Bounding the price of stability Lemma. Let C(P , …, P ) denote the total cost of selecting paths P , …, P . 1 k 1 k For any set of paths P , …, P , we have 1 k C(P,…, P ) ≤ Φ(P,…, P ) ≤ H(k)⋅ C(P ,…, P ) 1 k 1 k 1 k Pf. Let x denote the number of paths containing edge e. e + Let E denote set of edges that belong to at least one of the paths. € Then, C(P,…, P ) = ∑ c ≤ ∑ c H(x ) ≤ ∑ c H(k) = H(k) C(P,…, P ) 1 k e e e e 1 k + + + e∈ E e∈ E e∈ E     Φ(P ,…, P ) 1 k € 35Bounding the price of stability Theorem. There is a Nash equilibrium for which the total cost to all agents exceeds that of the social optimum by at most a factor of H(k). Pf. Let (P , …, P ) denote a set of socially optimal paths. 1 k Run bestresponse dynamics algorithm starting from P . Since Φ is monotone decreasing Φ(P , …, P ) ≤ Φ(P , …, P ). 1 k 1 k C(P,…, P ) ≤ Φ(P,…, P ) ≤ Φ(P,…, P ) ≤ H(k)⋅ C(P ,…, P ) 1 k 1 k 1 k 1 k previous lemma previous lemma applied to P applied to P € 36Summary Existence. Nash equilibria always exist for kagent multicast routing with fair sharing. Price of stability. Best Nash equilibrium is never more than a factor of H(k) worse than the social optimum. Fundamental open problem. Find any Nash equilibria in polytime. 37
Document Information
User Name:
User Type:
Uploaded Date: