Question? Leave a message!




Disjoint sets data type: applications

Disjoint sets data type: applications 29
UNIONFIND union by size ‣ linkbyrank ‣ path compression ‣ linkbyrank with path compression ‣ context ‣ Lecture slides by Kevin Wayne Copyright © 2005 PearsonAddison Wesley Copyright © 2013 Kevin Wayne http://www.cs.princeton.edu/wayne/kleinbergtardos Last updated on Sep 8, 2013 7:01 AMDisjoint sets data type Goal. Support two operations on a set of elements: MAKESET(x). Create a new set containing only element x. FIND(x). Return a canonical element in the set containing x. UNION(x, y). Merge the sets containing x and y. Dynamic connectivity. Given an initial empty graph G on n nodes, support the following queries: 1 union operation ADDEDGE(u, v). Add an edge between nodes u and v. ISCONNECTED(u, v). Is there a path between u and v 2 find operations 2Disjoint sets data type: applications Original motivation. Compiling EQUIVALENCE, DIMENSION, and COMMON statements in Fortran. other EQUIVALENCE declaration). We must exhibit An Improved Equivalence an algorithm which will result in a storage assignment for each variable and array occurring in any EQUIVALENCE Algorithm statement. Of course, the groups may be highly linked, such as in the BERNARD A. GALLER AND MICHAEL J. FISHER following statement. University of Michigan, Ann Arbor, Michigan EQUIVALENCE (X, Y2 ), (Q, J, K), (Y3, Z1 ), (U, V), (Y, Q), (U3, M10, N) (2) An algorithm for assigning storage on the basis of EQUIV ALENCE, DIMENSION and COMMON declarations is pre We shall use this example to illustrate the algorithm sented. The algorithm is based on a tree structure, and has presented here. Assume that K has been assigned to loca reduced computation time by 40 percent over a previously tion 100 by some other declaration and that the dimensions published algorithm by identifying all equivalence classes of Y, Z, M and U are 10, 4, 12 and 5, respectively. (In with one scan of the EQUIVALENCE declarations. The method other words, since the zero subscript is allowed here, is applicable in any problem in which it is necessary to identify the highest subscripts occurring for Y, Z, M and U are equivalence classes, given the element pairs defining the 9, 3, 11 and 4, respectively.) There is no loss in generality equivalence relation. if we assume (and we do) that every other variable is also an array of dimension 1. An algorithm for the assignment of storage on the basis The algorithm has as input a collection of n groups of the EQUIVALENCE declaration found in such lan of subscripted array names. We shall call the groups guages as FORTRAN and MAD was presented in 1. The G1, "", Gn, and for the group Gi, we shall label the algorithm given here, which uses a tree structure, is a mi array names gil, gi2, • • • , gimi • Associated with each considerable improvement over the previous one, and array name glj will be its subscript s(gii) and its dimen the two algorithms furnish a clearcut example of the sion d(gii). It will be convenient to use five auxiliary ben:fits which can be realized through the use of such Note. This 1964 paper also introduced key data structure for problem. vectors, called the E, R, S, H, and H' vectors, respectively. metkods. (Comparison tests have shown that the new These vectors must be large enough to hold all distinct method reduces the execution time of the algorithm by as array names appearing in the EQUIVALENCE state 3 much as 40 percent.) The notation and statement of the ments. The number of entries in the Evector will be problem have been made as similar to that of 1 as possible to facilitate comparison, and is reviewed here for com • " = • pleteness. Figure 1 shows a general equivalence algorithm, suitable for identification of equivalence classes in any context. Figures 2 and 3 use this same algorithm for the address assignment problem considered in 1, retaining additional information (D, d, do, R, H and H') during the con F struction of the trees to facilitate the address assignment at the end. The problem may then be stated as follows: In some t = i T algebraic (or any other) languages, one may write EQUIV ALEIXCE declarations of the form: EQUIVALENCE (X, Y, Zi), (Z, Ws), (V, V) (1) where the entries consist of names of variables, subscripted array names or unsubscripted array names (which are assumed to represent the element of the array which has subscript zero). Some of the variables or arrays which occur here may have already been assigned to specific locations in storage; others have not yet been assigned. The entries are grouped by means of parentheses, the groups being separated by commas. For example, state ment (1) would assign X, Y and Z1 to the same location, then Z( Z0) and W5 to another location, and U and V to yet another location (unless either U or V is made ¢5 equivalent to one of the other variables or arrays by some Presented at the ACM National Conference, Denver, Colorado, 1963. Fm. 1 Volume 7 / Number 5 / May, 1964 Communications of the ACM 301 Disjoint sets data type: applications Applications. Percolation. Kruskal's algorithm. Connected components. Computing LCAs in trees. Computing dominators in digraphs. Equivalence of finite state automata. Checking flow graphs for reducibility. HoshenKopelman algorithm in physics. HinleyMilner polymorphic type inference. Morphological attribute openings and closings. Matlab's BWLABEL() function for image processing. Compiling EQUIVALENCE, DIMENSION and COMMON statements in Fortran. ... 4UNIONFIND linkbysize ‣ linkbyrank ‣ path compression ‣ linkbyrank with path compression ‣ context ‣ SECTION 4.6Disjointsets data structure Representation. Represent each set as a tree of elements. Each element has a parent pointer in the tree. The root serves as the canonical element. FIND(x). Find the root of the tree containing x. UNION(x, y). Make the root of one tree point to root of other tree. root 4 6 3 8 9 0 2 5 1 7 parent of 1 is 2 Note. For brevity, we suppress arrows and self loops in figures. 6Linkbysize Linkbysize. Maintain a subtree count for each node, initially 1. Link root of smaller tree to root of larger tree (breaking ties arbitrarily). union(7, 3) size = 4 size = 6 4 6 3 8 9 0 2 5 1 7Linkbysize Linkbysize. Maintain a subtree count for each node, initially 1. Link root of smaller tree to root of larger tree (breaking ties arbitrarily). union(7, 3) size = 10 6 4 0 2 5 3 8 9 1 7Linkbysize Linkbysize. Maintain a subtree count for each node, initially 1. Link root of smaller tree to root of larger tree (breaking ties arbitrarily). UNIONBYSIZE (x, y) MAKESET (x) r ← FIND (x). s ← FIND (y). parent(x) ← x. size(x) ← 1. IF (r = s) RETURN. ELSE IF (size(r) size(s)) parent(s) ← r. FIND (x) size(r) ← size(r) + size(s). ELSE WHILE (x ≠ parent(x)) x ← parent(x). parent(r) ← s. size(s) ← size(r) + size(s). RETURN x. 9Linkbysize: analysis height(r) Property. Using linkbysize, for every root node r, size(r) ≥ 2 . Pf. by induction on number of links Base case: singleton tree has size 1 and height 0. Inductive hypothesis: assume true after first i links. Tree rooted at r changes only when a smaller tree rooted at s is linked into r. size'(r) ≥ size(r) Case 1. height(r) height(s) height(r) inductive hypothesis ≥ 2 height'(r) 2 . = size = 8 (height = 2) r size = 3 (height = 1) s 10Linkbysize: analysis height(r) Property. Using linkbysize, for every root node r, size(r) ≥ 2 . Pf. by induction on number of links Base case: singleton tree has size 1 and height 0. Inductive hypothesis: assume true after first i links. Tree rooted at r changes only when a smaller tree rooted at s is linked into r. size'(r) size(r) + size(s) Case 2. height(r) ≤ height(s) = linkbysize ≥ 2 size(s) height(s) inductive hypothesis 2 ⋅ 2 ≥ size = 6 height(s) + 1 2 (height = 1) = r height'(r) size = 3 2 . ▪ = (height = 2) s 11Linkbysize: analysis Theorem. Using linkbysize, any UNION or FIND operations takes O(log n) time in the worst case, where n is the number of elements. Pf. The running time of each operation is bounded by the tree height. By the previous property, the height is ≤ ⎣lg n⎦. ▪ lg n = log n 2 12A matching lower bound Theorem. Using linkbysize, a tree with n nodes can have height = lg n. Pf. k Arrange 2 – 1 calls to UNION to form a binomial tree of order k. k An orderk binomial tree has 2 nodes and height k. ▪ B B B B B 0 1 2 3 4 13UNIONFIND linkbysize ‣ linkbyrank ‣ path compression ‣ linkbyrank with path compression ‣ context ‣ SECTION 5.1.4Linkbyrank Linkbyrank. Maintain an integer rank for each node, initially 0. Link root of smaller rank to root of larger rank; if tie, increase rank of new root by 1. union(7, 3) rank = 1 rank = 2 4 6 3 8 9 0 2 5 1 7 Note. For now, rank = height.Linkbyrank Linkbyrank. Maintain an integer rank for each node, initially 0. Link root of smaller rank to root of larger rank; if tie, increase rank of new root by 1. rank = 2 6 4 0 2 5 3 8 9 1 7 Note. For now, rank = height.Linkbyrank Linkbyrank. Maintain an integer rank for each node, initially 0. Link root of smaller rank to root of larger rank; if tie, increase rank of new root by 1. MAKESET (x) UNIONBYRANK (x, y) parent(x) ← x. r ← FIND (x). rank(x) ← 0. s ← FIND (y). IF (r = s) RETURN. ELSE IF rank(r) rank(s) parent(s) ← r. ELSE IF rank(r) rank(s) FIND (x) parent(r) ← s. WHILE x ≠ parent(x) ELSE x ← parent(x). parent(r) ← s. RETURN x. rank(s) ← rank(s) + 1. 17Linkbyrank: properties Property 1. If x is not a root node, then rank(x) rank(parent(x)). Pf. A node of rank k is created only by merging two roots of rank k – 1. ▪ Property 2. If x is not a root, then rank(x) will never change again. Pf. Rank changes only for roots; a nonroot never becomes a root. ▪ Property 3. If parent(x) changes, then rank(parent(x)) strictly increases. Pf. The parent can change only for a root, so before linking parent(x) = x ; After x is linkedbyrank to new root r we have rank(r) rank(x). ▪ rank = 3 rank = 2 rank = 1 rank = 0 18Linkbyrank: properties k Property 4. Any root node of rank k has ≥ 2 nodes in its tree. Pf. by induction on k Base case: true for k = 0. Inductive hypothesis: assume true for k – 1. A node of rank k is created only by merging two roots of rank k – 1. k – 1 By inductive hypothesis, each subtree has ≥ 2 nodes k ⇒ resulting tree has ≥ 2 nodes. ▪ Property 5. The highest rank of a node is ≤ ⎣lg n⎦. Pf. Immediate from PROPERTY 1 and PROPERTY 4. ▪ rank = 2 (8 nodes) rank = 2 (4 nodes) 19Linkbyrank: properties r Property 6. For any integer r ≥ 0, there are ≤ n / 2 nodes with rank r. Pf. k Any root node of rank k has ≥ 2 descendants. PROPERTY 4 k Any nonroot node of rank k has ≥ 2 descendants because: it had this property just before it became a nonroot PROPERTY 4 its rank doesn't change once it becomes a nonroot PROPERTY 2 its set of descendants doesn't change once it became a nonroot Different nodes of rank k can't have common descendants. PROPERTY 1 ▪ rank = 4 (1 node) rank = 3 (1 node) rank = 2 (2 nodes) rank = 1 (5 nodes) rank = 0 (11 nodes) 20Linkbyrank: analysis Theorem. Using linkbyrank, any UNION or FIND operations takes O(log n) time in the worst case, where n is the number of elements. Pf. The running time of each operation is bounded by the tree height. By the PROPERTY 5, the height is ≤ ⎣lg n⎦. ▪ 21UNIONFIND linkbysize ‣ linkbyrank ‣ path compression ‣ linkbyrank with path compression ‣ context ‣ SECTION 5.1.4Path compression Path compression. After finding the root r of the tree containing x, change the parent pointer of all nodes along the path to point directly to r. before path compression r x 4 x 3 T 4 after path compression x 2 T 3 r x 1 T 2 T 1 x x x 4 3 2 x 1 T T T T 4 3 2 1 23Path compression Path compression. After finding the root r of the tree containing x, change the parent pointer of all nodes along the path to point directly to r. root 0 1 2 3 4 5 6 7 x 8 9 10 11 12 24Path compression Path compression. After finding the root r of the tree containing x, change the parent pointer of all nodes along the path to point directly to r. root 0 9 1 2 3 4 5 11 12 x 6 7 8 10 25Path compression Path compression. After finding the root r of the tree containing x, change the parent pointer of all nodes along the path to point directly to r. root 0 6 9 1 2 x 8 3 4 5 11 12 7 10 26Path compression Path compression. After finding the root r of the tree containing x, change the parent pointer of all nodes along the path to point directly to r. root 0 x 6 9 3 1 2 8 7 4 5 11 12 10 27Path compression Path compression. After finding the root r of the tree containing x, change the parent pointer of all nodes along the path to point directly to r. x root 0 6 9 3 1 2 8 7 4 5 11 12 10 28Path compression Path compression. After finding the root r of the tree containing x, change the parent pointer of all nodes along the path to point directly to r. FIND (x) IF x ≠ parent(x) parent(x) ← FIND (parent(x)). RETURN parent(x). Note. Path compression does not change the rank of a node; so height(x) ≤ rank(x) but they are not necessarily equal. 29Path compression Fact. Path compression (with naive linking) can require Ω(n) time to perform a single UNION or FIND operation, where n is the number of elements. Pf. The height of the tree is n – 1 after the sequence of union operations: UNION(1, 2), UNION(2, 3), …, UNION(n – 1, n). ▪ Theorem. Tarjanvan Leeuwen 1984 Starting from an empty data structure, path compression (with naive linking) performs any intermixed sequence of m ≥ n find and n – 1 union operations in O(m log n) time. Pf. Nontrivial but omitted. 30UNIONFIND linkbysize ‣ linkbyrank ‣ path compression ‣ linkbyrank with path compression ‣ context ‣ SECTION 5.1.4Linkbyrank with path compression: properties Property. The tree roots, node ranks, and elements within a tree are the same with or without path compression. Pf. Path compression does not create new roots, change ranks, or move elements from one tree to another. ▪ before path after path compression compression r r x 3 x 2 x x x 3 2 1 T 3 x 1 T 2 T T T 3 2 1 T 1 32Linkbyrank with path compression: properties Property. The tree roots, node ranks, and elements within a tree are the same with or without path compression. Corollary. PROPERTY 2, 4–6 hold for linkbyrank with path compression. Property 1. If x is not a root node, then rank(x) rank(parent(x)). Property 2. If x is not a root, then rank(x) will never change again. Property 3. If parent(x) changes, then rank(parent(x)) strictly increases. k Property 4. Any root node of rank k has ≥ 2 nodes in its tree. Property 5. The highest rank of a node is ≤ ⎣lg n⎦. r Property 6. For any integer r ≥ 0, there are ≤ n / 2 nodes with rank r. Bottom line. PROPERTY 1–6 hold for linkbyrank with path compression. (but we need to recheck PROPERTY 1 and PROPERTY 3) 33Linkbyrank with path compression: properties Property 1. If x is not a root node, then rank(x) rank(parent(x)). Pf. Path compression only increases rank of parent. Property 3. If parent(x) changes, then rank(parent(x)) strictly increases. Pf. Path compression can only make x point to an ancestor of parent(x). before path after path compression compression r r z y x y z T 3 x T 2 T T T 3 2 1 T 1 34Iterated logarithm function Def. The iterated logarithm function is defined by: n lg n 1 0 1B7 n 1 2 1 lg n = 1+lg (lgn)Qi2`rBb2 (3, 4 2 5, 16 3 17, 65536 4 65536 65537, 2 5 iterated lg function Note. We have lg n ≤ 5 unless n exceeds the atoms in the universe. 35Analysis Divide nonzero ranks into the following groups: 1 2 3, 4 5, 6, …, 16 16 17, 18, …, 2 65536 65537, 65538, …, 2 ... Property 7. Every nonzero rank falls within one of the first lg n groups. Pf. The rank is between 0 and ⎣lg n⎦. PROPERTY 5 36Creative accounting Credits. A node receives credits as soon as it ceases to be a root. k k If its rank is in the interval k + 1, k + 2, …, 2 , we give it 2 credits. group k Proposition. Number of credits disbursed to all nodes is ≤ n lg n. Pf. By PROPERTY 6, the number of nodes with rank ≥ k + 1 is at most n n n + + ··· k+1 k+2 k 2 2 2 Thus, nodes in group k need at most n credits in total. There are ≤ lg n groups. PROPERTY 7 ▪ 37Running time of find Running time of find. Bounded by number of parent pointers followed. Recall: the rank strictly increases as you go up a tree. PROPERTY 1 Case 0: parent(x) is a root ⇒ only happens for one link per FIND. Case 1: rank(parent(x)) is in a higher group than rank(x). Case 2: rank(parent(x)) is in the same group as rank(x). Case 1. At most lg n nodes on path can be in a higher group. PROPERTY 7 Case 2. These nodes are charged 1 credit to follow parent pointer. Each time x pays 1 credit, rank(parent(x)) strictly increases. PROPERTY 1 k Therefore, if rank(x) is in the group k + 1, …, 2 , the rank of its parent k will be in a higher group before x pays 2 credits. Once rank(parent(x)) is in a higher group than rank(x), it remains so because: rank(x) does not change once it ceases to be a root. PROPERTY 2 rank(parent(x)) does not decrease. PROPERTY 3 thus, x has enough credits to pay until it becomes a Case 1 node. ▪ 38Linkbyrank with path compression Theorem. Starting from an empty data structure, linkbysize with path compression performs any intermixed sequence of m ≥ n FIND and n – 1 UNION operations in O(m logn) time. 39UNIONFIND linkbysize ‣ linkbyrank ‣ path compression ‣ linkbyrank with path compression ‣ context ‣Linkbysize with path compression Theorem. Fischer 1972 Linkbysize with path compression performs any intermixed sequence of m ≥ n FIND and n – 1 UNION operations in O(m log log n) time. 41Linkbysize with path compression Theorem. HopcroftUllman 1973 Linkbysize with path compression performs any intermixed sequence of m ≥ n FIND and n – 1 UNION operations in O(m logn) time. SIAM J. COMPUT. Vol. 2, No. 4, December 1973 SET MERGING ALGORITHMS J. E. AND J. D. ULLMAN HOPCROFT" Abstract. This paper considers the problem ofmerging sets formed from a total of n items in such a way that at any time, the name ofa set containing a given item can be ascertained. Two algorithms using different data structures are discussed. The execution times ofboth algorithms are bounded by a constant times nG(n), where is a function whose asymptotic growth rate is less than that of G(n) any finite number of logarithms of n. Key words, algorithm, algorithmic data structure, equivalence analysis, computational complexity, algorithm, merging, property grammar, set, spanning tree 1. the sets Introduction. Let us consider problem of efficiently merging while at thesame time according to an initially unknown sequence ofinstructions, able to determine the set a element This problem being containing given rapidly. as the essential part of several less abstract problems. For example, in 13 appears 42 of addresses an assembler was con the problem "equivalencing" symbolic by sidered. Initially, each name is in a set by itself, i.e., it is equivalent to no other name. An assembly statement that sets nameA to name language equivalent B by makes to and C were and B and D implication C equivalent D if A equivalent were likewise to make A and B equivalent, we must find the equivalent. Thus, sets (equivalence of which A and B are currently members and merge classes) them their union. these sets, i.e., replace by Another for this problem is the construction of spanning trees for an setting a undirected graph Initially, each vertex is in set (connected component) by 2. itself. We find some and determine the connected compo edges (n, m) by strategy nents n and m. If these we add to the tree being constructed containing differ, (n, m) which now are connected the and merge the components containing n and m, by tree formed. If and are in the same we throw being n m already component, away find a new edge. (n, m)and of and A third application 33 is the implementation property grammars I43, others themselves when it is realized that the task we discuss here many suggest can be done in less than log time. O(n n) us of the more obvious data struc By way of introduction, let consider some tures could be kept in these sets could be merged, and whereby objects disjoint sets, thename ofthe set containing a given object could be determined. One possibility is to each set a Each the an represent by tree. vertex of tree would correspond to in the set. Each would have a to the object object pointer vertex representing it, and each vertex would have a pointer to its father. Ifthe vertex is a root, however, zero indicate absence ofa father. The name of the set the pointer would be to the is attached to the root. Received the editors and in revised form 18, 1973. This research was by August 10, 1972, May the National Science Foundation under Grant GJ1052and the Office ofNaval Research supported by under Contract N0001467A00710021. of Sciences, Cornell Ithaca, New York 14850. 5 Department Computer University, Department of Electrical Engineering, Princeton University, Princeton, New Jersey 08540. :1: 294 Downloaded 02/20/13 to 128.112.139.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpLinkbysize with path compression Theorem. Tarjan 1975 Linkbysize with path compression performs any intermixed sequence of m ≥ n FIND and n – 1 UNION operations in O(m α(m, n)) time, where α(m, n) is a functional inverse of the Ackermann function. Efficiency of a Good But Not Linear Set Union Algorithm ROBERT ENDRE TAR JAN University of California, Berkeley, Califorma ABSTRACT. TWO types of instructmns for mampulating a family of disjoint sets which partitmn a umverse of n elements are considered FIND(x) computes the name of the (unique) set containing element x UNION(A, B, C) combines sets A and B into a new set named C. A known algorithm for implementing sequences of these mstructmns is examined It is shown that, if t(m, n) as the maximum time reqmred by a sequence of m n FINDs and n 1 intermixed UNIONs, then kima(m, n) t(m, n) k:ma(m, n) for some positive constants ki and k2, where a(m, n) is related to a functional inverse of Ackermann's functmn and as very slowgrowing. KEY WORDS AND PHRASES. algorithm, complexity, eqmvalence, partition, set umon, tree CR CATEGORIES: 4 12, 5.25, 5.32 Introduction Suppose we want to use two types of instructions for manipulating disjoint sets. FIND(x) computes the name of the unique set containing element x. UNION(A, B, C) combines 43 sets A and B into a new set named C. Initially we are given n elements, each in a single ton set. We then wish to carry out a sequence of rn n FINDs and n, 1 intermixed UNIONs. An algorithm for solving this problem is useful in many contexts, including handling EQUIVALENCE and COMMON statements in FORTRAN 3, 6, finding minimum span ning trees 9, computing dominators in directed graphs 14, checking flow graphs for reducibility 13, calculating depths in trees 2, computing least common ancestors in trees 2, and solving an effiine minimum problem 7. Several algorithms have been developed 3, 57, 10, 12, notably a very complicated one by Hopcroft and Ullman 7. It is an extension of an idea by Stearns and Rosenkrantz 12 and has an 0(m log n) worstcase running time, where ) times log n = mini log log ... log (n) 1. All other known algorithms are slower, except for the very simple one we analyze here, which has been previously considered in 5, 7, 11. Each set is represented as a tree Each vertex in the tree represents an element in the Copyright © 1975, Association for Computing Machinery, Inc General permission to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privi leges were granted by permmsion of the Association for Computing Machinery Thin work was partially supported by the NSF, Contract No NSFGJ35604X, and by a Miller Re search Fellowship at the University of California, Berkeley, and by the Office of Naval Research, Contract NR 044402, Stanford University Author's address' Department of Electrical Engineering and Computer Sciences, Computer Science Division, Umversty of California, Berkeley, CA 94720 1 For the purposes of this paper, a tree T is a directed graph with a umque vertex s, the root of T, such that (i) no edge (s, v) exists in T, 0i) ff v s, there is a unique edge ( w) in T, and (id) there are no cycles in T If (v, w) is an edge of T (denoted by v w), w as called the father of v (denoted by w = f(v)) and v is called a son of w If there is a path from v to w an T (denoted by v w), then Journal of the Associatmn for Computing Machinery, Vol 22, No 2, Apml 1975, pp 215225 Ackermann function Ackermann function. A computable function that is not primitive recursive. n+1B7 m=0 A(m,n)= A(m 1,1)B7 m 0M/ n=0 A(m 1,A(m,n 1))B7 m 0M/ n 0 Note. There are many inequivalent definitions. 44Ackermann function Ackermann function. A computable function that is not primitive recursive. n+1B7 m=0 A(m,n)= A(m 1,1)B7 m 0M/ n=0 A(m 1,A(m,n 1))B7 m 0M/ n 0 Inverse Ackermann function. (m,n)=mini 1:A(i,m/n) log n 2 “ I am not smart enough to understand this easily. ” — Raymond Seidel 45Inverse Ackermann function Definition. 1B7 n=1 (n)= n/2B7 k=1 k 1+ ( (n))Qi2`rBb2 k k 1 Ex. α (n) = ⎡ n / 2⎤. 1 α (n) = ⎡ lg n⎤ = of times we divide n by two, until we reach 1. 2 α (n) = lg n = of times we apply the lg function to n, until we reach 1. 3 α (n) = of times we apply the iterated lg function to n, until we reach 1. 4 2 X X X 2 2 2 65536 = 2 65536iBK2b 16 65536 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 2 … 2 … 2 ↑ 65536 15 65535 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 … 2 … 2 … huge α (n) 1 1 1 2 2 3 3 3 3 4 4 4 4 4 4 4 4 … 16 … 65536 … α (n) 2 ↑ 65535 2 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 … 4 … 5 … 65536 α (n) 3 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 … 3 … 3 … 4 α (n) 4 46Inverse Ackermann function Definition. 1B7 n=1 (n)= n/2B7 k=1 k 1+ ( (n))Qi2`rBb2 k k 1 Property. For every n ≥ 5, the sequence α (n), α (n), α (n), … converges to 3. 1 2 3 35163 Ex. n = 9876 α (n) ≥ 10 , α (n) = 116812, α (n) = 6, α (n) = 4, α (n) = 3. 1 2 3 4 5 Oneparameter inverse Ackermann. α(n) = min k : α (n) ≤ 3 . k Ex. α(9876) = 5. Twoparameter inverse Ackermann. α(m, n) = min k : α (n) ≤ 3 + m / n . k 47A matching lower bound Theorem. FredmanSaks 1989 Any CPROBE(log n) algorithm for the disjoint set union problem requires Ω(m α(m, n)) time to perform an intermixed sequence of m ≥ n FIND and n – 1 UNION operations in the worst case. Cell probe model. Yao 1981 Count only number of words of memory accessed; all other operations are free. The Cell Probe Complexity of Dynamic Data Structures Michael L. Fredmarl ’ Michael E. Sak. 2 Bellcore and U.C. San Diego, U.C. San Diego Bellcore and Rutgers University register size from logn to polylog(n) only reduces the time 1. Summary of Results complexity by a constant factor. On the other hand, Dynamic data stNcture problems involve the representation of decreasing the register size from logn to 1 increases time data in memory in such a way as to permit certain types of complexity by a logn factor for one of the problems we modifications of the data (updates) and certain types of questions consider and only a loglogn factor for some other about the data (queries). This paradigm encompasses many problems. fimdamental problems in computer science. The first two specific data structure problems for which we The purpose of this paper is to prove new lower and upper obtain bounds are: bounds on the tie per operation to implement solutions to some List Representation. This problem concerns the represention of familiar dynamic data structure problems including list an ordered list of at most n (not necessarily distinct) elements representation, subset ranking, partial sums, and the set union from the universe U = (1, 2 ,..., n ). The operations to be problem . The main features of our lower bounds are: supported are report(k). which returns the k” element of the list, They hold in the cell probe model of computation (A. Yao insert(k, u) which inserts element u into the list between the (1) 18) in which the time complexity of a sequential elements in positions k 1 and k, delete(k), which deletes the k’” computation is defined to be the number of words of item. memory that are accessed. (The number of bits b in a 48 Subset Rank. This problem concerns the representation of a single word of memory is a parameter of the model). All subset S of CJ = 1, 2 ,..., n . The operations that must be other computations are free. This model is at least as supported are the updates “insert item j into the set” and powerful as a random access machine and allows for “delete item j from the set” and the queries rank(j), which unusual representation of data, indirect addressing etc. This returns the number of elements in S that are less than or equal contrasts with most previous lower bounds which are to j . proved in models (e.g., algebraic, comparison, pointer manipulation) which require restrictions on the way data is The natural word size for these problems is b = logn, which represented and manipulated. allows an item of Cl or an index into the list to be stored in one register. One simple solution to the list representation problem is The lower bound method presented here can be used to (2) to maintain a vector v, whose k’” entry contains the kih item of derive amortized complexities, worst cast per operation the list. The report operation can bc done in constant time, but complexities, and randomized complcxitics. the insert and delete operations may take time linear in the length The results occasionally provide (nearly tight) tradeoffs of the list. Alternatively, one could store the items of the list with (3) each element having a pointer to its predecessor and successor in between the number R of words of memory that are read per operation, the number W of memory words rewritten the list. This allows for constant time updates (given a pointer to per operation and the size b of each word. For the the appropriate location), but requires linear cost for queries. problems considered here thcrc is a parameter n that This problem can be solved much more efticiently by use of represents the size of the data set being manipulated and for balanced trees (such as AVL trees). When b = logn, the worst these problems b = logn is a natural register size to case cost per operation using AVL trees is O(logn). If instead consider. By letting b vary, our results illustrate the effect b = 1, so that each bit access costs 1, then the AVL tree solution of register size on time complexity. For instance, one requires 0 (log2n ) per operation. consequence of the resuhs is that for some of the problems considered here, increasing the It is not hard to find similar upper bounds for the subset rank problem (the algorithms for this problem are actually simpler than 1 Supported in parr by NSF ant DCR85042S AVL trees). I,.: .‘,i z.2 Force Office of 2 Sum,rIci ” CL 5.” 4 SF r;t I)‘.S57 The question is: are these upper bounds best possible Our Scientific Resech grylt AFOSRoZi t results show that the upper bounds for the case of logn bit registers are within a loglogn factor of optimal. On the other Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct hand, somewhat surprisingly, for the case of single bit registers commercial advantage, the ACM copyright notice and the title of there are implementations for both of these problems that run in the publication and its date appear, and notice is given that copying time significantly faster than 0 (log2n) per operation. is by permission of the Association for Computing Machinery. To Let CPROBE(b) denote the cell probe computational model copy otherwise, or to republish, requires a fee and/or specific with register size b . permission. 0 1989 ACM O897913078/89/0005/0345 1.50 345 Path compaction variants Path splitting. Make every node on path point to its grandparent. before path splitting r x 5 x 4 T 5 x 3 T 4 x 2 T 3 after path splitting x 1 T 2 r T 1 x x 5 4 x x 3 2 T T 4 5 x 1 T T 2 3 T 1 49Path compaction variants Path halving. Make every other node on path point to its grandparent. before path halving r x 5 x 4 T 5 after path x 3 halving T 4 x 2 r T 3 x 1 T x 2 5 T 1 T 5 x x 3 4 T T 4 3 x x 2 1 T T 1 2 50Linking variants Linkbysize. Number of nodes in tree. Linkbyrank. Rank of tree. Linkbyrandom. Label each element with a random real number between 0.0 and 1.0. Link root with smaller label into root with larger label. 51Disjoint set union algorithms Theorem. Tarjanvan Leeuwen 1984 Linkby size, rank combined with path compression, path splitting, path halving performs any intermixed sequence of m ≥ n find and n – 1 union operations in O(m α(m, n)) time. WorstCase Analysis of Set Union Algorithms ROBERT E. TAR JAN ATT Bell Laboratories, Murray Hdl, New Jersey AND JAN VAN LEEUWEN Universtty of Utrecht. Utrecht. The Netherlands Abstract. This paper analyzes the asymptotic worstcase running time of a number of variants of the wellknown method of path compression for maintaining a collection of disjoint sets under union. We show that two onepass methods proposed by van Leeuwen and van der Weide are asymptotically optimal, whereas several other methods, including one proposed by Rein and advocated by Dijkstra, are slower than the best methods. Categories and Subject Descriptors: E. 1 Data Structures: Trees; F2.2 Analysis of Algorithms and Problem Complexity: Nonnumerical Algorithms and Problemscomputations on discrete structures; G2.1 Discrete Mathematics: Combinatoriescombinatorial algorithms; (32.2 Disertqe Mathemat ics: Graph Theorygraph algortthms General Terms: Algorithms, Theory 52 Additional Key Words and Phrases: Equivalence algorithm, set union, inverse Aekermann's function 1. Introduction A wellknown problem in data structures is the set union problem, defined as follows: Carry out a sequence of intermixed operations of the following three kinds on labeled sets: make set(e, l): Create a new set with label l containing the single element e. This operation requires that e initially be in no set. find label(e): Return the label of the set containing element e. unite(e, f): Combine the sets containing elements e and finto a single set, whose label is the label of the old set containing element e. This operation requires that elements e andfinitially be in different sets. Because of the constraint on make set, the sets existing at any time are disjoint and define a partition of the dements into equivalence classes. For this reason the set union problem has been called the equivalence problem by some authors. A solution to the set union problem can be used in the compiling of FORTRAN Authors addre ,sses: R. E. Tarjan, ATT Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974; J. van Leeuwen, Department of Computer Science, University of Utrecht, Utrecht, The Netherlands. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1984 ACM 00045411/84/04000245 00.75 Journal of the Assoaatton for Computmg Machinery, Vot. 31, No. 2, April 1984, pp 245281.
Website URL
Comment