Question? Leave a message!




MINIMUM SPANNING TREES

MINIMUM SPANNING TREES
ROBERT SEDGEWICK KEVIN WAYNE Algorithms 4.3 MINIMUM SPANNING TREES introduction ‣ greedy algorithm ‣ edgeweighted graph API ‣ Algorithms FOUR TH EDITION Kruskal's algorithm ‣ Prim's algorithm ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu context ‣4.3 MINIMUM SPANNING TREES introduction ‣ greedy algorithm ‣ edgeweighted graph API ‣ Algorithms Kruskal's algorithm ‣ Prim's algorithm ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu context ‣Minimum spanning tree Def. A spanning tree of G is a subgraph T that is: Connected. Acyclic. Includes all of the vertices. graph G 3Minimum spanning tree Def. A spanning tree of G is a subgraph T that is: Connected. Acyclic. Includes all of the vertices. not connected 4Minimum spanning tree Def. A spanning tree of G is a subgraph T that is: Connected. Acyclic. Includes all of the vertices. not acyclic 5Minimum spanning tree Def. A spanning tree of G is a subgraph T that is: Connected. Acyclic. Includes all of the vertices. not spanning 6Minimum spanning tree Given. Undirected graph G with positive edge weights (connected). Goal. Find a min weight spanning tree. 24 4 23 9 6 18 5 11 16 8 7 10 14 21 edgeweighted graph G 7Minimum spanning tree Given. Undirected graph G with positive edge weights (connected). Goal. Find a min weight spanning tree. 24 4 23 9 6 18 5 11 16 8 7 10 14 21 minimum spanning tree T (cost = 50 = 4 + 6 + 8 + 5 + 11 + 9 + 7) Brute force. Try all spanning trees 8Network design MST of bicycle routes in North Seattle http://www.flickr.com/photos/ewedistrict/21980840 9Models of nature MST of random graph http://algo.inria.fr/broutin/gallery.html 10Medical image processing MST describes arrangement of nuclei in the epithelium for cancer research http://www.bccrc.ca/ci/ta01archlevel.html 11Medical image processing MST dithering http://www.flickr.com/photos/quasimondo/2695389651 12Applications MST is fundamental problem with diverse applications. Dithering. Cluster analysis. Max bottleneck paths. Realtime face verification. LDPC codes for error correction. Image registration with Renyi entropy. Find road networks in satellite and aerial imagery. Reducing data storage in sequencing amino acids in a protein. Model locality of particle interactions in turbulent fluid flows. Autoconfig protocol for Ethernet bridging to avoid cycles in a network. Approximation algorithms for NPhard problems (e.g., TSP, Steiner tree). Network design (communication, electrical, hydraulic, computer, road). http://www.ics.uci.edu/eppstein/gina/mst.html 134.3 MINIMUM SPANNING TREES introduction ‣ greedy algorithm ‣ edgeweighted graph API ‣ Algorithms Kruskal's algorithm ‣ Prim's algorithm ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu context ‣Simplifying assumptions Graph is connected. Edge weights are distinct. Consequence. MST exists and is unique. 5 7 2 10 no two edge 4 6 weights are equal 13 1 9 3 8 12 14 16 20 15Cut property Def. A cut in a graph is a partition of its vertices into two (nonempty) sets. Def. A crossing edge connects a vertex in one set with a vertex in the other. Cut property. Given any cut, the crossing edge of min weight is in the MST. crossing edge separating gray and white vertices minimumweight crossing edge must be in the MST 16Cut property: correctness proof Def. A cut in a graph is a partition of its vertices into two (nonempty) sets. Def. A crossing edge connects a vertex in one set with a vertex in the other. Cut property. Given any cut, the crossing edge of min weight is in the MST. Pf. Suppose minweight crossing edge e is not in the MST. Adding e to the MST creates a cycle. Some other edge f in cycle must be a crossing edge. Removing f and adding e is also a spanning tree. Since weight of e is less than the weight of f, f that spanning tree is lower weight. Contradiction. ▪ e the MST does not contain e adding e to MST creates a cycle 17Greedy MST algorithm demo Start with all edges colored gray. Find cut with no black crossing edges; color its minweight edge black. Repeat until V 1 edges are colored black. 07 0.16 23 0.17 1 17 0.19 3 02 0.26 5 57 0.28 7 13 0.29 2 15 0.32 27 0.34 0 45 0.35 12 0.36 6 4 47 0.37 04 0.38 62 0.40 an edgeweighted graph 36 0.52 60 0.58 64 0.93 18Greedy MST algorithm demo Start with all edges colored gray. Find cut with no black crossing edges; color its minweight edge black. Repeat until V 1 edges are colored black. 1 3 5 7 2 0 6 4 MST edges 02 57 62 07 23 17 45 19Greedy MST algorithm: correctness proof Proposition. The greedy algorithm computes the MST. Pf. Any edge colored black is in the MST (via cut property). Fewer than V 1 black edges ⇒ cut with no black crossing edges. (consider cut whose vertices are any one connected component) a cut with no black crossing edges fewer than V1 edges colored black fewer than V1 edges colored black fewer than V1 edges colored black a cut with no black crossing edges a cut with no black crossing edges 20Greedy MST algorithm: efficient implementations Proposition. The greedy algorithm computes the MST. Efficient implementations. Choose cut Find minweight edge Ex 1. Kruskal's algorithm. stay tuned Ex 2. Prim's algorithm. stay tuned Ex 3. Borüvka's algorithm. 21no MST if graph is not connected 4 5 0.61 4 6 0.62 5 6 0.88 1 5 0.11 2 3 0.35 no MST if graph is not connected 0 3 0.6 4 5 0.61 4 6 0.62 1 6 0.10 5 6 0.88 0 2 0.22 can independently compute 1 5 0.11 MSTs of components 2 3 0.35 0 3 0.6 1 6 0.10 weights need not be 0 2 0.22 proportional to distance can independently compute MSTs of components 4 6 0.62 5 6 0.88 1 5 0.02 weights need not be 0 4 0.64 proportional to distance 1 6 0.90 4 6 0.62 0 2 0.22 5 6 0.88 1 2 0.50 1 5 0.02 1 3 0.97 2 6 0.17 0 4 0.64 1 6 0.90 weights can be 0 or negative 0 2 0.22 4 6 0.62 1 2 0.50 5 6 0.88 1 3 0.97 1 5 0.02 2 6 0.17 0 4 0.99 weights can be 0 or negative 1 6 0 4 6 0.62 0 2 0.22 5 6 0.88 1 2 0.50 Removing two simplifying assumptions 1 5 0.02 1 3 0.97 0 4 0.99 2 6 0.17 1 6 0 MST may not be unique 0 2 0.22 Q. What if edge weights are not all distinct when weights have equal values 1 2 0.50 1 3 0.97 A. Greedy MST algorithm still correct if equal weights are present 1 2 1.00 2 6 0.17 1 3 0.50 (our correctness proof fails, but that can be fixed) 2 4 1.00 MST may not be unique 3 4 0.50 when weights have equal values 1 2 1.00 1 2 1.00 1 3 0.50 1 3 0.50 2 4 1.00 2 4 1.00 3 4 0.50 3 4 0.50 1 2 1.00 1 3 0.50 Various MST anomalies 2 4 1.00 3 4 0.50 Q. What if graph is not connected A. Compute minimum spanning forest = MST of each component. Various MST anomalies no MST if graph is not connected 4 5 0.61 4 6 0.62 5 6 0.88 1 5 0.11 2 3 0.35 0 3 0.6 1 6 0.10 0 2 0.22 can independently compute MSTs of components 22 weights need not be proportional to distance 4 6 0.62 5 6 0.88 1 5 0.02 0 4 0.64 1 6 0.90 0 2 0.22 1 2 0.50 1 3 0.97 2 6 0.17 weights can be 0 or negative 4 6 0.62 5 6 0.88 1 5 0.02 0 4 0.99 1 6 0 0 2 0.22 1 2 0.50 1 3 0.97 2 6 0.17 MST may not be unique when weights have equal values 1 2 1.00 1 3 0.50 2 4 1.00 3 4 0.50 1 2 1.00 1 3 0.50 2 4 1.00 3 4 0.50 Various MST anomaliesGreed is good Gordon Gecko (Michael Douglas) address to Teldar Paper Stockholders in Wall Street (1986) 234.3 MINIMUM SPANNING TREES introduction ‣ greedy algorithm ‣ edgeweighted graph API ‣ Algorithms Kruskal's algorithm ‣ Prim's algorithm ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu context ‣Weighted edge API Edge abstraction needed for weighted edges. public class public class public class Edge Edge Edge implements ComparableEdge implements ComparableEdge implements ComparableEdge Edge(int v, int w, double weight) create a weighted edge vw int either() either endpoint int other(int v) the endpoint that's not v int compareTo(Edge that) compare this edge to that edge double weight() the weight String toString() string representation weight v w Idiom for processing an edge e: int v = e.either(), w = e.other(v); 25Weighted edge: Java implementation public class Edge implements ComparableEdge private final int v, w; private final double weight; public Edge(int v, int w, double weight) constructor this.v = v; this.w = w; this.weight = weight; public int either() either endpoint return v; public int other(int vertex) if (vertex == v) return w; other endpoint else return v; public int compareTo(Edge that) if (this.weight that.weight) return 1; compare edges by weight else if (this.weight that.weight) return +1; else return 0; 26Edgeweighted graph API public class public class EdgeWeightedGraph EdgeWeightedGraph EdgeWeightedGraph(int V) EdgeWeightedGraph(int V) create an empty graph with V vertices EdgeWeightedGraph(In in) EdgeWeightedGraph(In in) create a graph from input stream void addEdge(Edge e) addEdge(Edge e) add weighted edge e to this graph IterableEdge adj(int v) adj(int v) edges incident to v IterableEdge edges() edges() all edges in this graph int V() V() number of vertices int E() E() number of edges String toString() toString() string representation Conventions. Allow selfloops and parallel edges. 27Edgeweighted graph: adjacencylists representation Maintain vertexindexed array of Edge lists. 6 0 .58 0 2 .26 0 4 .38 0 7 .16 tinyEWG.txt Bag objects V 8 E 1 3 .29 1 2 .36 1 7 .19 1 5 .32 16 adj 4 5 0.35 0 4 7 0.37 6 2 .40 2 7 .34 1 2 .36 0 2 .26 2 3 .17 5 7 0.28 1 0 7 0.16 2 1 5 0.32 3 6 .52 1 3 .29 2 3 .17 3 0 4 0.38 4 2 3 0.17 6 4 .93 0 4 .38 4 7 .37 4 5 .35 1 7 0.19 5 0 2 0.26 references to the 6 1 2 0.36 1 5 .32 5 7 .28 4 5 .35 same Edge object 7 1 3 0.29 2 7 0.34 6 4 .93 6 0 .58 3 6 .52 6 2 .40 6 2 0.40 3 6 0.52 6 0 0.58 2 7 .34 1 7 .19 0 7 .16 5 7 .28 5 7 .28 6 4 0.93 Edgeweighted graph representation 28Edgeweighted graph: adjacencylists implementation public class EdgeWeightedGraph private final int V; same as Graph, but adjacency private final BagEdge adj; lists of Edges instead of integers public EdgeWeightedGraph(int V) constructor this.V = V; adj = (BagEdge) new BagV; for (int v = 0; v V; v++) adjv = new BagEdge(); public void addEdge(Edge e) int v = e.either(), w = e.other(v); add edge to both adjv.add(e); adjacency lists adjw.add(e); public IterableEdge adj(int v) return adjv; 29Minimum spanning tree API Q. How to represent the MST public class public class MST MST MST(EdgeWeightedGraph G) MST(EdgeWeightedGraph G) constructor IterableEdge edges() edges() edges in MST double weight() weight() weight of MST tinyEWG.txt V 8 E 16 java MST tinyEWG.txt 4 5 0.35 MST edge 07 0.16 4 7 0.37 (black) 5 7 0.28 17 0.19 0 7 0.16 02 0.26 1 5 0.32 23 0.17 0 4 0.38 2 3 0.17 57 0.28 1 7 0.19 45 0.35 0 2 0.26 1 2 0.36 62 0.40 1 3 0.29 1.81 2 7 0.34 nonMST edge 6 2 0.40 (gray) 3 6 0.52 6 0 0.58 6 4 0.93 30 An edgeweighted graph and its MSTMinimum spanning tree API Q. How to represent the MST public class public class MST MST MST(EdgeWeightedGraph G) MST(EdgeWeightedGraph G) constructor IterableEdge edges() edges() edges in MST double weight() weight() weight of MST public static void main(String args) java MST tinyEWG.txt 07 0.16 In in = new In(args0); 17 0.19 EdgeWeightedGraph G = new EdgeWeightedGraph(in); 02 0.26 MST mst = new MST(G); 23 0.17 for (Edge e : mst.edges()) 57 0.28 StdOut.println(e); 45 0.35 StdOut.printf(".2f\n", mst.weight()); 62 0.40 1.81 314.3 MINIMUM SPANNING TREES introduction ‣ greedy algorithm ‣ edgeweighted graph API ‣ Algorithms Kruskal's algorithm ‣ Prim's algorithm ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu context ‣Kruskal's algorithm demo Consider edges in ascending order of weight. Add next edge to tree T unless doing so would create a cycle. graph edges sorted by weight 07 0.16 23 0.17 1 17 0.19 3 02 0.26 5 57 0.28 7 13 0.29 2 15 0.32 27 0.34 0 45 0.35 12 0.36 6 4 47 0.37 04 0.38 62 0.40 an edgeweighted graph 36 0.52 60 0.58 64 0.93 33Kruskal's algorithm demo Consider edges in ascending order of weight. Add next edge to tree T unless doing so would create a cycle. 07 0.16 23 0.17 1 17 0.19 3 02 0.26 5 57 0.28 7 13 0.29 2 15 0.32 27 0.34 0 45 0.35 12 0.36 6 4 47 0.37 04 0.38 62 0.40 a minimum spanning tree 36 0.52 60 0.58 64 0.93 34Kruskal's algorithm: visualization 35Kruskal's algorithm: correctness proof Proposition. Kruskal 1956 Kruskal's algorithm computes the MST. Pf. Kruskal's algorithm is a special case of the greedy MST algorithm. Suppose Kruskal's algorithm colors the edge e = v–w black. Cut = set of vertices connected to v in tree T. No crossing edge is black. No crossing edge has lower weight. Why adding edge to tree add edge to tree would create a cycle 36Kruskal's algorithm: implementation challenge Challenge. Would adding edge v–w to tree T create a cycle If not, add it. How difficult E + V run DFS from v, check if w is reachable V (T has at most V – 1 edges) log V use the unionfind data structure log V 1 adding edge to tree add edge to tree would create a cycle 37Kruskal's algorithm: implementation challenge Challenge. Would adding edge v–w to tree T create a cycle If not, add it. Efficient solution. Use the unionfind data structure. Maintain a set for each connected component in T. If v and w are in same set, then adding v–w would create a cycle. To add v–w to T, merge sets containing v and w. w v w v Case 1: adding v–w creates a cycle Case 2: add v–w to T and merge sets containing v and w 38Kruskal's algorithm: Java implementation public class KruskalMST private QueueEdge mst = new QueueEdge(); public KruskalMST(EdgeWeightedGraph G) build priority queue (or sort) MinPQEdge pq = new MinPQEdge(G.edges()); UF uf = new UF(G.V()); while (pq.isEmpty() mst.size() G.V()1) Edge e = pq.delMin(); greedily add edges to MST int v = e.either(), w = e.other(v); if (uf.connected(v, w)) edge v–w does not create cycle uf.union(v, w); merge sets mst.enqueue(e); add edge to MST public IterableEdge edges() return mst; 39Kruskal's algorithm: running time Proposition. Kruskal's algorithm computes MST in time proportional to E log E (in the worst case). operation frequency time per op Pf. build pq 1 E deletemin E log E † union V log V † connected E log V † amortized bound using weighted quick union with path compression recall: log V ≤ 5 in this universe Remark. If edges are already sorted, order of growth is E log V. 404.3 MINIMUM SPANNING TREES introduction ‣ greedy algorithm ‣ edgeweighted graph API ‣ Algorithms Kruskal's algorithm ‣ Prim's algorithm ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu context ‣Prim's algorithm demo Start with vertex 0 and greedily grow tree T. Add to T the min weight edge with exactly one endpoint in T. Repeat until V 1 edges. 07 0.16 23 0.17 1 17 0.19 3 02 0.26 5 57 0.28 13 0.29 7 2 15 0.32 27 0.34 0 45 0.35 12 0.36 47 0.37 6 4 04 0.38 62 0.40 36 0.52 an edgeweighted graph 60 0.58 64 0.93 42Prim's algorithm demo Start with vertex 0 and greedily grow tree T. Add to T the min weight edge with exactly one endpoint in T. Repeat until V 1 edges. 1 3 5 7 2 0 6 4 MST edges 07 17 02 23 57 45 62 43Prim’s algorithm: visualization 44Prim's algorithm: proof of correctness Proposition. Jarník 1930, Dijkstra 1957, Prim 1959 Prim's algorithm computes the MST. Pf. Prim's algorithm is a special case of the greedy MST algorithm. Suppose edge e = min weight edge connecting a vertex on the tree to a vertex not on the tree. Cut = set of vertices connected on tree. No crossing edge is black. No crossing edge has lower weight. edge e = 75 added to tree 45Prim's algorithm: implementation challenge Challenge. Find the min weight edge with exactly one endpoint in T. How difficult try all edges E V use a priority queue log E log E l 17 is min weight edge with exactly one endpoint in T priority queue of crossing edges 17 0.19 02 0.26 57 0.28 27 0.34 47 0.37 04 0.38 60 0.58 46Prim's algorithm: lazy implementation Challenge. Find the min weight edge with exactly one endpoint in T. Lazy solution. Maintain a PQ of edges with (at least) one endpoint in T. Key = edge; priority = weight of edge. Deletemin to determine next edge e = v–w to add to T. Disregard if both endpoints v and w are marked (both in T). Otherwise, let w be the unmarked vertex (not in T ): – add to PQ any edge incident to w (assuming other endpoint not in T) – add e to T and mark w 17 is min weight edge with exactly one endpoint in T priority queue of crossing edges 17 0.19 02 0.26 57 0.28 27 0.34 47 0.37 04 0.38 60 0.58 47Prim's algorithm (lazy) demo Start with vertex 0 and greedily grow tree T. Add to T the min weight edge with exactly one endpoint in T. Repeat until V 1 edges. 07 0.16 23 0.17 1 17 0.19 3 02 0.26 5 57 0.28 7 13 0.29 2 15 0.32 27 0.34 0 45 0.35 12 0.36 6 4 47 0.37 04 0.38 62 0.40 an edgeweighted graph 36 0.52 60 0.58 64 0.93 48Prim's algorithm (lazy) demo Start with vertex 0 and greedily grow tree T. Add to T the min weight edge with exactly one endpoint in T. Repeat until V 1 edges. 1 3 5 7 2 0 6 4 MST edges 07 17 02 23 57 45 62 49Prim's algorithm: lazy implementation public class LazyPrimMST private boolean marked; // MST vertices private QueueEdge mst; // MST edges private MinPQEdge pq; // PQ of edges public LazyPrimMST(WeightedGraph G) pq = new MinPQEdge(); mst = new QueueEdge(); marked = new booleanG.V(); assume G is connected visit(G, 0); while (pq.isEmpty() mst.size() G.V() 1) repeatedly delete the min weight edge e = v–w from PQ Edge e = pq.delMin(); int v = e.either(), w = e.other(v); ignore if both endpoints in T if (markedv markedw) continue; add edge e to tree mst.enqueue(e); add v or w to tree if (markedv) visit(G, v); if (markedw) visit(G, w); 50Prim's algorithm: lazy implementation private void visit(WeightedGraph G, int v) add v to T markedv = true; for (Edge e : G.adj(v)) if (markede.other(v)) for each edge e = v–w, add to pq.insert(e); PQ if w not already in T public IterableEdge mst() return mst; 51Lazy Prim's algorithm: running time Proposition. Lazy Prim's algorithm computes the MST in time proportional to E log E and extra space proportional to E (in the worst case). Pf. operation frequency binary heap delete min E log E insert E log E 52Prim's algorithm: eager implementation Challenge. Find min weight edge with exactly one endpoint in T. pq has at most one entry per vertex Eager solution. Maintain a PQ of vertices connected by an edge to T, where priority of vertex v = weight of shortest edge connecting v to T. Delete min vertex v and add its associated edge e = v–w to T. Update PQ by considering all edges e = v–x incident to v – ignore if x is already in T – add x to PQ if not already on it – decrease priority of x if v–x becomes shortest edge connecting x to T 0 1 17 0.19 2 02 0.26 red: on PQ 3 13 0.29 4 04 0.38 5 57 0.28 6 60 0.58 7 07 0.16 black: on MST 53Prim's algorithm (eager) demo Start with vertex 0 and greedily grow tree T. Add to T the min weight edge with exactly one endpoint in T. Repeat until V 1 edges. 07 0.16 23 0.17 1 17 0.19 3 02 0.26 5 57 0.28 7 13 0.29 2 15 0.32 27 0.34 0 45 0.35 12 0.36 6 4 47 0.37 04 0.38 62 0.40 an edgeweighted graph 36 0.52 60 0.58 64 0.93 54Prim's algorithm (eager) demo Start with vertex 0 and greedily grow tree T. Add to T the min weight edge with exactly one endpoint in T. Repeat until V 1 edges. v edgeTo distTo 0 1 7 0–7 0.16 3 1 1–7 0.19 5 2 0–2 0.26 7 3 2–3 0.17 2 5 5–7 0.28 0 4 4–5 0.35 6 6–2 0.40 6 4 MST edges 07 17 02 23 57 45 62 55Indexed priority queue Associate an index between 0 and N 1 with each key in a priority queue. Supports insert and deletetheminimum. Supports decreasekey given the index of the key. public class public class public class IndexMinPQ IndexMinPQ IndexMinPQKey extends ComparableKey Key extends ComparableKey Key extends ComparableKey create indexed priority queue IndexMinPQ(int N) with indices 0, 1, …, N – 1 void insert(int i, Key key) associate key with index i void decreaseKey(int i, Key key) decrease the key associated with index i boolean contains(int i) is i an index on the priority queue remove a minimal key and return its int delMin() associated index boolean isEmpty() is the priority queue empty int size() number of keys in the priority queue 56Indexed priority queue implementation Binary heap implementation. see Section 2.4 of textbook Start with same code as MinPQ. Maintain parallel arrays keys, pq, and qp so that: – keysi is the priority of i – pqi is the index of the key in heap position i – qpi is the heap position of the key with index i Use swim(qpi) to implement decreaseKey(i, key). i 0 1 2 3 4 5 6 7 8 keysi A S O R T I N G pqi 0 6 7 2 1 5 4 3 qpi 1 5 4 8 7 6 2 3 1 A 2 3 N G 6 7 5 4 O S I T 8 R 57Prim's algorithm: which priority queue Depends on PQ implementation: V insert, V deletemin, E decreasekey. PQ implementation insert deletemin decreasekey total 2 unordered array 1 V 1 V binary heap log V log V log V E log V dway heap log V d log V log V E log V d d d E/V † † † Fibonacci heap 1 log V 1 E + V log V † amortized Bottom line. Array implementation optimal for dense graphs. Binary heap much faster for sparse graphs. 4way heap worth the trouble in performancecritical situations. Fibonacci heap best in theory, but not worth implementing. 584.3 MINIMUM SPANNING TREES introduction ‣ greedy algorithm ‣ edgeweighted graph API ‣ Algorithms Kruskal's algorithm ‣ Prim's algorithm ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu context ‣Does a lineartime MST algorithm exist deterministic comparebased MST algorithms year worst case discovered by 1975 Yao E log log V 1976 CheritonTarjan E log log V 1984 FredmanTarjan E log V, E + V log V 1986 GabowGalilSpencerTarjan E log (log V) 1997 Chazelle E α(V) log α(V) 2000 Chazelle E α(V) 2002 PettieRamachandran optimal 20xx E Remark. Lineartime randomized MST algorithm (KargerKleinTarjan 1995). 60Euclidean MST Given N points in the plane, find MST connecting them, where the distances between point pairs are their Euclidean distances. 2 Brute force. Compute N / 2 distances and run Prim's algorithm. Ingenuity. Exploit geometry and do it in c N log N. 61Scientific application: clustering kclustering. Divide a set of objects classify into k coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Goal. Divide into clusters so that objects in different clusters are far apart. outbreak of cholera deaths in London in 1850s (Nina Mishra) Applications. Routing in mobile ad hoc networks. Document categorization for web search. Similarity searching in medical image databases. 9 Skycat: cluster 10 sky objects into stars, quasars, galaxies. 62Singlelink clustering kclustering. Divide a set of objects classify into k coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Single link. Distance between two clusters equals the distance between the two closest objects (one in each cluster). Singlelink clustering. Given an integer k, find a kclustering that maximizes the distance between two closest clusters. distance between two clusters distance between two closest clusters 4clustering 63Singlelink clustering algorithm “Wellknown” algorithm in science literature for singlelink clustering: Form V clusters of one object each. Find the closest pair of objects such that each object is in a different cluster, and merge the two clusters. Repeat until there are exactly k clusters. Observation. This is Kruskal's algorithm. (stopping when k connected components) Alternate solution. Run Prim; then delete k – 1 max weight edges. 64Dendrogram of cancers in human Tumors in similar tissues cluster together. gene 1 gene n gene expressed Reference: Botstein Brown group gene not expressed 65
Website URL
Comment
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.BenjaminClark
User Type:
Teacher
Country:
United States
Uploaded Date:
21-07-2017