Question? Leave a message!




Sorting and Searching

Sorting and Searching
4.2 Sorting and Searching Introduction to Programming in Java: An Interdisciplinary Approach · Robert Sedgewick and Kevin Wayne · Copyright © 2002–2010 · 6/26/10 8:09 AM Sequential Search Sequential search. Scan through array, looking for key.   Search hit: return array index.   Search miss: return 1. public static int search(String key, String a) int N = a.length; for (int i = 0; i a.length; i++) if (ai.compareTo(key) == 0) return i; return 1; 2 Search Client: Exception Filter Exception filter. Read a sorted list of strings from a whitelist file, then print out all strings from standard input not in the whitelist. public static void main(String args) In in = new In(args0); String s = in.readAll(); String words = s.split("\\s+"); while (StdIn.isEmpty()) String key = StdIn.readString(); if (search(key, words) == 1) StdOut.println(key); 3 Searching Challenge 1 Q. A credit card company needs to whitelist 10 million customer account numbers, processing 1,000 transactions per second. Using sequential search, what kind of computer is needed A.  Toaster B.  Cellphone C.  Your laptop D.  Supercomputer E.  Google server farm 4 Binary Search 5 Twenty Questions Intuition. Find a hidden integer. 6 Binary Search Main idea.   Sort the array (stay tuned).   Play "20 questions" to determine index with a given key. Ex. Dictionary, phone book, book index, credit card numbers, … Binary search.   Examine the middle key.   If it matches, return its index.   Otherwise, search either the left or right half. 7 Binary Search: Java Implementation Invariant. Algorithm maintains alo ≤ key ≤ ahi1. public static int search(String key, String a) return search(key, a, 0, a.length); public static int search(String key, String a, int lo, int hi) if (hi = lo) return 1; int mid = lo + (hi lo) / 2; int cmp = amid.compareTo(key); if (cmp 0) return search(key, a, lo, mid); else if (cmp 0) return search(key, a, mid+1, hi); else return mid; Java library implementation: Arrays.binarySearch() 8 Binary Search: Mathematical Analysis Analysis. To binary search in an array of size N: do one compare, then binary search in an array of size N / 2. N → N / 2 → N / 4 → N / 8 → … → 1 Q. How many times can you divide a number by 2 until you reach 1 A. log N. 2 1  2 → 1 4 → 2 → 1 8 → 4 → 2 → 1 16 → 8 → 4 → 2 → 1 32 → 16 → 8 → 4 → 2 → 1 64 → 32 → 16 → 8 → 4 → 2 → 1 128 → 64 → 32 → 16 → 8 → 4 → 2 → 1 256 → 128 → 64 → 32 → 16 → 8 → 4 → 2 → 1 512 → 256 → 128 → 64 → 32 → 16 → 8 → 4 → 2 → 1 1024 → 512 → 256 → 128 → 64 → 32 → 16 → 8 → 4 → 2 → 1 9 Searching Challenge 2 Q. A credit card company needs to whitelist 10 million customer account numbers, processing 1,000 transactions per second. Using binary search, what kind of computer is needed A.  Toaster B.  Cell phone C.  Your laptop D.  Supercomputer E.  Google server farm 10 Sorting Sorting Sorting problem. Rearrange N items in ascending order. Applications. Statistics, databases, data compression, bioinformatics, computer graphics, scientific computing, (too numerous to list), ... Hauser Hanley Hong Haskell Hsu Hauser Hayes Hayes Haskell Hong Hanley Hornet Hornet Hsu 12 Insertion Sort 13 Insertion Sort Insertion sort.   Bruteforce sorting solution.   Move lefttoright through array.   Exchange next element with larger elements to its left, onebyone. 14 Insertion Sort Insertion sort.   Bruteforce sorting solution.   Move lefttoright through array.   Exchange next element with larger elements to its left, onebyone. 15 Insertion Sort: Java Implementation public class Insertion public static void sort(String a) int N = a.length; for (int i = 1; i N; i++) for (int j = i; j 0; j) if (aj1.compareTo(aj) 0) exch(a, j1, j); else break; private static void exch(String a, int i, int j) String swap = ai; ai = aj; aj = swap; 16 Insertion Sort: Empirical Analysis Observation. Number of compares depends on input family. 2   Descending: N / 2. 2   Random: N / 4.   Ascending: N. 10000000 Descendng Random 100000 Ascending 1000 10 0.1 0.001 1000 10000 100000 1000000 Input Size 17 Comparsions (millions)Insertion Sort: Mathematical Analysis Worst case. descending   Iteration i requires i comparisons. 2   Total = (0 + 1 + 2 + ... + N1) N / 2 compares. E F G H I J D C B A i Average case. random   Iteration i requires i / 2 comparisons on average. 2   Total = (0 + 1 + 2 + ... + N1) / 2 N / 4 compares A C D F H J E B I G i 18 Sorting Challenge 1 Q. A credit card company sorts 10 million customer account numbers, for use with binary search. Using insertion sort, what kind of computer is needed A.  Toaster B.  Cell phone C.  Your laptop D.  Supercomputer E.  Google server farm 19 Insertion Sort: Lesson Lesson. Supercomputer can't rescue a bad algorithm. Comparisons Computer Thousand Million Billion Per Second 7 laptop 10 instant 1 day 3 centuries 12 super 10 instant 1 second 2 weeks 20 Moore's Law Moore's law. Transistor density on a chip doubles every 2 years. Variants. Memory, disk space, bandwidth, computing power per . http://en.wikipedia.org/wiki/Moore'slaw 21 Moore's Law and Algorithms Quadratic algorithms do not scale with technology.   New computer may be 10x as fast.   But, has 10x as much memory so problem may be 10x bigger.   With quadratic algorithm, takes 10x as long “Software inefficiency can always outpace Moore's Law. Moore's Law isn't a match for our bad coding.” – Jaron Lanier Lesson. Need linear (or linearithmic) algorithm to keep pace with Moore's law. 22 Mergesort 23 Mergesort Mergesort algorithm.   Divide array into two halves.   Recursively sort each half.   Merge two halves to make sorted whole. 24 Mergesort: Example 25 Merging Merging. Combine two presorted lists into a sorted whole. How to merge efficiently Use an auxiliary array. 26 Merging Merging. Combine two presorted lists into a sorted whole. How to merge efficiently Use an auxiliary array. String aux = new StringN; // merge into auxiliary array int i = lo, j = mid; for (int k = 0; k N; k++) if (i == mid) auxk = aj++; else if (j == hi) auxk = ai++; else if (aj.compareTo(ai) 0) auxk = aj++; else auxk = ai++; // copy back for (int k = 0; k N; k++) alo + k = auxk; 27 Mergesort: Java Implementation public class Merge public static void sort(String a) sort(a, 0, a.length); // Sort alo, hi). public static void sort(String a, int lo, int hi) int N = hi lo; if (N = 1) return; // recursively sort left and right halves int mid = lo + N/2; sort(a, lo, mid); sort(a, mid, hi); // merge sorted halves (see previous slide) lo mid hi 10 11 12 13 14 15 16 17 18 19 28 Mergesort: Mathematical Analysis Analysis. To mergesort array of size N, mergesort two subarrays of size N / 2, and merge them together using ≤ N compares. we assume N is a power of 2 T(N) N 2 (N / 2) T(N / 2) T(N / 2) 4 (N / 4) T(N / 4) T(N / 4) T(N / 4) T(N / 4) log N 2 . k T(N / 2 ) . . N / 2 (2) T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2) N log N 2 29 Mergesort: Mathematical Analysis Mathematical analysis. analysis comparisons worst N log N 2 average N log N 2 best 1/2 N log N 2 Validation. Theory agrees with observations. N actua l predicted 10,000 120 thousand 133 thousand 20 million 460 million 485 million 50 million 1,216 million 1,279 million 30 Sorting Challenge 2 Q. A credit card company sorts 10 million customer account numbers, for use with binary search. Using mergesort, what kind of computer is needed A.  Toaster B.  Cell phone C.  Your laptop D.  Supercomputer E.  Google server farm 31 Sorting Challenge 3 Q. What's the fastest way to sort 1 million 32bit integers 32 Mergesort: Lesson Lesson. Great algorithms can be more powerful than supercomputers. Compares Computer Insertion Mergesort Per Second 7 laptop 10 3 centuries 3 hours 12 super 10 2 weeks instant N = 1 billion 33 Longest Repeated Substring 34 Redundancy Detector Longest repeated substring. Given a string, find the longest substring that appears at least twice. a a c a a g t t t a c a a g c Brute force.   Try all indices i and j for start of possible match.   Compute longest common prefix for each pair (quadratic+). a a c a a g t t t a c a a g c j i Applications. Bioinformatics, data compression, … 35 LRS Application: The Shape of a Song Music is characterized by its repetitive structure. Mary Had a Little Lamb Like a Prayer http://www.bewitched.com 36 Longest Repeated Substring: BruteForce Solution Longest repeated substring. Given a string, find the longest substring that appears at least twice. a a c a a g t t t a c a a g c Brute force.   Try all indices i and j for start of possible match.   Compute longest common prefix (LCP) for each pair. a a c a a g t t t a c a a g c j i Mathematical analysis. 2   All pairs: 0 + 1 + 2 + … + N1 N /2 calls on LCP.   Way too slow for long strings. 37 Longest Repeated Substring: A Sorting Solution form suffixes sort suffixes to bring repeated substrings together compute longest prefix between adjacent suffixes 38 Longest Repeated Substring: Java Implementation Suffix sorting implementation. int N = s.length(); String suffixes = new StringN; for (int i = 0; i N; i++) suffixesi = s.substring(i, N); Arrays.sort(suffixes); Longest common prefix. lcp(s, t)   Longest string that is a prefix of both s and t.   Ex: lcp("acaagtttac", "acaagc") = "acaag". Easy to implement (you could write this one). Longest repeated substring. Search only adjacent suffixes. String lrs = ""; for (int i = 0; i N1; i++) String x = lcp(suffixesi, suffixesi+1); if (x.length() lrs.length()) lrs = x; 39 OOP Context for Strings String representation.   A String is an address and a length. does not copy chars   Characters can be shared among strings.   substring() computes address and length. A0 A1 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE s D0 15 a a c a a g t t t a c a a g c address length s = "aacaagtttacaagc"; t = s.substring(5, 15); B0 B1 t D5 10 Consequences.   substring() is constanttime operation (instead of linear).   Creating suffixes takes linear space (instead of quadratic).   Running time of LRS is dominated by the string sort. 40 Sorting Challenge 4 Q. Four researchers A, B, C, and D are looking for long repeated sequences in a genome with over 1 billion characters. Which one is more likely to find a cure for cancer A.  has a grad student to do it. B.  uses brute force (check all pairs) solution. C.  uses sorting solution with insertion sort. D.  uses sorting solution with mergesort. 41 Longest Repeated Substring: Empirical Analysis Input File Characters Brute Suffix Sort Length LRS.java 2,162 0.6 sec 0.14 sec 73 Amendments 18,369 37 sec 0.25 sec 216 Aesop's Fables 191,945 3958 sec 1.0 sec 58 † Moby Dick 1.2 million 43 hours 7.6 sec 79 † Bible 4.0 million 20 days 34 sec 11 † Chromosome 11 7.1 million 2 months 61 sec 12,567 † Pi 10 million 4 months 84 sec 14 † estimated Lesson. Sorting to the rescue; enables new research. 42 Summary Binary search. Efficient algorithm to search a sorted array. Mergesort. Efficient algorithm to sort an array. Applications. Many many applications are enabled by fast sorting and searching. 43
Website URL
Comment
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.BenjaminClark
User Type:
Teacher
Country:
United States
Uploaded Date:
21-07-2017