String sort algorithm

datatable sort string to int and string sort c++ alphabetical
Dr.BenjaminClark Profile Pic
Dr.BenjaminClark,United States,Teacher
Published Date:21-07-2017
Your Website URL(Optional)
Comment
ROBERT SEDGEWICK KEVIN WAYNE Algorithms 5.1 STRING SORTS strings in Java ‣ key-indexed counting ‣ LSD radix sort ‣ Algorithms FOUR TH EDITION MSD radix sort ‣ 3-way radix quicksort ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu suffix arrays ‣5.1 STRING SORTS strings in Java ‣ key-indexed counting ‣ LSD radix sort ‣ Algorithms MSD radix sort ‣ 3-way radix quicksort ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu suffix arrays ‣String processing String. Sequence of characters. Important fundamental abstraction. Genomic sequences. Information processing. Communication systems (e.g., email). Programming systems (e.g., Java programs). … “ The digital information that underlies biochemistry, cell biology, and development can be represented by a simple string of G's, A's, T's and C's. This string is the root data 0 structure of an organism's biology. ” — M. V. Olson 3The char data type C char data type. Typically an 8-bit integer. Supports 7-bit ASCII. 6.5 Data Compression 667 Can represent at most 256 characters. ASCII encoding. When you HexDump a bit- 0 1 2 3 4 5 6 7 8 9 A B C D E F stream that contains ASCII-encoded charac- NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 0 ters, the table at right is useful for reference. DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 1 Given a 2-digit hex number, use the first hex SP 2 “ % & ‘ ( ) + , - . / digit as a row index and the second hex digit 3 0 1 2 3 4 5 6 7 8 9 : ; = ? as a column reference to find the character 4 A B C D E F G H I J K L M N O that it encodes. For example, 31 encodes the 5 P Q R S T U V W X Y Z \ _ U+0041 U+00E1 U+2202 U+1D50A digit 1, 4A encodes the letter J, and so forth. 6 ` a b c d e f g h i j k l m n o This table is for 7-bit ASCII, so the first hex DEL 7 p q r s t u v w x y z some Unicode characters digit must be 7 or less. Hex numbers starting Hexadecimal to ASCII conversion table with 0 and 1 (and the numbers 20 and 7F) correspond to non-printing control charac- ters. Many of the control characters are left over from the days when physical devices like typewriters were controlled by ASCII input; the table highlights a few that you Java char data type. A 16-bit unsigned integer. might see in dumps. For example SP is the space character, NUL is the null character, LF is line-feed, and CR is carriage-return. Supports original 16-bit Unicode. Supports 21-bit Unicode 3.0 (awkwardly). In summary, working with data compr ession requires us to reorient our thinking about standard input and standard output to include binary encoding of data. BinaryStdIn 4 and BinaryStdOut provide the methods that we need. They provide a way for you to make a clear distinction in your client programs between writing out information in- tended for file storage and data transmission (that will be read by programs) and print- ing information (that is likely to be read by humans). I ♥︎ Unicode U+0041 5The String data type String data type in Java. Immutable sequence of characters. Length. Number of characters. th Indexing. Get the i character. Concatenation. Concatenate one string to the end of another. s.length() 0 1 2 3 4 5 6 7 8 9 10 11 12 s A T T A C K A T D A W N s.charAt(3) s.substring(7, 11) Fundamental constant-time String operations 6The String data type: immutability Q. Why immutable? A. All the usual reasons. Can use as keys in symbol table. Don't need to defensively copy. Ensures consistent state. Supports concurrency. public class FileInputStream Improves security. private String filename; public FileInputStream(String filename) if (allowedToReadFile(filename)) throw new SecurityException(); this.filename = filename; ... attacker could bypass security if string type were mutable 7The String data type: representation Representation (Java 7). Immutable char array + cache of hash. operation Java running time s.length() length 1 s.charAt(i) indexing 1 s + t concatenation M + N ⋮ ⋮ 8String performance trap Q. How to build a long string, one character at a time? public static String reverse(String s) String rev = ""; for (int i = s.length() - 1; i = 0; i) rev += s.charAt(i); quadratic time return rev; A. Use StringBuilder data type (mutable char array). public static String reverse(String s) StringBuilder rev = new StringBuilder(); for (int i = s.length() - 1; i = 0; i) rev.append(s.charAt(i)); linear time return rev.toString(); 9Comparing two strings Q. How many character compares to compare two strings of length W ? p r e f e t c h 0 1 2 3 4 5 6 7 p r e f i x e s Running time. Proportional to length of longest common prefix. Proportional to W in the worst case. But, often sublinear in W. 10Alphabets Digital key. Sequence of digits over fixed alphabet. 604 CHAPTER 6 Strings Radix. Number of digits R in alphabet. name R() lgR() characters BINARY 2 1 01 OCTAL 8 3 01234567 DECIMAL 10 4 0123456789 HEXADECIMAL 16 4 0123456789ABCDEF DNA 4 2 ACTG LOWERCASE 26 5 abcdefghijklmnopqrstuvwxyz UPPERCASE 26 5 ABCDEFGHIJKLMNOPQRSTUVWXYZ PROTEIN 20 5 ACDEFGHIKLMNPQRSTVWY ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef BASE64 64 6 ghijklmnopqrstuvwxyz0123456789+/ ASCII 128 7 ASCII characters EXTENDED_ASCII 256 8 extended ASCII characters UNICODE16 65536 16 Unicode characters Standard alphabets 11 holds the frequencies in Count is an example of a character-indexed array. With a Java String, we have to use an array of size 256; with Alphabet, we just need an array with one entry for each alphabet character. This savings might seem modest, but, as you will see, our algorithms can produce huge numbers of such arrays, and the space for arrays of size 256 can be prohibitive. Numbers. As you can see from our several of the standard Alphabet examples, we of- ten represent numbers as strings. The methods toIndices() coverts any String over a given Alphabet into a base-R number represented as an int array with all values between 0 and R1. In some situations, doing this conversion at the start leads to com- pact code, because any digit can be used as an index in a character-indexed array. For example, if we know that the input consists only of characters from the alphabet, we could replace the inner loop in Count with the more compact code int a = alpha.toIndices(s); for (int i = 0; i N; i++) countai++;5.1 STRING SORTS strings in Java ‣ key-indexed counting ‣ LSD radix sort ‣ Algorithms MSD radix sort ‣ 3-way radix quicksort ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu suffix arrays ‣Review: summary of the performance of sorting algorithms Frequency of operations. algorithm guarantee random extra space stable? operations on keys 2 2 compareTo() ✔ insertion sort ½ N ¼ N 1 compareTo() ✔ mergesort N lg N N lg N N compareTo() quicksort 1.39 N lg N 1.39 N lg N c lg N compareTo() heapsort 2 N lg N 2 N lg N 1 probabilistic Lower bound. N lg N compares required by any compare-based algorithm. Q. Can we do better (despite the lower bound)? use array accesses to make R-way decisions A. Yes, if we don't depend on key compares. (instead of binary decisions) 13Key-indexed counting: assumptions about keys Assumption. Keys are integers between 0 and R - 1. Implication. Can use key as an array index. input sorted result name section (by section) Anderson 2 Harris 1 Applications. Brown 3 Martin 1 Davis 3 Moore 1 Sort string by first letter. Garcia 4 Anderson 2 Harris 1 Martinez 2 Sort class roster by section. Jackson 3 Miller 2 Johnson 4 Robinson 2 Sort phone numbers by area code. Jones 3 White 2 Martin 1 Brown 3 Subroutine in a sorting algorithm. stay tuned Martinez 2 Davis 3 Miller 2 Jackson 3 Moore 1 Jones 3 Remark. Keys may have associated data ⇒ Robinson 2 Taylor 3 Smith 4 Williams 3 can't just count up number of keys of each value. Taylor 3 Garcia 4 Thomas 4 Johnson 4 Thompson 4 Smith 4 White 2 Thomas 4 Williams 3 Thompson 4 Wilson 4 Wilson 4 keys are small integers Typical candidate for key-indexed counting 14Key-indexed counting demo Goal. Sort an array a of N integers between 0 and R - 1. R = 6 Count frequencies of each letter using key as index. Compute frequency cumulates which specify destinations. Access cumulates using key as index to move items. Copy back into original array. i ai 0 d int N = a.length; 1 a int count = new intR+1; use for a 0 2 c b for 1 3 f for (int i = 0; i N; i++) c for 2 4 f countai+1++; d for 3 e for 4 5 b f for 5 for (int r = 0; r R; r++) 6 d countr+1 += countr; 7 b 8 f for (int i = 0; i N; i++) 9 b auxcountai++ = ai; 10 e 11 a for (int i = 0; i N; i++) ai = auxi; 15Key-indexed counting demo Goal. Sort an array a of N integers between 0 and R - 1. Count frequencies of each letter using key as index. Compute frequency cumulates which specify destinations. Access cumulates using key as index to move items. Copy back into original array. offset by 1 i ai stay tuned 0 d int N = a.length; 1 a int count = new intR+1; r countr 2 c 3 f a 0 for (int i = 0; i N; i++) count frequencies 4 f countai+1++; b 2 5 b c 3 for (int r = 0; r R; r++) 6 d d 1 countr+1 += countr; 7 b e 2 8 f f 1 for (int i = 0; i N; i++) 9 b - 3 auxcountai++ = ai; 10 e 11 a for (int i = 0; i N; i++) ai = auxi; 16Key-indexed counting demo Goal. Sort an array a of N integers between 0 and R - 1. Count frequencies of each letter using key as index. Compute frequency cumulates which specify destinations. Access cumulates using key as index to move items. Copy back into original array. i ai 0 d int N = a.length; 1 a int count = new intR+1; r countr 2 c 3 f a 0 for (int i = 0; i N; i++) 4 f countai+1++; b 2 5 b c 5 for (int r = 0; r R; r++) 6 d d 6 compute cumulates countr+1 += countr; 7 b e 8 8 f f 9 for (int i = 0; i N; i++) 9 b - 12 auxcountai++ = ai; 10 e 11 a for (int i = 0; i N; i++) 6 keys d, 8 keys e ai = auxi; so d’s go in a6 and a7 17Key-indexed counting demo Goal. Sort an array a of N integers between 0 and R - 1. Count frequencies of each letter using key as index. Compute frequency cumulates which specify destinations. Access cumulates using key as index to move items. Copy back into original array. i ai i auxi 0 a 0 d int N = a.length; 1 a 1 a int count = new intR+1; r countr 2 b 2 c 3 b 3 f a 2 for (int i = 0; i N; i++) 4 b 4 f countai+1++; b 5 5 c 5 b c 6 for (int r = 0; r R; r++) 6 6 d d d 8 countr+1 += countr; 7 d 7 b e 9 8 e 8 f f 12 for (int i = 0; i N; i++) move 9 f 9 b - 12 items auxcountai++ = ai; 10 f 10 e 11 f 11 a for (int i = 0; i N; i++) ai = auxi; 18Key-indexed counting demo Goal. Sort an array a of N integers between 0 and R - 1. Count frequencies of each letter using key as index. Compute frequency cumulates which specify destinations. Access cumulates using key as index to move items. Copy back into original array. i ai i auxi 0 a 0 a int N = a.length; 1 a 1 a int count = new intR+1; r countr 2 b 2 b 3 b 3 b a 2 for (int i = 0; i N; i++) 4 b 4 b countai+1++; b 5 5 c 5 c c 6 for (int r = 0; r R; r++) 6 6 d d d 8 countr+1 += countr; 7 d 7 d e 9 8 e 8 e f 12 for (int i = 0; i N; i++) 9 f 9 f - 12 auxcountai++ = ai; 10 f 10 f 11 f 11 f for (int i = 0; i N; i++) copy back ai = auxi; 19Key-indexed counting: analysis Proposition. Key-indexed takes time proportional to N + R. for (int i = 0; i N; i++) auxcountai.key(d)++ = ai; Proposition. Key-indexed counting uses extra space proportional to N + R. count 1 2 3 4 0 3 8 14 Stable? ✔ a0 aux0 Anderson 2 Harris 1 0 4 8 14 a1 aux1 Brown 3 Martin 1 0 4 9 14 a2 aux2 Davis 3 Moore 1 0 4 10 14 a3 aux3 Garcia 4 Anderson 2 0 4 10 15 aux4 a4 Harris 1 Martinez 2 1 4 10 15 a5 aux5 Jackson 3 Miller 2 1 4 11 15 aux6 a6 Johnson 4 Robinson 2 1 4 11 16 a7 aux7 Jones 3 White 2 1 4 12 16 a8 aux8 Martin 1 Brown 3 2 4 12 16 a9 aux9 Martinez 2 Davis 3 2 5 12 16 a10 aux10 Miller 2 Jackson 3 2 6 12 16 a11 aux11 Moore 1 Jones 3 3 6 12 16 a12 aux12 Robinson 2 Taylor 3 3 7 12 16 a13 aux13 Smith 4 Williams 3 3 7 12 17 a14 aux14 Taylor 3 Garcia 4 3 7 13 17 a15 aux15 Thomas 4 Johnson 4 3 7 13 18 a16 aux16 Thompson 4 Smith 4 3 7 13 19 a17 aux17 White 2 Thomas 4 3 8 13 19 a18 aux18 Williams 3 Thompson 4 3 8 14 19 a19 aux19 Wilson 4 Wilson 4 3 8 14 20 Distributing the data (records with key 3 highlighted) 20