Anagram substring Search Java

javascript substring search case insensitive and Elasticsearch query substring
Dr.BenjaminClark Profile Pic
Dr.BenjaminClark,United States,Teacher
Published Date:21-07-2017
Your Website URL(Optional)
Comment
ROBERT SEDGEWICK KEVIN WAYNE Algorithms 5.3 SUBSTRING SEARCH introduction ‣ brute force ‣ Knuth-Morris-Pratt ‣ Algorithms FOUR TH EDITION Boyer-Moore ‣ Rabin-Karp ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu5.3 SUBSTRING SEARCH introduction ‣ brute force ‣ Knuth-Morris-Pratt ‣ Algorithms Boyer-Moore ‣ Rabin-Karp ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.eduSubstring search Goal. Find pattern of length M in a text of length N. typically N M pattern N E E D L E text I N A H A Y S T A C K N E E D L E I N A match Substring search 3Substring search applications Goal. Find pattern of length M in a text of length N. typically N M pattern N E E D L E text I N A H A Y S T A C K N E E D L E I N A match Substring search 4Substring search applications Goal. Find pattern of length M in a text of length N. typically N M pattern N E E D L E text I N A H A Y S T A C K N E E D L E I N A match Substring search Computer forensics.  Search memory or disk for signatures, e.g., all URLs or RSA keys that the user has entered. http://citp.princeton.edu/memory 5Substring search applications Goal. Find pattern of length M in a text of length N. typically N M pattern N E E D L E text I N A H A Y S T A C K N E E D L E I N A match Substring search Identify patterns indicative of spam. PROFITS L0SE WE1GHT herbal Viagra There is no catch. This is a one-time mailing. This message is sent in compliance with spam regulations. 6Substring search applications Electronic surveillance. Need to monitor all internet traffic. (security) No way (privacy) Well, we’re mainly interested in “ATTACK AT DAWN” OK. Build a machine that just looks for that. “ATTACK AT DAWN” substring search machine found 7Substring search applications Screen scraping. Extract relevant data from web page. Ex. Find string delimited by b and /b after first occurrence of pattern Last Trade:. ... tr td class= "yfnc_tablehead1" width= "48%" Last Trade: /td td class= "yfnc_tabledata1" bigb452.92/b/big /td/tr td class= "yfnc_tablehead1" http://finance.yahoo.com/q?s=goog width= "48%" Trade Time: /td td class= "yfnc_tabledata1" ... 8Screen scraping: Java implementation Java library. The indexOf() method in Java's string library returns the index of the first occurrence of a given string, starting at a given offset. public class StockQuote public static void main(String args) String name = "http://finance.yahoo.com/q?s="; In in = new In(name + args0); String text = in.readAll(); int start = text.indexOf("Last Trade:", 0); int from = text.indexOf("b", start); int to = text.indexOf("/b", from); String price = text.substring(from + 3, to); StdOut.println(price); % java StockQuote goog 582.93 % java StockQuote msft 24.84 95.3 SUBSTRING SEARCH introduction ‣ brute force ‣ Knuth-Morris-Pratt ‣ Algorithms Boyer-Moore ‣ Rabin-Karp ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.eduBrute-force substring search Check for pattern starting at each text position. i j i+j 0 1 2 3 4 5 6 7 8 9 10 A B A C A D A B R A C txt pat 0 2 2 A B R A entries in red are 1 0 1 A B R A mismatches 2 1 3 A B R A entries in gray are 3 0 3 A B R A for reference only 4 1 5 A B R A entries in black 5 0 5 A B R A match the text 6 4 10 A B R A return i when j is M match Brute-force substring search 11i j i+j 0 1 2 3 4 5 6 7 8 9 10 A B A C A D A B R A C 4 3 7 A D A C R 5 0 5 A D A C R Brute-force substring search: Java implementation Check for pattern starting at each text position. public static int search(String pat, String txt) int M = pat.length(); int N = txt.length(); for (int i = 0; i = N - M; i++) int j; for (j = 0; j M; j++) if (txt.charAt(i+j) = pat.charAt(j)) break; index in text where if (j == M) return i; pattern starts return N; not found 12Brute-force substring search: worst case Brute-force algorithm can be slow if text and pattern are repetitive. i j i+j 0 1 2 3 4 5 6 7 8 9 10 i j i+j 0 1 2 3 4 5 6 7 8 9 txt A B A C A D A B R A C txt A A A A A A A A A B 0 2 2 A B R A pat pat 0 4 4 A A A A B entries in red are 1 0 1 A B R A 1 4 5 A A A A B mismatches 2 1 3 A B R A 2 4 6 A A A A B entries in gray are 3 0 3 A B R A for reference only 3 4 7 A A A A B 4 1 5 A B R A entries in black 4 4 8 A A A A B 5 0 5 A B R A match the text 5 5 10 A A A A B 6 4 10 A B R A return i when j is M Brute-force substring search (worst case) match Brute-force substring search Worst case. M N char compares. 13A A A A A A A A A A A A A A A A A A A A A B A A A A A B A A A A A A A A A A A A A A A A A A A A A B A A A A A B shift pattern right one position Backup In many applications, we want to avoid backup in text stream. Treat input as stream of data. “ATTACK AT DAWN” substring search Abstract model: standard input. machine found Brute-force algorithm needs backup for every mismatch. matched chars mismatch backup Approach 1. Maintain buffer of last M characters. Approach 2. Stay tuned. 14i j 0 1 2 3 4 5 6 7 8 9 10 A B A C A D A B R A C 7 3 A D A C R 5 0 A D A C R Brute-force substring search: alternate implementation Same sequence of char compares as previous implementation. i points to end of sequence of already-matched chars in text. j stores of already-matched chars (end of sequence in pattern). public static int search(String pat, String txt) int i, N = txt.length(); int j, M = pat.length(); for (i = 0, j = 0; i N && j M; i++) if (txt.charAt(i) == pat.charAt(j)) j++; explicit backup else i -= j; j = 0; if (j == M) return i - M; else return N; 15Algorithmic challenges in substring search Brute-force is not always good enough. fundamental algorithmic problem Theoretical challenge. Linear-time guarantee. often no room or time to save text Practical challenge. Avoid backup in text stream. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for each good person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. Now is the time for all people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for a lot of good people to come to the aid of their party. Now is the time for all of the good people to come to the aid of their party. Now is the time for all good people to come to the aid of their attack at dawn party. Now is the time for each person to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Republicans to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for many or all good people to come to the aid of their party. Now is the time for all good people to come to the aid of their party. Now is the time for all good Democrats to come to the aid of their party. 165.3 SUBSTRING SEARCH introduction ‣ brute force ‣ Knuth-Morris-Pratt ‣ Algorithms Boyer-Moore ‣ Rabin-Karp ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.eduKnuth-Morris-Pratt substring search Intuition. Suppose we are searching in text for pattern BAAAAAAAAA. th Suppose we match 5 chars in pattern, with mismatch on 6 char. We know previous 6 chars in text are BAAAAB. Don't need to back up text pointer assuming A, B alphabet ii te tex xt t A B A A A A B A A A A A A A A A A B A A A A B A A A A A A A A A aft afte er mismat r mismatc ch h p patt atte er rn n B A A A A B A A A A AA A A A A A A A A on six on sixth c th char har B B A A A A A A A A A A A A A A A A A A br brut ute-f e-for orc ce b e bac acks ks up t up to t o tr ry this y this B B A A A A A A A A A A A A A A A A A A and this and this B B A A A A A A A A A A A A A A A A A A and this and this B B A A A A A A A A A A A A A A A A A A and this and this B A A B A A A A A A A A A A A A A A A A and this and this B B A A A A A A A A A A A A A A A A A A b but no b ut no bac ackup kup is nee is neede ded d T Te ex xt poin t point ter backup in substring sear er backup in substring searching ching Knuth-Morris-Pratt algorithm. Clever method to always avoid backup. () 18Deterministic finite state automaton (DFA) DFA is abstract string-searching machine. Finite number of states (including start and halt). Exactly one transition for each char in alphabet. Accept if sequence of transitions leads to halt state. X internal representation X j 0 1 2 3 4 5 j 0 1 2 3 4 5 pat.charAt(j) A B A B A C If in state j reading char c: pat.charAt(j) A B A B A C A 1 1 3 1 5 1 if j is 6 halt and accept A 1 1 3 1 5 1 dfaj 0 2 0 4 0 4 B dfaj 0 2 0 4 0 4 B else move to state dfacj 0 0 0 0 0 6 C 0 0 0 0 0 6 C graphical representation A A A A B,C B,C A A B B 0 A B A B A C 1 5 6 2 3 4 A B A A C 0 B 1 5 6 2 3 4 C B,C C C B,C B,C C B,C Constructing the DFA for KMP substring search for A B A B A C 19 Constructing the DFA for KMP substring search for A B A B A CKnuth-Morris-Pratt demo: DFA simulation A A B A C A A B A B A C A A 0 1 2 3 4 5 pat.charAt(j) A B A B A C A 1 1 3 1 5 1 dfaj B 0 2 0 4 0 4 C 0 0 0 0 0 6 A A B, C A B 0 A 1 B 2 A 3 B 4 A 5 C 6 C B, C C B, C 20