Question? Leave a message!




SYMBOL TABLE APPLICATIONS

SYMBOL TABLE APPLICATIONS
ROBERT SEDGEWICK KEVIN WAYNE Algorithms 3.5 SYMBOL TABLE APPLICATIONS sets ‣ dictionary clients ‣ indexing clients ‣ Algorithms FOUR TH EDITION sparse vectors ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu3.5 SYMBOL TABLE APPLICATIONS sets ‣ dictionary clients ‣ indexing clients ‣ Algorithms sparse vectors ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.eduSet API Mathematical set. A collection of distinct keys. public class SETKey extends ComparableKey public class SETKey extends ComparableKey public class SETKey extends ComparableKey SET() create an empty set void add(Key key) add the key to the set boolean contains(Key key) is the key in the set void remove(Key key) remove the key from the set int size() return the number of keys in the set IteratorKey iterator() iterator through keys in the set Q. How to implement 3Exception filter Read in a list of words from one file. Print out all words from standard input that are in, not in the list. more list.txt list of exceptional words was it the of java WhiteList list.txt tinyTale.txt it was the of it was the of it was the of it was the of it was the of it was the of it was the of it was the of it was the of it was the of java BlackList list.txt tinyTale.txt best times worst times age wisdom age foolishness epoch belief epoch incredulity season light season darkness spring hope winter despair 4Exception filter applications Read in a list of words from one file. Print out all words from standard input that are in, not in the list. application purpose key in list spell checker identify misspelled words word dictionary words browser mark visited pages URL visited pages parental controls block sites URL bad sites chess detect draw board positions spam filter eliminate spam IP address spam addresses credit cards check for stolen cards number stolen cards 5Exception filter: Java implementation Read in a list of words from one file. Print out all words from standard input that are in the list. public class WhiteList public static void main(String args) create empty set of strings SETString set = new SETString(); In in = new In(args0); while (in.isEmpty()) read in whitelist set.add(in.readString()); while (StdIn.isEmpty()) String word = StdIn.readString(); print words in list if (set.contains(word)) StdOut.println(word); 6Exception filter: Java implementation Read in a list of words from one file. Print out all words from standard input that are not in the list. public class BlackList public static void main(String args) create empty set of strings SETString set = new SETString(); In in = new In(args0); while (in.isEmpty()) read in whitelist set.add(in.readString()); while (StdIn.isEmpty()) String word = StdIn.readString(); print words not in list if (set.contains(word)) StdOut.println(word); 73.5 SYMBOL TABLE APPLICATIONS sets ‣ dictionary clients ‣ indexing clients ‣ Algorithms sparse vectors ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.eduDictionary lookup Commandline arguments. more ip.csv A commaseparated value (CSV) file. www.princeton.edu,128.112.128.15 www.cs.princeton.edu,128.112.136.35 Key field. www.math.princeton.edu,128.112.18.11 www.cs.harvard.edu,140.247.50.127 Value field. www.harvard.edu,128.103.60.24 www.yale.edu,130.132.51.8 www.econ.yale.edu,128.36.236.74 www.cs.yale.edu,128.36.229.30 Ex 1. DNS lookup. espn.com,199.181.135.201 domain name is key IP is value yahoo.com,66.94.234.13 msn.com,207.68.172.246 google.com,64.233.167.99 baidu.com,202.108.22.33 java LookupCSV ip.csv 0 1 yahoo.co.jp,202.93.91.141 adobe.com sina.com.cn,202.108.33.32 192.150.18.60 ebay.com,66.135.192.87 www.princeton.edu adobe.com,192.150.18.60 128.112.128.15 163.com,220.181.29.154 passport.net,65.54.179.226 ebay.edu domain name is key URL is value tom.com,61.135.158.237 Not found nate.com,203.226.253.11 cnn.com,64.236.16.20 java LookupCSV ip.csv 1 0 daum.net,211.115.77.211 128.112.128.15 blogger.com,66.102.15.100 fastclick.com,205.180.86.4 www.princeton.edu wikipedia.org,66.230.200.100 999.999.999.99 rakuten.co.jp,202.72.51.22 Not found ... 9Dictionary lookup more amino.csv Commandline arguments. TTT,Phe,F,Phenylalanine TTC,Phe,F,Phenylalanine A commaseparated value (CSV) file. TTA,Leu,L,Leucine TTG,Leu,L,Leucine Key field. TCT,Ser,S,Serine TCC,Ser,S,Serine Value field. TCA,Ser,S,Serine TCG,Ser,S,Serine TAT,Tyr,Y,Tyrosine TAC,Tyr,Y,Tyrosine Ex 2. Amino acids. TAA,Stop,Stop,Stop TAG,Stop,Stop,Stop TGT,Cys,C,Cysteine TGC,Cys,C,Cysteine codon is key name is value TGA,Stop,Stop,Stop TGG,Trp,W,Tryptophan CTT,Leu,L,Leucine java LookupCSV amino.csv 0 3 CTC,Leu,L,Leucine ACT CTA,Leu,L,Leucine CTG,Leu,L,Leucine Threonine CCT,Pro,P,Proline TAG CCC,Pro,P,Proline Stop CCA,Pro,P,Proline CAT CCG,Pro,P,Proline CAT,His,H,Histidine Histidine CAC,His,H,Histidine CAA,Gln,Q,Glutamine CAG,Gln,Q,Glutamine CGT,Arg,R,Arginine CGC,Arg,R,Arginine ... 10Dictionary lookup Commandline arguments. more classlist.csv 13,Berl,Ethan Michael,P01,eberl A commaseparated value (CSV) file. 12,Cao,Phillips Minghua,P01,pcao Key field. 11,Chehoud,Christel,P01,cchehoud 10,Douglas,Malia Morioka,P01,malia Value field. 12,Haddock,Sara Lynn,P01,shaddock 12,Hantman,Nicole Samantha,P01,nhantman 11,Hesterberg,Adam Classen,P01,ahesterb Ex 3. Class list. 13,Hwang,Roland Lee,P01,rhwang 13,Hyde,Gregory Thomas,P01,ghyde first name 13,Kim,Hyunmoon,P01,hktwo is value login is key 12,Korac,Damjan,P01,dkorac 11,MacDonald,Graham David,P01,gmacdona 10,Michal,Brian Thomas,P01,bmichal java LookupCSV classlist.csv 4 1 12,Nam,Seung Hyeon,P01,seungnam eberl 11,Nastasescu,Maria Monica,P01,mnastase Ethan section 11,Pan,Di,P01,dpan nwebb is value login is key 12,Partridge,Brenton Alan,P01,bpartrid Natalie 13,Rilee,Alexander,P01,arilee 13,Roopakalu,Ajay,P01,aroopaka java LookupCSV classlist.csv 4 3 11,Sheng,Ben C,P01,bsheng 12,Webb,Natalie Sue,P01,nwebb dpan P01 ⋮ 11Dictionary lookup: Java implementation public class LookupCSV public static void main(String args) In in = new In(args0); process input file int keyField = Integer.parseInt(args1); int valField = Integer.parseInt(args2); STString, String st = new STString, String(); while (in.isEmpty()) String line = in.readLine(); String tokens = line.split(","); build symbol table String key = tokenskeyField; String val = tokensvalField; st.put(key, val); while (StdIn.isEmpty()) process lookups String s = StdIn.readString(); with standard I/O if (st.contains(s)) StdOut.println("Not found"); else StdOut.println(st.get(s)); 123.5 SYMBOL TABLE APPLICATIONS sets ‣ dictionary clients ‣ indexing clients ‣ Algorithms sparse vectors ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.eduFile indexing Goal. Index a PC (or the web). 14File indexing Goal. Given a list of files, create an index so that you can efficiently find all files containing a given query string. ls .txt ls .java aesop.txt magna.txt moby.txt BlackList.java Concordance.java sawyer.txt tale.txt DeDup.java FileIndex.java ST.java SET.java WhiteList.java java FileIndex .txt java FileIndex .java freedom magna.txt moby.txt tale.txt import FileIndex.java SET.java ST.java whale moby.txt Comparator null lamb sawyer.txt aesop.txt Solution. Key = query string; value = set of files containing that string. 15File indexing import java.io.File; public class FileIndex public static void main(String args) symbol table STString, SETFile st = new STString, SETFile(); for (String filename : args) list of file names File file = new File(filename); from command line In in = new In(file); while (in.isEmpty()) for each word in file, String key = in.readString(); add file to if (st.contains(key)) corresponding set st.put(word, new SETFile()); SETFile set = st.get(key); set.add(file); while (StdIn.isEmpty()) process queries String query = StdIn.readString(); StdOut.println(st.get(query)); 16Book index Goal. Index for an ebook. 17Concordance Goal. Preprocess a text corpus to support concordance queries: given a word, find all occurrences with their immediate contexts. java Concordance tale.txt cities tongues of the two cities that were blended in majesty their turnkeys and the majesty of the law fired me treason against the majesty of the people in of his most gracious majesty king george the third princeton no matches Solution. Key = query string; value = set of indices containing that string. 18Concordance public class Concordance public static void main(String args) In in = new In(args0); String words = in.readAllStrings(); STString, SETInteger st = new STString, SETInteger(); for (int i = 0; i words.length; i++) read text and build index String s = wordsi; if (st.contains(s)) st.put(s, new SETInteger()); SETInteger set = st.get(s); set.add(i); while (StdIn.isEmpty()) process queries String query = StdIn.readString(); and print SETInteger set = st.get(query); concordances for (int k : set) // print wordsk4 to wordsk+4 193.5 SYMBOL TABLE APPLICATIONS sets ‣ dictionary clients ‣ indexing clients ‣ Algorithms sparse vectors ‣ ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.eduMatrixvector multiplication (standard implementation) a x b 0 .90 0 0 0 .05 .036 0 0 .36 .36 .18 .04 .297 = 0 0 0 .90 0 .36 .333 .90 0 0 0 0 .37 .045 .47 0 .47 0 0 .19 .1927 Matrixvector multiplication ... double a = new doubleNN; double x = new doubleN; double b = new doubleN; ... // initialize a and x ... nested loops 2 for (int i = 0; i N; i++) (N running time) sum = 0.0; for (int j = 0; j N; j++) sum += aijxj; bi = sum; 21Sparse matrixvector multiplication Problem. Sparse matrixvector multiplication. Assumptions. Matrix dimension is 10,000; average nonzeros per row 10. A x = b 22Vector representations 1d array (standard) representation. Constant time access to elements. Space proportional to N. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0 .36 0 0 0 .36 0 0 0 0 0 0 0 0 .18 0 0 0 0 0 Symbol table representation. Key = index, value = entry. Efficient iterator. Space proportional to number of nonzeros. key value st 1 .36 5 .36 14 .18 23Sparse vector data type public class SparseVector HashST because order not important private HashSTInteger, Double v; public SparseVector() empty ST represents all 0s vector v = new HashSTInteger, Double(); public void put(int i, double x) ai = value v.put(i, x); public double get(int i) if (v.contains(i)) return 0.0; return ai else return v.get(i); public IterableInteger indices() iterate through indices of return v.keys(); nonzero entries public double dot(double that) dot product is constant double sum = 0.0; time for sparse vectors for (int i : indices()) sum += thatithis.get(i); return sum; 24Matrix representations 2D array (standard) matrix representation: Each row of matrix is an array. Constant time access to elements. 2 Space proportional to N . Sparse matrix representation: Each row of matrix is a sparse vector. Efficient access to elements. Space proportional to number of nonzeros (plus N). array of doubleobjects array of SparseVector objects array of doubleobjects array of SparseVector objects st st 0 1 2 3 4 0 1 2 3 4 1 .90 1 .90 0.0 .90 0.0 0.0 0.0 0.0 .90 0.0 0.0 0.0 key value key value 0 1 2 3 4 0 1 2 3 4 st st a a a a 2 .36 3 .36 4 .18 2 .36 3 .36 4 .18 0.0 0.0 .36 .36 .18 0.0 0.0 .36 .36 .18 0 0 0 0 1 0 1 2 3 4 1 1 0 1 2 3 4 1 st st independent independent 2 0.0 0.0 0.0 .90 0.0 2 3 .90 2 0.0 0.0 0.0 .90 0.0 2 3 .90 symboltable symboltable 3 3 objects 3 3 objects 0 1 2 3 4 0 1 2 3 4 4 4 4 4 st st .90 0.0 0.0 0.0 0.0 .90 0.0 0.0 0.0 0.0 0 .90 0 .90 0 1 2 3 4 0 1 2 3 4 st st .45 0.0 .45 0.0 0.0 .45 0.0 .45 0.0 0.0 0 2 .45 .45 0 2 .45 .45 a42 a42 25 Sparse matrix representations Sparse matrix representationsSparse matrixvector multiplication a x b 0 .90 0 0 0 .05 .036 0 0 .36 .36 .18 .04 .297 = 0 0 0 .90 0 .36 .333 .90 0 0 0 0 .37 .045 .47 0 .47 0 0 .19 .1927 Matrixvector multiplication .. SparseVector a = new SparseVectorN; double x = new doubleN; double b = new doubleN; ... // Initialize a and x ... linear running time for (int i = 0; i N; i++) for sparse matrix bi = ai.dot(x); 26
Website URL
Comment