Question? Leave a message!




Hashing and Hash Methods

Hashing and Hash Methods
Hashing and Hash MethodsThe Basic Problem  We have lots of data to store.  We desire efficient – O( 1 ) – performance for insertion, deletion and searching.  Too much (wasted) memory is required if we use an array indexed by the data‟s key.  The solution is a “hash table”. www.ThesisScientist.comHash Table 0 1 2 m1  Basic Idea  The hash table is an array of size „m‟  The storage index for an item determined by a hash function h(k): U  0, 1, …, m1  Desired Properties of h(k)  easy to compute  uniform distribution of keys over 0, 1, …, m1  when h(k ) = h(k ) for k , k U , we have a collision 1 2 1 2 www.ThesisScientist.comDivision Method  The hash function: h( k ) = k mod m where m is the table size.  m must be chosen to spread keys evenly.  Poor choice: m = a power of 10 b  Poor choice: m = 2 , b 1  A good choice of m is a prime number.  Table should be no more than 80 full.  Choose m as smallest prime number greater than m , where m = (expected number of min min entries)/0.8 www.ThesisScientist.comMultiplication Method  The hash function: h( k ) =  m( kA  kA  )  where A is some real positive constant.  A very good choice of A is the inverse of the “golden ratio.”  Given two positive numbers x and y, the ratio x/y is the “golden ratio” if = x/y = (x+y)/x  The golden ratio: 2 2 2 x xy y = 0   1 = 0  = (1 + sqrt(5))/2 = 1.618033989… = Fib /Fib i i1 www.ThesisScientist.comMultiplication Method (cont.)  Because of the relationship of the golden ratio to Fibonacci numbers, this particular value of A in the multiplication method is called “Fibonacci hashing.”  Some values of 1 1 h( k ) = m(k  k ) = 0 for k = 0 1 = 0.618m for k = 1 ( = 1/ 1.618… = 0.618…) = 0.236m for k = 2 = 0.854m for k = 3 = 0.472m for k = 4 = 0.090m for k = 5 = 0.708m for k = 6 = 0.326m for k = 7 = … = 0.777m for k = 32 www.ThesisScientist.comwww.ThesisScientist.comNoninteger Keys  In order to have a noninteger key, must first convert to a positive integer: h( k ) = g( f( k ) ) with f: U  integer g: I  0 .. m1  Suppose the keys are strings.  How can we convert a string (or characters) into an integer value www.ThesisScientist.comHorner’s Rule static int hash(String key, int tableSize) int hashVal = 0; for (int i = 0; i key.length(); i++) hashVal = 37 hashVal + key.charAt(i); hashVal = tableSize; if(hashVal 0) hashVal += tableSize; return hashVal; www.ThesisScientist.comHashTable Class public class SeparateChainingHashTableAnyType public SeparateChainingHashTable( )/ Later / public SeparateChainingHashTable(int size)/Later/ public void insert( AnyType x ) /Later/ public void remove( AnyType x ) /Later/ public boolean contains( AnyType x )/Later / public void makeEmpty( ) / Later / private static final int DEFAULTTABLESIZE = 101; private ListAnyType theLists; private int currentSize; private void rehash( ) / Later / private int myhash( AnyType x ) / Later / private static int nextPrime( int n ) / Later / private static boolean isPrime( int n ) / Later / www.ThesisScientist.comHashTable Ops  boolean contains( AnyType x )  Returns true if x is present in the table.  void insert (AnyType x)  If x already in table, do nothing.  Otherwise, insert it, using the appropriate hash function.  void remove (AnyType x)  Remove the instance of x, if x is present.  Ptherwise, does nothing  void makeEmpty() www.ThesisScientist.comHash Methods private int myhash( AnyType x ) int hashVal = x.hashCode( ); hashVal = theLists.length; if( hashVal 0 ) hashVal += theLists.length; return hashVal; www.ThesisScientist.comHandling Collisions  Collisions are inevitable. How to handle them  Separate chaining hash tables  Store colliding items in a list.  If m is large enough, list lengths are small.  Insertion of key k  hash( k ) to find the proper list.  If k is in that list, do nothing, else insert k on that list.  Asymptotic performance  If always inserted at head of list, and no duplicates, insert = O(1): best, worst, average www.ThesisScientist.comHash Class for Separate Chaining  To implement separate chaining, the private data of the hash table is a vector (array) of Lists. The hash functions are written using List functions private ListAnyType theLists; www.ThesisScientist.comPerformance of contains( )  contains  Hash k to find the proper list.  Call contains( ) on that list which returns a boolean.  Performance  best:  worst:  average www.ThesisScientist.comPerformance of remove( )  Remove k from table  Hash k to find proper list.  Remove k from list.  Performance  best  worst  average www.ThesisScientist.comHandling Collisions Revisited  Probing hash tables  All elements stored in the table itself (so table should be large. Rule of thumb: m = 2N)  Upon collision, item is hashed to a new (open) slot.  Hash function h: U x 0,1,2,….  0,1,…,m1 h( k, i ) = ( h‟( k ) + f( i ) ) mod m for some h‟: U  0, 1,…, m1 and some f( i ) such that f(0) = 0  Each attempt to find an open slot (i.e. calculating h( k, i )) is called a probe www.ThesisScientist.comHashEntry Class for Probing Hash Tables  In this case, the hash table is just an array private static class HashEntryAnyType public AnyType element; // the element public boolean isActive; // false if deleted public HashEntry( AnyType e ) this( e, true ); public HashEntry( AnyType e, boolean active ) element = e; isActive = active; // The array of elements private HashEntryAnyType array; // The number of occupied cells private int currentSize; www.ThesisScientist.comLinear Probing  Use a linear function for f( i ) f( i ) = c i  Example: h‟( k ) = k mod 10 in a table of size 10 , f( i ) = i So that h( k, i ) = (k mod 10 + i ) mod 10 Insert the values U=89,18,49,58,69 into the hash table www.ThesisScientist.comLinear Probing (cont.)  Problem: Clustering  When the table starts to fill up, performance  O(N)  Asymptotic Performance  Insertion and unsuccessful find, average  is the “load factor” – what fraction of the table is used 2  Number of probes  ( ½ ) ( 1+1/( 1 ) )  if  1, the denominator goes to zero and the number of probes goes to infinity www.ThesisScientist.comLinear Probing (cont.)  Remove  Can‟t just use the hash function(s) to find the object and remove it, because objects that were inserted after X were hashed based on X‟s presence.  Can just mark the cell as deleted so it won‟t be found anymore.  Other elements still in right cells  Table can fill with lots of deleted junk www.ThesisScientist.comQuadratic Probing  Use a quadratic function for f( i ) 2 f( i ) = c i + c i + c 2 1 0 2 The simplest quadratic function is f( i ) = i  Example: 2 Let f( i ) = i and m = 10 Let h‟( k ) = k mod 10 So that 2 h( k, i ) = (k mod 10 + i ) mod 10 Insert the value U=89, 18, 49, 58, 69 into an initially empty hash table www.ThesisScientist.comQuadratic Probing (cont.)  Advantage:  Reduced clustering problem  Disadvantages:  Reduced number of sequences  No guarantee that empty slot will be found if λ ≥ 0.5, even if m is prime  If m is not prime, may not find an empty slot even if λ 0.5 www.ThesisScientist.comDouble Hashing  Let f( i ) use another hash function f( i ) = i h ( k ) 2 Then h( k, I ) = ( h‟( k ) + h ( k ) ) mod m 2 And probes are performed at distances of h ( k ), 2 h ( k ), 3 h ( k ), 4 h ( k ), etc 2 2 2 2  Choosing h ( k ) 2  Don‟t allow h ( k ) = 0 for any k. 2  A good choice: h ( k ) = R ( k mod R ) with R a prime smaller than m 2  Characteristics  No clustering problem  Requires a second hash function www.ThesisScientist.comRehashing  If the table gets too full, the running time of the basic operations starts to degrade.  For hash tables with separate chaining, “too full” means more than one element per list (on average)  For probing hash tables, “too full” is determined as an arbitrary value of the load factor.  To rehash, make a copy of the hash table, double the table size, and insert all elements (from the copy) of the old table into the new table  Rehashing is expensive, but occurs very infrequently. www.ThesisScientist.com
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
WilliamsMcmahon
User Type:
Professional
Country:
United States
Uploaded Date:
20-07-2017