Double hashing example

double hashing c++ example and double hashing in data structure with example
Dr.SethPatton Profile Pic
Dr.SethPatton,United Kingdom,Teacher
Published Date:22-07-2017
Your Website URL(Optional)
Comment
ECE 250 Algorithms and Data Structures Double hashing Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada ece.uwaterloo.ca dwharderalumni.uwaterloo.ca © 2006-2013 by Douglas Wilhelm Harder. Some rights reserved.Double hashing 2 Outline This topic covers double hashing – More complex than linear or quadratic probing – Uses two hash functions • The first gives the bin • The second gives the jump size – Primary clustering no longer occurs – More efficient than linear or quadratic probingDouble hashing 3 Background Linear probing: – Look at bins k, k + 1, k + 2, k + 3, k + 4, … – Primary clustering Quadratic probing: – Look at bins k, k + 1, k + 4 , k + 9, k + 16, … – Secondary clustering (dangerous for poor hash functions) – Expensive: • Prime-sized arrays • Euclidean algorithm for calculating remaindersDouble hashing 4 Background Linear probing causes primary clustering – All entries follow the same search pattern for bins: int initial = hash_M( x.hash(), M ); for ( int k = 0; k M; ++k ) bin = (initial + k) % M; // ... Double hashing 5 Background This can be partially solved with quadratic probing – The step size grows in quadratic increments: 0, 1, 4, 9, 16, ... int bin = hash_M( x.hash(), M ); for ( int k = 0; k M; ++k ) bin = (bin + k) % M; // ... – Problems: what happens if multiple things are placed in the same bin? • The same steps are followed for each: secondary clustering • This indicates a potentially poor hash function • This sometimes cannot be avoidedDouble hashing 6 Background An alternate solution? – Give each object (with high probability) a different jump sizeDouble hashing 7 Description Associate with each object an initial bin and a jump size For example, int initial = hash_M( x.hash(), M ); int jump = hash2_M( x.hash(), M ); for ( int k = 0; k M; ++k ) bin = (initial + kjump) % M; Double hashing 8 Description Problem: – Will initial + kjump step through all of the bins? – Here, the jump size is 7: M = 16; initial = 5 jump = 7; for ( int k = 0; k = M; ++k ) std::cout (initial + kjump) % M ' '; – The output is 5 12 3 10 1 8 15 6 13 4 11 2 9 0 7 14 5Double hashing 9 Description Problem: – Will initial + kjump step through all of the bins? – Now, the jump size is 12: M = 16; initial = 5 jump = 12; for ( int k = 0; k = M; ++k ) std::cout (initial + kjump) % M ' '; – The output is now 5 1 13 9 5 1 13 9 5 1 13 9 5 1 13 9 5Double hashing 10 Description 1 The jump size and the number of bins M must be coprime – They must share no common factors There are two ways of ensuring this: – Make M a prime number – Restrict the prime factors 1 also known as relatively primeDouble hashing 11 Making M Prime If we make the table size M = p a prime number then all values between 1, ..., p – 1 are relatively prime – Example: 13 does not share any common factors with 1, 2, 3, ..., 11, 12 Problems: – All operations must be done using % • Cannot use &, , or • The modulus operator % is relatively slow – Doubling the number of bins is difficult: • What is the next prime after 2 × 263?Double hashing 12 m Using M = 2 We can restrict the number of prime factors m If we ensure M = 2 then: – The only prime factor of M is 2 – All odd numbers are relatively prime to M Benefits: – Doubling the size of the array is easy – We can use bitwise operationsDouble hashing 13 m Using M = 2 Given a number, how do we ensure the jump size is odd? Make the least-significant bit a 1: unsigned int make_odd( unsigned int n ) return n 1; For example: 0010101101100100111100010101010? 00000000000000000000000000000001 = 00101011011001001111000101010101Double hashing 14 m Using M = 2 This also works for signed integers and 2’s complement: – The least significant bit of a negative odd number is 1 For example –1 11111111111111111111111111111111 –2 11111111111111111111111111111110 –3 11111111111111111111111111111101 –4 11111111111111111111111111111100 –5 11111111111111111111111111111011 –6 11111111111111111111111111111010 –7 11111111111111111111111111111001 –8 11111111111111111111111111111000 –9 11111111111111111111111111110111Double hashing 15 m Using M = 2 Devising a second hash function is necessary One solution: define two functions hash_M1 and hash_M2 which map onto 0, …, M – 1 – Use a different set of m bits, e.g., if m = 10 use 00101001011011011111000100010101 . 1 initial jump where initial == 365 and jump == 965 Useful only if m ≤ 16Double hashing 16 m Using M = 2 Another solution: int initial = hash_M( x.hash(), M ); int jump = hash_M( x.hash() 532934959, M ) 1; for ( int k = 0; k M; ++k ) bin = (initial + kjump) & (M - 1); // ... 532934959 is primeDouble hashing 17 Example Consider a hash table with M = 16 bins Given a 3-digit hexadecimal number: – The least-significant digit is the primary hash function (bin) – The next digit is the secondary hash function (jump size) – Example: for 6B72A , the initial bin is A and the jump size is 3 16 Double hashing 18 Example Insert these numbers into this initially empty hash table 19A, 207, 3AD, 488, 5BA, 680, 74C, 826, 946, ACD, B32, C8B, DBE, E9C 0 1 2 3 4 5 6 7 8 9 A B C D E FDouble hashing 19 Example Start with the first four values: 19A, 207, 3AD, 488 0 1 2 3 4 5 6 7 8 9 A B C D E FDouble hashing 20 Example Start with the first four values: 19A, 207, 3AD, 488 0 1 2 3 4 5 6 7 8 9 A B C D E F 207 488 19A 3AD