Question? Leave a message!




Hash functions

Hash functions
ECE 250 Algorithms and Data Structures Hash functions Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada ece.uwaterloo.ca dwharderalumni.uwaterloo.ca © 20062013 by Douglas Wilhelm Harder. Some rights reserved.Hash functions 2 Outline In this talk, we will discuss – Finding 32bit hash values using: • Predetermined hash values – Autoincremented hash values – Addressbased hash values • Arithmetic hash values – Example: stringsHash functions 3 9.2 Definitions What is a hash of an object From MerriamWebster: a restatement of something that is already known The ultimate goal is to map onto an integer range 0, 1, 2, ..., M – 1Hash functions 4 9.2.1 The hash process Object We will look at the first problem – Hashing an object into a 32bit integer 32bit integer – Subsequent topics will examine the next Modulus, midsquare, steps multiplicative, Fibonacci Map to an index 0, ..., M – 1 Deal with collisions Chained hash tables Open addressing Linear probing Quadratic probing Double hashingHash functions 5 9.2.2 Properties Necessary properties of such a hash function h are: 1a. Should be fast: ideally Q(1) 1b. The hash value must be deterministic • It must always return the same 32bit integer each time 1c. Equal objects hash to equal values • x = y ⇒ h(x) = h(y) 1d. If two objects are randomly chosen, there should be only a onein 32 2 chance that they have the same hash valueHash functions 6 9.2.3 Types of hash functions We will look at two classes of hash functions – Predetermined hash functions (explicit) – Arithmetic hash functions (implicit)Hash functions 7 9.2.4 Predetermined hash functions The easiest solution is to give each object a unique number class Classname private: 31 31 unsigned int hashvalue; // int: –2 , ..., 2– 1 32 // unsigned int: 0, ..., 2– 1 public: Classname(); unsigned int hash() const; Classname::Classname() ; hashvalue = ; unsigned int Classname::hash() const return hashvalue; Hash functions 8 9.2.4 Predetermined hash functions For example, an autoincremented static member variable class Classname private: unsigned int hashvalue; static unsigned int hashcount; public: Classname(); unsigned int hash() const; Classname::Classname() ; hashvalue = hashcount; ++hashcount; unsigned int Classname::hashcount = 0; unsigned int Classname::hash() const return hashvalue; Hash functions 9 9.2.4 Predetermined hash functions Examples: All UW coop student have two hash values: – UW Student ID Number – Social Insurance Number Any 9digitdecimal integer yields a 32bit integer 9 lg( 10 ) = 29.897Hash functions 10 9.2.4 Predetermined hash functions If we only need the hash value while the object exists in memory, use the address: unsigned int Classname::hash() const return reinterpretcastunsigned int( this ); This fails if an object may be stored in secondary memory – It will have a different address the next time it is loadedHash functions 11 9.2.4.1 Predetermined hash functions Predetermined hash values give each object a unique hash value This is not always appropriate: – Objects which are conceptually equal: Rational x(1, 2); Rational y(3, 6); – Strings with the same characters: string str1 = "Hello world"; string str2 = "Hello world"; These hash values must depend on the member variables – Usually this uses arithmetic functionsHash functions 12 9.2.5 Arithmetic Hash Values An arithmetic hash value is a deterministic function that is calculated from the relevant member variables of an object We will look at arithmetic hash functions for: – Rational numbers, and – StringsHash functions 13 9.2.5.1 Rational number class What if we just add the numerator and denominator class Rational private: int numer, denom; public: Rational( int, int ); ; unsigned int Rational::hash() const return staticcastunsigned int( numer ) + staticcastunsigned int( denom ); Hash functions 14 9.2.5.1 Rational number class We could improve on this: multiply the denominator by a large prime: class Rational private: int numer, denom; public: Rational( int, int ); ; unsigned int Rational::hash() const return staticcastunsigned int( numer ) + 429496751staticcastunsigned int( denom ); Hash functions 15 9.2.5.1 Rational number class For example, the output of int main() cout Rational( 0, 1 ).hash() endl; cout Rational( 1, 2 ).hash() endl; cout Rational( 2, 3 ).hash() endl; cout Rational( 99, 100 ).hash() endl; return 0; is 429496751 858993503 1288490255 2239 http://xkcd.com/571/ Recall that arithmetic operations wrap on overflowHash functions 16 9.2.5.1 Rational number class This hash function does not generate unique values – The following pairs have the same hash values: 0/1 1327433019/800977868 1/2 534326814/1480277007 2/3 820039962/1486995867 – Finding rational numbers with matching hash values is very difficult: – Finding these required the generation of 1 500 000 000 random rational numbers – It is fast: Q(1) – It does produce an even distributionHash functions 17 9.2.5.1 Rational number class Problem: – The rational numbers 1/2 and 2/4 have different values – The output of int main() cout Rational( 1, 2 ).hash(); cout Rational( 2, 4 ).hash(); return 0; is 858993503 1717987006Hash functions 18 9.2.5.1 Rational number class Solution: divide through by the greatest common divisor Rational::Rational( int a, int b ):numer(a), denom(b) int divisor = gcd( numer, denom ); numer /= divisor; denom /= divisor; int gcd( int a, int b) while( true ) if ( a == 0 ) return (b = 0) b : b; b = a; if ( b == 0 ) return (a = 0) a : a; a = b; Hash functions 19 9.2.5.1 Rational number class Problem: 11 – The rational numbers and have different values 22 – The output of int main() cout Rational( 1, 2 ).hash(); cout Rational( 1, 2 ).hash(); return 0; is 858993503 3435973793Hash functions 20 9.2.5.1 Rational number class Solution: define a normal form – Require that the denominator is positive Rational::Rational( int a, int b ):numer(a), denom(b) int divisor = gcd( numer, denom ); divisor = (denom = 0) divisor : divisor; numer /= divisor; denom /= divisor; Hash functions 21 9.2.5.3 String class Two strings are equal if all the characters are equal and in the identical order A string is simply an array of bytes: – Each byte stores a value from 0 to 255 Any hash function must be a function of these bytesHash functions 22 9.2.5.3.1 String class We could, for example, just add the characters: unsigned int hash( const string str ) unsigned int hashvaalue = 0; for ( int k = 0; k str.length(); ++k ) hashvalue += strk; return hashvalue; Hash functions 23 9.2.5.3.1 String class Not very good: – Slow run time: Q(n) – Words with the same characters hash to the same code: • "form" and "from" TM – A poor distribution, e.g., all words in Moby Words II by Grady Ward (single.txt) Project Gutenberg):Hash functions 24 9.2.5.3.2 String class Let the individual characters represent the coefficients of a polynomial in x: n – 1 n – 2 2 p(x) = c x + c x + ··· + c x + c x + c 0 1 n – 3 n – 2 n – 1 Use Horner’s rule to evaluate this polynomial at a prime number, e.g., x = 12347: unsigned int hash( string const str ) unsigned int hashvalue = 0; for ( int k = 0; k str.length(); ++k ) hashvalue = 12347hashvalue + strk; return hashvalue; Hash functions 25 9.2.5.3.2 String class Is this hash function actually better Suppose I pick n random integers from 1 to L – One would expect each integer to appear l = n/L times – Some, however, will appear more often, others less often To test whether or not the integers are random, we will as: “How many (what proportion of) integers were chosen k times”Hash functions 26 9.2.5.3.2 String class Consider the hash of each of the 354985 strings in single.txt to 32 be a random value in 0, 1, 2, 3, …, 2 – 1 – Subdivide the integers into groups of approximately 12099 – We expect one hash value per interval – Count the number of these subintervals which contain 0, 1, 2, ... of these hash values – Plotting these proportions and 1/en, we see they’re very similar Proportion of intervals with n hash values Poisson distribution with l = 1Hash functions 27 9.2.5.3.3 String class Problem, Horner’s rule runs in Q(n) "A Elbereth Gilthoniel,\n Silivren penna miriel\n O menal aglar elenath\n Nachaered palandiriel\n O galadhremmin ennorath,\n Fanuilos, le linnathon\n nef aear, si nef aearon" Suggestions J.R.R. TolkienHash functions 28 9.2.5.3.3 String class k Use characters in locations 2 – 1 for k = 0, 1, 2, ...: "AElbereth Gilthoniel,\n Silivrenpenna miriel\n O menal aglar elenath\n Nachaered palandiriel\n O galadhremmin ennorath,\n Fanuilos, le linnathon\n nef aear, si nef aearon" J.R.R. TolkienHash functions 29 9.2.5.3.3 String class The run time is now Q(ln(n)) : unsigned int hash( const string str ) unsigned int hashvalue = 0; for ( int k = 1; k = str.length(); k = 2 ) hashvalue = 12347hashvalue + strk – 1; return hashvalue; Note: this cannot be used if you require a cryptographic hash function or message digestHash functions 30 9.2.5.3.3 Arithmetic hash functions In general, any member variables that are used to uniquely define an object may be used as coefficients in such a polynomial – The salary hopefully changes over time… class Person string surname; string givennames; unsigned char numgivennames; unsigned short birthyear; unsigned char birthmonth; unsigned char birthday; unsigned int salary; // ... ;Hash functions 31 Summary We have seen how a number of objects can be mapped onto a 32 bit integer We considered – Predetermined hash functions • Autoincremented variables • Addresses – Hash functions calculated using arithmetic Next: map a 32bit integer onto a smaller range 0, 1, ..., M – 1Hash functions 32 References Wikipedia, http://en.wikipedia.org/wiki/Hashfunction 1 Cormen, Leiserson, and Rivest, Introduction to Algorithms, McGraw Hill, 1990. rd 2 Weiss, Data Structures and Algorithm Analysis in C++, 3 Ed., Addison Wesley. These slides are provided for the ECE 250 Algorithms and Data Structures course. The material in it reflects Douglas W. Harder’s best judgment in light of the information available to him at the time of preparation. Any reliance on these course slides by any party for any other purpose are the responsibility of such parties. Douglas W. Harder accepts no responsibility for damages, if any, suffered by any party as a result of decisions made or actions based on these course slides for any other purpose than that for which it was intended.
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.SethPatton
User Type:
Teacher
Country:
United Kingdom
Uploaded Date:
22-07-2017