Question? Leave a message!

An introduction to hash tables

An introduction to hash tables
Dr.SethPatton Profile Pic
Dr.SethPatton,United Kingdom,Teacher
Published Date:22-07-2017
Website URL
ECE 250 Algorithms and Data Structures An introduction to hash tables Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada © 2006-2013 by Douglas Wilhelm Harder. Some rights reserved.Introduction to hash tables 2 9.1 Outline Discuss storing unrelated/unordered data – IP addresses and domain names Consider conversions between these two forms Introduce the idea of hashing: – Reducing O(ln(n)) operations to O(1) Consider some of the weaknesses Introduction to hash tables 3 9.1.1 Supporting Example Suppose we have a system which is associated with approximately 150 error conditions where – Each of which is identified by an 8-bit number from 0 to 255, and – When an identifier is received, a corresponding error-handling function must be called We could create an array of 150 function pointers and to then call the appropriate function….Introduction to hash tables 4 Supporting Example include iostream int main() void (function_array150)(); unsigned char error_id150; void a() std::cout function_array0 = a; "Calling 'void a()'" error_id0 = 3; std::endl; function_array1 = b; error_id1 = 8; function_array0(); void b() function_array1(); std::cout "Calling 'void b()'" return 0; Output: std::endl; % ./a.out Calling 'void a()' Calling 'void b()'Introduction to hash tables 5 Supporting Example Unfortunately, this is slow—we would have to do some form of binary search in order to determine which of the 150 slots corresponds to, for example, error-condition identifier id = 198 This would require approximately 6 comparisons per error condition If there was a possibility of dynamically adding new error conditions or removing defunct conditions, this would substantially increase the effort required…Introduction to hash tables 6 Supporting Example A better solution: – Create an array of size 256 – Assign those entries corresponding to valid error conditions int main() void (function_array256)(); for ( int i = 0; i 256; ++i ) function_arrayi = nullptr; function_array3 = a; function_array8 = b; function_array3(); function_array8(); return 0; Question: – Is the increased speed worth the allocation of additional memory?Introduction to hash tables 7 9.1.3 Keys Our goal: Store data so that all operations are Q(1) time Requirement: The memory requirement should be Q(n) In our supporting example, the corresponding function can be called in Q(1) time and the array is less than twice the optimal sizeIntroduction to hash tables 8 9.1.3 Keys In our example, we: – Created an array of size 256 – Store each of 150 objects in one of the 256 entries – The error code indicated which bin the corresponding function pointer was stored In general, we would like to: – Create an array of size M – Store each of n objects in one of the M bins – Have some means of determining the bin in which an object is storedIntroduction to hash tables 9 9.1.3 IP Addresses Examples: Suppose we want to associate IP addresses and any corresponding domain names Recall that a 32-bit IP address are often written as four byte values from 0 to 255 – Consider 10000001 01100001 00001010 10110011 2 – This can be written as – We use domain names because IP addresses are not human readable Introduction to hash tables 10 9.1.3 IP Addresses 16 Similarly, the University of Waterloo has 2 IP addrescontrol of names within its domain – Any IP address starting with 129.97 belongs to UW 2 – This gives UW 256 = 65535 IP addresses The University of Waterloo currently uses 60 % of the IP addressesIntroduction to hash tables 11 9.1.3 IP Addresses Given an IP address, if we wanted to quickly find any associated domain name, we could create an array of size 65536 of strings: int const MAX_IP_ADDRESSES = 65536 string domain_nameMAX_IP_ADDRESSES; For example, my computer is and its IP address is – The prefix 129.97 is common to all uWaterloo IP addresses – As 179 + 256 × 10 = 2739, it follows that domain_name2739 = "churchill";Introduction to hash tables 12 9.1.3 IP Addresses Under IPv6, IP addresses are 128 bits – It combines what is now implemented as subnets as well as allowing for many more IP addresses 64 Suppose uWaterloo is allocated 2 IP addresses under this scheme 64 – We cannot allocate an array of size 2Introduction to hash tables 13 9.1.3 IP Addresses What if we want to associate domain names with IP addresses? – Which entry in the array should be associated with the domain name ? Consider core routers on the Internet: – They must associate each IP address with the next router that address should be passed toIntroduction to hash tables 14 9.1.3 Simpler problem Let’s try a simpler problem – How do I store your examination grades so that I can access your grades in Q(1) time? Recall that each student is issued an 8-digit number – How do I store your examination grades so that I can access your grades in Q(1) time? – Suppose Jane Doe has the number 20123456 8 26 – I can’t create an array of size 10 ≈ 1.5 × 2Introduction to hash tables 15 9.1.3 Simpler problem I could create an array of size 1000 – How could you convert an 8-digit number into a 3-digit number? – First three digits might cause a problem: almost all students start with 201, 202, 203, 204, or 205 – The last three digits, however, are essentially random Therefore, I could store Jane’s examination grade grade456 = 86;Introduction to hash tables 16 Simpler problem Question: – What is the likelihood that in a class of size 100 that no two students will have the last three digits? – Not very high: 999 998 997 901 1 0.005959 1000 1000 1000 1000Introduction to hash tables 17 Simpler problem Consequently, I have a function that maps a student onto a 3-digit number – I can store something in that location 454 – Storing it, accessing it, and erasing it is Q(1) 455 – Problem: two or more students may map 456 86 to the same number: 457 – Wěi Wang has ID 20173456 and scored 85 458 – Alma Ahmed has ID 2024456 and scored 87 459 460 461 462 463 79 464 465 ... ... ... ...Introduction to hash tables 18 The hashing problem The process of mapping an object or a number onto an integer in a given range is called hashing Problem: multiple objects may hash to the same value – Such an event is termed a collision Hash tables use a hash function together with a mechanism for dealing with collisionsIntroduction to hash tables 19 IP Addresses Going back to the problem with IP addresses – We need a hash function to map an IP address to a smaller range – We need a hash function to map a domain name to a smaller range Mapping onto a smaller range may seem easier, but a mechanism for mapping onto a small range of integers may be more interestingIntroduction to hash tables 20 9.1.4 The hash process Object We will break the process into Techniques vary... three independent steps: – We will try to get each of 32-bit integer these down to Q(1) Modulus, mid-square, multiplicative, Fibonacci Map to an index 0, ..., M– 1 Deal with collisions Chained hash tables Open addressing Linear probing Quadratic probing Double hashing