Question? Leave a message!




An introduction to hash tables

An introduction to hash tables
ECE 250 Algorithms and Data Structures An introduction to hash tables Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada ece.uwaterloo.ca dwharderalumni.uwaterloo.ca © 20062013 by Douglas Wilhelm Harder. Some rights reserved.Introduction to hash tables 2 9.1 Outline Discuss storing unrelated/unordered data – IP addresses and domain names Consider conversions between these two forms Introduce the idea of hashing: – Reducing O(ln(n)) operations to O(1) Consider some of the weaknesses Introduction to hash tables 3 9.1.1 Supporting Example Suppose we have a system which is associated with approximately 150 error conditions where – Each of which is identified by an 8bit number from 0 to 255, and – When an identifier is received, a corresponding errorhandling function must be called We could create an array of 150 function pointers and to then call the appropriate function….Introduction to hash tables 4 9.1.1.1 Supporting Example include iostream int main() void (functionarray150)(); unsigned char errorid150; void a() std::cout functionarray0 = a; "Calling 'void a()'" errorid0 = 3; std::endl; functionarray1 = b; errorid1 = 8; functionarray0(); void b() functionarray1(); std::cout "Calling 'void b()'" return 0; Output: std::endl; ./a.out Calling 'void a()' Calling 'void b()'Introduction to hash tables 5 9.1.1.1 Supporting Example Unfortunately, this is slow—we would have to do some form of binary search in order to determine which of the 150 slots corresponds to, for example, errorcondition identifier id = 198 This would require approximately 6 comparisons per error condition If there was a possibility of dynamically adding new error conditions or removing defunct conditions, this would substantially increase the effort required…Introduction to hash tables 6 9.1.1.2 Supporting Example A better solution: – Create an array of size 256 – Assign those entries corresponding to valid error conditions int main() void (functionarray256)(); for ( int i = 0; i 256; ++i ) functionarrayi = nullptr; functionarray3 = a; functionarray8 = b; functionarray3(); functionarray8(); return 0; Question: – Is the increased speed worth the allocation of additional memoryIntroduction to hash tables 7 9.1.3 Keys Our goal: Store data so that all operations are Q(1) time Requirement: The memory requirement should be Q(n) In our supporting example, the corresponding function can be called in Q(1) time and the array is less than twice the optimal sizeIntroduction to hash tables 8 9.1.3 Keys In our example, we: – Created an array of size 256 – Store each of 150 objects in one of the 256 entries – The error code indicated which bin the corresponding function pointer was stored In general, we would like to: – Create an array of size M – Store each of n objects in one of the M bins – Have some means of determining the bin in which an object is storedIntroduction to hash tables 9 9.1.3 IP Addresses Examples: Suppose we want to associate IP addresses and any corresponding domain names Recall that a 32bit IP address are often written as four byte values from 0 to 255 – Consider 10000001 01100001 00001010 10110011 2 – This can be written as http://129.97.10.179/ – We use domain names because IP addresses are not human readable Introduction to hash tables 10 9.1.3 IP Addresses 16 Similarly, the University of Waterloo has 2 IP addrescontrol of names within its domain – Any IP address starting with 129.97 belongs to UW 2 – This gives UW 256 = 65535 IP addresses The University of Waterloo currently uses 60 of the IP addressesIntroduction to hash tables 11 9.1.3 IP Addresses Given an IP address, if we wanted to quickly find any associated domain name, we could create an array of size 65536 of strings: int const MAXIPADDRESSES = 65536 string domainnameMAXIPADDRESSES; For example, my computer is churchill.uwaterloo.ca and its IP address is http://129.97.10.179/ – The prefix 129.97 is common to all uWaterloo IP addresses – As 179 + 256 × 10 = 2739, it follows that domainname2739 = "churchill";Introduction to hash tables 12 9.1.3 IP Addresses Under IPv6, IP addresses are 128 bits – It combines what is now implemented as subnets as well as allowing for many more IP addresses 64 Suppose uWaterloo is allocated 2 IP addresses under this scheme 64 – We cannot allocate an array of size 2Introduction to hash tables 13 9.1.3 IP Addresses What if we want to associate domain names with IP addresses – Which entry in the array should be associated with the domain name churchill.uwaterloo.ca Consider core routers on the Internet: – They must associate each IP address with the next router that address should be passed toIntroduction to hash tables 14 9.1.3 Simpler problem Let’s try a simpler problem – How do I store your examination grades so that I can access your grades in Q(1) time Recall that each student is issued an 8digit number – How do I store your examination grades so that I can access your grades in Q(1) time – Suppose Jane Doe has the number 20123456 8 26 – I can’t create an array of size 10 ≈ 1.5 × 2Introduction to hash tables 15 9.1.3 Simpler problem I could create an array of size 1000 – How could you convert an 8digit number into a 3digit number – First three digits might cause a problem: almost all students start with 201, 202, 203, 204, or 205 – The last three digits, however, are essentially random Therefore, I could store Jane’s examination grade grade456 = 86;Introduction to hash tables 16 9.1.3.1 Simpler problem Question: – What is the likelihood that in a class of size 100 that no two students will have the last three digits – Not very high: 999 998 997 901 1 0.005959 1000 1000 1000 1000Introduction to hash tables 17 9.1.3.1 Simpler problem Consequently, I have a function that maps a student onto a 3digit number – I can store something in that location 454 – Storing it, accessing it, and erasing it is Q(1) 455 – Problem: two or more students may map 456 86 to the same number: 457 – Wěi Wang has ID 20173456 and scored 85 458 – Alma Ahmed has ID 2024456 and scored 87 459 460 461 462 463 79 464 465 ... ... ... ...Introduction to hash tables 18 9.1.3.1 The hashing problem The process of mapping an object or a number onto an integer in a given range is called hashing Problem: multiple objects may hash to the same value – Such an event is termed a collision Hash tables use a hash function together with a mechanism for dealing with collisionsIntroduction to hash tables 19 9.1.3.1 IP Addresses Going back to the problem with IP addresses – We need a hash function to map an IP address to a smaller range – We need a hash function to map a domain name to a smaller range Mapping 129.97.10.179 onto a smaller range may seem easier, but a mechanism for mapping churchill.uwaterloo.ca onto a small range of integers may be more interestingIntroduction to hash tables 20 9.1.4 The hash process Object We will break the process into Techniques vary... three independent steps: – We will try to get each of 32bit integer these down to Q(1) Modulus, midsquare, multiplicative, Fibonacci Map to an index 0, ..., M– 1 Deal with collisions Chained hash tables Open addressing Linear probing Quadratic probing Double hashingIntroduction to hash tables 21 Summary Discuss storing unordered data Discuss IP addresses and domain names Consider conversions between these two forms Introduce the idea of using a smaller array – Converted ―large‖ numbers into valid array indices – Reduces O(ln(n)) in arrays and AVL trees to to O(1) Discussed the issues with collisionsIntroduction to hash tables 22 References Wikipedia, http://en.wikipedia.org/wiki/Hashtable 1 Cormen, Leiserson, and Rivest, Introduction to Algorithms, McGraw Hill, 1990. rd 2 Weiss, Data Structures and Algorithm Analysis in C++, 3 Ed., Addison Wesley. These slides are provided for the ECE 250 Algorithms and Data Structures course. The material in it reflects Douglas W. Harder’s best judgment in light of the information available to him at the time of preparation. Any reliance on these course slides by any party for any other purpose are the responsibility of such parties. Douglas W. Harder accepts no responsibility for damages, if any, suffered by any party as a result of decisions made or actions based on these course slides for any other purpose than that for which it was intended.
Website URL
Comment