Hash Coding

Definition

Hash coding is used to convert a large and possibly variable-sized amount of data into a small datum known as a hash address. Hash addresses are usually a few integers that may serve as an index into an array. To convert a large amount of data, such as telephone numbers, into a small amount of data, you can use a hash function such as (phone number mod 100). In this case, telephone numbers are the key field. If you want to insert and search for phone numbers of patients in a hospital, you store the phone numbers into an array in which the last two digits of the phone number serve as positions in an array.
Hash coding is used to find items in a database, to detect duplicated or similar records in a large file, or to find similar stretches in DNA sequences. Good hash codes distribute data uniformly throughout the hash table and take O(1) for both insertion and searching.

Collisions

If two phone numbers end in the same two digits, there will be a collision. In order to remedy collisions, you can do Hash and Search, Rehashing, or Chaining.

Hash and Search

If you want to insert a phone number whose last two digits are 05 but the index 05 already has a number stored in it, you can store the number in the next available slot. If 06 is already full, you can't store the number in it, so you move on to 07. If 07 is empty, you can store the number in it. If all the indexes after 05 are full, you can go to the beginning of the array and check if there are any empty slots.

Hash Address Phone Number
[00] 382-4500
[01] 698-7701
[02] Empty
[03] 584-2303
[04] 751-4604
[05] 998-2705
[06] 623-8906
[07] Empty
[08] 994-1108
[09] Empty
[10] 743-1610
[11] Empty

Rehashing

If you want to insert a phone number whose last two digits are 05 but the index 05 already has a number stored in it, you can add a constant that is relatively prime, such as 3, to 5, and mod 8 by 100 to get 8. If the slot 08 is already taken, you can add the same constant, 3, to 8, and mod 11 by 100 to get 11. If the slot 11 is empty, you can store the number in slot 11. If most slots in a table are already full, this can take O(n).

Hash Address Phone Number
[00] 382-4500
[01] 698-7701
[02] Empty
[03] 584-2303
[04] 751-4604
[05] 998-2705
[06] 623-8906
[07] Empty
[08] 994-1108
[09] Empty
[10] 743-1610
[11] Empty

Chaining

In chaining, the hash address is the index of an array that contains linked lists called buckets, all of which share the same hash address. This means that if a number ends in 05, it is stored in the index 05, even if 05 isn't empty. If you want to find a particular number that ends in 05, you do a linear search of the bucket.

Hash Address Phone Number
[00] 382-4500
[01] 698-7701 -> 836-2501
[02] Empty
[03] 584-2303
[04] 751-4604 -> 310-9904
[05] 998-2705 -> 212-5305 -> 201-4105
[06] 623-8906
[07] Empty
[08] 994-1108
[09] Empty
[10] 743-1610
[11] Empty
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License