Hashing: Difference between revisions
|  Created page with "Hashing is a tool to compactly store sparse data (i.e. maps and sets). On average, hashing generally achieves constant-time lookup but can degrade to linear time depending on the data nad the algorithm details.  There are two common problems with hashing * Collisions * Cache coherency  ==Collision resolution== There are several methods to resolve collisions when implementing hashmaps. Below are some common solutions.  Note that in scenarios where performance is of upmost..." | |||
| Line 23: | Line 23: | ||
| ===Perfect Hashing=== | ===Perfect Hashing=== | ||
| {{main | Wikipedia: Perfect hash function}} | |||
| ==Locality-sensitive Hashing== | ==Locality-sensitive Hashing== | ||
Latest revision as of 19:10, 8 June 2023
Hashing is a tool to compactly store sparse data (i.e. maps and sets). On average, hashing generally achieves constant-time lookup but can degrade to linear time depending on the data nad the algorithm details.
There are two common problems with hashing
- Collisions
- Cache coherency
Collision resolution
There are several methods to resolve collisions when implementing hashmaps. Below are some common solutions.
Note that in scenarios where performance is of upmost priotity and an occational incorrect solution is acceptable, it may be possible to ignore hash collisions or increase the hashmap size.
Linear probing
In linear probing, you continue by adding one to the hash value (i.e. iterating down the backing array) until you find the desired element or reach an empty entry. This requires no additional data structures beyond the backing array and iterating down the array allows using the memory page, likely stored in CPU cache.
Double hashing
In this scenario, you rehash the index. This does not require a new data structure but may not be cache coherent.
Chaining
With chaining, each hash is a bucket to a linked-list. This ensures you do not run out of space in the array but requires an extra data structure which needs to be managed and is not cache coherent.
