Hashing

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Hashing is a tool to compactly store sparse data (i.e. maps and sets). On average, hashing generally achieves constant-time lookup but can degrade to linear time depending on the data nad the algorithm details.

There are two common problems with hashing

  • Collisions
  • Cache coherency

Collision resolution

There are several methods to resolve collisions when implementing hashmaps. Below are some common solutions.

Note that in scenarios where performance is of upmost priotity and an occational incorrect solution is acceptable, it may be possible to ignore hash collisions or increase the hashmap size.

Linear probing

In linear probing, you continue by adding one to the hash value (i.e. iterating down the backing array) until you find the desired element or reach an empty entry. This requires no additional data structures beyond the backing array and iterating down the array allows using the memory page, likely stored in CPU cache.

Double hashing

In this scenario, you rehash the index. This does not require a new data structure but may not be cache coherent.

Chaining

With chaining, each hash is a bucket to a linked-list. This ensures you do not run out of space in the array but requires an extra data structure which needs to be managed and is not cache coherent.

Perfect Hashing

Locality-sensitive Hashing