Branchless Programming

Revision as of 16:26, 13 August 2020 by David (talk | contribs) (Created page with "The idea of branchless programming is to avoid <code>if</code> and <code>for</code> statements. Typically, these statements require the CPU to perform branch prediction. If...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

The idea of branchless programming is to avoid if and for statements.
Typically, these statements require the CPU to perform branch prediction. If the CPU predicts incorrectly, it will have to discard all pipelined results and go down the other route. Furthermore warps on GPUs must all go through both branches if there are threads which go down each branch. This can significantly slow things down.

In general, I consider avoiding if statements to be a micro-optimization. You should just use if statements wherever the optimized alternative is less readable. Typically, this is fine for CPUs which are good at branch prediction.

Below are a few tricks to avoiding branches and if statements. In general, I would not refractor to improve performance through these micro-optimizations, only to improve readability.

Switch Statements

Technically, switch statements are still branches. However, they can lead to a negligible performance gains if there are lots of cases. Only convert if to switch if switch statements are significantly more readable in your scenario.

Polymorphism

Rather than do

def speak():
  if self.type == "bird":
    print("Chirp")
  elif self.type == "cat":
    print("Meow")

create separate classes for bird and cat each overriding the speak method.

Hash Maps

This is one way to simulate switch statements. Technically these do not have branching but will have a memory read which could be just as slow as a single branch.

# Cache the following at the start of your program or in the constructor
# Assume we have some functions batch_norm(), group_norm()
my_cache = {
  "batch_norm": batch_norm,
  "group_norm": group_norm,
  "identity": identity
}
x = my_cache[operation](x)

Multiplications

Rather than doing

if age > 10:
  x = x + 10
else:
  x = x - 10

you can do

y = 2 * (age > 10) - 1
x = x + 10 * y
// or
y = age > 10
x = x + 10 * y - 10 * (1-y)

These are micro optimizations but they are sometimes useful for speeding up GPU shaders.

Resources