Branchless Programming: Difference between revisions
Created page with "The idea of branchless programming is to avoid <code>if</code> and <code>for</code> statements. Typically, these statements require the CPU to perform branch prediction. If..." |
|||
| (17 intermediate revisions by the same user not shown) | |||
| Line 3: | Line 3: | ||
If the CPU predicts incorrectly, it will have to discard all pipelined results and go down the other route. | If the CPU predicts incorrectly, it will have to discard all pipelined results and go down the other route. | ||
Furthermore warps on GPUs must all go through both branches if there are threads which go down each branch. | Furthermore warps on GPUs must all go through both branches if there are threads which go down each branch. | ||
In general, I consider avoiding if statements to be a micro-optimization. | In general, I consider avoiding if statements to be a micro-optimization. | ||
| Line 14: | Line 13: | ||
==Switch Statements== | ==Switch Statements== | ||
Technically, switch statements are still branches. | Technically, switch statements are still branches. | ||
However, they can lead to a negligible performance gains if there are lots of cases. | However, they can lead to a negligible performance gains if there are lots of cases. | ||
Only convert if to | Only convert if statements to switch statements when they are significantly more readable in your scenario. | ||
==Polymorphism== | ==Polymorphism== | ||
| Line 31: | Line 30: | ||
This is one way to simulate switch statements. | This is one way to simulate switch statements. | ||
Technically these do not have branching but will have a memory read which could be just as slow as a single branch. | Technically these do not have branching but will have a memory read which could be just as slow as a single branch. | ||
< | <syntaxhighlight lang="python"> | ||
# Cache the following at the start of your program or in the constructor | # Cache the following at the start of your program or in the constructor | ||
# Assume we have some functions batch_norm(), group_norm() | # Assume we have some functions batch_norm(), group_norm() | ||
| Line 39: | Line 38: | ||
"identity": identity | "identity": identity | ||
} | } | ||
# At runtime, you can do: | |||
x = my_cache[operation](x) | x = my_cache[operation](x) | ||
</ | </syntaxhighlight> | ||
==Booleans== | |||
Multiplications by booleans can be used instead of switch statements. | |||
Rather than doing | Rather than doing | ||
< | <syntaxhighlight lang="cpp"> | ||
if age > 10 | if (age > 10) { | ||
x = | x += 10; | ||
else | } else { | ||
x = | x -= 10; | ||
</ | } | ||
</syntaxhighlight> | |||
you can do | you can do | ||
< | <syntaxhighlight lang="cpp"> | ||
x = x + (20 * (age > 10) - 10); | |||
// or | // or | ||
y = age > 10 | y = age > 10; | ||
x = x + 10 * y - 10 * ( | x = x + 10 * y - 10 * !y; | ||
</ | </syntaxhighlight> | ||
Similarly, instead of doing | |||
<syntaxhighlight lang="cpp"> | |||
if (age > 10) { | |||
x++; | |||
} | |||
</syntaxhighlight> | |||
you can just write <code>x += age>10;</code>. | |||
These are micro optimizations but they | These are micro optimizations but they could be useful for speeding up GPU shaders in some cases. | ||
In high-level programming languages such as Python, these tricks can actually slow down your code<ref name="branchless-slower-python">https://stackoverflow.com/questions/72118249/why-are-the-branchless-and-built-in-functions-slower-in-python</ref>. | |||
==Resources== | ==Resources== | ||
* [https://francescocirillo.com/pages/anti-if-campaign Anti If Campaign] | * [https://francescocirillo.com/pages/anti-if-campaign Anti If Campaign] | ||
* [https://code.joejag.com/2016/anti-if-the-missing-patterns.html Anti-If: The missing patterns] | |||
==References== | |||
<references /> | |||