Program optimization

The most important aspect of optimizing your program is to continuosly perform benchmarks. Without any metrics, you risk complicating your code without any actual performance improvements.

Memory Access

See What Every Programmer Should Know About Memory and Memory Bandwidth Napkin Math.

In general, try to access memory sequentially to maximize the use of cache. Accessing memory can be 10x-100x slower than accessing cache. This means paying attention to ordering (i.e. row-major vs column-major) when iterating over arrays.

For very small amounts of data, it may be better to directly use an array instead of a fancy data structure which requires jumping around in memory.
For variables used together, it may make sense to put them in a struct so they are close together in memory.
Variables independently accessed by multiple threads can but put into thread-local storage (TLS).

Branchless Programming

Branchless programming is typically a micro-optimization since small branches are optimized by the compiler. However, if this fails to occur, you can consider manually removing branches in your code.