5,321
edits
Line 58: | Line 58: | ||
* [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video by Yannic Kilcher] | * [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video by Yannic Kilcher] | ||
* [https://arxiv.org/abs/2207.09238 Formal Algorithms for Transformers (Arxiv 2022)] | * [https://arxiv.org/abs/2207.09238 Formal Algorithms for Transformers (Arxiv 2022)] | ||
==Followup work== | |||
* [https://arxiv.org/abs/2112.05682 Memory-efficient attention] reduces the memory overhead of an attention layer to a constant amount (specifically, a scalar and a vector the size of one output feature). | |||
** This processes queries sequentially, is good for weaker GPUs where memory is limited and computation is less parallel due to fewer cores. | |||
==References== | ==References== |