Transformer (machine learning model): Difference between revisions

Transformer (machine learning model) (view source)

365 bytes added , 26 September 2022

5,321

edits

@@ Line 58: / Line 58: @@
 * [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video by Yannic Kilcher]
 * [https://arxiv.org/abs/2207.09238 Formal Algorithms for Transformers (Arxiv 2022)]
+==Followup work==
+* [https://arxiv.org/abs/2112.05682 Memory-efficient attention] reduces the memory overhead of an attention layer to a constant amount (specifically, a scalar and a vector the size of one output feature).
+** This processes queries sequentially, is good for weaker GPUs where memory is limited and computation is less parallel due to fewer cores.
 ==References==