5,321
edits
(→E) |
(→A) |
||
Line 3: | Line 3: | ||
==A== | ==A== | ||
* Attention - An component of [[Transformer_(machine_learning_model)|transformers]] which involves computing the product of query and key embeddings to compute the interaction between elements. | * Attention - An component of [[Transformer_(machine_learning_model)|transformers]] which involves computing the product of query and key embeddings to compute the interaction between elements. | ||
* Adam optimizer - A popular gradient descent optimizer which includes momentum and per-parameter learning rates. | |||
==B== | ==B== |