Jump to content

Machine Learning Glossary: Difference between revisions

Line 2: Line 2:


==A==
==A==
* Activation function - A nonlinear function applied after every linear layer in a neural network. Typically ReLU but can also be tanh or sine.
* Adam optimizer - A popular gradient descent optimizer which includes momentum and per-parameter learning rates.
* Attention - An component of [[Transformer_(machine_learning_model)|transformers]] which involves computing the product of query and key embeddings to compute the interaction between elements.
* Attention - An component of [[Transformer_(machine_learning_model)|transformers]] which involves computing the product of query and key embeddings to compute the interaction between elements.
* Adam optimizer - A popular gradient descent optimizer which includes momentum and per-parameter learning rates.


==B==
==B==