Machine Learning Glossary

From David's Wiki
Revision as of 21:50, 10 June 2022 by David (talk | contribs) (→‎U)
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Machine Learning, Computer Vision, and Computer Graphics Glossary

A

  • Attention - An element used in transformers which involves computing the product of query embeddings and key embeddings to compute the interaction between sequence elements.

B

  • Backwards propagation - Also known as backprop or backpropagation. Application of the chain rule on neural networks to compute gradients for each parameters. Known as backpropagation because you need to know gradients at the following layers to compute each layers gradient.

C

D

  • Dilation - Spacing between elements in a CNN kernel when applied. See Convolutional neural network.
  • Domain Adaptation - An area of research focused on making neural network work with alternate domains, or sources of data.

E

  • Early stopping - a technique where you stop training once the validation loss begins increasing. This is not as used these days with large models.

F

  • Forward propagation - Inference through a neural network by computing each layer's outputs.
  • Fully connected network - The standard neural network model where each layer is a sequence of nodes.

G

  • Generalization - How well a model works on data it has not been trained on.
  • Generative adversarial network (GAN) - A neural network setup for generating examples from a training distribution.
  • Gradient Descent - The operation used to update parameters when optimizing neural network. Also known as direction of steepest descent.
  • Graph neural network (GNN) - A type of neural network which operates on graph inputs.

H

  • Hinge Loss - A loss used for training classifiers which returns 0 for correct classifications and for bad classifications. \(l=\max(0, 1-y*\hat{y})\)
  • Hidden Layer - Intermediate layers in a neural network whose outputs are passed to other parts of the neural network.
  • Hyperparameter - Parameters of a model which are typically hand chosen and not directly optimized during training.

I

  • Intersection over Union (IoU) - A metric for computing the accuracy of bounding box prediction.

L

  • L1 or L2 norm - Two common norms used for computing accuracy or losses.
  • Loss function - Target function which you are attempting to minimize.
  • Long short-term memory (LSTM) - An RNN neural network architecture which has two sets of hidden states for long and short term.

M

  • MSE - Mean squared error. The L2 loss without the square root.
  • Multilayer perceptron (MLP) - See Fully connected network.

N

  • Normalized Device Coordinates - In images, pixels are in coordinates of \(\displaystyle [-1, 1]\times[-1, 1] \).

O

  • Overfitting - when a model begins to learn attributes specific to your training data, thereby worsening performance on non-training data.

R

  • Recurrent neural network (RNN) - A type of neural network which operates sequentially on sequence data.
  • Reinforcement Learning - an area of machine learning focused on learning to perform actions, E.g. playing a game

S

  • Stride - how far the CNN kernel in terms of input pixels moves between output pixels.

T

U

  • Underfitting - when a model performs poorly on both training and validation data, usually due to inadequate model complexity or training duration.

Other Resources