Machine Learning Glossary: Difference between revisions

← Older edit

Latest revision as of 15:09, 12 July 2023

Machine Learning, Computer Vision, and Computer Graphics Glossary

A

Activation function - A nonlinear function applied after every linear layer in a neural network. Typically ReLU but can also be tanh or sine.
Adam optimizer - A popular gradient descent optimizer which includes momentum and per-parameter learning rates.
Attention - An component of transformers which involves computing the product of query and key embeddings to compute the interaction between elements.

B

Backwards propagation - Also known as backprop or backpropagation. Application of the chain rule on neural networks to compute gradients for each parameters. Known as backpropagation because you need to know gradients at the following layers to compute each layers gradient.

C

Capsule neural network - a niche type of neural network whose neurons outputs vectors instead of scalars.
Convolutional neural network or CNN - A neural network architecture for image data, or other data on a regular grid.
Cross-entropy - A loss function for used for classification with categorical data.

D

Decision Tree - A simple classifier which consists of layers of smaller binary classifiers which each reduces entropy in their classified sets.
Deep Learning - The use of neural networks (i.e. >= 2 layers) in machine learning tasks.
Dilation - Spacing between elements in a CNN kernel when applied. See Convolutional neural network.
Diffusion Models - A method which iteratively applies a neural network to sample from a target distribution.
Domain Adaptation - An area of research focused on making neural network work with alternate domains, or sources of data.
Dropout - A technique where you zero out the features outputs of a random percent of neurons in each iteration, turning your network into an ensemble of subnetworks during training.

E

Early stopping - a technique where you stop training once the validation loss begins increasing. This is not as popular these days with large models.
Early exitting - a technique to optimize neural network inference by routing features directly to an output head instead of through the entire model.
Embedding - see latent code. This is a learned feature used to represent something.

F

Forward propagation - Inference through a neural network by computing each layer's outputs.
Fréchet Inception Distance (FID) - a reference-based GAN evaluation metric which passes images through a pretrained network (typically Inception) and compares the distribution of intermediate features.
Fully connected network - The standard neural network model where each layer is a sequence of nodes.
Features - a generic term indicating the latent inputs or intermediate outputs of a neural network (2D = feature map, 3D = feature grid).

G

Generalization - How well a model works on data it has not been trained on.
Generative adversarial network (GAN) - A neural network setup for generating examples from a training distribution.
Generative Pretrained Transformer (GPT) - A large decoder-only transformer trained on next-word prediction. GPT-2, GPT-3 refers specific models owned by OpenAI.
Gradient Descent - The operation used to update parameters when optimizing neural network. Also known as direction of steepest descent.
Graph neural network (GNN) - A type of neural network which operates on graph inputs.
Gumbel-Softmax - a method to differentiably sample from a distribution

H

Hinge Loss - A loss used for training classifiers which returns 0 for correct classifications and for bad classifications. \(l=\max(0, 1-y*\hat{y})\)
Hidden Layer - Intermediate layers in a neural network whose outputs are passed to other parts of the neural network.
Hyperparameter - Parameters of a model which are typically hand chosen and not directly optimized during training.

I

Inception score (IS) - a GAN evaluation metric which involves using Inceptionv3 to predict labels with high confidence and variety.
Intersection over Union (IoU) - A metric for computing the accuracy of bounding box prediction.

L

L1, L2, Lp norm - Two common norms used for computing accuracy or losses. Lp is the general form where the exponent is p.
Large Language Model (LLM) - A neural network, typically a transformer (see GPT), which is trained to understand language.
Latent code - an input to a neural network which gets optimized during training to represent a particular state or value
LLaMa - a set pf language models published in 2023 by Meta. It is trained on more tokens and some models can run on a single GPU or CPU.
Long short-term memory (LSTM) - An RNN neural network architecture which has two sets of hidden states for long and short term.
Loss function - Target function which you are attempting to minimize.
Low-rank Adaptation (LORA) - A technique to fine-tune a model which uses low-rank matrices as diffs to the original model.

M

MSE - Mean squared error. The L2 loss without the square root.
Multilayer perceptron (MLP) - See Fully connected network.

N

Neurons - Individual elements in a MLP layer (perceptron + activation) which supposedly resemble brain neurons.
Neural Fields - A subfield of computer vision and graphics which uses neural networks to represent 2D/3D scenes and perform tasks such as 3D reconstruction, scene generation, and image compression.
Normalized Device Coordinates - In images, pixels are in coordinates of \(\displaystyle [-1, 1]\times[-1, 1] \).

O

Overfitting - when a model begins to learn attributes specific to your training data, thereby worsening performance on non-training data.

P

Peak signal-to-noise ratio (PSNR) - a pixel-wise metric used to evaluate the quality of an image against a reference.
Perceptron - a linear classifier.
Perceptual loss - a loss function which passes images through a pretrained network (e.g. VGG) and compares intermediate features instead of raw pixels.
Positional encoding - Applying sin/cos at various frequencies (i.e. fourier basis) so the network can distinguish input values at different scales. Used in neural fields as well as NLP models to encode the relative positions of inputs.

R

Random Forest - an ensemble learning method which involves building several decision trees, each with different subsets of features.
Recurrent neural network (RNN) - A type of neural network which operates sequentially on sequence data.
Reinforcement Learning - an area of machine learning focused on learning to perform actions, E.g. playing a game
Receptive Field - in a CNN, the size of pixels in the input which affect a value in the output feature map

S

Stride - how far the CNN kernel in terms of input pixels moves between output pixels.
Structural simmilarity metric (SSIM) - a reference-based image metric which compares local image patch statistics (mean, covariance).
Softmax - a function used to convert a set of logits into a probability distribution.
Support Vector Machine (SVM) - A linear classifier which maximizes the margin/distance to the nearest examples.
Stochastic Gradient Descent (SGD) - A variation on gradient descent where we only compute the gradient of the parameters against a small batch of data instead of the entire dataset.

T

Transfer Learning - Techniques to make a neural network perform a different task than what it is trained on.
Transformer (machine learning model) - A neural network architecture which uses attention between elements. Originally designed for NLP, but now used for many other areas.

U

Underfitting - when a model performs poorly on both training and validation data, usually due to inadequate model complexity or training duration.

Other Resources

Google Machine Learning Glossary

@@ Line 2: / Line 2: @@
 ==A==
+* Activation function - A nonlinear function applied after every linear layer in a neural network. Typically ReLU but can also be tanh or sine.
+* Adam optimizer - A popular gradient descent optimizer which includes momentum and per-parameter learning rates.
 * Attention - An component of [[Transformer_(machine_learning_model)|transformers]] which involves computing the product of query and key embeddings to compute the interaction between elements.
@@ Line 8: / Line 10: @@
 ==C==
-* [[Capsule neural network]]
+* [[Capsule neural network]] - a niche type of neural network whose neurons outputs vectors instead of scalars.
 * [[Convolutional neural network]] or CNN - A neural network architecture for image data, or other data on a regular grid.
+* Cross-entropy - A loss function for used for classification with categorical data.
 ==D==
@@ Line 15: / Line 18: @@
 * Deep Learning - The use of neural networks (i.e. >= 2 layers) in machine learning tasks.
 * Dilation - Spacing between elements in a CNN kernel when applied. See [[Convolutional neural network]].
+* [[Diffusion Models]] - A method which iteratively applies a neural network to sample from a target distribution.
 * Domain Adaptation - An area of research focused on making neural network work with alternate domains, or sources of data.
 * Dropout - A technique where you zero out the features outputs of a random percent of neurons in each iteration, turning your network into an ensemble of subnetworks during training.
@@ Line 21: / Line 25: @@
 * Early stopping - a technique where you stop training once the validation loss begins increasing. This is not as popular these days with large models.
 * Early exitting - a technique to optimize neural network inference by routing features directly to an output head instead of through the entire model.
-* Embedding - see latent code. This is a just a feature used to represent something.
+* Embedding - see latent code. This is a learned feature used to represent something.
 ==F==
 * Forward propagation - Inference through a neural network by computing each layer's outputs.
+* Fréchet Inception Distance (FID) - a reference-based GAN evaluation metric which passes images through a pretrained network (typically Inception) and compares the distribution of intermediate features.
 * Fully connected network - The standard neural network model where each layer is a sequence of nodes.
-* Features or Feature map - a generic term indicating the intermediate outputs of a neural network
+* Features - a generic term indicating the latent inputs or intermediate outputs of a neural network (2D = feature map, 3D = feature grid).
 ==G==
 * Generalization - How well a model works on data it has not been trained on.
 * [[Generative adversarial network]] (GAN) - A neural network setup for generating examples from a training distribution.
+* Generative Pretrained Transformer (GPT) - A large decoder-only transformer trained on next-word prediction. GPT-2, GPT-3 refers specific models owned by OpenAI.
 * Gradient Descent - The operation used to update parameters when optimizing neural network. Also known as direction of steepest descent.
 * [[Graph neural network]] (GNN) - A type of neural network which operates on graph inputs.
+* [[Gumbel-Softmax]] - a method to differentiably sample from a distribution
 ==H==
@@ Line 40: / Line 47: @@
 ==I==
+* Inception score (IS) - a GAN evaluation metric which involves using Inceptionv3 to predict labels with high confidence and variety.
 * Intersection over Union (IoU) - A metric for computing the accuracy of bounding box prediction.
 ==L==
-* L1 or L2 norm - Two common norms used for computing accuracy or losses.
+* L1, L2, Lp norm - Two common norms used for computing accuracy or losses. Lp is the general form where the exponent is ''p''.
+* Large Language Model (LLM) - A neural network, typically a transformer (see GPT), which is trained to understand language.
 * Latent code - an input to a neural network which gets optimized during training to represent a particular state or value
+* [https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ LLaMa] - a set pf language models published in 2023 by Meta. It is trained on more tokens and some models can run on a single GPU or CPU.
+* [[Long short-term memory]] (LSTM) - An RNN neural network architecture which has two sets of hidden states for long and short term.
 * Loss function - Target function which you are attempting to minimize.
-* [[Long short-term memory]] (LSTM) - An RNN neural network architecture which has two sets of hidden states for long and short term.
+* [https://arxiv.org/abs/2106.09685 Low-rank Adaptation (LORA)] - A technique to fine-tune a model which uses low-rank matrices as diffs to the original model.
 ==M==
@@ Line 54: / Line 65: @@
 ==N==
 * Neurons - Individual elements in a MLP layer (perceptron + activation) which supposedly resemble brain neurons.
+* Neural Fields - A subfield of computer vision and graphics which uses neural networks to represent 2D/3D scenes and perform tasks such as 3D reconstruction, scene generation, and image compression.
 * Normalized Device Coordinates - In images, pixels are in coordinates of <math>[-1, 1]\times[-1, 1] </math>.
@@ Line 60: / Line 72: @@
 ==P==
+* Peak signal-to-noise ratio (PSNR) - a pixel-wise metric used to evaluate the quality of an image against a reference.
 * Perceptron - a linear classifier.
-* Positional encoding - Applying sin/cos at various frequencies (i.e. fourier basis) so the network can distinguish input values at different scales. Used in NeRF as well as in NLP models to indicate the relative position of tokens.
+* Perceptual loss - a loss function which passes images through a pretrained network (e.g. VGG) and compares intermediate features instead of raw pixels.
+* Positional encoding - Applying sin/cos at various frequencies (i.e. fourier basis) so the network can distinguish input values at different scales. Used in neural fields as well as NLP models to encode the relative positions of inputs.
 ==R==
@@ Line 71: / Line 85: @@
 ==S==
 * Stride - how far the CNN kernel in terms of input pixels moves between output pixels.
+* Structural simmilarity metric (SSIM) - a reference-based image metric which compares local image patch statistics (mean, covariance).
+* Softmax - a function used to convert a set of ''logits'' into a probability distribution.
 * Support Vector Machine (SVM) - A linear classifier which maximizes the margin/distance to the nearest examples.
+* Stochastic Gradient Descent (SGD) - A variation on gradient descent where we only compute the gradient of the parameters against a small batch of data instead of the entire dataset.
 ==T==