PyTorch: Difference between revisions
Created page with "PyTorch is a popular machine learning library developed by Facebook ==Usage==" |
|||
(42 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
PyTorch is a popular machine learning library developed by Facebook | PyTorch is a popular machine learning library developed by Facebook | ||
==Installation== | |||
See [https://pytorch.org/get-started/locally/ PyTorch Getting Started] and [https://pytorch.org/get-started/previous-versions/ PyTorch Previous Versions] | |||
I recommend using the conda installation method since it is paired with the correct version of cuda. | |||
==Getting Started== | |||
* [https://pytorch.org/tutorials/ PyTorch Tutorials] | |||
{{hidden | Example | | |||
<syntaxhighlight lang="python"> | |||
import torch | |||
import torch.nn as nn | |||
model = nn.Sequential(nn.Linear(5, 5),nn.ReLU(),nn.Linear(5, 1)) | |||
criterion = nn.MSELoss() | |||
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) | |||
# Training | |||
for epoch in range(epochs): | |||
for i, data in enumerate(trainloader): | |||
# get the inputs; e.g. data is a list of [inputs, labels] | |||
inputs, labels = data | |||
# zero the parameter gradients | |||
optimizer.zero_grad() | |||
# forward | |||
outputs = model(inputs) | |||
loss = criterion(outputs, labels) | |||
# backward | |||
loss.backward() | |||
optimizer.step() | |||
</syntaxhighlight> | |||
}} | |||
==Importing Data== | |||
See [https://pytorch.org/tutorials/beginner/data_loading_tutorial.html Data Loading Tutorial] | |||
==Usage== | ==Usage== | ||
Note that there are several useful functions under <code>torch.nn.functional</code> which is typically imported as <code>F</code>. | |||
Most neural network layers are actually implemented in functional. | |||
===torch.meshgrid=== | |||
Note that this is transposed compared to <code>np.meshgrid</code>. | |||
===torch.multinomial=== | |||
[https://pytorch.org/docs/stable/generated/torch.multinomial.html torch.multinomial]<br> | |||
If you need to sample with a lot of categories and with replacement, it may be faster to use `torch.cumsum` to build a CDF and `torch.searchsorted`. | |||
{{hidden | torch.searchsorted example | | |||
<syntaxhighlight lang="python"> | |||
# Create your weights variable. | |||
weights_cdf = torch.cumsum(weights, dim=0) | |||
weights_cdf_max = weights_cdf[0] | |||
sample = torch.searchsorted(weights_cdf, | |||
weights_cdf_max * torch.rand(num_samples)) | |||
</syntaxhighlight> | |||
}} | |||
===F.grid_sample=== | |||
[https://pytorch.org/docs/stable/nn.functional.html#grid-sample Doc]<br> | |||
This function allows you to perform interpolation on your input tensor.<br> | |||
It is very useful for resizing images or warping images. | |||
==Building a Model== | |||
To build a model, do the following: | |||
* Create a class extending <code>nn.Module</code>. | |||
* In your class include all other modules you need during init. | |||
** If you have a list of modules, make sure to wrap them in <code>nn.ModuleList</code> or <code>nn.Sequential</code> so they are properly recognized. | |||
* Wrap any parameters for you model in <code>nn.Parameter(weight, requires_grad=True)</code>. | |||
* Write a forward pass for your model. | |||
==Multi-GPU Training== | |||
See [https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html Multi-GPU Examples]. | |||
===nn.DataParallel=== | |||
The basic idea is to wrap blocks in [https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel <code>nn.DataParallel</code>]. | |||
This will automatically duplicate the module across multiple GPUs and split the batch across GPUs during training. | |||
However, doing so causes you to lose access to custom methods and attributes. | |||
To save and load the model, just use <code>model.module.save_state_dict()</code> and <code>model.module.load_state_dict()</code>. | |||
===nn.parallel.DistributedDataParallel=== | |||
[https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel nn.parallel.DistributedDataParallel] | |||
[https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead DistributedDataParallel vs DataParallel] | |||
[https://pytorch.org/tutorials/intermediate/ddp_tutorial.html ddp tutorial] | |||
The PyTorch documentation suggests using this instead of <code>nn.DataParallel</code>. | |||
The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter. | |||
It also supports training on GPUs across multiple ''nodes'', or computers. | |||
Using this is quite a bit more work than nn.DataParallel. | |||
You may want to consider using PyTorch Lightning which abstracts this away. | |||
==Optimizations== | |||
===Reducing GPU memory usage=== | |||
* Save loss using [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.item <code>.item()</code>] which returns a standard Python number | |||
* For non-scalar items, use <code>my_var.detach().cpu().numpy()</code> | |||
* [https://pytorch.org/docs/stable/autograd.html#torch.Tensor.detach <code>detach()</code>] removes the item from the autograd edge. | |||
* [https://pytorch.org/docs/stable/tensors.html?highlight=cpu#torch.Tensor.cpu <code>cpu()</code>] moves the tensor to the CPU. | |||
* [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.numpy <code>numpy()</code>] returns a numpy view of the tensor. | |||
When possible, use functions which return new views of existing tensors rather than making duplicates of tensors: | |||
* [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.permute <code>permute</code>] | |||
* [https://pytorch.org/docs/stable/generated/torch.Tensor.expand.html#torch.Tensor.expand <code>expand</code>] instead of [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.repeat <code>repeat</code>] | |||
* [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view <code>view</code>] | |||
Note that <code>permute</code> does not change the underlying data. | |||
This can result in a minor performance hit which PyTorch will warn you about if you repeatedly use a contiguous tensor with a channels last tensor. | |||
To address this, call [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.contiguous <code>contiguous</code>] on the tensor with the new memory format. | |||
;During inference | |||
* Use `model.eval()` | |||
* Use `with torch.no_grad():` | |||
===Float16=== | |||
Float16 uses half the memory of float32. | |||
New Nvidia GPUs also have dedicated hardware instructions called tensor cores to speed up float16 matrix multiplication. | |||
Typically it's best to train using float32 though for stability purposes. | |||
You can do truncate trained models and inference using float16. | |||
Note that [https://en.wikipedia.org/wiki/Bfloat16_floating-point_format <code>bfloat16</code>] is different from IEEE float16. bfloat16 has fewer mantissa bits (8 exp, 7 mantissa) and is used by Google's TPUs. In contrast, float16 has 5 exp and 10 mantissa bits. | |||
==Classification== | |||
In classification, your model outputs a vector of ''logits''. | |||
These are relative scores for each potential output class. | |||
To compute the loss, pass the logits into a [https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html cross-entropy loss]. | |||
To compute the accuracy, you can use [https://pytorch.org/docs/stable/generated/torch.argmax.html <code>torch.argmax</code>] to get the top prediction or [https://pytorch.org/docs/stable/generated/torch.topk.html <code>torch.topk</code>] to get the top-k prediction. | |||
==Debugging== | |||
{{see also | Debugging ML Models}} | |||
If you get a cuda kernel error, you can rerun with the environment variable <code>CUDA_LAUNCH_BLOCKING=1</code> to get the correct line in the stack trace. | |||
<pre> | |||
CUDA_LAUNCH_BLOCKING=1 python app.py | |||
</pre> | |||
For the following error: | |||
<pre> | |||
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx(...)` | |||
</pre> | |||
First check all your tensor types and shapes.<br> | |||
If you've checked all your tensor shapes and types and you can try running with the environment variable: | |||
<pre> | |||
CUBLAS_WORKSPACE_CONFIG=:0:0 | |||
</pre> | |||
References: | |||
* [https://github.com/pytorch/pytorch/issues/54975 https://github.com/pytorch/pytorch/issues/54975] | |||
==TensorBoard== | |||
{{main | TensorBoard}} | |||
See [https://pytorch.org/docs/stable/tensorboard.html PyTorch Docs: Tensorboard] | |||
<syntaxhighlight lang="python"> | |||
from torch.utils.tensorboard import SummaryWriter | |||
writer = SummaryWriter(log_dir="./runs") | |||
# Calculate loss. Increment the step. | |||
writer.add_scalar("train_loss", loss.item(), step) | |||
# Optionally flush e.g. at checkpoints | |||
writer.flush() | |||
# Close the writer (will flush) | |||
writer.close() | |||
</syntaxhighlight> | |||
==Libraries== | |||
A list of useful libraries | |||
===torchvision=== | |||
https://pytorch.org/vision/stable/index.html | |||
Official tools for image manipulation such as blur, bounding boxes. | |||
===torchmetrics=== | |||
https://torchmetrics.readthedocs.io/en/stable/ | |||
Various metrics such as PSNR, SSIM, LPIPS | |||
===PyTorch3D=== | |||
{{main | PyTorch3D}} | |||
[https://github.com/facebookresearch/pytorch3d PyTorch3D] | |||
Facebook library with differentiable renderers for meshes and point clouds. |