PyTorch: Difference between revisions
Line 81: | Line 81: | ||
When possible, use functions which return new views of existing tensors rather than making duplicates of tensors: | When possible, use functions which return new views of existing tensors rather than making duplicates of tensors: | ||
* [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.permute <code>permute</code>] | * [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.permute <code>permute</code>] | ||
* [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand <code>expand</code> | * [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand <code>expand</code>] instead of <code>repeat</code> | ||
* [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view <code>view</code>] | * [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view <code>view</code>] | ||
Revision as of 20:32, 27 January 2021
PyTorch is a popular machine learning library developed by Facebook
Installation
# If using conda, python 3.5+, and CUDA 10.0 (+ compatible cudnn)
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
Getting Started
import torch
import torch.nn as nn
# Training
for epoch in range(epochs):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
Importing Data
Usage
torch.meshgrid
Note that this is transposed compared to np.meshgrid
.
torch.nn.functional
F.grid_sample
Doc
This function allows you to perform interpolation on your input tensor.
It is very useful for resizing images or warping images.
Building a Model
To build a model, do the following:
- Create a class extending
nn.Module
. - In your class include all other modules you need during init.
- If you have a list of modules, make sure to wrap them in
nn.ModuleList
ornn.Sequential
so they are properly recognized.
- If you have a list of modules, make sure to wrap them in
- Write a forward pass for your model.
Multi-GPU Training
See Multi-GPU Examples.
The basic idea is to wrap blocks in nn.DataParallel
.
This will automatically duplicate the module across multiple GPUs and split the batch across GPUs during training.
However, doing so causes you to lose access to custom methods and attributes.
nn.parallel.DistributedDataParallel
nn.parallel.DistributedDataParallel
DistributedDataParallel vs DataParallel
The PyTorch documentation suggests using this instead of nn.DataParallel
.
The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter.
Memory
Reducing memory usage
- Save loss using
.item()
which returns a standard Python number - For non-scalar items, use
my_var.detach().cpu().numpy()
detach()
removes the item from the autograd edge.cpu()
moves the tensor to the CPU.numpy()
returns a numpy view of the tensor.
When possible, use functions which return new views of existing tensors rather than making duplicates of tensors:
Note that permute
does not change the underlying data.
This can result in a minor performance hit if you repeatedly use a contiguous tensor with a channels last tensor.
To address this, call contiguous
on the tensor with the new memory format.
Classification
In classification, your model outputs a vector of logits.
These are relative scores for each potential output class.
To compute the loss, pass the logits into a cross-entropy loss.
To compute the accuracy, you can use torch.argmax
to get the top prediction or torch.topk
to get the top-k prediction.
TensorBoard
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter(log_dir="./runs")
# Calculate loss. Increment the step.
writer.add_scalar("train_loss", loss.item(), step)
# Optionally flush e.g. at checkpoints
writer.flush()
# Close the writer (will flush)
writer.close()
PyTorch3D
PyTorch3D is a library by Facebook AI Research which contains differentiable renderers for meshes and point clouds.
It is built using custom CUDA kernels and only runs on Linux.