PyTorch: Difference between revisions

From David's Wiki
Line 51: Line 51:
** If you have a list of modules, make sure to wrap them in <code>nn.ModuleList</code> or <code>nn.Sequential</code> so they are properly recognized.
** If you have a list of modules, make sure to wrap them in <code>nn.ModuleList</code> or <code>nn.Sequential</code> so they are properly recognized.
* Write a forward pass for your model.
* Write a forward pass for your model.
==Multi-GPU Training==
See [https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html Multi-GPU Examples].
The basic idea is to wrap blocks in [https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel <code>nn.DataParallel</code>]. 
This will automatically duplicate the module across multiple GPUs and split the batch across GPUs during training.
However, doing so causes you to lose access to custom methods and attributes.
===nn.parallel.DistributedDataParallel===
[https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel nn.parallel.DistributedDataParallel] 
[https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead DistributedDataParallel vs DataParallel]
The PyTorch documentation suggests using this instead of <code>nn.DataParallel</code>.
The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter.


==Memory Usage==
==Memory Usage==

Revision as of 15:18, 13 August 2020

PyTorch is a popular machine learning library developed by Facebook

Installation

See PyTorch Getting Started

# If using conda, python 3.5+, and CUDA 10.0 (+ compatible cudnn)
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Getting Started

import torch
import torch.nn as nn

# Training
for epoch in range(epochs):
  running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

Importing Data

See Data Loading Tutorial

Usage

torch.nn.functional

PyTorch Documentation

F.grid_sample

Doc
This function allows you to perform interpolation on your input tensor.
It is very useful for resizing images or warping images.

Building a Model

To build a model, do the following:

  • Create a class extending nn.Module.
  • In your class include all other modules you need during init.
    • If you have a list of modules, make sure to wrap them in nn.ModuleList or nn.Sequential so they are properly recognized.
  • Write a forward pass for your model.

Multi-GPU Training

See Multi-GPU Examples.

The basic idea is to wrap blocks in nn.DataParallel.
This will automatically duplicate the module across multiple GPUs and split the batch across GPUs during training.

However, doing so causes you to lose access to custom methods and attributes.

nn.parallel.DistributedDataParallel

nn.parallel.DistributedDataParallel
DistributedDataParallel vs DataParallel

The PyTorch documentation suggests using this instead of nn.DataParallel. The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter.

Memory Usage

Reducing memory usage

  • Save loss using .item() which returns a standard Python number
  • For non-scalar items, use my_var.detach().cpu().numpy()
  • detach() deletes the item from the autograd edge
  • cpu() copies the tensor to the CPU
  • numpy() returns a numpy view of the tensor

TensorBoard

See PyTorch Docs: Tensorboard

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter(log_dir="./runs")

# Calculate loss. Increment the step.

writer.add_scalar("train_loss", loss.item(), step)

# Optionally flush e.g. at checkpoints
writer.flush()

# Close the writer (will flush)
writer.close()

PyTorch3D

PyTorch3D is a library by Facebook AI Research which contains differentiable renderers for meshes and point clouds.
It is build using custom CUDA kernels.