5,337
edits
(→Usage) |
|||
Line 51: | Line 51: | ||
** If you have a list of modules, make sure to wrap them in <code>nn.ModuleList</code> or <code>nn.Sequential</code> so they are properly recognized. | ** If you have a list of modules, make sure to wrap them in <code>nn.ModuleList</code> or <code>nn.Sequential</code> so they are properly recognized. | ||
* Write a forward pass for your model. | * Write a forward pass for your model. | ||
==Multi-GPU Training== | |||
See [https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html Multi-GPU Examples]. | |||
The basic idea is to wrap blocks in [https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel <code>nn.DataParallel</code>]. | |||
This will automatically duplicate the module across multiple GPUs and split the batch across GPUs during training. | |||
However, doing so causes you to lose access to custom methods and attributes. | |||
===nn.parallel.DistributedDataParallel=== | |||
[https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel nn.parallel.DistributedDataParallel] | |||
[https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead DistributedDataParallel vs DataParallel] | |||
The PyTorch documentation suggests using this instead of <code>nn.DataParallel</code>. | |||
The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter. | |||
==Memory Usage== | ==Memory Usage== |