Jump to content

PyTorch: Difference between revisions

1,016 bytes added ,  13 August 2020
Line 51: Line 51:
** If you have a list of modules, make sure to wrap them in <code>nn.ModuleList</code> or <code>nn.Sequential</code> so they are properly recognized.
** If you have a list of modules, make sure to wrap them in <code>nn.ModuleList</code> or <code>nn.Sequential</code> so they are properly recognized.
* Write a forward pass for your model.
* Write a forward pass for your model.
==Multi-GPU Training==
See [https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html Multi-GPU Examples].
The basic idea is to wrap blocks in [https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel <code>nn.DataParallel</code>]. 
This will automatically duplicate the module across multiple GPUs and split the batch across GPUs during training.
However, doing so causes you to lose access to custom methods and attributes.
===nn.parallel.DistributedDataParallel===
[https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel nn.parallel.DistributedDataParallel] 
[https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead DistributedDataParallel vs DataParallel]
The PyTorch documentation suggests using this instead of <code>nn.DataParallel</code>.
The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter.


==Memory Usage==
==Memory Usage==