PyTorch: Difference between revisions
| Line 70: | Line 70: | ||
The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter. | The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter. | ||
== | ==Optimizations== | ||
===Reducing memory usage=== | ===Reducing memory usage=== | ||
* Save loss using [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.item <code>.item()</code>] which returns a standard Python number | * Save loss using [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.item <code>.item()</code>] which returns a standard Python number | ||
| Line 87: | Line 87: | ||
This can result in a minor performance hit which PyTorch will warn you about if you repeatedly use a contiguous tensor with a channels last tensor. | This can result in a minor performance hit which PyTorch will warn you about if you repeatedly use a contiguous tensor with a channels last tensor. | ||
To address this, call [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.contiguous <code>contiguous</code>] on the tensor with the new memory format. | To address this, call [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.contiguous <code>contiguous</code>] on the tensor with the new memory format. | ||
===Float16=== | |||
Float16 uses half the memory of float32. | |||
New Nvidia GPUs also have dedicated hardware called tensor cores to speed up float16 matrix multiplication. | |||
Typically it's best to train using float32 though for stability purposes. | |||
You can do truncate trained models and inference using float16. | |||
Note that [https://en.wikipedia.org/wiki/Bfloat16_floating-point_format <code>bfloat16</code>] is different from IEEE float16. bfloat16 has fewer mantissa bits (8 exp, 7 mantissa) and is used by Google's TPUs. In contrast, float16 has 5 exp and 10 mantissa bits. | |||
==Classification== | ==Classification== | ||