PyTorch: Difference between revisions

Line 70:

The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter.

==~~Memory~~==

==Optimizations==

===Reducing memory usage===

* Save loss using [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.item <code>.item()</code>] which returns a standard Python number

Line 87:

This can result in a minor performance hit which PyTorch will warn you about if you repeatedly use a contiguous tensor with a channels last tensor.

To address this, call [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.contiguous <code>contiguous</code>] on the tensor with the new memory format.

===Float16===

Float16 uses half the memory of float32.

New Nvidia GPUs also have dedicated hardware called tensor cores to speed up float16 matrix multiplication.

Typically it's best to train using float32 though for stability purposes.

You can do truncate trained models and inference using float16.

Note that [https://en.wikipedia.org/wiki/Bfloat16_floating-point_format <code>bfloat16</code>] is different from IEEE float16. bfloat16 has fewer mantissa bits (8 exp, 7 mantissa) and is used by Google's TPUs. In contrast, float16 has 5 exp and 10 mantissa bits.

==Classification==

@@ Line 70: / Line 70: @@
 The main difference is this uses multiple processes instead of multithreading to work around the Python Interpreter.
-==Memory==
+==Optimizations==
 ===Reducing memory usage===
 * Save loss using [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.item <code>.item()</code>] which returns a standard Python number
@@ Line 87: / Line 87: @@
 This can result in a minor performance hit which PyTorch will warn you about if you repeatedly use a contiguous tensor with a channels last tensor.
 To address this, call [https://pytorch.org/docs/stable/tensors.html#torch.Tensor.contiguous <code>contiguous</code>] on the tensor with the new memory format.
+===Float16===
+Float16 uses half the memory of float32.
+New Nvidia GPUs also have dedicated hardware called tensor cores to speed up float16 matrix multiplication.
+Typically it's best to train using float32 though for stability purposes.
+You can do truncate trained models and inference using float16.
+Note that [https://en.wikipedia.org/wiki/Bfloat16_floating-point_format <code>bfloat16</code>] is different from IEEE float16. bfloat16 has fewer mantissa bits (8 exp, 7 mantissa) and is used by Google's TPUs. In contrast, float16 has 5 exp and 10 mantissa bits.
 ==Classification==