Debugging ML Models: Difference between revisions

Line 8:

** Check that your loss is implemented correctly and taken against the correct ground truth image.

* Dump all inputs and outputs into [[TensorBoard]]. You may have an unexpected input or output somewhere.

* Try disabling any tricks you have like dropout.

* Make sure there is no activation on the final layer.

Line 18:

Line 16:

* Increase or decrease the learning rate one magnitude.

* Make sure the batch size is a multiple of 2. Try increasing it to get more stable gradient updates or decreasing it to get faster iterations.

* Try disabling any tricks you have like dropout.

==Overfitting==

Overfitting occurs when your training loss is below your validation loss.

Historically this was a big concern for ML models and people relied heavily on regularization to address overfitting.

Recently though, overfitting has become less of a concern with larger ML models.

* Increase the capacity and depth of the model.

* Add more training data if you can.

* Depending on your task, you can also try data augmentation (e.g. random crops, random rotations, random hue).

** PyTorch has many augmentations in [https://pytorch.org/docs/stable/torchvision/transforms.html torchvision.transforms].

** TF/Keras has augmentations as well in tf.image. See [https://www.tensorflow.org/tutorials/images/data_augmentation Tutorials: Data Augmentation].