Debugging ML Models
Notes on debugging ML models, primarilly CNNs.
Debugging
- Train on a single example and see if it overfits.
- If it doesn't overfit, there may be an issue with your code.
- You can try increasing the capacity (e.g number of filters or number of nodes in FC) 2-4x.
- E.g. if the input is 3 channels, the first conv layer should have more than 3 channels.
- Check that your loss is implemented correctly and taken against the correct ground truth image.
- Dump all inputs and outputs into TensorBoard. You may have an unexpected input or output somewhere.
- If it looks like it is underfitting (e.g. if the training output and validation output are both blurry):
- Train for 4x as long until the training loss and validation loss both flatten.
- Increase or decrease the learning rate one magnitude.
- Make sure the batch size is a multiple of 2. Try increasing it to get more stable gradient updates or decreasing it to get faster iterations.
- Try disabling any tricks you have like dropout.
- Make sure there is no activation on the final layer.