\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Notes on debugging ML models, primarilly CNNs.

Debugging

  • Train on a single example and see if it overfits.
    • If it doesn't overfit, there may be an issue with your code.
    • You can try increasing the capacity (e.g number of filters or number of nodes in FC) 2-4x.
      If the input is 3 channels, then the first conv layer should have more than 3 channels.
    • Check that your loss is implemented correctly and taken against the correct ground truth image.
  • Dump all inputs and outputs into TensorBoard. You may have an unexpected input or output somewhere.
  • Try disabling any tricks you have like dropout.
  • Make sure there is no activation on the final layer.

Underfitting

If it looks like it is underfitting (e.g. if the training output and validation output are both blurry), then you can try the following.

  • Train for 4x as long until the training loss and validation loss both flatten.
  • Increase or decrease the learning rate one magnitude.
  • Make sure the batch size is a multiple of 2. Try increasing it to get more stable gradient updates or decreasing it to get faster iterations.