Debugging ML Models

From David's Wiki
Revision as of 17:57, 3 August 2020 by David (talk | contribs)
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Notes on debugging ML models, primarilly CNNs.

Debugging

  • Train on a single example and see if it overfits.
    • If it doesn't overfit, there may be an issue with your code.
    • You can try increasing the capacity (e.g number of filters or number of nodes in FC) 2-4x.
      If the input is 3 channels, then the first conv layer should have more than 3 channels.
    • Check that your loss is implemented correctly and taken against the correct ground truth image.
  • Dump all inputs and outputs into TensorBoard. You may have an unexpected input or output somewhere.
  • If it looks like it is underfitting (e.g. if the training output and validation output are both blurry):
    • Train for 4x as long until the training loss and validation loss both flatten.
    • Increase or decrease the learning rate one magnitude.
    • Make sure the batch size is a multiple of 2. Try increasing it to get more stable gradient updates or decreasing it to get faster iterations.
  • Try disabling any tricks you have like dropout.
  • Make sure there is no activation on the final layer.