Debugging ML Models: Difference between revisions

Line 12: Line 12:
* Make sure there is no activation on the final layer.
* Make sure there is no activation on the final layer.
* If the loss is unstable or increasing, drop the learning rate to <code>O(1e-3)</code> or <code>O(1e-4)</code>.
* If the loss is unstable or increasing, drop the learning rate to <code>O(1e-3)</code> or <code>O(1e-4)</code>.
* Try taking the loss closer to the output of the network. E.g. if \(f\) are some transformations, do \(loss = loss\_fn(f^{-1}(gt), output)\) instead of \(loss = loss\_fn(gt, f(output))\).
* Try taking the loss closer to the output of the network.  
** If you apply some transformations \(f\) after the output, do \(loss = loss\_fn(f^{-1}(gt), output)\) instead of \(loss = loss\_fn(gt, f(output))\).
** This shortens the paths the gradients need to flow through.
** Note that this may change the per-pixel weights of the loss function.
** Note that this may change the per-pixel weights of the loss function.