5,337
edits
Line 12: | Line 12: | ||
* Make sure there is no activation on the final layer. | * Make sure there is no activation on the final layer. | ||
* If the loss is unstable or increasing, drop the learning rate to <code>O(1e-3)</code> or <code>O(1e-4)</code>. | * If the loss is unstable or increasing, drop the learning rate to <code>O(1e-3)</code> or <code>O(1e-4)</code>. | ||
* Try taking the loss closer to the output of the network. | * Try taking the loss closer to the output of the network. | ||
** If you apply some transformations \(f\) after the output, do \(loss = loss\_fn(f^{-1}(gt), output)\) instead of \(loss = loss\_fn(gt, f(output))\). | |||
** This shortens the paths the gradients need to flow through. | |||
** Note that this may change the per-pixel weights of the loss function. | ** Note that this may change the per-pixel weights of the loss function. | ||