Debugging ML Models: Difference between revisions

Line 88: Line 88:
One example of this is softmax which allows you to apply gradients using a one-hot encoding.
One example of this is softmax which allows you to apply gradients using a one-hot encoding.


* Rather than regressing a real value <math>x</math> directly, regress a probability distribution.
* Rather than regressing a real value <math>x</math> directly, output a probability distribution.
** Output scores for <math>P(x=j)</math> for some fixed set of <math>j</math>, do softmax, and take the expected value.
** Output scores for <math>P(x=j)</math> for some fixed set of <math>j</math>, do softmax, and take the expected value.
** Or output <math>\mu, \sigma</math> and normalize the loss based on <math>\sigma</math>.
** Or output <math>\mu, \sigma</math> and normalize the loss based on <math>\sigma</math>.