5,337
edits
Line 88: | Line 88: | ||
One example of this is softmax which allows you to apply gradients using a one-hot encoding. | One example of this is softmax which allows you to apply gradients using a one-hot encoding. | ||
* Rather than regressing a real value <math>x</math> directly, | * Rather than regressing a real value <math>x</math> directly, output a probability distribution. | ||
** Output scores for <math>P(x=j)</math> for some fixed set of <math>j</math>, do softmax, and take the expected value. | ** Output scores for <math>P(x=j)</math> for some fixed set of <math>j</math>, do softmax, and take the expected value. | ||
** Or output <math>\mu, \sigma</math> and normalize the loss based on <math>\sigma</math>. | ** Or output <math>\mu, \sigma</math> and normalize the loss based on <math>\sigma</math>. |