Debugging ML Models: Difference between revisions
Tags: Mobile edit Mobile web edit |
|||
| (5 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
Notes on debugging ML models, primarilly CNNs. | Notes on debugging ML models, primarilly CNNs. | ||
Most of this is advice I've found online or gotten through mentors. | Most of this is advice I've found online or gotten through mentors or experience. | ||
==Debugging== | ==Debugging== | ||
| Line 26: | Line 26: | ||
==Overfitting== | ==Overfitting== | ||
Overfitting occurs when your training | Overfitting occurs when your model begins learning attributes specific to your training data, causing your validation loss to increase. | ||
Historically this was a big concern for ML models and people relied heavily on regularization to address overfitting. | Historically this was a big concern for ML models and people relied heavily on regularization to address overfitting. | ||
Recently though, overfitting has become less of a concern with larger ML models. | Recently though, overfitting has become less of a concern with larger ML models. | ||
| Line 53: | Line 53: | ||
assert all_finite(my_tensor), "my_tensor has NaNs or Infs" | assert all_finite(my_tensor), "my_tensor has NaNs or Infs" | ||
# Or | |||
tf.debugging.assert_all_finite(my_tensor, "my_tensor has NaNs or Infs") | |||
</syntaxhighlight> | </syntaxhighlight> | ||
| Line 62: | Line 65: | ||
* Checking that the training data has no NaNs or Infs. | * Checking that the training data has no NaNs or Infs. | ||
* Checking that there are no divides anywhere in the code or that all divides are safe. | * Checking that there are no divides anywhere in the code or that all divides are safe. | ||
** See [https://www.tensorflow.org/api_docs/python/tf/math/divide_no_nan <code>tf.math.divide_no_nan</code>]. | |||
* Checking the gradients of trig functions in the code. | * Checking the gradients of trig functions in the code. | ||
| Line 82: | Line 86: | ||
** For Tensorflow see [https://www.tensorflow.org/api_docs/python/tf/clip_by_norm tf.clip_by_norm] and [https://www.tensorflow.org/api_docs/python/tf/clip_by_value tf.clip_by_value]. | ** For Tensorflow see [https://www.tensorflow.org/api_docs/python/tf/clip_by_norm tf.clip_by_norm] and [https://www.tensorflow.org/api_docs/python/tf/clip_by_value tf.clip_by_value]. | ||
* Using a safe divide which forces the denominator to have values with abs > EPS. | * Using a safe divide which forces the denominator to have values with abs > EPS. | ||
** Note that this can cutoff gradients. | |||
==Soft Operations== | ==Soft Operations== | ||
| Line 87: | Line 92: | ||
One example of this is softmax which allows you to apply gradients using a one-hot encoding. | One example of this is softmax which allows you to apply gradients using a one-hot encoding. | ||
* Rather than regressing a real value <math>x</math> directly, | * Rather than regressing a real value <math>x</math> directly, output a probability distribution. | ||
** Output scores for <math>P(x=j)</math> for some fixed set of <math>j</math>, do softmax, and take the expected value. | ** Output scores for <math>P(x=j)</math> for some fixed set of <math>j</math>, do softmax, and take the expected value. | ||
** Or output <math>\mu, \sigma</math> and normalize the loss based on <math>\sigma</math>. | ** Or output <math>\mu, \sigma</math> and normalize the loss based on <math>\sigma</math>. | ||