Debugging ML Models: Difference between revisions

(5 intermediate revisions by the same user not shown)

Line 1:

Notes on debugging ML models, primarilly CNNs.

Most of this is advice I've found online or gotten through mentors.

Most of this is advice I've found online or gotten through mentors or experience.

==Debugging==

Line 26:

==Overfitting==

Overfitting occurs when your training ~~loss is below~~ your validation loss.

Overfitting occurs when your model begins learning attributes specific to your training data, causing your validation loss to increase.

Historically this was a big concern for ML models and people relied heavily on regularization to address overfitting.

Recently though, overfitting has become less of a concern with larger ML models.

Line 53:

assert all_finite(my_tensor), "my_tensor has NaNs or Infs"

# Or

tf.debugging.assert_all_finite(my_tensor, "my_tensor has NaNs or Infs")

</syntaxhighlight>

Line 62:

Line 65:

* Checking that the training data has no NaNs or Infs.

* Checking that there are no divides anywhere in the code or that all divides are safe.

** See [https://www.tensorflow.org/api_docs/python/tf/math/divide_no_nan <code>tf.math.divide_no_nan</code>].

* Checking the gradients of trig functions in the code.

Line 82:

Line 86:

** For Tensorflow see [https://www.tensorflow.org/api_docs/python/tf/clip_by_norm tf.clip_by_norm] and [https://www.tensorflow.org/api_docs/python/tf/clip_by_value tf.clip_by_value].

* Using a safe divide which forces the denominator to have values with abs > EPS.

** Note that this can cutoff gradients.

==Soft Operations==

Line 87:

Line 92:

One example of this is softmax which allows you to apply gradients using a one-hot encoding.

* Rather than regressing a real value <math>x</math> directly, ~~regress~~ a probability distribution.

* Rather than regressing a real value <math>x</math> directly, output a probability distribution.

** Output scores for <math>P(x=j)</math> for some fixed set of <math>j</math>, do softmax, and take the expected value.

** Or output <math>\mu, \sigma</math> and normalize the loss based on <math>\sigma</math>.

@@ Line 1: / Line 1: @@
 Notes on debugging ML models, primarilly CNNs.
-Most of this is advice I've found online or gotten through mentors.
+Most of this is advice I've found online or gotten through mentors or experience.
 ==Debugging==
@@ Line 26: / Line 26: @@
 ==Overfitting==
-Overfitting occurs when your training loss is below your validation loss.
+Overfitting occurs when your model begins learning attributes specific to your training data, causing your validation loss to increase.
 Historically this was a big concern for ML models and people relied heavily on regularization to address overfitting.
 Recently though, overfitting has become less of a concern with larger ML models.
@@ Line 53: / Line 53: @@
 assert all_finite(my_tensor), "my_tensor has NaNs or Infs"
+# Or
+tf.debugging.assert_all_finite(my_tensor, "my_tensor has NaNs or Infs")
 </syntaxhighlight>
@@ Line 62: / Line 65: @@
 * Checking that the training data has no NaNs or Infs.
 * Checking that there are no divides anywhere in the code or that all divides are safe.
+** See [https://www.tensorflow.org/api_docs/python/tf/math/divide_no_nan <code>tf.math.divide_no_nan</code>].
 * Checking the gradients of trig functions in the code.
@@ Line 82: / Line 86: @@
 ** For Tensorflow see [https://www.tensorflow.org/api_docs/python/tf/clip_by_norm tf.clip_by_norm] and [https://www.tensorflow.org/api_docs/python/tf/clip_by_value tf.clip_by_value].
 * Using a safe divide which forces the denominator to have values with abs > EPS.
+** Note that this can cutoff gradients.
 ==Soft Operations==
@@ Line 87: / Line 92: @@
 One example of this is softmax which allows you to apply gradients using a one-hot encoding.
-* Rather than regressing a real value <math>x</math> directly, regress a probability distribution.
+* Rather than regressing a real value <math>x</math> directly, output a probability distribution.
 ** Output scores for <math>P(x=j)</math> for some fixed set of <math>j</math>, do softmax, and take the expected value.
 ** Or output <math>\mu, \sigma</math> and normalize the loss based on <math>\sigma</math>.