TensorFlow: Difference between revisions

 
(21 intermediate revisions by the same user not shown)
Line 2: Line 2:


==Install==
==Install==
* Install CUDA and CuDNN
* Create a conda environment with python 3.5+
** <code>conda create -n my_env python=3.8</code>
* Install with pip


===Install TF2===
===Install TF2===
See https://www.tensorflow.org/install/pip
Install tensorflow and [https://www.tensorflow.org/addons/overview tensorflow-addons]
Install tensorflow and [https://www.tensorflow.org/addons/overview tensorflow-addons]
<pre>
<pre>
# Install cuda and cudnn if necessary
pip install tensorflow-addons
conda install cudatoolkit=11.0.221
 
pip install tensorflow tensorflow-addons
</pre>
</pre>


* Run <code>conda search cudatoolkit</code> to see other versions of cuda available
;Notes
* On Windows, there is no cudnn available for cuda 11 in conda's repos. You will need to install this manually by downloading [https://developer.nvidia.com/cuDNN cudnn] and copying the binaries to the environment's <code>Library/bin/</code> directory.
* Note that [https://anaconda.org/anaconda/tensorflow anaconda/tensorflow] does not always have the latest version.
* If you prefer, you can install only cuda and cudnn from conda:
** See [https://www.tensorflow.org/install/source#linux https://www.tensorflow.org/install/source#linux] for a list of compatible Cuda and Cudnn versions.
** <code>conda search cudatoolkit</code> to which versions of cuda available
** Download [https://developer.nvidia.com/cuDNN cudnn] and copy the binaries to the environment's <code>Library/bin/</code> directory.


===Install TF1===
===Install TF1===
Note: You will only need TF1 if working with a TF1 repo.
The last official version of TensorFlow v1 is 1.15. This version does not work on RTX 3000+ (Ampere) GPUs. Your code will run but output bad results.<br>
If migrating your old code, you can install TF2 and use:
If you need TensorFlow v1, see [https://github.com/NVIDIA/tensorflow nvidia-tensorflow].
* <code>import tensorflow.compat.v1 as tf</code>
* See [https://www.tensorflow.org/guide/migrate TF Guide Migrate]
 
See [https://www.tensorflow.org/install/source#linux https://www.tensorflow.org/install/source#linux] for a list of compatible Cuda and Cudnn versions.
 
<pre>
<pre>
# Install compatible cuda and cudnn versions.
pip install nvidia-pyindex
conda install cudatoolkit=10.0.130 cudnn=7.6.5
pip install nvidia-tensorflow
 
# Install tensorflow
pip install tensorflow-gpu==1.15
 
# Test GPU support
python -c "import tensorflow as tf;print(tf.test.is_gpu_available())"
</pre>
</pre>
;Notes
* Sometimes, I get <code>CUDNN_STATUS_INTERNAL_ERROR</code>. This is fixed by setting the environment variable <code>TF_FORCE_GPU_ALLOW_GROWTH=true</code> in my conda env. See [https://stackoverflow.com/questions/46826497/conda-set-ld-library-path-for-env-only Add env variables to conda env]


==Usage (TF2)==
==Usage (TF2)==
Here we'll cover usage using TensorFlow 2 which has eager execution.<br>
Here we'll cover usage using TensorFlow 2 which has eager execution.<br>
This is using the Keras API in tensorflow.keras.
This is using the Keras API in tensorflow.keras.
===Basics===
===Keras Pipeline===
[https://www.tensorflow.org/api_docs/python/tf/keras/Model tf.keras.Model]
 
The general pipeline using Keras is:
The general pipeline using Keras is:
* Define a model, typically using [https://www.tensorflow.org/api_docs/python/tf/keras/Sequential tf.keras.Sequential]
* Define a model, typically using [https://www.tensorflow.org/api_docs/python/tf/keras/Sequential tf.keras.Sequential]
* Call [https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile <code>model.compile</code>]
* Call [https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile <code>model.compile</code>]
** Here you pass in your optimizer, loss function, and metrics.
** Here you pass in your optimizer, loss function, and metrics.
* Train your model by calling <code>model.fit</code>
* Train your model by calling [https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit <code>model.fit</code>]
** Here you pass in your training data, batch size, number of epochs, and training callbacks
** Here you pass in your training data, batch size, number of epochs, and training callbacks
** For more information about callbacks, see [https://www.tensorflow.org/guide/keras/custom_callback Keras custom callbacks].
** For more information about callbacks, see [https://www.tensorflow.org/guide/keras/custom_callback Keras custom callbacks].


After training, you can use your model by calling <code>model.evaluate</code>
After training, you can use your model by calling [https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate <code>model.evaluate</code>]


===Custom Models===
===Custom Models===
Line 68: Line 54:
You can write your own training loop by doing the following:
You can write your own training loop by doing the following:
<syntaxhighlight lang="python">
<syntaxhighlight lang="python">
import tensorflow as tf
from tensorflow import keras


my_model= keras.Sequential([
my_model = keras.Sequential([
     keras.layers.Dense(400, input_shape=400, activation='relu'),
     keras.Input(shape=(400,)),
    keras.layers.Dense(400, activation='relu'),
    keras.layers.Dense(400, activation='relu'),
     keras.layers.Dense(400, activation='relu'),
     keras.layers.Dense(400, activation='relu'),
     keras.layers.Dense(400, activation='relu'),
     keras.layers.Dense(400, activation='relu'),
Line 159: Line 145:


==Usage (TF1)==
==Usage (TF1)==
In TF1, you first build a computational graph by chaining commands with placeholder.   
In TF1, you first build a computational graph by chaining commands with placeholders and constant variables.   
Then, you execute the graph in a tf session.
Then, you execute the graph in a <code>tf.Session()</code>.
{{hidden | TF1 MNIST Example |
<syntaxhighlight lang="python">
<syntaxhighlight lang="python">
import tensorflow as tf
import tensorflow as tf
from tensorflow import keras
import numpy as np
NUM_EPOCHS = 10
BATCH_SIZE = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
rng = np.random.default_rng()
classification_model = keras.Sequential([
    keras.Input(shape=(28, 28, 1)),
    keras.layers.Conv2D(16, 3, padding="SAME"),
    keras.layers.ReLU(),
    keras.layers.Conv2D(16, 3, padding="SAME"),
    keras.layers.ReLU(),
    keras.layers.Flatten(),
    keras.layers.Dense(10, activation='relu'),
])
x_in = tf.compat.v1.placeholder(dtype=tf.float32, shape=(None, 28, 28, 1))
logits = classification_model(x_in)
gt_classes = tf.compat.v1.placeholder(dtype=tf.int32, shape=(None,))
loss = tf.losses.softmax_cross_entropy(tf.one_hot(gt_classes, 10), logits)
optimizer = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(loss)


with tf.compat.v1.Session() as sess:
    sess.run(tf.compat.v1.global_variables_initializer())
    global_step = 0
    for epoch in range(NUM_EPOCHS):
        x_count = x_train.shape[0]
        image_ordering = rng.choice(range(x_count), x_count, replace=False)
        current_idx = 0
        while current_idx < x_count:
            my_indices = image_ordering[current_idx:min(current_idx + BATCH_SIZE, x_count)]
            x = x_train[my_indices]
            x = x[:, :, :, None] / 255
            logits_val, loss_val, _ = sess.run((logits, loss, optimizer), {
                x_in: x,
                gt_classes: y_train[my_indices]
            })
            if global_step % 100 == 0:
                print("Loss", loss_val)


            current_idx += BATCH_SIZE
            global_step += 1
</syntaxhighlight>
}}


===Batch Normalization===
See [https://www.tensorflow.org/api_docs/python/tf/compat/v1/layers/batch_normalization <code>tf.compat.v1.layers.batch_normalization</code>]
When training with batchnorm, you need to run <code>tf.GraphKeys.UPDATE_OPS</code> in your session to update the batchnorm variables or they will not be updated.
These variables do not contribute to the loss when training is true so they will not by updated by the optimizer.
<syntaxhighlight lang="python">
update_ops = tf.compat.v1.get_collection(tf.GraphKeys.UPDATE_OPS)
train_op = optimizer.minimize(loss)
train_op = tf.group([train_op, update_ops])
</syntaxhighlight>
</syntaxhighlight>