In the course of using an HPC cluster for training a deep learning text classification model, I needed to set the environment up by installing tensorflow-gpu, tensorflow-text, and tensorflow_hub. However, it didn’t go well, as none of the online resources yielded the correct result. By installing tensorflow-gpu using conda install tensorflow-gpu the problem is that at the time of writing this story, this command ended up installing tensorflow-gpu==2.4.0 that was incompatible with tensorflow-text version, which was 2.10.0. I tried installing tensorflow-text of the same version of the tensorflow-gpu , but the problem is that the oldest version available on conda was 2.5.0, which was still incompatible. Long story short, after hours of struggling with the issue, I hereby release the solution that worked for me for setting up an HPC cluster for training deep learning models using Tensorflow on GPU resources.

1- Install a version of miniconda on the machine and initialize it

2- Run the following commands to setup the environment for tensorflow-gpu :

Of course, you can change the version in line 5 with any version of tensorflow that you want.

3- To ensure tensorflow-gpu has been set up properly to use the GPU resources, run the following command:

conda activate tf_gpu

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Happy coding!

References

1- https://www.tensorflow.org/install/pip