In the course of using an HPC cluster for training a deep learning text classification model, I needed to set the environment up by installing tensorflow-gpu, tensorflow-text, and tensorflow_hub. However, it didn’t go well, as none of the online resources yielded the correct result. By installing tensorflow-gpu using conda install tensorflow-gpu the problem is that at the time of writing this story, this command ended up installing tensorflow-gpu==2.4.0 that was incompatible with tensorflow-text version, which was 2.10.0. I tried installing tensorflow-text of the same version of the tensorflow-gpu , but the problem is that the oldest version available on conda was 2.5.0, which was still incompatible. Long story short, after hours of struggling with the issue, I hereby release the solution that worked for me for setting up an HPC cluster for training deep learning models using Tensorflow on GPU resources.
1- Install a version of miniconda on the machine and initialize it
2- Run the following commands to setup the environment for tensorflow-gpu :
Of course, you can change the version in line 5 with any version of tensorflow that you want.
3- To ensure tensorflow-gpu has been set up properly to use the GPU resources, run the following command:
conda activate tf_gpu
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Happy coding!
1- https://www.tensorflow.org/install/pip