I’m a computer engineering student. In my Lab, we have a server with dual Xeon and dual NVIDIA GPU configuration.
We need to set up this server as a remote server where people with an account can access a jupyter notebook with associated one GPU. Every time a user tries to run a job (a deep learning algorithm) a docker that contains a job scheduling process manage the execution of this work permit or denied the execution at the specific time or demand this execution to the night.
So we think the right configuration is this:
CentOS 7 with NVIDIA driver and cuDNN, after this we update the python interpreter to 3.6.8 that is the least compatible with Tensorflow and Keras.
The second step we did is the docker installation and after the Nvidia-docker installation.
Now we create a container where we install python 3.6.8 and TensorFlow and Keras but every time we stop this container we have to restart installation of all packages. The execution with NVidia-docker give us some problem when we use the option but if we use the GPU option when we start the docker it works well.
So my question is: How can we configure a docker environment in the right way to have this infrastructure:
own PC of the user - connect to -> net-docker on the server - permit access to -> Jupyter notebook - that is connected to -> TensorFlow with Keras docker container - that is managed by -> docker container with job scheduling software/script?
I’m going to explain well:
- own PC is the personal PC of the user that uses a browser to connect to the server with hir/her credentials
- net-docker is a docker container that manages the input/output network connection to the server with the redirection to Jupyter notebook
- Jupyter notebook is a notebook where we have Jupyter always in execution that permit to access to the HDD of the server to upload files necessary for the algorithm’s execution
- This Jupyter notebook is connected to a TensorFlow and Keras docker container that accesses itself to the same HDD location of the Jupyter notebook and has the TensorFlow configuration with Keras to execute deep Learning Algorithm
- At the end there is a docker container with job scheduling script or software that manage the queue of execution of the deep learning algorithm on the GPUs, in particular, one algorithm has to use one GPU!
I know that this question is very long but I think that all the information I give is useful to help me and my colleague to configure right this environment.
So if more information is necessary do not hesitate to ask me.
Thank you for your help.