Hi everyone,
I am new to docker and I am curious about how to mange GPUs in a docker swarm. I have multiple Linux servers and each machine is equipped with multiple NVIDIA GPUs (three servers and each has two GPUs: GPU 0 and GPU 1).
What I have done so far:
1. I create a swarm consisting of one manager and two workers;
2. I follow the instructions in this post Instructions for Docker swarm with GPUs · GitHub and expose the GPU resources on the worker nodes;
3. Now I can create docker services using GPUs distributed over this swarm.
Problems I am faced with:
1. It seems that only GPU 0 of each node is used for running the services, likely because each instance of my service containers uses a single GPU. Therefore the GPU 1 is not used and wasted.
2. The docker container do have access to all the GPUs, even if I only expose one GPU.
I wonder how to setup a swarm with one GPU per worker so that I can make the best of my GPUs. Any comments and suggestions will be appreciated.
Thank you!
Shijie