Manage GPUs in a docker swarm

shijieyan · April 12, 2021, 5:27pm

Hi everyone,

I am new to docker and I am curious about how to mange GPUs in a docker swarm. I have multiple Linux servers and each machine is equipped with multiple NVIDIA GPUs (three servers and each has two GPUs: GPU 0 and GPU 1).

What I have done so far:
1. I create a swarm consisting of one manager and two workers;
2. I follow the instructions in this post Instructions for Docker swarm with GPUs · GitHub and expose the GPU resources on the worker nodes;
3. Now I can create docker services using GPUs distributed over this swarm.

Problems I am faced with:
1. It seems that only GPU 0 of each node is used for running the services, likely because each instance of my service containers uses a single GPU. Therefore the GPU 1 is not used and wasted.
2. The docker container do have access to all the GPUs, even if I only expose one GPU.

I wonder how to setup a swarm with one GPU per worker so that I can make the best of my GPUs. Any comments and suggestions will be appreciated.

Thank you!
Shijie

microturtle · January 31, 2024, 2:54pm

Hello Shijie,

I am having the exact same issues. I can see that each docker image in the swarm is told use a specific GPU while also exposing all the GPUs at a same which is resulting in each docker image trying to use only the first GPU, instead of the GPU id it was told to use.

Were you able to find a solution to this issue?

microturtle · February 1, 2024, 6:16pm

Well after many days I have finally found the solution.

Use complete UUIDs
In /etc/nvidia-container-runtime/config.toml change “DOCKER_RESOURCE_GPU” to “DOCKER_RESOURCE_NVIDIA-GPU”

After making those changes the docker swarm will be able to select the correct GPU on machines with multiple GPUs.

Topic		Replies	Views
Docker Swarm GPU Support General swarm , docker-compose	2	77	June 24, 2025
Using NVIDIA GPU with docker swarm started by docker-compose file Swarm	3	2838	May 25, 2025
Using Swarm Filter to serialize access to GPU card Swarm	0	1439	August 24, 2016
How to make docker swarm more robust by allowing random generic-resource-matching? Swarm swarm	3	500	March 27, 2023
Run Image across all workers nodes as single container General	3	625	June 5, 2018

Manage GPUs in a docker swarm

Related topics