Swarm - attach to network and run with gpu

muxlevator · October 8, 2020, 7:28am

Hi all. I want to start service in swarm with gpu resource (with nvidia runtime) and custom overlay network. So, when I’m starting service like this

docker service create --with-registry-auth --generic-resource "gpu=1" --name=test --constraint=node.id==50pbc33tbompfiiu1n61khyc5 --network=myinternal busybox:latest sh -c "while true; do echo Hello; sleep 2; done"

I’ve got error node is missing network attachments, ip addresses may be exhausted and then assigned node no longer meets constraints:

ID                          NAME                IMAGE                                                                                    NODE                DESIRED STATE       CURRENT STATE             ERROR                                                                  PORTS
yonzgcjx8793nxf2jbuvpdukq    \_ test.1     busybox:latest@sha256:d366a4665ab44f0648d7a00ae3fae139d55e32f9712c67accd604bb55df9d05a   node-4             Shutdown            Rejected 19 seconds ago   "assigned node no longer meets constraints"
3a3wrspme0m5ureu69dd9wpju    \_ test.1     busybox:latest@sha256:d366a4665ab44f0648d7a00ae3fae139d55e32f9712c67accd604bb55df9d05a   node-4             Shutdown            Rejected 19 seconds ago   "node is missing network attachments, ip addresses may be exhausted"

Service starts ok if I remove either --network or --generic-resource. Overlay network myinternal is empty (there is no other services/containers in this network) and I can’t understand how it gets exhausted. Network inspect:

docker network inspect e0fs28o8t7pq
[
    {
        "Name": "myinternal",
        "Id": "e0fs28o8t7pqgc5p2jusa662g",
        "Created": "2020-10-08T06:46:38.851827933Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.9.8.1/16",
                    "Gateway": "10.9.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": null,
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4096"
        },
        "Labels": null
    }
]

Where is my mistake?

nokidev · November 20, 2020, 12:07am

@muxlevator, did you manage to solve this? I’m hitting this issue as well.
However, I hit this error when I use generic resources, it does not depend whether the service has a network attached or not.

I tried a lot of things but can’t figure out why this is happening.
I didn’t take the time to look at the code yet.

I tried to setup this with Nvidia driver 450 and 455, on an Ubuntu 18.04 and 20.04 without success. I also tested to use solely node-generic-resources in my daemon.conf (without installing Nvidia-docker)
I feel like it is a regression since I used to be able to do it on another machine. Or I missed something like a conf somewhere…

Any way if you have some findings feel free to share, I’ll post here if I find something. Cheers.

Ps one thing I didn’t mention, is that I use Gpu passthrough, before using it, with docker.

muxlevator · November 20, 2020, 7:49am

@nokidev, I resolved my issue by removing --generic-resource. This way you get GPU support (nvidia-runtime magic?) AND network. I think it’s poor swarm support from nvidia is to blame. You can check GPU availability from docker by running nvidia-smi from nvidia/cuda:10.0-base container.

And I don’t know a thing about gpu passthrough.

nokidev · November 20, 2020, 1:01pm

Thanks @muxlevator,
I gave a shot to Nvidia-runtime black magic and it worked.
I’m pretty sad because generic-resources used to work.

I also had few issue with passthrough, but was able to solve it and now everything work normally inside docker.

Thanks again,
Cheers

markperri · January 28, 2023, 3:55am

Has anyone gotten this to work? setting gpu in a node-generic-resources gives me this networking error. I’m using the overlay network for docker swarm. I’d like to use gpu as an allocated resource and some kind of a network.

Topic		Replies	Views
Using NVIDIA GPU with docker swarm started by docker-compose file Swarm	3	2880	May 25, 2025
Manage GPUs in a docker swarm Swarm swarm	2	5119	February 1, 2024
Docker Swarm GPU Support General swarm , docker-compose	2	148	June 24, 2025
How to make docker swarm more robust by allowing random generic-resource-matching? Swarm swarm	3	509	March 27, 2023
Docker swarm not connecting across hosts Swarm docker , swarm	0	1628	August 27, 2018

Swarm - attach to network and run with gpu

Related topics