Swarm - attach to network and run with gpu

Hi all. I want to start service in swarm with gpu resource (with nvidia runtime) and custom overlay network. So, when I’m starting service like this

docker service create --with-registry-auth --generic-resource "gpu=1" --name=test --constraint=node.id==50pbc33tbompfiiu1n61khyc5 --network=myinternal busybox:latest sh -c "while true; do echo Hello; sleep 2; done"

I’ve got error node is missing network attachments, ip addresses may be exhausted and then assigned node no longer meets constraints:

ID                          NAME                IMAGE                                                                                    NODE                DESIRED STATE       CURRENT STATE             ERROR                                                                  PORTS
yonzgcjx8793nxf2jbuvpdukq    \_ test.1     busybox:latest@sha256:d366a4665ab44f0648d7a00ae3fae139d55e32f9712c67accd604bb55df9d05a   node-4             Shutdown            Rejected 19 seconds ago   "assigned node no longer meets constraints"
3a3wrspme0m5ureu69dd9wpju    \_ test.1     busybox:latest@sha256:d366a4665ab44f0648d7a00ae3fae139d55e32f9712c67accd604bb55df9d05a   node-4             Shutdown            Rejected 19 seconds ago   "node is missing network attachments, ip addresses may be exhausted"

Service starts ok if I remove either --network or --generic-resource. Overlay network myinternal is empty (there is no other services/containers in this network) and I can’t understand how it gets exhausted. Network inspect:

docker network inspect e0fs28o8t7pq
[
    {
        "Name": "myinternal",
        "Id": "e0fs28o8t7pqgc5p2jusa662g",
        "Created": "2020-10-08T06:46:38.851827933Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.9.8.1/16",
                    "Gateway": "10.9.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": null,
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4096"
        },
        "Labels": null
    }
]

Where is my mistake?

@muxlevator, did you manage to solve this? I’m hitting this issue as well.
However, I hit this error when I use generic resources, it does not depend whether the service has a network attached or not.

I tried a lot of things but can’t figure out why this is happening.
I didn’t take the time to look at the code yet.

I tried to setup this with Nvidia driver 450 and 455, on an Ubuntu 18.04 and 20.04 without success. I also tested to use solely node-generic-resources in my daemon.conf (without installing Nvidia-docker)
I feel like it is a regression since I used to be able to do it on another machine. Or I missed something like a conf somewhere…

Any way if you have some findings feel free to share, I’ll post here if I find something. Cheers.

Ps one thing I didn’t mention, is that I use Gpu passthrough, before using it, with docker.

@nokidev, I resolved my issue by removing --generic-resource. This way you get GPU support (nvidia-runtime magic?) AND network. I think it’s poor swarm support from nvidia is to blame. You can check GPU availability from docker by running nvidia-smi from nvidia/cuda:10.0-base container.

And I don’t know a thing about gpu passthrough.

Thanks @muxlevator,
I gave a shot to Nvidia-runtime black magic and it worked.
I’m pretty sad because generic-resources used to work.

I also had few issue with passthrough, but was able to solve it and now everything work normally inside docker.

Thanks again,
Cheers

Has anyone gotten this to work? setting gpu in a node-generic-resources gives me this networking error. I’m using the overlay network for docker swarm. I’d like to use gpu as an allocated resource and some kind of a network.