Running multi-GPU workload (vLLM) with Docker Swarm - IPC limitation?

ziywang50 · November 9, 2025, 11:52pm

Hi everyone,

I’m trying to run vLLM (an LLM inference engine) on Docker Swarm
with tensor parallelism across multiple GPUs.

The issue is that vLLM requires --ipc=host for NCCL communication
between GPUs, but Docker Swarm doesn’t support this option
(see moby/moby#25303).

My question:

Is there a known workaround for IPC communication in Swarm?
Would mounting tmpfs to /dev/shm with large size work?
Has anyone successfully run multi-GPU workloads on Swarm?

What I’ve found so far:

--ipc=host is not supported in Swarm
Possible workaround: tmpfs mount, but unconfirmed

Any guidance would be appreciated. If there’s no solution, I’m
happy to document this limitation for others.

meyay · November 10, 2025, 7:12am

You can find a list of supported/unsupported features for swarm services n this epic:

github.com/moby/moby

[epic] add more options to `service create` / `service update`

opened 05:19PM - 01 Aug 16 UTC

thaJeztah

kind/feature area/swarm kind/epic

The `service create` We should add more I tried to create | Status |:----------------------| | :white_check_mark: | #27902 | :question: | | | | | :white_check_mark: | #25885 | :white_check_mark: | #25885 | | | :x: | | :question: | | :white_check_mark: | | :white_check_mark: | | :question: | | :question: | | :question: | | :white_check_mark: | | :question: | :question: | | :white_check_mark: | | :x: | | :question: | :question: | | :question: | :question: | :question: | :question: | | | :white_check_mark: | #24391 | :white_check_mark: | #24391 | :white_check_mark: | #24391 | :question: | :white_check_mark: | | :white_check_mark: | | | | | :white_check_mark: | #25317 | :white_check_mark: | #27369 | :white_check_mark: | #27369 | :white_check_mark: | #27369 | :question: | | :white_check_mark: | #27369 | :white_check_mark: | #24877 | :white_check_mark: | :question: | :question: | :question: | | | :white_check_mark: | :x: | | :white_check_mark: | | | | :x: | | :x: | | :white_check_mark: | | :white_check_mark: | | :question: | :white_check_mark: | | :white_check_mark: | | :construction: | :construction: | :white_check_mark: | | :white_square_button: | | :white_check_mark: | #28573 | :white_check_mark: | #28247 | | :white_check_mark: | | | | :white_square_button: | #34703 | | :white_check_mark: | #28618 | :question: | | | :white_check_mark: | | :x: | | :white_check_mark: | #30162 | :white_check_mark: | | :x: | | | | | :question: | | | :white_check_mark: | #25696 | :white_check_mark: | | | :white_check_mark: | :white_check_mark: | | :white_check_mark: | #25644 | :white_check_mark: | #25209 | :white_check_mark: | :question: | | | :white_check_mark: | | :white_check_mark: | | :x: | | :white_check_mark: | and `service update` commands do not support all options th…at `docker run` / `docker create` supports. Some options are not implemented yet, whereas other options may either _not_ be implemented (because they don't make sense in the context of a _service_, or are not portable / cross platform). options for services, _however_ instead of blindly copying every option, we should make sure the options are implemented properly, which may require using different names for the options and/or different kind of values. an overview of all options on `docker run`, and to match them with the `docker service create` options we currently have; I may have missed some, or made the wrong "translation", so input is welcome here | Issue | `docker run` | `docker service` | Notes | :-----------------------------------------------|:-----------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `--add-host` | | | | #27552 (for `exec`) | `-a, --attach` | n/a | ~~does not apply to _services_, as there are multiple containers backing it~~ There may be usecases for this, but design/implementation needs discussion | | `--blkio-weight` | | | | `--blkio-weight-device` | | | | `--cap-add` | | ~docker/cli#2663~ docker/cli#2687 docker/cli#2709 | | `--cap-drop` | | ~docker/cli#2663~ docker/cli#2687 docker/cli#2709 | | `--cgroup-parent` | | | | `--cidfile` | | does not apply to _services_, as there are multiple containers backing it | | `--cpu-percent` | | | | `--cpu-period` | `--limit-cpu` | `--limit-cpu` sets a combination of "cpu period" and "cpu quota" see #27958 for the `docker run` implementation | | `--cpu-quota` | `--limit-cpu` | `--limit-cpu` sets a combination of "cpu period" and "cpu quota" see #27958 for the `docker run` implementation | | `--cpu-rt-period` | | | | `--cpu-rt-runtime` | | | | `-c, --cpu-shares` | | | | `--cpus` | `--limit-cpu` | `--limit-cpu` sets a combination of "cpu period" and "cpu quota" see #27958 for the `docker run` implementation | | #30477 | `--cpuset-cpus` | | | | `--cpuset-mems` | | | | `-d, --detach` | | `-d` is the default | | `--detach-keys` | | No interactive services, so not needed | | #24865 / docker/swarmkit#1244 | `--device` | | | | `--device-cgroup-rule` | | devices are host specific, so may not make sense:question: | | #32602 | `--device-read-bps` | | devices are host specific, so may not make sense:question: | | #32602 | `--device-read-iops` | | devices are host specific, so may not make sense:question: | | #32602 | `--device-write-bps` | | devices are host specific, so may not make sense:question: | | #32602 | `--device-write-iops` | | devices are host specific, so may not make sense:question: | | `--disable-content-trust` | | | | `--dns` | | PR #27567 | | `--dns-option` | `--dns-option`, `--dns-option-add`, `--dns-option-rm` | PR #27567 | | `--dns-search` | `--dns-search, `--dns-search-add, `--dns-search-rm` | PR #27567 | | #29171 | `--entrypoint` | | | | `-e, --env` | `-e, --env` | | | #24712 #31595 | `--env-file` | | PR #24844 | | `--expose` | | | | `--gpus` | | | | `--group-add` | `--group` | | | `--health-cmd` | | | | `--health-interval duration` | | | | `--health-retries` | | | | `--health-start-period` | | | | `--health-timeout duration` | | | | `-h, --hostname` | | | | #34529, docker/cli#51 #34639 | `--init` | `--init` | PR docker/swarmkit#2350, docker/swarmkit#2652, moby/moby#36895, moby/moby#37183, docker/cli#1116, docker/cli#479, docker/cli#1129 | | #32300 | `-i, --interactive` | | ~~does not apply to _services_, as there are multiple containers backing it~~ There may be usecases for this, but design/implementation needs discussion | | #24170 / #29816 | `--ip` | | ~~does not apply to _services_, as there are multiple containers backing it.~~ Update: possibly useful to set the VIP | | #24170 / #29816 | `--ip6` | | ~~does not apply to _services_, as there are multiple containers backing it.~~ Update: possibly useful to set the VIP | | `--ipc` | | | | #31616, docker/cli#414 | `--isolation` | | PR #34424, docker/cli#426, docker/swarmkit#2342 | | `--kernel-memory` | | Feature is deprecated in the kernel; see #41254, #41252 | | `-l, --label` | `--container-label` | | | `--label-file` | | | | `--link` | | will be resolved through `--network-alias`:question: | | `--link-local-ip` | | does not apply to _services_, as there are multiple containers backing it | | `--log-driver` | `--log-driver` | | | `--log-opt` | `--log-opt` | | | #31092 | `--mac-address` | | does not apply to _services_, as there are multiple containers backing it | | `-m, --memory` | `--limit-memory` | | | `--memory-reservation` | `--reserve-memory` | | | #34654 | `--memory-swap` | | PR: https://github.com/moby/moby/pull/37872 | | #34654 | `--memory-swappiness` | | PR: https://github.com/moby/moby/pull/37872 | | `--mount` | `--mount`, `--mount-add`, `--mount-rm` | | | `--name` | | NOTE: `--name` sets the _service_ name, not the container's name | | `--network` | `--network` | `host` networking (see #25873) added through #32981. | | -- | `--network-add`/`--network-rm` are added in docker 17.05 | docker/swarmkit#1029 | | #24787 | `--network-alias` | | | | `--no-healthcheck` | | | | `--oom-kill-disable` | | | | `--oom-score-adj` | | swarmkit PR: docker/swarmkit#2371 | | docker/swarmkit#1605 | `--pid` | | | | `--pids-limit` | | PR: #39882 swarmkit PR: docker/swarmkit#2415 (vendored: #35326) | | `--platform` | | | | #24862 / docker/swarmkit#1030 | `--privileged` | | docker/swarmkit#1722 | | `-p, --publish` | `-p, --publish` | NOTE: does not support `<ip-address>` (#26696, #32299) | | `-P, --publish-all` | | when defining a service; explicitly define ports to publish | | `--read-only` | | #29972 | | `--restart` | `--restart-condition`, `--restart-delay`, `--restart-max-attempts`, `--restart-window` | | | `--rm` | | SwarmKit keeps old tasks (containers) around, but removes them, based on `--task-history-limit` | | `--runtime` | | | | ~#25209~ -> #41371 | `--security-opt` | `--credential-spec` (#32339) is equivalent for `--security opt credentialspec=...` | SELinux can be set through API (#32339) | | #26714 | `--shm-size` | Possible through `--mount type=tmpfs,target=/dev/shm` | | | `--sig-proxy` | | | | `--stop-signal` | | PR #30754 | | `--stop-timeout` | `--stop-grace-period` | New in 1.13 (see #22566) | | #28619 | `--storage-opt` | | | | #25209, #31961, moby/libentitlement#35 | `--sysctl` | | PR #37701, docker/swarmkit#2729, docker/cli#1754 | | `--tmpfs` | `--mount type=tmpfs` | | | `-t, --tty` | | Implemented in SwarmKit docker/swarmkit#1370. Docker PR is #28076 | | `--ulimit` | | PRs: docker/swarmkit#2967, #41284, ~docker/cli#2660~ docker/cli#2712 | | ~~#25304~~ | `-u, --user` | `-u, --user` | ~~Does not support group / gid~~ (see [#25304 (comment)](https://github.com/docker/docker/issues/25304#issuecomment-236881192)) | | #37560 | `--userns` | | | | `--uts` | | | | `-v, --volume` | `--mount` | UX improvement needed (add `-v` flag?) | | `--volume-driver` | `--mount` | UX improvement needed (add `-v` flag?) | | `--volumes-from` | | does not apply to _services_, as there are multiple containers backing it | | `-w, --workdir` | `-w, --workdir` | |

Even though the epic was created 9 years ago, it is still open and updated.

Topic		Replies	Views
Swarm with GPU MIGs General swarm	4	247	January 23, 2026
Afraid for the Future of Swarm Swarm docker , swarm	18	10636	May 13, 2019
Docker swarm services cannot communicate across nodes Swarm swarm	7	15221	May 14, 2024
Swam 1.12 Multi-host networking help Swarm	30	10884	March 29, 2017
How to configure Swarm Service Discovery (1.12) Swarm	38	13169	February 9, 2017

Running multi-GPU workload (vLLM) with Docker Swarm - IPC limitation?

Related topics