Running multi-GPU workload (vLLM) with Docker Swarm - IPC limitation?

Hi everyone,

I’m trying to run vLLM (an LLM inference engine) on Docker Swarm
with tensor parallelism across multiple GPUs.

The issue is that vLLM requires --ipc=host for NCCL communication
between GPUs, but Docker Swarm doesn’t support this option
(see moby/moby#25303).

My question:

  1. Is there a known workaround for IPC communication in Swarm?
  2. Would mounting tmpfs to /dev/shm with large size work?
  3. Has anyone successfully run multi-GPU workloads on Swarm?

What I’ve found so far:

  • --ipc=host is not supported in Swarm
  • Possible workaround: tmpfs mount, but unconfirmed

Any guidance would be appreciated. If there’s no solution, I’m
happy to document this limitation for others.

You can find a list of supported/unsupported features for swarm services n this epic:

Even though the epic was created 9 years ago, it is still open and updated.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.