Docker swarm routing mesh published ports not available on restarted nodes or locally

Issue Type: Published ports are not accessible on nodes that do not have containers running on them.

OS Version/ Build: Windows Server 2022 (21H2 Build 20348.2966

App Version:

PS C:\Windows\system32> docker version
Client:
Version: 27.4.1
API version: 1.47
Go version: go1.22.10
Git commit: b9d17ea
Built: Tue Dec 17 15:47:22 2024
OS/Arch: windows/amd64
Context: default

Server: Docker Engine - Community
Engine:
Version: 27.4.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.10
Git commit: c710b88
Built: Tue Dec 17 15:45:57 2024
OS/Arch: windows/amd64
Experimental: false

Steps / discussion:

I have a small two node swarm (1 manager, 1 worker) just to test things out.

Host IPs for this example are:

Manager: 10.254.10.2
Worker: 10.254.10.3

I have deployed one service onto the swarm as follows:

docker service create --replicas 4 --name some-service --publish published=61000,target=80 localregistry/container-image:1.0.0

I can see that the service has two containers on each of the nodes.

I can only get a response when I connect to the published port from the opposite node, e.g.:

10.254.10.2 → 10.254.10.3:61000 works
10.254.10.3 → 10.254.10.2:61000 works

When I try and connect to the published port on the same node I get timeouts:

10.254.10.3 → 10.254.10.3:61000 Timeout
10.254.10.2 → 10.254.10.2:61000 Timeout

This isn’t too much of a train smash at this point - I can live with that although it seems a bit weird. I guess it could be due to the way the routing mesh works on Windows? It would be good if I could understand why this doesn’t work though.

Interestingly, I also can’t see port 61000 being used when I do a netstat -a command on either of the nodes.

I am simulating a node failure by rebooting the worker node. When I do this, I see that docker swarm spins up two new containers on the management node.

The worker node comes back online and is listed as being Ready and Active. Obviously, none of the service containers are running on it at this point (all four containers are still running on the manager). This is expected as docker swarm will not automatically rebalance.

What is a little weird to me is that the worker node does not accept connections on the service published port anymore. I would expect it to accept connections on 61000 and send traffic to wherever the containers are running.

In this state the following holds true:

10.254.10.2 → 10.254.10.3:61000 Timeout
10.254.10.3 → 10.254.10.2:61000 works

Once I force a rebalance by doing “docker service update --force some-service”, the worker node starts accepting traffic on the published port again:

10.254.10.2 → 10.254.10.3:61000 works
10.254.10.3 → 10.254.10.2:61000 works

I’m new to swarm and I guess I am still lacking a good understanding of how the overlay networking works and how the routing mesh works.

My questions are:

  1. Is it expected to not be able to connect to the published service ports from the local node?
  2. Why would I not be seeing the published ports in a listening state when I do a netstat?
  3. Is it expected that a node will only handle connections if it is currently running containers for a service?

cheers!

Your output indicates that you use Docker on Windows. Docker on Windows has different constraints than Docker on Linux, as the latter relies on the Linux kernel, Linux libraries and standard Linux tools. I have no idea how they implemented it in Windows…

While you find tons of information about the ingress routing mesh for docker-ce on Linux, I doubt you will find any information in Dockers docs for Windows, as Docker itself does not release Docker-CE for Windows.

The Moby project creates and releases the Docker version for Windows. I can only assume that Microsoft might have docs for this. If not, you can always start a discussion in the Moby Github project https://github.com/moby/moby, or raise an issue with the content of your original post.

1 Like

I can answer the questions for Linux:

  1. No, published ingress ports will be bound on every node. If instead the published port is a host port, it would only be bound on a node, where a task is running.
  2. You would see the bound port in netstat.
  3. Only if the published port is a host port.

I think I found the Microsoft docs: https://learn.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/swarm-mode#limitations

  • Routing mesh for Windows docker hosts is not supported on Windows Server 2016, but only from Windows Server 2019 onwards. Users seeking an alternative load balancing strategy today can setup an external load balancer (e.g. NGINX) and use Swarm’s publish-port mode to expose container host ports over which to load balance. More detail on this below.

Since it says the routing mesh is support from Windows Server 2019 onwards, it seems like it’s a bug that it doesn’t work like on Linux on Windows 2022.

Update: it’s a bug, see https://github.com/moby/moby/issues/42812. The issue was created September 2021 and is still open…

1 Like

Thank you - it is helpful to know that it works correctly on Linux. Appreciate the responses.