Issue Type: Published ports are not accessible on nodes that do not have containers running on them.
OS Version/ Build: Windows Server 2022 (21H2 Build 20348.2966
App Version:
PS C:\Windows\system32> docker version
Client:
Version: 27.4.1
API version: 1.47
Go version: go1.22.10
Git commit: b9d17ea
Built: Tue Dec 17 15:47:22 2024
OS/Arch: windows/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.4.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.10
Git commit: c710b88
Built: Tue Dec 17 15:45:57 2024
OS/Arch: windows/amd64
Experimental: false
Steps / discussion:
I have a small two node swarm (1 manager, 1 worker) just to test things out.
Host IPs for this example are:
Manager: 10.254.10.2
Worker: 10.254.10.3
I have deployed one service onto the swarm as follows:
docker service create --replicas 4 --name some-service --publish published=61000,target=80 localregistry/container-image:1.0.0
I can see that the service has two containers on each of the nodes.
I can only get a response when I connect to the published port from the opposite node, e.g.:
10.254.10.2 → 10.254.10.3:61000 works
10.254.10.3 → 10.254.10.2:61000 works
When I try and connect to the published port on the same node I get timeouts:
10.254.10.3 → 10.254.10.3:61000 Timeout
10.254.10.2 → 10.254.10.2:61000 Timeout
This isn’t too much of a train smash at this point - I can live with that although it seems a bit weird. I guess it could be due to the way the routing mesh works on Windows? It would be good if I could understand why this doesn’t work though.
Interestingly, I also can’t see port 61000 being used when I do a netstat -a command on either of the nodes.
I am simulating a node failure by rebooting the worker node. When I do this, I see that docker swarm spins up two new containers on the management node.
The worker node comes back online and is listed as being Ready and Active. Obviously, none of the service containers are running on it at this point (all four containers are still running on the manager). This is expected as docker swarm will not automatically rebalance.
What is a little weird to me is that the worker node does not accept connections on the service published port anymore. I would expect it to accept connections on 61000 and send traffic to wherever the containers are running.
In this state the following holds true:
10.254.10.2 → 10.254.10.3:61000 Timeout
10.254.10.3 → 10.254.10.2:61000 works
Once I force a rebalance by doing “docker service update --force some-service”, the worker node starts accepting traffic on the published port again:
10.254.10.2 → 10.254.10.3:61000 works
10.254.10.3 → 10.254.10.2:61000 works
I’m new to swarm and I guess I am still lacking a good understanding of how the overlay networking works and how the routing mesh works.
My questions are:
- Is it expected to not be able to connect to the published service ports from the local node?
- Why would I not be seeing the published ports in a listening state when I do a netstat?
- Is it expected that a node will only handle connections if it is currently running containers for a service?
cheers!