Swarm nodes timeout communicating

jamesstidard · September 5, 2022, 3:24pm

Hi,

I’m cross-posting this question from Stack Overflow, just to increase the likelihood of getting a solution. Hopefully that’s not against the ToS of the forum. The Stack Overflow question can be found here, if anyone stumbles on this having the same problem in the future, the answer might be over there.

I have a couple of machines running Ubuntu Server 22.04.1 LTS and Docker version 20.10.17.

I’ve set up a swarm containing both the machines. These machines have ports tcp/2377, udp/4789, udp/7946, and tcp/7946 open. I’ve done no firewall configuration to do this a Ubuntu Server ships with its firewall service disabled. I’ve tested this with with these commands nc -zv HOST PORT and nc -zvu HOST PORT for tcp and udp respectively. All return success, apart from the tcp/2377 query from the manager node to the worker node, presumably this is fine as this port seems to be the manager specific port.

If I run a couple services in a stack on the same node, the services can communicate without issue. However, when the services are split across nodes, they are no longer able to connect with each other.

They are able to ping each other from within each container using the name of the other service.

However, they are not able to curl service_name any running web server, for example, running on the containers on separate machines.

I’ve tried to google this problem and tried turning off packet checksums by running sudo ethtool -K docker_gwbridge tx off; sudo ethtool -K docker0 tx off on both machines, and then restarting the machines after, with no success.

I’m looking for any other causes of this problem or maybe how I’ve misused commands above. I’ve ran a swarm across these nodes before using Ubuntu Desktop without this issue, and has come up switching to ubuntu server.

Thanks.

P.S. Happy to provide any additional info that’s relevant.

lalaviswanath670 · December 10, 2025, 12:43pm

@jamesstidard I am exactly getting the same issue. Were you able to resolve it? Thank you

meyay · December 10, 2025, 1:21pm

If the overlay traffic is not working, usually those are the suspects:

Firewall needs following ports to be open on all nodes:
- Port 2377 TCP for communication with and between manager nodes
- Port 7946 TCP/UDP for overlay network node discovery
- Port 4789 UDP (configurable) for overlay network traffic
The mtu size is not identical on all nodes
- ip addr show scope global | grep mtu
The nodes don’t share a low latency network connection
Nodes are running in vms on VMware vSphere with NSX
- Outgoing traffic to port 4789 UDP is silently dropped as it conflicts with VMware NSX’s communication port for VXLAN
- Re-create the swarm with a different data-port:
  - docker swarm init --data-path-port=7789
Problems with checksum offloading
- Disable checksum offloading for the network interface (eth0 is a placeholder):
- ethtool -K eth0 tx-checksum-ip-generic off

bluepuma77 · December 10, 2025, 1:30pm

To add to @meyay, also make sure the MTU is set to the correct size in case of using a VLAN/vSwitch/VPN.

Test if a ping with payload of 2000 bytes is successful between nodes.

lalaviswanath670 · December 10, 2025, 6:24pm

Thanks @meyay and @bluepuma77 It was due to UDP port not open.

Topic		Replies	Views
Docker Swarm - Containers can ping between nodes, cannot communicate over TCP ports [SOLVED] Swarm swarm , docker-compose	1	4747	May 10, 2022
Docker swarm services cannot communicate across nodes Swarm swarm	7	14813	May 14, 2024
Unable to communicate between 2 service from different ubuntu22.04 nodes with ip adrress or taks.<service-name> General swarm	3	411	January 9, 2024
Tasks can't communicate via overlay network when running on different swarm nodes Swarm swarm	10	3545	April 23, 2025
Swarm: Containers on different nodes unable to communicate on overlay network Swarm	5	15308	November 1, 2016

Swarm nodes timeout communicating

Related topics