Swarm nodes timeout communicating

Hi,

I’m cross-posting this question from Stack Overflow, just to increase the likelihood of getting a solution. Hopefully that’s not against the ToS of the forum. The Stack Overflow question can be found here, if anyone stumbles on this having the same problem in the future, the answer might be over there.

I have a couple of machines running Ubuntu Server 22.04.1 LTS and Docker version 20.10.17.

I’ve set up a swarm containing both the machines. These machines have ports tcp/2377, udp/4789, udp/7946, and tcp/7946 open. I’ve done no firewall configuration to do this a Ubuntu Server ships with its firewall service disabled. I’ve tested this with with these commands nc -zv HOST PORT and nc -zvu HOST PORT for tcp and udp respectively. All return success, apart from the tcp/2377 query from the manager node to the worker node, presumably this is fine as this port seems to be the manager specific port.

If I run a couple services in a stack on the same node, the services can communicate without issue. However, when the services are split across nodes, they are no longer able to connect with each other.

They are able to ping each other from within each container using the name of the other service.

However, they are not able to curl service_name any running web server, for example, running on the containers on separate machines.

I’ve tried to google this problem and tried turning off packet checksums by running sudo ethtool -K docker_gwbridge tx off; sudo ethtool -K docker0 tx off on both machines, and then restarting the machines after, with no success.

I’m looking for any other causes of this problem or maybe how I’ve misused commands above. I’ve ran a swarm across these nodes before using Ubuntu Desktop without this issue, and has come up switching to ubuntu server.

Thanks.

P.S. Happy to provide any additional info that’s relevant.