We had some issues with an existing 3 manager swarm where the managers all became disconnected and the swarm complained about not having enough managers to make a quorum.
We have since removed 2 of the managers from the swarm and forced a swarm initialisation on the first manager.
Now we’re in a situation where we can’t add more managers to the swarm.
When we add a second manager it appears in docker node ls as active but in the status is has nothing. It neither says reachable or unreachable.
When we docker node inspect second host we get heartbeat failure on the primary manager.
We had previously been making dns changes but believe these have all been switched back and DNS is resolving for all the manager hostsnames. They’re all on the same network and there are no firewalls denying traffic between managers.
I cannot understand why the managers are failing to communicate as we see nothing of use in the docker logs (or syslog) even with debug logging enabled on the first manager.
I believe the nodes communicate on udp port 7946 and we did have an issue where the managers weren’t listening on these ports but this now seems to have been resolved.
Anyone have any suggestions on where to start looking at this one?