We are running docker 1.12.3 in swarm mode on AWS. We have 3 manager nodes, all drained and 2 worker nodes running 6 services in global mode (so all services are running on both nodes). We also have an ELB configured for our services. I thought it made sense for fault tolerance to list all 3 manager nodes in the ELB target groups and then we could depend on the manager nodes forwarding the traffic on to the worker nodes. The worker nodes are not listed at all in the target groups. This works mostly but what I’ve found is that we get intermittent 502 errors (bad gateway) in this configuration and I don’t see any pattern to them. I can’t tell if one manager is bad somehow but the nodes are all healthy. Just to test, I removed all but one manager from the ELB target groups and now we don’t get random 502 errors at all.
Does that make any sense? Is there any way to debug what is going on with the managers to see if one is somehow “bad”? Doesn’t it make sense to list all managers in the ELB so that if one manager goes down, the ELB won’t send it any traffic and the swarm will stay up?