Docker swarm - high availability / fault tolerance / failover / load balancing

I have few docker nodes that are running in swarm mode. I’d like to archive followings (if possible):

  • high availability
  • fault
  • tolerance
  • fail-over
  • load balancing

Let’s say one of the swarm node goes down for whatever reason, I would like to continue serving my users with my applications…

I know at least some of it is available through service, however I’m still a bit unclear on IP traffic. If host is down, how is IP traffic going get routed to another host and so on…
Let’s say I have DNS A record pointing to IP address 11.22.33.44 and if that same IP address is assigned to one of docker node host and host goes down, my assumption is that my application will be unavailable at that point… Is there a solution for that scenario?

Please advise.

ip address goes to load balancer node

load balancer talks (or not) to the application service nodes (which typically are identical copies, or have identical copies for path based routing from the balancer). when a node is offline,… planned, bug, …, the load balancer knows and doesn’t attempt to send traffic there til it has rejoined the network.

can u describe your view of failover in this environment?

@sdetweil,

“load balancer node” - Do you mean to say to have yet another docker node that runs load balancer container such as: nginx, haproxy, etc.? is there a way to escape “single point of failure” here? meaning to have IP address as service that can be easily (automatically) migrate to another host if primary host is down… I guess that’s my definition of failover…

Yes, another node. No magic here

They (proxies) have multi instance models as well.

here is a good description of the design and process. using ec2, but substitute docker instances