Docker failover clustering


First of all, I’m pretty new to Docker, and tried to look for information on this forum, but found very different answers and none of them worked for me :confused:

I would like to create a Docker cluster for running services in a failover option.

My current setup is the following:

DOCKER-BKP       Ready       Active       Reachable
DOCKER-MANAGER   Ready       Drain        Leader
DOCKER-PROJ_A    Ready       Active
DOCKER-PROJ_B    Ready       Active

My task would be to run the service SRVC-PROJ_A on DOCKER-PROJ_A preferably, and if it fails (Server turns off, or not reachabel) the Swarm should start a Replica/Copy on DOCKER-BKP.

The task is the same with SRVC-PROJ_B. IF DOCKER-PROJ_B fails, a Replica/Copy should start on DOCKER-BKP.

No PROJ_A service should run on node PROJ_B and neither PROJ_B service on node PROJ_A.

If a preferred node is active again, no automatic “Role drain back” should run. We want to put back the “service” manually to the desired node, because no service can run in two instances. It has to be stopped first.

I found a thread where somebody said I can set this with Node Labels & --placement-pref flag.
The first flag should be the desired node, and the second one, is where the service can spread.

These are my labels:


And my commands:

> docker service create --replicas 1 --replicas-max-per-node 1 --placement-pref "spread=node.labels.PROJ_A-MASTER" --placement-pref "spread=node.labels.PROJ_A" --update-order stop-first --rollback-order stop-first --network host -p 8080:80 --name SRVC-PROJ_A nginx:latest
> docker service create --replicas 1 --replicas-max-per-node 1 --placement-pref "spread=node.labels.PROJ_B-MASTER" --placement-pref "spread=node.labels.PROJ_B" --update-order stop-first --rollback-order stop-first --network host -p 8081:80 --name SRVC-PROJ_B nginx:latest

Unfortunately it always starts the service on a different node, and doesn’t even bother about the labels.
Sometimes it starts SRVC-PROJ_A on DOCKER-PROJ_B for example, but it doesn’t even have that label.

I also found that Docker starts the services in the “Emptiest” mode, so the node with the least running services is where the next service starts, but it’s not true neither, because last time, it started both services on DOCKER-PROJ_A, even though both nodes DOCKER-BKP & DOCKER-PROJ_B was empty and active.

Is there a way I can make it work, or Docker is not capable for my use case?

Thx, for your help!

For HA, you need 3 Swarm managers. They can run workloads like regular workers.

When using replicas: 1, Swarm will usually ensure that 1 instance is running on any node. Use constraints to limit the nodes.

This works for “apps”, but note that volumes are not distributed. So when the instance is migrated to a new node, existing files will usually not be there.

Databases usually have their own replication, so you would setup a database cluster for HA, not just migrate a single instance.

One Docker service can call another using the service name, for which Docker provides DNS. If multiple instances of a service are running, Docker will forward round-robin.

Have you tried to combine placement constraints and placement preference?

This should limit the tasks being schedule to nodes with the label PROJ_A, but should prefer the node with the label PROJ_A-MASTER:

docker service create \
  --replicas 1 \
  --replicas-max-per-node 1 \
  --placement-pref "spread=node.labels.PROJ_A-MASTER" \
  --constraint node.labels.PROJ_A \
  --update-order stop-first \
  --rollback-order stop-first \
  --network host \
  -p 8080:80 \
  --name SRVC-PROJ_A \

Note: I am not sure if --constraint node.labels.PROJ_A works, or requires a value to compare against, e.g. --constraint node.labels.PROJ_A==true

Observation: I am not sure how much sense it makes to use --network host and -p 8080:80 at the same time. You can either use the host network, or use an overlay network and publish ports.

Running 2 manager nodes is worse than running one, as if one of the managers become unhealthy, the whole cluster will be headless. If you want to be able to compensate one unhealthy node, you require at least 3 manager nodes. It is not recommended to use an equal number of manager nodes.

Thanks guys, you are awesome!

It work like a charm now. :slight_smile:

I’m already added two more managers just in case. (Some of them will be turned off at the end of the watch, so if I need at least 3 constantly, then I need at least 5 managers anyway)

The following commands make it work:

> docker service create \
>   --replicas 1 \
>   --replicas-max-per-node 1 \
>   --placement-pref "spread=node.labels.PROJ_A-MASTER" \
>   --constraint node.labels.PROJ_A==TRUE \
>   --update-order stop-first \
>   --rollback-order stop-first \
>   --network host \
>   -p 8080:80 \
>   --name SRVC-PROJ_A \
>    nginx:latest
> docker service create \
>   --replicas 1 \
>   --replicas-max-per-node 1 \
>   --placement-pref "spread=node.labels.PROJ_B-MASTER" \
>   --constraint node.labels.PROJ_B==TRUE \
>   --update-order stop-first \
>   --rollback-order stop-first \
>   --network host \
>   -p 8081:80 \
>   --name SRVC-PROJ_B \
>    nginx:latest

I have also edited the node labels, for these:


(Masters without any value)

Thanks again for your help!

1 Like

You should still fix this.

Either there is no network namespace isolation, and the container port directly binds the port on the host interfaces, or there is an overlay network with namespace isolation, and published ports are forwarded from the host port to the container port.

I am surprised that it doesn’t cause an error message, which it should in my opion.