When testing the new version of docker swarm, available as a built in feature of docker 1.12, I noticed that it only allowed there to be one swarm manager at a time. If that manager node went down then I would have to have every node leave the current swarm and rejoin a brand new manager in order to recover from this failure.
The older version of docker swarm supported multiple managers for failover and I was wondering if there is a way to do that with this new version? Otherwise this represents a single point of failure!
It seems as if there is a function docker node promote that allows the manual promotion of swarm managers. Promoting a node changes its MANAGER STATUS to Reachable and allows me to run docker service commands from the promoted node. However if the Leader leaves swarm or the machine running the leader looses connection to the other nodes then running swarm commands from the promoted node produces the following error:
username@hostname:~$ docker service ls Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
I followed the steps as mentioned in docker docs. Having one leader and rest of them being worker nodes goes fine.
However, the instant we try to join a node with ‘–manager’ option and accepting it on manager node, the swarm loses its sanity. I tried ‘docker swarm update’ , ‘docker swarm leave’ on manager node, worker nodes. The whole swarm just stays there in a dangling state.
vagrant@smgr:~$ docker node ls ID NAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS 3yqk7wiz2w7o2q9m27dp76cl4 * smgr Accepted Ready Active Leader 89bgf7seeacmd7ijt3ev3uhk8 snode2 Accepted Ready Active by4zyturqgn4uayg78ybt3tps snode1 Accepted Ready Active vagrant@smgr:~$ docker swarm update Swarm updated. vagrant@smgr:~$ docker node ls ID NAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS 3yqk7wiz2w7o2q9m27dp76cl4 * smgr Accepted Ready Active Leader 89bgf7seeacmd7ijt3ev3uhk8 snode2 Accepted Down Active by4zyturqgn4uayg78ybt3tps snode1 Accepted Down Active vagrant@smgr:~$ docker node accept 0dq57qb235prslhqy6z6bk1l6 Node 0dq57qb235prslhqy6z6bk1l6 accepted in the swarm. vagrant@smgr:~$ docker swarm update Error response from daemon: rpc error: code = 4 desc = context deadline exceeded vagrant@smgr:~$ docker node ls ID NAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS 0dq57qb235prslhqy6z6bk1l6 Accepted Unknown Active Reachable 3yqk7wiz2w7o2q9m27dp76cl4 * smgr Accepted Ready Active Leader 89bgf7seeacmd7ijt3ev3uhk8 snode2 Accepted Down Active by4zyturqgn4uayg78ybt3tps snode1 Accepted Down Active vagrant@smgr:~$ docker swarm update Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
Promoting the node also did not help me - The command executes fine, however the status become Unreachable
vagrant@smgr:~$ docker node promote snode1 Node snode1 promoted to a manager in the swarm. vagrant@smgr:~$ docker node ls ID NAME MEMBERSHIP STATUS AVAILABILITY MANAGER STATUS 2djvv949g8fcbypx5w26271xf snode1 Accepted Ready Active Unreachable e5hc9jq52e42it54o74jc9v6d snode2 Accepted Ready Active f2ppi8i1ddyhe7mod85dx2yvg * smgr Accepted Ready Active Leader vagrant@smgr:~$ docker swarm update Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
Update: I have found a solution. Apparently when creating a swarm of masters the majority of (n+1)/2 must be present and active for swarm to function. So if there are 2 managers and one goes down, swarm goes down. However if there are 3 managers and one goes down the other two keep working. The initial master can be started with:
You can also use the “docker node” command to promote/demote other managers in your swarm:
~$ docker node --help
Usage: docker node COMMAND
Manage Docker Swarm nodes
Options:
--help Print usage
Commands:
demote Demote one or more nodes from manager in the swarm
inspect Display detailed information on one or more nodes
ls List nodes in the swarm
promote Promote one or more nodes to manager in the swarm
rm Remove one or more nodes from the swarm
ps List tasks running on a node
update Update a node
To add a manager to this swarm, run the following command:
docker swarm join --token SWMTKN-1-3ql91v9oxhgwh8mneqd4zclm954mmze2e7cmasomw3gjw9ajfr-dlpgrjaaeqmjf8xt4eyffphad 192.168.99.100:2377
Running above command in any node which you want to make as master:
docker@master2:~$ docker swarm join --token SWMTKN-1-3ql91v9oxhgwh8mneqd4zclm954mmze2e7cmasomw3gjw9ajfr-dlpgrjaaeqmjf8xt4eyffphad 192.168.99.100:2377 This node joined a swarm as a manager.