I set up a working swarm composed by 3 node, 1 service and 5 replicas. These nodes are running on 3 different virtual machines, they have 3 static ip and all works fine in the swarm. 1node is the manager and 2 are the workers: worker1 and worker2. The problem appears when I directly “kill” a worker. eg when I stop the docker daemon on worker1 or if I turn off the virtual machine where worker1 run on. After that, if I turn on the virtual machine or docker daemon. By the command:
# docker node ls
I see all 3 nodes active and ready. Obviously the containers that were running on worker1 have been moved on manager and worker2. Since this moment no more new containers will be created on worker1 (eg scaling from 5 to 10 replicas doing # docker service scale id=10) unless I delete worker 1 and join again in the swarm. it seems worker1 can’t reconnect at the swarm after it fails.
Which is the best thing to do when containers are been pause for some error during something like:
# docker service update id --mount-add .........
I got often this kind of message.