What happens when a swarm worker node with running stacks times out of the swarm and rejoins?

Hello,

We have an issue where the dockerd on a worker node redeployed the stacks running on a worker node. From the logs, it looks like the worker node is having trouble communicating with swarm just before the stacks are redeployed. What happens when a swarm worker node with running stacks times out of the swarm and then rejoins? Would this cause the stacks to be redeployed?

Docker version 18.09.2, build 6247962
Ubuntu 18.04.1 LTS

Thanks,

Erik

In general, if a node goes offline, the orchestration layer will redeploy any tasks that were running there to suitable other workers. Those new tasks on the new nodes will stay put until they exit or are evicted. The presence of a new node will not cause an overall redeployment of happily running tasks from other nodes on to the “new” node.

All this assumes there’s sufficient capacity in the cluster to survive the loss of a worker node, and it’s possible to run the displaces services/tasks onto some other node(s). I mention this because some folks deploy under-sized clusters which can’t really handle the loss of a worker, or they heavily constrain specific services to run on only a single node or very small set of nodes. In cases like that, the displaced tasks might not be able to run at all when the node stops/errors/is rebooted/is unavailable.

Hmm. What happens when a stack is constrained to a specific node, the node becomes unavailable, and then becomes available again? Does the swarm detect the stack is already running on the node when it return and leave it alone or does
the stack get removed and created fresh on the node?