How to make nodes retry joining the swarm on timeout?

nvivo · January 23, 2018, 12:21am

I had this problem a few times already. I start a few hundred VMs on AWS, and have a cloud-init script with a command to join the swarm: “docker swarm join --token …”

Sometimes, the swarm master is unresponsive or unreachable, and some nodes are not able to join with a timeout. But they don’t appear to try again ever, and I’m lost with a few dozen instances that didn’t join.

What is the correct way to deal with it? I have tried to wait, but nodes doen’t seem to retry joining. Rebooting the instances that didn’t join causes the instances to forget about the swarm.

Do I need to script a loop until it joins or is there any built-in way to make nodes retry by themselves every couple minutes at least?

Topic		Replies	Views
Swarm 1.12 with boot2docker - hosts never rejoin cluster after reboot Swarm	0	1405	July 15, 2016
Node takes a long time to join swarm Swarm	3	1708	June 13, 2022
Hello everyone, I am trying to form a Docker Swarm with a manager node and 2 workers. The problem is that the worker nodes do not join the swarm, and they show me the following error: Swarm swarm	1	516	August 29, 2023
Swarm join problems: Timeout was reached before node joined Swarm	3	651	April 2, 2024
Can't Join Worker Node to Swarm Swarm swarm	0	1674	March 20, 2020

How to make nodes retry joining the swarm on timeout?

Related topics