Question reboot Docker Swarm

Hello everyone, I’m new to the forum and I’m not an expert in Docker or Docker Swarm, and I have the following question:
I currently have a Docker Swarm environment with 3 VMs, one acting as Manager and the other 2 as Workers, its version is 20.10.17.

I need to perform maintenance tasks on the machines, so I have to restart all 3 nodes. How do I do this correctly to avoid breaking anything?

If you could explain it to me in an easy-to-understand way, I would appreciate it.

Thank you very much for your responses!

For HA and fail-over reasons, you should use 3 managers. If one fails, another can take over the role. You can still run regular workloads on them, no need for dedicated worker-only nodes.

When one node fails, Docker Swarm will usually re-schedule the tasks/containers to running nodes.

Hi, so should I create 1 VM and make it the Manager of the Swarm environment? This way, would I be able to restart VMs without any issues? I assume the recommended number is 3 Managers, but would 2 suffice for me

No, it is not the recommended, it is the minimum for HA.

https://docs.docker.com/engine/swarm/admin_guide/#add-manager-nodes-for-fault-tolerance

Check the table that shows if you have 2 managers, you have no fault tolerance. You have to have odd number of managers. This is the algorithm that is responsible for it:

Of course!

"Understood, thank you very much.

Currently, I have 1 Traefik instance in the Manager. If I simply add 2 more nodes, totaling 3 Managers, will the leader have the container for my Traefik? We deployed it with docker-compose, and it has the following line:

deploy:
  placement:
    constraints: [node.role==manager]

It probably depends on how you deployed it, but I use Kubernetes, not Swarm, so you will need to wait for @bluepuma77 's answer :slight_smile:

Side note:


Please, format your posts according to the following guide: How to format your forum posts
In short: please, use </> button to share codes, terminal outputs, error messages or anything that can contain special characters which would be interpreted by the MarkDown filter. Use the preview feature to make sure your text is formatted as you would expect it and check your post after you have sent it so you can still fix it.

Example code block:

```
services:
  service1:
    image: image1
```

1 Like

You can promote your regular worker nodes to managers to have HA for the management layer. Then you could upgrade one node after the other.

But it really depends on your setup. With the Traefik constraint on manager, you currently only run a single instance, so your domains in DNS will probably point to that single node. So when you run an update, you will have a short service interruption, if you take no extra measures.

We run an external load balancer in front of 3 Traefik servers, that way we can update one at a time.

Good morning, after adding 2 more manager nodes, I was able to restart and everything is perfect!
Thank you very much for your help.