Will adding a Config to a Docker Swarm Service always restart the container(s)? (Traefik / distributed LetsEncrypt)

I am still trying to figure out a way to use Docker Swarm with multiple Traefik instances and still be able to generate LetsEncrypt certificates.

I have been experimenting with a single certbot instance, but I am not happy with the distribution of the certs to the Traefik instances.

My latest idea was to just create a new config/secret with the cert and add it to the service for Traefik to pick it up dynamically. During development I found that the container is restarted when adding a new config.

Will a container always be restarted when adding a config to it’s serice? Is there a way around the restart?

I still use traefik 1.7.x for Swarm, as it is the last version that stored the LE certificates in consul and used it on all nodes.

A service will not pick up changes for an existing secret name or config name, unless the service is removed and re-created.

That’s why people usually append a suffixlike .v1, v2, …, vn to the secret name and update the service definition to reference the new secret name instead. This allows to update the service (which of course updates the service definitions, which in turn updates the service tasks, which in turn replace the containers)

I am not aware of any way that prevents the task instances from being replaced after modifying the configuration.
You could store your certificates on a nfs remote share, use it as volume with certbot and Traefik. Certbot generates the cert and stores it in the volume and Treafik can access the updated certificate. Though, I am not sure if Traefik will pick up on the changed certificate. It depens on whether It is read just once when Traefik starts, or refreshed periodicly, or read on each access… I have no idea how it’s implemented. You’d need to test it.

Thanks, I have experience with Traefik, fighting with it for a year, so that will work.

I just thought I could leave out the shared folder part, as that needs to be made high-available, too.

The Traefik 1.7.34 image now has critical findings. So 1.7.x is not an option anymore.

Indeed, the shared remote folder would need to be HA as well.

Kind of off-topic, but somehow on-topic:
In the past I would have used portworx/px-dev for this, which uses local node storage to create a cluster of strorages with all bells and whistles (including replication and nfs-exposure for ReadWriteMany volumes) it could be used as a swarm scoped volume plugin.

You could try to research if a comparable solution still exists. Just make sure the plugin itself is not run in docker itself, but is run on containerd, runc or on the host itself instead. If the plugin runs in docker, there will be a terrible race condition between “docker will miss the required plugin for volumes” vs “Plugin container can not be started, because docker doesn’t start due to the missing plugin”. Last I used StorageOS with Swarm, I pretty much had this problem… it was horrible.

I am afraid I have no good (or even other) solution for the situation.

You could try opening an issue on Github, but not sure if that will be resolved. It seems Traefik Labs fired at least 2 long time maintainers last month, don’t know their current priorities.

Docker Swarm with Traefik and distributed Lets Encrypt

I built a proof-of-concept to generate LetsEncrypt certs with certbot behind a Traefik v2 cluster, delivering the certs for provider.http, you can find it here.

I built another proof-of-concept to simply spin up syncthing in a Docker Swarm cluster to distribute the cert files to all nodes, you can find it here.

I don’t know how stable those two pieces are. I am currently working on a simpler certbot and syncthing solution. I will report back here in the forum when it works. But I am also open for other ideas.

I hope you find a satisfying solution.

At this point I would have switched to Kubernetes probably, as it actually allows updating secrets. Though, pods still need a restart to use the updates values of the secret.

It doesn’t hurt to know Kubernetes, as it allows fine-grained control deployments.

K3s has the same simple install as Docker Swarm, even comes with included Traefik. Rke2 should be same easy install, but with more advanced security.

Multiple times I have tried to get started with k8s, but always gave up, as there is so much to configure. And every time I ask people with experience, they say I should have 2 FTE to run the cluster and day-2 operation is not easy. And experienced consultants tell me they needs more than 2 days to set it up. It seems that’s just not my game for a tiny company with 10 servers. It may be okay for big companies with many servers and deep pockets.

I hear you :slight_smile:

The flexibility indeed has its price. And you need to put into account that you need at least 3 people who are able to maintain the environment to spread workload and tackle absence.

Though, performing the kind of deployments like they can be done with Swarm are just slightly more complicated on Kubernetes. Migrating the compose files to k8s manifests might look challenging in the beginning, It’s all the bells and whistles people add to their clusters like a monitoring stack, secretsexports, configreloades, which most people don’t even consider for the Swarm clusters.

Just found that cifs mount supports cache=loose, that should enable file cache and cert read even if network or share is offline. (Doc)

Will adapt my certbot and test.