Afraid for the Future of Swarm

The Switch to Swarm

I recently switched from using Rancher to using Docker Swarm for container orchestration. Me and my partner loved Rancher and used its built-in Cattle orchestration engine, but with the upcoming Rancher 2.0, they were only going to support Kubernetes orchestration. We tried to use Kubernetes, but we simply don’t like it. It is overcomplicated, clunky, and resource intensive compared to the simple and elegant solutions of Cattle. Upon comparing Cattle to Docker Swarm, we liked Swarm even more than we liked Cattle. They were pretty much the same in most respects, but Swarm was built into Docker and didn’t require running a dedicated server. We have already set up a working swarm cluster and it is working very nicely for us. We even created a Swarm docker plugin for distributed storage using LizardFS.

Concerning Issues

After getting into Swarm I found an issue on GitHub that concerned me quite a bit. There is a limitation imposed by Swarm that you cannot run privileged Swarm services. This is a pretty major limitation, especially when the Docker run command already supports it and many different applications need to have system privileges. It was almost a show stopper for a Docker plugin I made. I had to create a Swarm service that would run a standalone Docker container to workaround. Additionally, this issue was created two years ago and yet has had no progress so far.

It’s not the only issue like it either. There is another issue where the docker stack deploy command won’t source a .env file like Docker Compose does. It is a valid concern for the users of Docker, but the issue is still there without any progress after a year-and-a-half. And here are some more:

In all of these issues you see the users of Docker trying to hack their way around an issue that the Docker team doesn’t seem to care about fixing. Sometimes the response is that it is not a “bug”, but you have a lot of people who have valid use cases that they cannot facilitate because of these issues. The issues were raised by users who actually have a need for the feature, but the Docker team doesn’t seem to agree with the users that they need it.

To be completely frank these issues are starting to scare me away from Swarm, and to a lesser degree, Docker itself. They seem to represent the Docker team’s attitude as one of thinking that they know how the tool should be used more than the people that use it. I understand the Docker team’s perspective sometimes, but I think that they are really pushing away the users of Docker in these instances.

djbingham:
I feel like this statement reveals a fundamental disconnect between the way Docker’s developers and users view these products.

killcity:
Gave up on this and switched to Kubernetes.

My fear is that I am eventually going to have my own issue that is going to be a blocker to me using Swarm, and that I am going to get similar resistance from the Docker team. The privileged swarm service issue was almost a show-stopper for one of my solutions ( and I still think that it is a shame that it has not been fixed yet ). I don’t want to go with Swarm if I am going to have to switch to something else later because of a limitation. That could be an expensive and long process if I start a major project with it.

What is the Future of Swarm?

Me and my partner still love the way Swarm is setup. The way that it handles networking and high availability and service discovery could hardly be more ideal in our opinion. It is the attitude of the Docker developers that is concerning. I would like to get some answers from the Docker team about some questions:

  • Does Docker ( team ) want to keep Swarm going?
  • Is there a Roadmap or plans for continuing development on Swarm?
  • Are you going to give up Swarm eventually because you are supporting Kubernetes?

Me and my partner don’t wan’t to switch to Kubernets. Swarm is so much more elegant without sacrificing hardly any features. Swarm deserves more care than it gets.

If the Docker team could grant me any insight in to what the future of Swarm looks like it would be appreciated. I want to use it, but at this point I am very concerned about where it is headed.

10 Likes

What would be a good use case for the privileged mode? The only thing I saw was that you cannot run “elasticsearch” inside a container. However for something like that I would question the choice of using a container for the backend services such as Elasticsearch, Redis, MySQL, MongoDB, even Docker Registry.

Mind you I do run redis, mysql and Sonatype Nexus (as a Docker Registry) using GlusterFS (using my own volume plugin) on my setup (because I can). But I would never recommend doing so in any production system especially if it is in AWS.

For one, if you use a cloud service for ElasticSearch, MySQL, Docker image registry even the for file storage. they would manage the availability and security for you. So you can focus on the parts that are really important which is your custom application.

Putting everything on a container is an interesting mental exercise (I try to do that) but it yields no pragmatic value if you need to manage EVERY LITTLE THING. Exception I have to this rule is on development environments, I would like to be able to replicate production very quickly if possible so that’s why I would run MySQL in a container.

Mind you there is one thing that K8S does that I haven’t found out how to do in Docker Swarm environments which is to obtain the source IP for a replicated service. That would allow me to do GeoIP related analytics. Though in practical sense of things, I just use Google Analytics for that since in the end it’s the users I am looking for. DDoS attempts from an IP range I would leave it up to the cloud provider to manage for me because if it does happen there’s not much I can do to alleviate it.

However, your fear is not unfounded there’s isn’t much news about swarm. Except that I know they’re working on service based plugins (which will fix the stupidity of doing docker plugin install in every node)

@trajano My approach to applications is actually to containerize everything, even in production. So far it hasn’t failed me; that doesn’t mean that it won’t on day, but putting everything in containers allows you to completely automate you application deployments. For example, my LizardFS Docker plugin, can actually deploy a fully functional LizardFS cluster on Swarm by running a single command.

Even if that wasn’t the case, people were still having problems without privileges for webservers and load balancers. The thing is that people do find use cases for these things. I don’t think it makes sense that they stay unimplemented just because not everybody needs them ( especially when the features are already in normal Docker containers ). Even if I didn’t need it right now, it is still a concern that I might find a need for it later and then end up stuck because it isn’t there.

Just looking at your plugin it seems that you still need an existing LizardFS cluster but you can set that up on the swarm (which is something I was planning to do on my end too https://github.com/trajano/docker-volume-plugins/issues/4)

For the privileges issue, the only thing I can think of is that they are working on they have an experimental API https://docs.docker.com/engine/api/v1.37/#operation/ServiceCreate (if you open up Task Template, there is a Plugin Spec) which will deploy the plugin as a service so you don’t need your workaround. But that is not exposed to the CLI or the stack deploy as of yet. I am waiting on that myself.

Once the API is finally exposed in an easy to use fashion we could create plugins that act as services with privileges and have it working in swarm. When that would be I cannot say, but it is in the experiment API already.

Like I said the only issue I have about containerizing everything is that you need to manage everything yourself. Maybe you’re a small business with a small set of developers, but if you’re in a large organization where there may be many people of varying skill levels and time in a company you may encounter issues like that of Equifax where they just happen to not upgrade struts to a newer version and they had a data breach.

That’s nice to know. Thanks for the info. At least it looks like it might be in progress, then. :slightly_smiling_face:

Yes, there may be some challenges. There are a lot of things to take into account, but me and my partner are still going to work on automating workflows and designing solutions until we figure it out. :wink:

In the end I’m just hoping that Docker is going to keep Swarm going, and not just keep it alive, but also help it to be great. It’s the best orchestrator I’ve found so far. :crossed_fingers:

1 Like

If you deal with a separate operations team (which tends to occur in large organizations) they’ll thank you for not overtly complicating their lives too much.

We face also the privilage problem with docker swarm, and we used the same workaround as kapono did. Start a service, that can start again a container with docker run.

My usecase is for example the usage of ipmi tool to manage on premise servers. Or to deploy a quobyte storage cluster. Or to mount volumes with native clients within container. This all are essential for us, and we want them together with swarm to get a HA capability for this services.

So I can really agree with kapono, that I am a bit afraid from the way, how docker reply to this kind of problems.

Another problem is, that in swarm you don’t have a way to supply constant IPs to services, that run on a macvlan network interface, what is again essential for us, since the overhead from overlay networks is not acceptable.

And we really need containers, since we use CoreOS on our systems.

But all this movement with CoreOS to Fedora CoreOS/RedHat CoreOS, docker integrate Kubernitis and so on is really scary. It is not acceptable for us to exchange every year a major component of our stack.

So it would be really good to know more about the future of swarm. Especial, since Kubernetes is for us as for kapono far too much and too complex.

1 Like

I sympathize with your situation. However, this could’ve been avoided if people didn’t use the container technology as something equivalent to a full virtual machine.

For me I try to limit any production usage of Docker clusters (Swarm/K8S) to what Java EE application servers like WebSphere used to do which is to host the application. I also look at Docker containers as ephemeral and simply an isolated area to run my application service component without the full cost of a VM.

Perhaps you may want to rethink your architecture and your business scope. Is your primary business scope to build an application that people use so you can make money or not. Is the infrastructure purity (i.e. container everything) significantly more important than delivering a product?

FYI I am not a Docker developer, though as Abraham Maslow said in 1966, “I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” seems to apply to a lot of issues I see with the whole container based architecture, just remember there were many other technologies before containers and they are still valid.

Our product IS the infrastructure. And we require the capability to provide static macvlan ips as well as privileg for at least one container, since this container require HA.

So if we would follow your advise, we could just stop our project, since we already have the alternative with pacemaker, drbd etc.

So the handful of missing docker swarm service capabilities more or less break our whole concept, that provide a real benefit for us as developers as well as our admins. Until now we have found for everything a workaround, but this was lots of effort and is sometimes really ugly…

PS: I just forgot to mention, that a full VM is not acceptable for us with respect to the performance lost, as well as cost for something like VMware HA.

1 Like

Then for sure it really appears that Docker Swarm really shouldn’t be used. What you need is a whole lot of flexibility and Docker Swarm really ain’t it unless you are willing to do a lot of extra programming (which I did for my own stuff namely the docker volume plugins for CIFS. GlusterFS and NFS)

Now if you are willing to… you may want to forgo trying to run things in service containers and build your own managed plugins

Since you’re doing something on the network Docker network driver plugins | Docker Docs may be something to look at along with https://github.com/docker/libnetwork/blob/master/docs/remote.md

You can probably integrate that with pacemaker, drbd, etc. I can’t really tell as I don’t know those specific products.

For developers though, I find vagrant does what I would need for the most part to set up development and test VMs. I also use the simplistic “shell” provisioner rather than something elaborate as I have better control.

1 Like

Well I, for instance, had the worst experience ever with Swarm. The issues I found:

  • When our servers became low in memory, Linux’ OOM would kill swarm orchestrator. This meant that:
    • docker service create worked fine, but new images were not created
    • The other nodes on swam didn’t catch that the problematic node was offline, so they kept sending commands to the problematic node
    • The images were not being created, so swarm auto-moved these images to another node… causing a memory overflow, and breaking the whole cluster
  • Sometimes, when a node was unreachable, swarm didn’t mark it as such. So, new commands were sent to the problematic node all the time…
  • Swarm’s internal monitoring simply didn’t work - when a node was down, the other managers still saw that node as up.
  • Most of the problems we had with a swarm node needed a complete reboot. But swarm doesn’t re-balance services when someone ingress, so we had to re-balance all of that by hand when reboot finished. But remember my first point: Linux OOM would kill swarm in a way that other nodes didn’t identify it as down. So, a reboot probably will kill other nodes…

Most of these problems dissapeared when we migrated to Kubernetes. Anyway, I don’t know how Swarm is nowadays, but I’ll probably not use it in production yet.

Actually the Linux OOM killer will kill any random process, you just happen to hit it with the orchestrator. I’ve had similar OOM Killer issues with K8S so it’s not immune.

I still had to reboot and rebuild K8S clusters to recover everything as well when the provisioned amount of disk or memory ran out and the etcd is unable to recover.

Mind you in Swarm you need to be more explicit in your resource limits to make sure you don’t get the OOM issues (same with K8S) but it’s not set by default and they don’t warn you either.

2 Likes

Okay, but the problem was that when it killed the orchestrator, the node was still marked as “up and running”. Swarm would still redirect commands/images to that node, that silently failed; swarm router still redirected requests to that node, that would not respond; as the orchestrator was down, it simply didn’t send Healthchecks to images running on it, so everything was marked as “up and working”, so the cluster didn’t know that there was some problem with a service…

It was, indeed, the worst possible outcome…

I use Portainer with my raspberrypi cluster works like a charm, it’s lite and allows me all sorts of access and extensions
docker pull portainer/portainer

docker service create
–name xxx
–detach=false
–replicas=1
–constraint ‘node.role == manager’
–mount type=bind,src=//var/run/docker.sock,dst=/var/run/docker.sock
–mount type=bind,src=/path/to/VOL/data,dst=/data
–mount type=bind,src=/path/to/Certs,dst=/certs
portainer/portainer:latest
-H unix://var/run/docker.sock \

tcp://host.domain.com --tlsverify … …etc etc
.

there’s a link to the deployment on Read’D’ Docs and Portainer is found here on DockerHub
try it out Swarm is here to stay ; )

I use Portainer for my Swarm too. I really like it. Swarm is really cool, I just hope that the Docker devs realize it. :slight_smile:

1 Like

given very little interaction on these threads from anyone at docker, I think we have our answer

2 Likes

So far, Swarm is strong and not going anywhere.

From a great teacher (Docker Mastery Video Course + Docker Swarm Video Course on UDEMY)

1 Like

@kapono I was afraid too about future of swarm mode and decided to start fixing those long living issues. Especially those ones which would be useful on our use cases.

There is some roadmap discussion on https://github.com/docker/swarmkit/issues/2665 and I have shared my TODO list on there too.

IMO, biggest issue with swarmkit is that there is no enough contributors (only 102 currently when moby/docker engine have 1814 and kubernetes 2135) so I would like to see that instead of complaining on forums people would help swarmkit involve.

Those who are ready to do so can reach me on Docker community Slack and I do my best to help you to get your pull request approved.

@olljanat I’m very glad to see that you are working on this stuff! I’ve actually checked out your work on the Swarm privileges recently and was glad that somebody was finally doing it. I’ve needed that for a while.

Me and my partner have since continued with our Swarm usage, and it is still doing its job and we still haven’t found a case where we found need to use Kubernetes over it.

Just know that the purpose of this topic wasn’t to complain, but was meant to search out what Docker’s intentions were with Swarm. I definitely agree that Swarm could use more contributors and I’m the kind of person who would love to jump in and work on different Open Source projects, but there is a lot of technology that supports my business and I can’t just work on anything I want to.

Definitely appreciate you offering to help other contributors as well. That could definitely make a difference in the Swarm enthusiast community.