High Availability in Docker Swarm

As you have mentioned all official and “professional” HA systems need atleast 3nodes.

But there are Cluster HA environments in Oracle Sun Solaris, Veritas and Linux pacemaker etc and these are also “professional”.

They can run HA in 2 node setup and I have personally worked in it.

But docker is new to me and hence asking twice the same question doesn’t make it senseless.

Or may be for you since you might not have come across this in other platforms.:wink:

You are right, I might just be thinking about the usual scaling Internet technologies.

Now you are starting to make sense :+1: :blush:

That might be true, but having three nodes is not something that Kubernetes or Swarm requires just because they could not do it better. It is necessary for a certain level of stability. The so-called “Split brain scenario” mentioned by @meyay can most likely happen with the cluster solutions you mentioned, but they can have other ways to make the system more stable. For example in some cases you don’t have 3 manager nodes (Let’s call it that for now), but you have a third place where the cluster state can be saved and independent from the two managers. When the network breaks between the managers, but both can access the stored state on the external, common storage, that sstorage can act as a third manager from the point of view of the actual managers.

In other cases some manual interventions can be required.

Either way, we all stated the same, Kubernetes and Swarm works this way and you need 3 nodes. As far as I know, in Kubernetes, you could have an external etcd database cluster (on three nodes) and have two manager nodes, but that would actually mean 5 nodes eventually. Unless of course you have an already existing etcd cluster. K3s uses Kine to support converting etcd requests to MySQL or Postgresql queries, so if you have an already working database like those in HA, you would not need to add more nodes, although I’m pretty sure it would be a little slower then directly talking to etcd and you would still need to solve highly available MySQL or Postgres. Postgres can also run in HA with two nodes, but… it requires a replication manager as far as I know which is recommended to run with 3 replicas, although not required. Whatever you do, at some point in the infrastructure, you will need three of something or accept that it will be not as stable as many system / customer would require.

So in the end, it is up to you how stable system you build, but in Kubernetes and Swarm you will need 3 manager nodes.

If you can virtualize, you can run two virtual machines on your three physical machine and configure 3+3 managers on those for the dev and the prod clusters. Allow scheduling containers on the manager nodes so you do not need additional worker nodes. Ths could be acceptable depending on your exact requirements, but normally I would only run manager services on manager nodes and have dedicated worker nodes since these manager nodes have to be protected from intensive CPU, memory and network usage to remain stable.

If you know how other systems can work perfectly with only two nodes, please, share, so we can learn, but so far I haven’t seen any actual highly available system that can work with 2 nodes without a third component and still avoid split brain scenarios.

That was the point that if you don’t have an odd number of nodes you will have split brain. If you have only two nodes, it is just a double chance to fail, as any of the two nodes fail, the other can’t work either.

1 Like

I am using on-prem ubuntu servers. I have created docker-swarm network and multiple api’s will be running on my worker nodes so, I just want to know if someone can help me to know that what can i do to get high availability here and how can i use mysql database with great extent ? there is a external load balancer which accepts traffic and then send this traffic to my docker swarm after this i havent created any structure but i want my services to have high availability and dont want any disturbance when my primary mysql database is getting jammed because of excess requests. How should i handle such project like suppose trading application with zero downtime and high availability ? also the main code is dotnet if it helps

If you have an external load balancer, just run multiple replicas of reverse proxy like Traefik on at least two nodes.

Make sure to have at least 3 Swarm managers for HA, which can also run workloads.

Run your SQL database in a cluster, like MariaDB Galera.

This topic was automatically closed after 10 days. New replies are no longer allowed.