'm learning Docker Swarm mode and I managed to create a Swarm locally with a web application and a PostgreSQL database. I can scale them and I see Swarm creating replicas.
I think I understand how Docker Swarm can load balance regular web servers, but how does it deal out of the box with database containers?
Outside of the Swarm context, usually databases have their own ways to deal with replication, in the form of plugins or extended products like MySQL cluster. Other databases like Cassandra have replication built directly into their product. On a Swarm context, do we still need to rely on those database plugins and features?
What is the expected pattern to handle data consistency between replicas of a database container?
I know it’s a very open-ended question, but Docker’s documentation is very open-ended too and I can’t seem to find anything specific to this.
In my limited experience it comes down to that swarm scaling is currently more applicable to stateless applications. As soon as you add state then the picture changes as you have generally have to (at least in the case of Postgres) have the data in a volume so you need to constraint it to a host. See the --constraint option in Swarm. Postgres replication will not be related to Swarm at all. You would need to have a slave that is constrained to another (different) node, then do normal PG replication. I’m not familiar with database like Cassandra with built-in replication, but the setup picture is probably easier. However, you still most likely need constraints as with databases it generally just doesn’t make sense to scale a database to run multiple instances on the same machine like you see with all the web server examples with Swarm.
As collinp said, Docker swarm currently scales well for the stateless applications. For database replication, you have to rely on every database’s own replication mechanism. The volume or file system level replication could provide the protection for a single instance database, but are not aware of database replication/cluster.
For databases such as PostgreSQL, the additional works are required. There are a few options:
Use host’s local directory. You will need to create one service for every replica, and use constraint to schedule the container to one specific host. You will also need custom postgresql docker image to set up the postgresql replication among replicas. While, when one node goes down, one PostgreSQL replica will go down. You will need to work to bring up another replica. See crunchydata’s example.
Use the volume plugin, such as flocker, REX-Ray. You will still need to create one service for every replica, and bind one volume to one service. You need to create all services in the same overlay network and configure the PostgreSQL replicas to talk with each other via the dns name (the docker service name of the replica). You will still need to set up the postgresql replication among replicas.
Use FireCamp. FireCamp is an open source project to simplify the setup and management of the stateful services, including PostgreSQL, on docker swarm. It does all the manual works for you. So you could setup a PostgreSQL cluster via one command.