I want to deploy a Cassandra and HDFS cluster in Swarm mode, as far as I know, I can’t just use the local docker volume for persisting data as if the container is killed and restarted on another host, the container will lose access to the old data.
After some research, I found that a possible way is to use Flocker for migrating the data volume to another host. However, I am not sure if Flocker is ready for production, will it degrade the performance of Cassandra and HDFS?
If I use Flocker with EBS, does the data migration still work if the EC2 instance is dead or terminated?
My solution to persistent data local to the server is to not run a multi-node swarm. Rather each node is a single node swarm and services/containers are deployed using docker-compose v3 using the new
docker deploy feature (I’m working on the transition to Docker 1.13 now, current deployment is done with systemd services).
Persistent database data is replicated using MySQL Group Replication (5.7.17) on 3 single node swarms to provide high availability in the case that one of the servers crash or reboot unexpectedly.
I do deploy an haproxy service on each node for HTTPS (in front of the web servers listening on the public network) and for TCP (in front of the mysql servers listening on a private network) that handle traffic between the internet and my servers and between the single node swarms.
So, basically, the only swarm mode features that I really use are services and, soon, the new secrets capability but the servers are managed as separate nodes with network connectivity between nodes provided by the host network interfaces (public and private physical networks).
I also have Dovecot mail persistent data set up similarly to MySQL structure persistent data except using a Dovecot IMAP proxy and Postfix SMTP on groups of 3 servers (with the persistent data replicated by Dovecot between the 3 servers, same as MySQL). I am also looking to upgrade the MySQL proxies to use ProxySQL instead of HAProxy for high availability of the databases.
@ktwalrus For what it’s worth you don’t need to run each node as a separate Swarm master to stick containers to a particular host. It’d probably be easier to use constraints (https://docs.docker.com/engine/reference/commandline/service_create/#/specify-service-constraints---constraint) to “stick” them to a particular host (either using pre-determined labels or node ID / hostname directly). If the node failed you would not be able to reschedule them, but you have that problem already anyway.
Using constraint works, while the container is stick to one node. You will have to handle the failure by yourself.
The volume plugin, such as flocker, REX-Ray, would help. You will need to create one service for every Cassandra member. For example, you want to create a 3 nodes cassandra. You need to create 3 swarm services with 3 volumes, such as cas-0, cas-1, cas-2. All 3 swarm services need to connect to the same overlay network, such as cas-network. Then you need to config cassandra correctly.
When the EC2 instance is dead, container will move to another node, the volume plugin will move the volume together. The swarm overlay network moves the service dns name such as cas-0 with container.
If you want the cassandra nodes run on different availability zones, you will still need to use
constraint to tell swarm to schedule one service to one availability zone only.
The application has to attach to the same overlay network to access cassandra, or some proxy needs to be setup.
Another option is to use FireCamp. FireCamp has a built-in volume plugin, and it does more than the simple volume plugin. It automates the cassandra configurations for you. Every cassandra member gets a unique dns name, such as
cas-0.clustername-firecamp.com. The application in the same AWS VPC could access cassandra via this dns name.
When the EC2 instance is dead, the container will move to another node with the volume, and the dns name will be updated automatically to the new node.