Docker Community Forums

Share and learn in the Docker community.

Persistent data across swarm

Hi,

Sharing named volumes across hosts seems like a necessity for any production use of Docker Swarm Mode on a real-world application (where data storage is required). Without this, if one node goes down I might lose access to all of my data, which rather defeats one of the main purposes of horizontal scaling - to remove/reduce single points of failure. And yet this isn’t supported out-of-the-box and all the plugins I can find for this purpose are archived on GitHub or haven’t been touched in years, which makes me hesitate to use any of them as I don’t want to risk future security issues due to unsupported plugins.

For example, the plugins I’ve found the most references to in articles and tutorials are Flocker and Gluster FS. However, Flocker seems to have lost all support since ClusterHQ went out of business and the GlusterFS repository linked from the docs is marked as archived.

The fact that people do use Docker swarms in production tells me I might be missing something. I, therefore, have a few questions:

  1. Are these volume plugins safe to use despite the lack of updates? (I would imagine not but if I’m right then I’m surprised they’re still referenced in the official Docker documentation)

  2. Does anyone know of a reliable, currently supported plugin for sharing named volumes across swarm nodes that doesn’t depend on external services such as NFS mounts?

  3. What other solutions do people use for data storage in production? Are you simply running databases and file stores completely outside of your container-based infrastructure?

David,
This is a well-posed question and I am wondering if you ever came to any conclusions?
Thank You,
Larry

You have to choose between:
– Remote file share (CIFS/NFS)
– Storage cluster (Ceph/GlusterFS)
– Container native storage (Portworx, StorageOS)
– provider specific storage (AWS/Azure/Storage Vendor)

The remote file shares are easy to use and can be used with the local driver. Everything else needs an installed volume plugin and the installation of kernel drivers or setup of the storage cluster itself. If the bandwidth, latency and iops of the NFS/CIFS share are sufficient… there is nothing wrong with NFS or CIFS (though NFSv4 is recommended over CIFS)

You might want to take a look at Ceph and use the Rex-Ray volume plugin. Portworx looks promissing, it consist of kernel drivers, a command line tool and takes care of clustering containers accross the local storages of the nodes - you get local blockdevice speeds and still data replication. If a container dies and respawns on a different node, the replica on the node will become the master and sync its changes to the other replicas.

I personaly use StorageOS for my development environment, even though it is not supported to work with Docker Swarm - it was a headache to setup and works stable most of the time. I use it with a free developer license, which allows to create a storage cluster with a total of 500gb storage.

In productive environments databases are operated outside docker. In test environments it is not uncommon to operate the database in its own Docker Stack.