Syncing files between multiple hosts: NFS, GlusterFS or something else?

Before Docker we normally had a NFS server on a separate host(s) and then mounted it on nginx and app hosts, so that nginx instances could serve static files created by web app and app worker instances could process user uploads or download data files.

Migrating to Docker we would like to avoid installing NFS server/client on host machines (i.e. outside containers) but still have a way to sync/mount static files and user uploads between nginx and app servers.

For example, assume we have 2 nginx hosts (each of them runs nginx container), 3 web app hosts and 2 app worker hosts. Now we need a way to share static files of web application (sync each deploy) and user uploads with nginx and workers. How would you do this in 2017?

Do you use NFS on host machines and volumes in containers mapping to the mounted folders? Has anyone tried GlusterFS? Are there other options not involving 3rd-party storage (e.g. S3 etc)?

I’m thinking of running Ceph in Docker and using the S3-compatible Object Storage API to store my static files in the Object Storage. The Object Storage API is HTTP based so very easy to use. Then, I plan to deploy nginx with a large local proxy cache backed by the Ceph Object Storage.

The other alternative I have considered is to mount a shared filesystem implemented with GlusterFS. GlusterFS is a bit easier to set up and manage than Ceph. A shared filesystem is a little easier to store the static files and can be accessed like any other local files on the server (while still providing high availability if using the glusterfs client to access the filesystem).

But, in my case, I want to be able to eventually scale to a petabyte or more of files. Cloud storage is very cheap ($0.01/GB at my Cloud provider) and uses the same Object Storage API that Ceph uses. So, I can expand to use Ceph as a cache for serving static files that are permanently stored in Cloud Storage. I would then have 2 levels of caching (NGINX and Ceph) with Cloud Storage containing all static files. Then, if Ceph goes down, the Cloud Storage would still be available for directly handling cache misses in the NGINX proxy cache. If both Ceph and Cloud Storage go down, NGINX could server many of the requests from the proxy cache so maybe just a few users will be affected by the outage.

What’s wrong with NFS? It is high performance, stable and easy to both use and manage. That is what I use in Prod (NFS mounted from a NetApp filer).

In my lab, I currently use Gluster and it has been good with 1 notable exception: A MongoDB container with its storage in Gluster was a disaster. Mongo would not die, but my apps were constantly losing their connection to it and Mongo would not tell me why.

I haven’t tried others DBs yet, but my gut feeling is that the likes of Postgres and Mariadb would not fair much better.

Having said all that, I am of the opinion that one should not really put DBs in Docker anyway. Treat them as services to be consumed by the apps in your container.

TL;DR: NFS is a perfectly good solution, Gluster is good if you are sensible, but tread carefully (test, test, test).

1 Like

The Netshare plugin and NetApp plugin on docker volume plugins support NFS. While, you would still need to have NFS server on different host machines.

GlusterFS and Ceph are popular cluster file systems. They would be widely used by many people. http://www.networkcomputing.com/storage/gluster-vs-ceph-open-source-storage-goes-head-head/8824853 is a good article about GlusterFS vs Ceph. Probably you could test both to see which one performs best for your application.