Changing from NFS volume to local bind (GlusterFS) has no effect

tamasgal · October 27, 2021, 11:46am

I have a problem with services which used an NFS server as volume bind. The compose file had a section like this:

  xwiki-data:
    driver: local
    driver_opts:
      type: "nfs4"
      o: addr=123.45.67.89,nolock,soft,rw
      device: ":/raid/nfs/swarm/ecap-wiki/xwiki"

but I have issues with the NFS server and wanted switch to a local bind (GlusterFS which is mirrored to all nodes). So I changed the volume section to:

  xwiki-data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: "/data/glusterfs/swarm/ecap-wiki/xwiki"

copied over all the data with rsync -avz ... and then did a

docker stack rm SERVICENAME
docker deploy -c config.yml SERVICENAME

The service is now constantly crashing due to a “connection refused”, because it is trying to access the old NFS share. I don’t understand what’s going on. I also deleted the Docker volume and redeployed a few times but the error always says that it’s still accessing the NFS:

# docker service ps --no-trunc ecap-wiki_web
ID                          NAME                  IMAGE                                                                                            NODE                DESIRED STATE       CURRENT STATE                     ERROR                                                                                                                                                                                   PORTS
yz7s0dr6qtyzk312o7wm9ehft   ecap-wiki_web.1       xwiki:lts-mysql-tomcat@sha256:91b56f7635dc9031cfc777f8bbccc955b3c38b575cc1c30e27b19c802441ffe7   pi1080              Running             Starting less than a second ago
6xowo1k9meyflfilp6bym7ft4    \_ ecap-wiki_web.1   xwiki:lts-mysql-tomcat@sha256:91b56f7635dc9031cfc777f8bbccc955b3c38b575cc1c30e27b19c802441ffe7   pi1085              Shutdown            Rejected 10 seconds ago           "failed to mount local volume: mount :/raid/nfs/swarm/ecap-wiki/xwiki:/var/lib/docker/volumes/ecap-wiki_xwiki-data/_data, data: addr=123.45.67.89,nolock,soft: connection refused"
vbh3hs3uf02py1syvnf8joj2h    \_ ecap-wiki_web.1   xwiki:lts-mysql-tomcat@sha256:91b56f7635dc9031cfc777f8bbccc955b3c38b575cc1c30e27b19c802441ffe7   pi1085              Shutdown            Rejected 15 seconds ago           "failed to mount local volume: mount :/raid/nfs/swarm/ecap-wiki/xwiki:/var/lib/docker/volumes/ecap-wiki_xwiki-data/_data, data: addr=123.45.67.89,nolock,soft: connection refused"
tpjdx584q63i7qeoj40d8m2au    \_ ecap-wiki_web.1   xwiki:lts-mysql-tomcat@sha256:91b56f7635dc9031cfc777f8bbccc955b3c38b575cc1c30e27b19c802441ffe7   pi1085              Shutdown            Rejected 20 seconds ago           "failed to mount local volume: mount :/raid/nfs/swarm/ecap-wiki/xwiki:/var/lib/docker/volumes/ecap-wiki_xwiki-data/_data, data: addr=123.45.67.89,nolock,soft: connection refused"
p9iy3zr13hiut0ub2mnginl0t    \_ ecap-wiki_web.1   xwiki:lts-mysql-tomcat@sha256:91b56f7635dc9031cfc777f8bbccc955b3c38b575cc1c30e27b19c802441ffe7   pi1085              Shutdown            Rejected 25 seconds ago           "failed to mount local volume: mount :/raid/nfs/swarm/ecap-wiki/xwiki:/var/lib/docker/volumes/ecap-wiki_xwiki-data/_data, data: addr=123.45.67.89,nolock,soft: connection refused"

Any ideas why where this configuration might be cached, or what’s going on here? I did this docker stack rm ... and docker deploy ... combination a lot of time in past when changing the service configuration and it always worked. I thought that docker stack rm would remove everything…

EDIT: Btw. the only workaround so far is to choose a different name when deploying the stack, but this is of course not a solution.

tamasgal · October 27, 2021, 3:29pm

It seems that the volume configuration is stored somewhere hidden inside the docker blackhole and cannot be overwritten by the stack YAML file.

meyay · October 27, 2021, 5:21pm

The volume configuration is immutable, thus changes on the volume’s configuration won’t be reflected to an existing volume.

Remove the deployment, delete the volume declaration on each node (as the local driver is not swarm scoped) and let docker-stack deploy create it with the new configuration.

tamasgal · October 27, 2021, 6:51pm

This does not work. I did exactly the same, as written above. Removed the stack, verified that all services are gone, deleted all volumes on each node manually and when I redeploy the new configuration (with only the volume part changed) it still tries to mount the old one.

The only way is to rename the service but that cannot be.

I can reproduce the issue very simply.

meyay · October 27, 2021, 7:10pm

Something realy is off. A volume is merly a configuration handle. If it’s deleted a new one with a different config can be created with the same name.

Though, like I wrote: the volume needs to be removed manualy on each node seperately (docker volume rm ${stackname}_xwiki-data)

tamasgal · October 27, 2021, 7:48pm

Yes this is what I also expected but the old configuration is sitting somewhere. I will provide an MWE and open an issue.

meyay · October 27, 2021, 8:07pm

This might be a long reach, but recently another member in this forum experienced the unexplainable as well: it was caused by a prallel installation of docker’s docker-ce package and snap’s docker package.

Btw, the metadata of the volume is located in /var/lib/docker/volumes/ecap-wiki_xwiki-data/opts.json on each node the container using it was running.

tamasgal · October 27, 2021, 8:14pm

In my case it’s a regular Debian bare-metal installation from the official Docker apt repository.

I first thought it might be related to the NFS server which died, so that it caused some weird state where the configuration could not be removed or so. But then it happened with a glusterFS volume again, which I used for a couple of hours as a temporary solution. Now I have a new NFS server set up and needed to rename the stack the second time. All other configurations are still there, if I chose the old names, but the volumes are all deleted.

This means, if I deploy the stack with the newest configuration, I get an NFS error when I pick the original stack name, a GlusterFS error when choosing the second name and the third name works, all three deploys with the very same configuration file.

Only the last volumes which are bound to the new NFS server are there (docker volumes ls)

meyay · October 27, 2021, 8:26pm

Something is way off! It shouldn’t retain any configuration of a removed volume. Ofc this only removes the handel but leaves the data untouched on the remote share or in the bind source folder.

There is clearly something wrong. When we where still using Swarm, I was fed up about volume changes required the volumes to be deleted.

tamasgal · October 27, 2021, 8:37pm

It sounds like you switched to something else? k8?

meyay · October 27, 2021, 9:19pm

3 or 4 years ago our companies customer seem to silently have agreed that their container strategies are soly based on kubernetes. Lately I have been buisy with crafting cloud based solutions and IaC/build, delivery and deployment chains.

If your storage demands do not require much capacity. I can highly recommend portworx with Swarm. The develop edition (px-developer is free and If i remember correctly support 1TB of storage. It basicly turns each node of your cluster to a storage node and handle replicas under the hood for you. The docker plugin is solid and the installation is a matter of minutes. It might resemble GlusterFS, but instead of beeing a storage cluster, Portworx is a cluster of storages The fun part is, It allows different different storage types (hd/ssd/nvme), different profile (general, db, and others I don’t remember) and creates one to one block devices for volumes or one to many nfs baked block devices for volumes.

tamasgal · October 27, 2021, 9:25pm

Ah ok, I see. Well, we are currently moving from an old setup with OpenVZ/Xen to a more modern approach. I decided to go with Docker Swarm and there will be roughly 12 nodes in the first configuration (early next year) and two big storage servers, around 100TB each. I planned to do a GlusterFS on the 12 nodes since each of them will have at lest 2 to 6 drives (RAID1 and RAID6), but everything is still in the planning phase.

I am more concerned about the orchestration. I considered going with Kubernetes but it’s much more work. Docker Swarm is quite lightweight and I can easily teach others to help me maintain it. So far however, I still got trouble in many areas. Especially with configurations of the volumes, with the firewall, which is quite annoying since Debian Buster switched to nftables and I am reluctant to switch to the legacy iptables mode, for a system which will be built up in 2022. Docker still requires iptables and it’s really messy to get both work together in a transparent way.

Well, enough said, that’s how it is

meyay · October 27, 2021, 9:41pm

Indeed the learnig curves are way different. I alway compare Swarm to learning how to drive a car, while Kubernetes is more like learning how fly different type of planes at once. Plain docker would be somewhere around learning how to ride a bike ^^

I see, portworx is realy not suited for your szenario.

Kubernetes is flexible and allows low level stuff that Swarm doesn’t: mTLS without modifying your application? No problem put a sidecar container into the pod. Privileged container to do some phony initialization task, while running the main application with less privilages? Run one or more initContainers before the main container and it’s sidecars.

Before commiting yourself to Swarm, I highly recommend to check this github epic, it lists the features that swarm service have, which are a subset of features of what plain docker containers are able to do.

Swarm realy is easy to use and works marvelous for traditional information systems. But as soon as you need kernel parameter tweaking (like its the case for ElasticSearch), you end up doing things on the host shell again.

tamasgal · October 28, 2021, 11:07am

Yes, many thanks for the feedback! I know most of the benefits of Kubernetes, it’s just the thing that out use-case is in Academics and I don’t know how long I stay there. I could commit a significant amount of my time to do a full fledged K8s but as said, if I leave, we need to find another PhD physicist to maintain this stuff (for free, as usual )

Anyways, it’s not too late to reconsider. Currently I am running two swarm nodes and 4 services, but I need to migrate like 20 others from the older system which I maintained over the past decade.

I’ll check out the link, thanks! I have not used k8s in real production yet.

Topic		Replies	Views
NFS, docker-compose, "failed to mount local volume:…" General swarm , docker-compose , volumes	8	6067	October 14, 2023
Docker Swarm NFS mount Swarm docker , swarm	6	30919	April 3, 2018
Volume driver recommendation for swarm? Swarm	1	1324	July 18, 2020
Subpath with NFS volumes Swarm swarm , docker-compose , volumes , linux	16	755	December 17, 2024
Docker Swarm - Volume CIFS - Secrets General swarm , cifs , docker-compose , volumes	3	1468	October 14, 2023

Changing from NFS volume to local bind (GlusterFS) has no effect

Related topics