NFS volume: same service but inconsistent storage between replicas in different nodes

Hi everybody,
Need some help with NFS storage.
I am experiencing a strange behavior of nfs storage within swarm services, data is not consistent between the different replicas in different nodes.
In the first part, I deployed NFS volumes with simple container as a control test:
I am using the native NFS plugin

1- Persistent NFS volume with containers

Created volume with docker NFS driver
ajn@manager1 ~]$ docker volume create --driver local --opt type=nfs --opt o=addr=192.168.0.146,rw --opt device=:/var/nfs/general native-nfs

Run a container that mount a local directory to NFS volume
[ajn@manager1 ~]$ docker run --rm -it -v native-nfs:/data alpine

/ # ls /data
fromserver
/ #

It works fine because the file “fromserver” is created on NFS server and it is visible from container1 (OK)
Let’s create a file “fromcontainer1”

/ # touch /data/fromcontainer1
/ # exit

Now let’s create a second container and check container1 written data
[ajn@manager1 ~]$ docker run --rm -it -v native-nfs:/data alpine

/ # ls /data
fromcontainer1 fromserver

Data created rom the first container is visible from the second container (OK)


2- Persistent NFS volume with swarm service

Now, let’s test NFS storage with services.
We keep the same volume created in part1:

[ajn@manager1 ~]$ docker volume ls

DRIVER VOLUME NAME
local native-nfs

Launched the service
[ajn@manager1 ~]$ docker service create --name serviceweb1 -p 80:80 --mount source=native-nfs,target=/data --detach=false --replicas 3 nginx

x84mzibymv9ju4nx4eicq83hh
overall progress: 3 out of 3 tasks
1/3: running
2/3: running
3/3: running
verify: Service converged

[ajn@manager1 ~]$ docker service ps serviceweb1

ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
jjnlx1mceoem serviceweb1.1 nginx:latest worker3.ajnouri.com Running Running less than a second ago
lguwb24jrnuw serviceweb1.2 nginx:latest manager1.ajnouri.com Running Running 40 seconds ago
ku833ahrgw9l serviceweb1.3 nginx:latest worker1.ajnouri.com Running Running 39 seconds ago

Let’s access “serviceweb1.1” container from the manager1 node:
[ajn@manager1 ~]$ docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b89267982ae2 nginx:latest “nginx -g 'daemon of_” 2 minutes ago Up 2 minutes 80/tcp serviceweb1.2.lguwb24jrnuw3h6ahtkw7wta9

let’s check peristent stogare from serviceweb1.2

[ajn@manager1 ~]$ docker exec b89267982ae2 ls /data

fromcontainer1
fromcontainer2
fromserver

So far so good, the replica on the manager connects to NFS storage (OK)

Now, let’s create data on the peristent stogare from serviceweb1.2
[ajn@manager1 ~]$ docker exec b89267982ae2 touch /data/serviceweb1.2

Now let’s connect to another replica on another node “worker1”

[root@worker1 ~]# docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a7542ddb01c3 nginx:latest “nginx -g 'daemon of_” 8 minutes ago Up 8 minutes 80/tcp serviceweb1.3.ku833ahrgw9lt1ssvy3ufdlmo

[root@worker1 ~]# docker exec a7542ddb01c3 ls /data

serviceweb1.3

!?!?! Strange! because “serviceweb1.3” is a file I’ve created within a replica of a prior service

From the third replica on node “worker3”
[root@worker3 ~]$ docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6bd96c2ddcca nginx:latest “nginx -g 'daemon of_” 12 minutes ago Up 12 minutes 80/tcp serviceweb1.1.jjnlx1mceoempu2ayapq6bhpl

[ajn@worker3 ~]$ sudo docker exec 6bd96c2ddcca ls /data

service-ctn1-worker1
service-ctn2-worker1

These were files created from different replicas on prior service, removed since. So it didn’t connect to the same storage as the replica on “manager1” (NOK)

Let’s create a file from here:
[ajn@worker3 ~]$ sudo docker exec 6bd96c2ddcca touch /data/serviceweb1.1
[ajn@worker3 ~]$ sudo docker exec 6bd96c2ddcca ls /data

service-ctn1-worker1
service-ctn2-worker1
serviceweb1.1

Now back to “manager1” node and the replica there to inspect the volume

[ajn@manager1 ~]$ docker exec b89267982ae2 ls /data

fromcontainer1
fromcontainer2
fromserver
serviceweb1.2

Only the container replica on “manage1” seems to connect successfully to NFS share.*
How comes that some replicas (on manager) correctly connects to NFS volume, but not other replicas on other nodes, they connects each to a different volume with different data???

You may have the chicken-egg problem with using mounts. If docker starts before the NFS mount is mounted you may get issues.

This is mitigated using docker managed volume plugins which are guaranteed to run BEFORE the containers start on the node. I created an NFS volume plugin which encapsulates the NFS client in its own container (managed plugins are containers as well)

https://hub.docker.com/r/trajano/nfs-volume-plugin/