Hi everybody,
Need some help with NFS storage.
I am experiencing a strange behavior of nfs storage within swarm services, data is not consistent between the different replicas in different nodes.
In the first part, I deployed NFS volumes with simple container as a control test:
I am using the native NFS plugin
1- Persistent NFS volume with containers
Created volume with docker NFS driver
ajn@manager1 ~]$ docker volume create --driver local --opt type=nfs --opt o=addr=192.168.0.146,rw --opt device=:/var/nfs/general native-nfs
Run a container that mount a local directory to NFS volume
[ajn@manager1 ~]$ docker run --rm -it -v native-nfs:/data alpine
/ # ls /data
fromserver
/ #
It works fine because the file “fromserver” is created on NFS server and it is visible from container1 (OK)
Let’s create a file “fromcontainer1”
/ # touch /data/fromcontainer1
/ # exit
Now let’s create a second container and check container1 written data
[ajn@manager1 ~]$ docker run --rm -it -v native-nfs:/data alpine
/ # ls /data
fromcontainer1 fromserver
Data created rom the first container is visible from the second container (OK)
2- Persistent NFS volume with swarm service
Now, let’s test NFS storage with services.
We keep the same volume created in part1:
[ajn@manager1 ~]$ docker volume ls
DRIVER VOLUME NAME
local native-nfs
Launched the service
[ajn@manager1 ~]$ docker service create --name serviceweb1 -p 80:80 --mount source=native-nfs,target=/data --detach=false --replicas 3 nginx
x84mzibymv9ju4nx4eicq83hh
overall progress: 3 out of 3 tasks
1/3: running
2/3: running
3/3: running
verify: Service converged
[ajn@manager1 ~]$ docker service ps serviceweb1
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
jjnlx1mceoem serviceweb1.1 nginx:latest worker3.ajnouri.com Running Running less than a second ago
lguwb24jrnuw serviceweb1.2 nginx:latest manager1.ajnouri.com Running Running 40 seconds ago
ku833ahrgw9l serviceweb1.3 nginx:latest worker1.ajnouri.com Running Running 39 seconds ago
Let’s access “serviceweb1.1” container from the manager1 node:
[ajn@manager1 ~]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b89267982ae2 nginx:latest “nginx -g 'daemon of_” 2 minutes ago Up 2 minutes 80/tcp serviceweb1.2.lguwb24jrnuw3h6ahtkw7wta9
let’s check peristent stogare from serviceweb1.2
[ajn@manager1 ~]$ docker exec b89267982ae2 ls /data
fromcontainer1
fromcontainer2
fromserver
So far so good, the replica on the manager connects to NFS storage (OK)
Now, let’s create data on the peristent stogare from serviceweb1.2
[ajn@manager1 ~]$ docker exec b89267982ae2 touch /data/serviceweb1.2
Now let’s connect to another replica on another node “worker1”
[root@worker1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a7542ddb01c3 nginx:latest “nginx -g 'daemon of_” 8 minutes ago Up 8 minutes 80/tcp serviceweb1.3.ku833ahrgw9lt1ssvy3ufdlmo
[root@worker1 ~]# docker exec a7542ddb01c3 ls /data
serviceweb1.3
!?!?! Strange! because “serviceweb1.3” is a file I’ve created within a replica of a prior service
From the third replica on node “worker3”
[root@worker3 ~]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6bd96c2ddcca nginx:latest “nginx -g 'daemon of_” 12 minutes ago Up 12 minutes 80/tcp serviceweb1.1.jjnlx1mceoempu2ayapq6bhpl
[ajn@worker3 ~]$ sudo docker exec 6bd96c2ddcca ls /data
service-ctn1-worker1
service-ctn2-worker1
These were files created from different replicas on prior service, removed since. So it didn’t connect to the same storage as the replica on “manager1” (NOK)
Let’s create a file from here:
[ajn@worker3 ~]$ sudo docker exec 6bd96c2ddcca touch /data/serviceweb1.1
[ajn@worker3 ~]$ sudo docker exec 6bd96c2ddcca ls /data
service-ctn1-worker1
service-ctn2-worker1
serviceweb1.1
Now back to “manager1” node and the replica there to inspect the volume
[ajn@manager1 ~]$ docker exec b89267982ae2 ls /data
fromcontainer1
fromcontainer2
fromserver
serviceweb1.2