Storage plugin failed to properly remove multipath link to PV on docker swarm worker

vluc06 · March 20, 2024, 10:43am

when a persistant volume is mounted by a service on a swarm worker node, il we modify the swarm service to mount another persistant volume then this can cause high IO wait on the first swarm worker when the remove the first persistant volume

Environment
We use trident + Ontap Select iscsi to be able to consume persistant volume for our services on Docker swarm clusters

Trident docker plugin should clear multipath link to unsuse persistant volume before deleting volume on Ontap backend
It’s not clear for me if that must be docker swarm who call trindent plugin on each swarm worker to do this or if docker swarm just have to call trident plugin on swarm manager and then trident plugin from this swarm manager have to call all trident plugin on every swarm worker nodes

Trident version: 20.10.16
Trident installation flags used: [e.g. -d -n trident --use-custom-yaml]
Container runtime: Docker version 25.0.3, build 4debf41
Docker Swarm mode
OS: Rocky Linux release 9.3 (Blue Onyx)
NetApp backend types: NetApp Release 9.8P6

procedure to reproduce issue

To Reproduce
Steps to reproduce the behavior:
start.sh to create docker swarm service with a persistant volume

Volumes
export SERVICE_TEST_VOLUME=TestVolume1
export SERVICE_TEST_VOLUME_SIZE='1gb'

vol1=docker volume inspect $SERVICE_TEST_VOLUME | wc -c
if [ $vol1 -gt 3 ]
then
echo "$SERVICE_TEST_VOLUME exists"
else
echo "Creating volume $SERVICE_TEST_VOLUME"
docker volume create --driver=netapp --name=$SERVICE_TEST_VOLUME -o size=$SERVICE_TEST_VOLUME_SIZE -o fileSystemType=ext4 -o spaceReserve=volume
docker run --rm -v $SERVICE_TEST_VOLUME:/data busybox rmdir /data/lost+found
fi
docker stack deploy -c docker-compose.yml --resolve-image=always --prune --with-registry-auth SERVICE_TEST

we deploy this service on our swarm cluster. Swarm manager starts this service on worker node A

[root@nodeA:]# mount |grep testv
/dev/mapper/3600a098056303030313f526b682f4279 on /local/docker-data/plugins/b8fe688a4fd41d4af97f5de3ce33dee1f7f862d89ba982eec79bf5c785b93c9c/propagated-mount/netappdvp_testvolume type ext4 (rw,relatime,stripe=16)
/dev/mapper/3600a098056303030313f526b682f4279 on /local/docker-data/plugins/b8fe688a4fd41d4af97f5de3ce33dee1f7f862d89ba982eec79bf5c785b93c9c/propagated-mount/netappdvp_testvolume type ext4 (rw,relatime,stripe=16)
[root@nodeA:]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| - 4:0:0:227 sdc 8:32 active ready running -+- policy='service-time 0' prio=10 status=enabled
`- 3:0:0:227 sdb 8:16 active ready running

Then we modify the volume name to TestVolume2 and redeploy the service
export SERVICE_TEST_VOLUME=TestVolume2

The service is stopped on node A
NetApp Trident create a new volume TestVolume2
The service is started on another swarm worker node : node B

On node A we can no longer see TestVolume1 with “mount |grep TestVolume1”
But there are still some multipath info on node A

[root@nodeA:]# mount |grep testv
[root@nodeA:]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| - 4:0:0:227 sdc 8:32 active ready running -+- policy='service-time 0' prio=10 status=enabled
`- 3:0:0:227 sdb 8:16 active ready running

then on one of the swarm manager we launch “docker volume rm TestVolume1”

[root@nodeA:~]# multipath -ll
3600a098056303030313f526b682f4279 dm-8 NETAPP,LUN C-Mode
size=954M features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| - 4:0:0:227 sdc 8:32 **failed faulty running** -+- policy='service-time 0' prio=0 status=enabled
`- 3:0:0:227 sdb 8:16 failed faulty running

[root@nodeA:~]# top
top - 18:28:57 up 1 day, 2:02, 2 users, load average: 0.80, 0.30, 0.10
Tasks: 310 total, 1 running, 309 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.3 sy, 0.0 ni, 82.9 id, 16.6 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7656.0 total, 5421.9 free, 1101.3 used, 1402.1 buff/cache
MiB Swap: 6144.0 total, 6144.0 free, 0.0 used. 6554.7 avail Mem

to remove high IO wait we have to use dmsetup command

[root@nodeA:]# dmsetup -f remove 3600a098056303030313f526b682f4279
[root@nodeA:]# multipath -ll
[root@nodeA:~]# top
top - 18:29:50 up 1 day, 2:03, 2 users, load average: 0.97, 0.43, 0.16
Tasks: 306 total, 1 running, 305 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.0 us, 1.9 sy, 0.0 ni, 97.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7656.0 total, 5454.4 free, 1070.0 used, 1400.7 buff/cache
MiB Swap: 6144.0 total, 6144.0 free, 0.0 used. 6586.0 avail Mem

rimelek · March 20, 2024, 3:56pm

Please, format your post according to the following guide: How to format your forum posts
In short: please, use </> button to share codes, terminal outputs, error messages or anything that can contain special characters which would be interpreted by the MarkDown filter. Use the preview feature to make sure your text is formatted as you would expect it and check your post after you have sent it so you can still fix it.

Example code block:

```
echo "I am a code."
echo "An athletic one, and I wanna run."
```