Docker Host with High load on NFS failover

Dear cmmunity,

I’m running a swarm cluster 9 nodes running on debian 12 hosts.

Some of the service I deployed use a NFS share for persistence.

I use the mount of NFS directly through docker file to avoid any mismatch between OS and container state.

So in my stack file I got volume definition like :


volumes:
  grafana-data:
    driver: local
    driver_opts:
      type: nfs
      o: addr=nfs.domain.com,rw,nolock,soft,vers=4.0
      device: ":/nfs/docker/grafana-dev" 

This is working great until we did some failover test on network equipment between the docker hosts and NFS server. This failover test break the NFS containers and was not able to recover until we rebooted all failing nodes.

Symptom was High load (something like 300) and high latency. Looking at /var/log/messages I could see :

Aug 18 21:51:11 docker-worker01 kernel: [422211.885112] nfs: server  not responding, timed out
Aug 18 21:51:11 docker-worker01 kernel: [422212.461092] nfs: server  not responding, timed out
Aug 18 21:51:11 docker-worker01 kernel: [422212.521107] nfs: server  not responding, timed out

Anyone encounter this kind of Issue ?

Thank you for your help.

What’s the failover? Different server? Different IP?