Containers with nfs volumes won't start after reboot

Hi all, I’m struggling with container startup issues after a hardware reboot on containers that have an NFS volume.

Environment

  • Ubuntu box (22.04.2) that I’m using as a Docker host

  • Docker cli (was 24.0.1 but just updated to 24.0.6 and have the same results)

  • Synology NAS w/ several NFS shares set up

I have 16 containers configured (all set to “Restart Always”) and everything works perfect until the host computer is rebooted. 6 of the containers have an NFS volume specified and all 6 of them fail to start. If I hop into Portainer after a reboot, they are all in an “Exited” state. If I simply select them and hit “Start”, all six start up just fine and do everything I expect them to do.

I’ve spent hour scouring the internet and things that apparently work for others have so far been unsuccessful for me.

After I manually start all of the containers and have them working, if I do a systemctl --all list-units | grep .mount I see the mounted NFS drives. None of these mounts are present in this list after a reboot until I manually start the containers.

As per one suggestion I found when researching, I’ve tried testing out one drive by modifying docker.service and adding it to the “After” and “Requires” lines of the Unit section like:

  GNU nano 6.2                                                                                  /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service containerd.service time-set.target var-lib-docker-volumes-nas_entertainment-_data.mount
Wants=network-online.target containerd.service
Requires=docker.socket var-lib-docker-volumes-nas_entertainment-_data.mount

When that change is in place, after a reboot, the container using it still does not start. When listing all of the .mount units does indeed make a reference to it (which is more than I was seeing after a reboot previously), but it is in a not-found / inactive / dead state.

● var-lib-docker-volumes-nas_entertainment-_data.mount                                                                    not-found inactive dead      var-lib-docker-volumes-nas_entertainment-_data.mount

Not sure if this will show up properly, but a screen shot summary of what the above looks like:

As per another suggestion I came across, I tried creating a drop-in file with pretty much the same “After” and “Requires” settings as I had added to the docker.service file. As best I could tell, Docker wasn’t picking up the file at all (didn’t see the “not found / inactive / dead” mount like I did above), so I apparently did something wrong there, but it seems like I would have gotten the same results.

Below is an example of how I created all of the volumes and the containers that use them. Again, this combo works perfect after I simply push the Start button on the “Exited” container in Portainer. All of them continue working just fine until the point that the box is rebooted.

docker volume create --driver local \
	--opt type=nfs \
	--opt o=addr=192.168.26.5,rw \
	--opt device=:/volume1/Entertainment \
	nas_entertainment


docker run -d \
 --name jellyfin \
 --hostname jellyfin \
 --net=vlan32 \
 --ip=192.168.32.11 \
 --restart=always \
 -v jellyfin_config:/config \
 -v jellyfin_cache:/cache \
 -v nas_entertainment:/media \
 jellyfin/jellyfin

This is driving me bonkers!!! I’m not incredibly Linux savvy (or Docker savvy either for that matter… been using for a couple of years, but definitely a hobbyist skillset) so it’s entirely possible that I’m overlooking something obvious. Hopeful that’s the case and it jumps out at one of you Docker ninjas.

Any ideas greatly appreciated!

I use plenty of nfsv4 shares from my NAS without any issues. I didn’t touch the systemd unit.

I declare mine in docker compose like this:

  myservice-config:
    driver_opts:
      type: nfs 
      o: addr=192.168.x.y,nfsvers=4
      device: :/volume1/docker/volumes/myservice/

Basically the difference is that I use nfsvers=4 to make sure nfsv4 is used, and I don’t specify rw, as it’s the default setting anyway.

All containers start without any issue if the hosts is restarted. Of course the container freak out, when the NAS is rebooted - I usually remove all containers when I plan to reboot the NAS and redeploy them after the NAS is available again.

Thank you meyay! That is both encouraging, and frustrating!!! I failed to mention above, but trying the nfsvers=4 option was another thing I had tried. I had not tried without the rw flag. I did both of those together just now and didn’t see any difference. The Synology is for sure set up to allow up to NFS4.

I can tell from the container logs that all of them are continually retrying after bootup. There is just something fundamentally different taking place with I click the “Start” button in Portainer. Once that sunk in with me yesterday, I started thinking there must be some sort of permission issue.

That in mind, I tried several things last night like enabling the option on the NAS share to “map all users to Admin” and ensuring that Admin was activated again on the NAS and had RW access to the share in question. That didn’t work, so I tried adding a -u 1038 option on the container config to make (at least in my head) the container use UID 1038 which is another admin-level account I have set up on the NAS (also has RW access to the share). No change. Tried forcing in an environment option of -e CONTAINER_OWNER=my_user_name to associate the container with the same NAS user. No change.

I’m about to hop on a plane for a work trip. I may just try a bare-metal fresh start of Ubuntu and Docker on another box and see how far I get. I’ve been wanting to switch to Compose, so I might take advantage and implement that as well.

Anyway, thank you for the feedback and reassurance that this should work (was having my doubts at this point thinking that I was having a fundamental misconception!)

My NAS is a Synology as well.

This is the shared folder configuration I use on all my nfsv4 shares:
Permissions: my user → read/write, guest no access
Advanced Permissions: everything deactivated
NFS Permissions:

  • Hostname or IP: 192.168.x.0/24 (your Docker host must be in that range)
  • Privilege: Read/Write
  • Squash: No mapping
  • Security: sys
  • Allow users to access mounted subfolders is checked

Note: volumes configurations are immutable. While the cli doesn’t allow to update volume configurations, a volume definition in a compose file could be changed, but the changes will not be propagated to the volume configuration, unless the volume is removed. Once removed, docker compose will be able to create the volume declaration with the new config. For a volume backed by nfs, the volume configuration merely is the connection detail to the remote share, there is no risk of loosing data.

Sorry for the slow reply… the work trip turned into more work and I haven’t had any me-time to explore this further yet. I really appreciate the share of the Synology settings. Those are in line with what I’m set up as too, unfortunately.

Good tip on the immutable volume config too. I was not aware of that or taking it into consideration when I was adjusting settings on the NAS to try and solve this. Just tried dropping and rebuilding one of the vols just now but it the containers still didn’t start.

On the hostname/IP, I use IPs for the share. 4 of my 6 that are failing I have set up on their own VLAN (each of which are also set up in Docker as macvlans and work fine). The other 2 are running on the host IP. I appear to have both the docker host ip & respective container ip permissioned in the NFS shares on the NAS and the 4’th one (interestingly) only to the Docker host ip.

Not sure when it will be, but I do have another box here that I can start fresh on with the sole intention of getting one @$%^@ container to start up on its own.

Again, I really appreciate the validation that it at least SHOULD work and I haven’t been trying to do something impossible!!

Only the docker host ip (or its subnet) is relevant, the container ips are irrelevant, as the docker engine mounts the nfs share into /var/lib/docker/volume/{volume_name}/_data and binds this mount point into the container.

2 Likes

A year later, did you solve this?

I have a volume that mounts a NFS share from my NAS. It works fine, but when the system reboots, the services that try to use the volume get an error and exit. If I immediately login to the computer and start the container (no other changes) it works fine.

I have tried IP address and FQDN to specifiy the NAS server, neither work during the boot process.

Share your settings, share the error.

SInce this was a year old thread and I felt like I should create a new thread for my problem I created a different post in this forum where I shared more detail.