Issue with service using NFS volume not starting during boot

kscrib · September 13, 2024, 5:16pm

OS: Ubuntu 24.04
Docker 27.2.1

I am using docker compose.

My configuration works fine until the system reboots. Following a reboot, I have to issue:

$ docker start

when I do that, everything works fine. But I have to issue that manually.

$ docker inspect NAS_PlexData
[
    {
        "CreatedAt": "2024-09-13T10:53:21-05:00",
        "Driver": "local",
        "Labels": {
            "com.docker.compose.project": "dockerconfig",
            "com.docker.compose.version": "2.29.2",
            "com.docker.compose.volume": "NAS_PlexData"
        },
        "Mountpoint": "/var/lib/docker/volumes/NAS_PlexData/_data",
        "Name": "NAS_PlexData",
        "Options": {
            "device": ":/PlexData",
            "o": "addr=192.168.x.y,nfsvers=4.1,rw",
            "type": "nfs"
        },
        "Scope": "local"
    }
]

The service immediately after a reboot:

$ docker inspect sonarr
[
    {
        "Id": "c797d0e1a9fbf766513dd9bd3a166786c7bf5d81bcf2e1fbc56eea28f0c25a34",
        "Created": "2024-09-13T15:53:21.4682294Z",
        "Path": "/init",
        "Args": [],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 128,
            "Error": "error while mounting volume '/var/lib/docker/volumes/NAS_PlexData/_data': failed to mount local volume: mount :/PlexData:/var/lib/docker/volumes/NAS_PlexData/_data, data: addr=192.168.2.15,nfsvers=4.1: no route to host",
            "StartedAt": "2024-09-13T16:43:23.345249681Z",
            "FinishedAt": "2024-09-13T16:46:52.586093821Z"
        },

It appears to me that the issue is a timing during the boot process. So I tried modifying the boot sequence by modifying /lib/systemd/system/docker.service to add an after clause:

After=network-online.target docker.socket firewalld.service containerd.service time-set.target multi-user.target
#After=network-online.target docker.socket firewalld.service containerd.service time-set.target nfs-client.target systemd-resolved.service NetworkManager-wait-online.service nss-lookup.target

I listed the current one and a commented out prior attempt. Neither were successful. I tried adding multi-user.target as it has the last start time of any target on my system. Since my docker containers only need to be available after the system boots (nothing else uses them) I felt being the last service to start would be OK.

I have tired using both FQDN and IP address, neither made a difference.

$ cat compose.yml
services:
  sonarr:
    image: lscr.io/linuxserver/sonarr:latest
    container_name: sonarr
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=US/Central
    volumes:
      - /mnt/MediaRaid/DockerConfig/config_sonarr:/config
      - NAS_PlexData:/NAS_MediaContent
    ports:
       - 8989:8989
    restart: unless-stopped
#    restart: on-failure:3

  radarr:
    image: lscr.io/linuxserver/radarr:latest
    container_name: radarr
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=US/Central
    volumes:
      - /mnt/MediaRaid/DockerConfig/config_radarr:/config
      - NAS_PlexData:/NAS_MediaContent
    ports:
      - 7878:7878
#    restart: on-failure:3
    restart: unless-stopped

volumes:
  NAS_PlexData:
    driver: local
    name: NAS_PlexData
    driver_opts:
      type: nfs
      o: addr=192.168.x.y,nfsvers=4.1,rw
#      o: addr=NAS.mylocal,nfsvers=4.1,rw
      device: :/PlexData

kscrib · September 13, 2024, 10:36pm

Update in case anyone else sees this post and has a similar problem:

adding all sorts of waits (after: or requires:) in the file /etc/systemd/system/docker.service did not have any impact to the problem.
creating a dumby service and having the services accessing the locally defined NFS services wait on that dumby service using “Depends_on:” did not work either. The containers accessing the NFS exited before they tried to start because they could not access the volumes.
I eventually removed the docker composed defined volumes that pointed to the NFS share to use the /mnt/XXXX of Ubuntu. This worked well, but removed the ability to look at the path from the compose.yml file or with the docker inspect command. Neither of those were show stoppers for me. I have spent days on this, and decided this was the easiest approach.

meyay · September 14, 2024, 9:24am

kscrib:

            "Error": "error while mounting volume '/var/lib/docker/volumes/NAS_PlexData/_data': failed to mount local volume: mount :/PlexData:/var/lib/docker/volumes/NAS_PlexData/_data, data: addr=192.168.2.15,nfsvers=4.1: no route to host",

There must be a reason for the no route to host error. Something seem to be wrong in your network configuration or your volume definition.

Just to be sure: you deleted the volume manually every time you made changes underneath the volume top level section in your compose file, right? The configuration of a volume is immutable, so no change in your compose file will be reflected to the volume configuration, until it is deleted and compose can re-create it.

kscrib · September 15, 2024, 3:43pm

Yes, I tried by deleting the volume when ever I would make a change to the volume portion of the compose file. The volume would work fine, just not during the boot process. It would work right after the boot was finished.

I added a depends_on: clause for a different service. That gave me a couple of minutes after boot to quickly logon and inspect the container. The scenario that I could repeat at will was having all three services up and running normally using the NFS mount created in the docker compose file. I could stop and start those services at will with no issues. So with all three services up and running I would reboot the PC. Immediately following that reboot, I could quickly login to the PC and “inspect” a service that would access the NFS mount share. Although the service had not tried to start (it was still waiting for the service in the depends_on clause to become healthy), the container would already have an error about inability to access the NFS share. The error was not consistent, sometimes it would be about no route, some times it would be about the network being unstable. If I used FQDN, the error was frequently about unable to perform the DNS lookup for that FQDN. It appears to me that the network inside of docker is still going through the starting process when the container is built. I don’t know enough about how docker starts its internal network service, but I am guessing that the containers are built and the NFS mount is tested prior to the docker network fully starting.

Post reboot, I can immediately issue “docker start xxxx” and the service starts fine, accesses the NFS share and works properly with no other changes.

To make sure the issue was not a timing issue with the PC network not starting, I added a “after: multi-user.target” to /lib/systemd/system/docker.servce to delay when docker started. That did not make a difference. I picked multi-user.target because it shows the last start time of any target on the PC. I confirmed that docker.service did start after multi-user.target. So I know that setting was working. But it did not fix the issue.

As I type (and continue to edit) this reply, I starting thinking about docker.socket. I am not sure what that o/s service does, but I have not tired modifying it’s start time in the boot process. Is that impactful?

My PC network is a LACP bound pair (2 nics bound as one using LACP). That has worked fine for quite some time. Just adding that as it may be a less used function on home PCs.

I am NEW to docker and only have about a year of Ubuntu home use knowledge. But, I have a pretty strong understanding of network concepts from other platforms in the business environment.

meyay · September 15, 2024, 5:43pm

The Docker service should start after the network-online.target and docker.socket services are started. At least this is how it’s configured by default on systems that use systemd. The docker.sock is a unix-domain socket that binds the docker daemon api endpoint. It is used by client applications (like the docker cli) to control the docker engine.

I understand your NFS server is located on a different host, and should be reachable and serve the nfs remote shares as long as the network connection is available. Though, when I read about depends_on, I get the feeling your nfs server is running as a container as well - please say that it’s not what you meant!

A docker volume backed by remote share, will be mounted if the first container that uses it is starting. And it will be unmounted when the last container using it is stopped. This is done on the host, not inside the container.

A container does not boot, a container is merely a process in an isolated environment where everything is already in place when the process is started in the container.

kscrib · September 15, 2024, 9:46pm

I am using standard Ubuntu NFS client (not a container) connected to a NFS server on a NAS that is on the same subnet as the PC.

I am away from the PC right now. I will update with config information later

kscrib · September 17, 2024, 4:14am

Ubuntu 24.04

$ docker --version
Docker version 27.2.1, build 9e34c9b
$ docker compose version
Docker Compose version v2.29.2

I am using what I believe to be a typical 802.3ad LACP network connection. The PC only has two NICs.

$ cat /etc/netplan/01-netcfg.yaml

network:
  version: 2
  renderer: NetworkManager
  ethernets:
    enp4s0:
      link-local: [ipv4]
      dhcp4: no
      dhcp6: no
    enp5s0:
      link-local: [ipv4]
      dhcp4: no
      dhcp6: no
  bonds:
    bond0:
      addresses: [192.168.x.y/24]
      interfaces: [enp4s0,enp5s0]
      link-local: [ipv4]
      routes:
        - to: default
          via: 192.168.x.1
          metric: 100
      nameservers:
        addresses: [192.168.1.1]
        search: [mylocal]
      parameters:
        mode: 802.3ad
        transmit-hash-policy: layer3+4
        lacp-rate: fast
        mii-monitor-interval: 100

This is how I mount the NAS share at the O/S level. When I access the share via this mount, docker accesses the share normally, even on reboot.
$ cat /etc/fstab
…
NAS.mylocal:/PlexData /mnt/NAS-Plex nfs defaults,suid,bg 0 0
…

$ cat docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service containerd.service time-set.target nfs-client.target

$ cat compose.yml
networks:
  StarrNetwork:
    name: StarrNetwork

volumes:
  NAS_PlexData:
    driver: local
    name: NAS_PlexData
    driver_opts:
      type: nfs
#      o: addr=192.168.x.y,nfsvers=4.1,rw
      o: addr=KC-NAS.mylocal,nfsvers=4.1,rw
      device: :/PlexData

services:
  watchtower:
    image: containrrr/watchtower:latest
    container_name: watchtower
    environment:
      - TZ=US/Central
      - WATCHTOWER_CLEANUP=true
      - WATCHTOWER_INCLUDE_STOPPED=true
      - WATCHTOWER_REVIVE_STOPPED=false
      - WATCHTOWER_SCHEDULE=0 0 2 * * *
    command:
     - radarr
      - sonarr
     - watchtower
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: unless-stopped
    networks:
      - StarrNetwork

  sonarr:
#  TV Series app
    image: lscr.io/linuxserver/sonarr:latest
    container_name: sonarr
    depends_on:
      watchtower:
        condition: service_healthy
        restart: true
        required: true
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=US/Central
    volumes:
      - /mnt/MediaRaid/DockerConfig/config_sonarr:/config
      - /mnt/NAS-Plex:/NAS_MediaContent
#      - NAS_PlexData:/NAS_MediaContent
    ports:
       - 8989:8989
    restart: unless-stopped
#    restart: on-failure:10
    networks:
      - StarrNetwork

  radarr:
    image: lscr.io/linuxserver/radarr:latest
    container_name: radarr
    depends_on:
      watchtower:
        condition: service_healthy
        restart: true
        required: true
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=US/Central
    volumes:
      - /mnt/MediaRaid/DockerConfig/config_radarr:/config
#     - /mnt/NAS-Plex:/NAS_MediaContent
      - NAS_PlexData:/NAS_MediaContent
    ports:
      - 7878:7878
    restart: on-failure:10
#    restart: unless-stopped
    networks:
      - StarrNetwork

As an experiment I set radarr to use the NFS volume defined in docker compose and sonarr to use the same NFS share, but defined at the o/s level. (I noted after the fact that using the same mount name inside of docker caused a different issue, I could have resolved that by using different mount names, but did not want to have to rebuild the app database.)

Following a reboot sonarr started fine, but radarr did not start.

$ docker inspect radarr
[
    {
        "Id": "4c645c18daf6a69758f42632563bc4cb64ee578f81728d25d83146c0a873cfb4",
        "Created": "2024-09-17T03:54:57.260129557Z",
        "Path": "/init",
        "Args": [],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 128,
            "Error": "error while mounting volume '/var/lib/docker/volumes/NAS_PlexData/_data': failed to mount local volume: mount :/PlexData:/var/lib/docker/volumes/NAS_PlexData/_data, data: addr=192.168.x.y,nfsvers=4.1: no route to host",
            "StartedAt": "2024-09-17T03:55:28.394142834Z",
            "FinishedAt": "2024-09-17T03:56:34.012273432Z"
...
            "Mounts": [
                {
                    "Type": "volume",
                    "Source": "NAS_PlexData",
                    "Target": "/NAS_MediaContent",
                    "VolumeOptions": {}
                }
...
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/mnt/MediaRaid/DockerConfig/config_radarr",
                "Destination": "/config",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "volume",
                "Name": "NAS_PlexData",
                "Source": "/var/lib/docker/volumes/NAS_PlexData/_data",
                "Destination": "/NAS_MediaContent",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": ""
            }

The docker volume:

$ docker inspect NAS_PlexData
[
    {
        "CreatedAt": "2024-09-16T22:54:57-05:00",
        "Driver": "local",
        "Labels": {
            "com.docker.compose.project": "dockerconfig",
            "com.docker.compose.version": "2.29.2",
            "com.docker.compose.volume": "KC-NAS_PlexData"
        },
        "Mountpoint": "/var/lib/docker/volumes/NAS_PlexData/_data",
        "Name": "NAS_PlexData",
        "Options": {
            "device": ":/PlexData",
            "o": "addr=192.168.x.y,nfsvers=4.1,rw",
            "type": "nfs"
        },
        "Scope": "local"
    }
]

So on reboot this time the error was no route to host, but that error varies slightly from time-to-time.

The next three commands issued right after reboot show that radarr is not running, but sonarr. The only command of impact issued was start radarr and the service remains running after that.

$ docker ps
CONTAINER ID   IMAGE                               COMMAND                  CREATED          STATUS                   PORTS                                       NAMES
ce11499acc66   lscr.io/linuxserver/sonarr:latest   "/init"                  10 minutes ago   Up 7 minutes             0.0.0.0:8989->8989/tcp, :::8989->8989/tcp   sonarr
e14ff0847ff0   containrrr/watchtower:latest        "/watchtower radarr …"   3 days ago       Up 7 minutes (healthy)   8080/tcp                                    watchtower

$ docker start radarr
radarr

$ docker ps
CONTAINER ID   IMAGE                               COMMAND                  CREATED          STATUS                   PORTS                                       NAMES
4c645c18daf6   lscr.io/linuxserver/radarr:latest   "/init"                  10 minutes ago   Up 3 seconds             0.0.0.0:7878->7878/tcp, :::7878->7878/tcp   radarr
ce11499acc66   lscr.io/linuxserver/sonarr:latest   "/init"                  10 minutes ago   Up 8 minutes             0.0.0.0:8989->8989/tcp, :::8989->8989/tcp   sonarr
e14ff0847ff0   containrrr/watchtower:latest        "/watchtower radarr …"   3 days ago       Up 8 minutes (healthy)   8080/tcp                                    watchtower

gmcouto · November 25, 2024, 11:55pm

I have the same exact issue.

I’m using TrueNAS ElectricEel-24.10.0

I have some services that I migrated from a jail to native docker installation on the OS, and this is the issue I’m facing.

I can start my containers manually and it all works ok. But upon reboot, all my containers that use a NFS mount point will give me “no route to host” upon boot.

I can start them manually, but it has been a grief as it is supposed to be a server that will get back up automatically upon reboot.

I did try adding sleep 300 on the initialization of the service, but it has not fixed it.

Topic		Replies	Views
Containers with nfs volumes won't start after reboot General docker	8	2089	September 15, 2024
Docker compose up stuck on "starting" when using NFS volume General docker-compose , volumes	2	2306	August 23, 2023
How disabled mount nfs volume to start container General	4	2200	June 19, 2020
Start docker service after NFS shares have been mounted General	6	13792	October 6, 2024
Container not restarting after volume mount fails General docker , cifs , docker-compose	1	2053	November 7, 2023

Issue with service using NFS volume not starting during boot

Related topics