Mounted volume / wrong df value reported

Hello,

I am currently fighting a weird issue that I can’t seem to resolve : I am trying to mount a volume on / to allow some monitoring, but even though I am using the same version of docker on both, one server reports it correctly, but not the other one

docker-compose.yml (Exactly the same on both servers)

version: '3.4'

networks:
  monitor-net:
    driver: bridge

services:

  nodeexporter-text:
    image: prom/node-exporter:v1.1.2
    #image: quay.io/prometheus/node-exporter:latest
    container_name: nodeexporter-test
    volumes:
            - /proc:/host/proc:ro
            - /sys:/host/sys:ro
            - /:/rootfs:ro
    command:
            - '--path.rootfs=/rootfs'
            - '--path.procfs=/host/proc'
            - '--path.sysfs=/host/sys'
            #- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
            - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
            #- "--collector.filesystem.ignored-mount-points"
            #- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
    pid: host
    restart: unless-stopped
    expose:
      - 9100
    networks:
      - monitor-net
    labels:
      org.label-schema.group: "monitoring"
    healthcheck:
      test: ["CMD", "wget", "--tries=1", "--spider", "http://localhost:9100/metrics"]
      interval: 10s
      timeout: 5s

… started exactly the same way on both servers :

$ sudo docker-compose up -d
Starting nodeexporter-test ... done

Server “Jump” : /rootfs seen empty

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/md3        938G   31G  860G   4% /

$ sudo docker exec -it nodeexporter-test sh -c 'df -h /rootfs'
Filesystem                Size      Used Available Use% Mounted on
/dev/loop1               55.5M     55.5M         0 100% /rootfs
$ sudo docker --version
Docker version 20.10.7, build f0df350

$ sudo docker-compose --version
docker-compose version 1.29.2, build 5becea4c

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic

Server “API” /rootfs seen correctly

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        848G  496G  309G  62% /

$ sudo docker exec -it nodeexporter-test sh -c 'df -h /rootfs'
Filesystem                Size      Used Available Use% Mounted on
/dev/md2                847.1G    495.0G    308.9G  62% /rootfs
$ sudo docker --version
Docker version 20.10.7, build f0df350

$ sudo docker-compose --version
docker-compose version 1.29.0, build 07737305

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic

Any suggestion would be greatly appreciated :slight_smile:

Thanks in advance for your help

PS : I wasn’t sure where to post that in this forum, feel free to indicate a more appropriate section if needed !

1 Like

I guess not, but just in case:

While the Docker versions are the same, the Docker Compose versions are slightly different. Could this somehow affect how the volumes are created? Maybe docker volume inspect can help?

Hello, I tried on another server (same exact docker-compose.yml than copied in the first message), whith exactly the same version of docker-compose, and as you see this time again df -h properly works again.

Dev server (ok : partition size is seen correctly)

co@dev-api:/tmp/mvc$ sudo docker exec -it nodeexporter-test sh -c 'df -h /rootfs'
Filesystem                Size      Used Available Use% Mounted on
/dev/md2                847.1G    692.3G    111.7G  86% /rootfs

co@dev-api:/tmp/mvc$ sudo docker --version
Docker version 20.10.7, build f0df350

corentin@dev-api:/tmp/mvc$ sudo docker-compose --version
docker-compose version 1.29.2, build 5becea4c

co@dev-api:/tmp/mvc$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic

Monitoring server (KO :red_circle: partition size not seen corrrectly)

co@jump:/tmp/mvc$ sudo docker-compose up -d
Starting nodeexporter-test ... done

co@jump:/tmp/mvc$ sudo docker exec -it nodeexporter-test sh -c 'df -h /rootfs'
Filesystem                Size      Used Available Use% Mounted on
/dev/loop1               55.5M     55.5M         0 100% /rootfs

co@jump:/tmp/mvc$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/md3        938G   34G  856G   4% /

co@jump:/tmp/mvc$ sudo docker --version
Docker version 20.10.7, build f0df350

co@jump:/tmp/mvc$ sudo docker-compose --version
docker-compose version 1.29.2, build 5becea4c

co@jump:/tmp/mvc$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic

I don’t understand where that 55.5M size for the docker volume comes from…

I guess you noticed the differences for the docker exec command, for okay mounts:

…versus the troublesome ones:

I don’t know what that means but it may help searching?

Also, did docker volume inspect (docs) reveal anything? And do the troublesome mounts show files like expected?

That’s a very interesting point I hadn’t noticed.
I spent a few hours trying to figure out this issue on why / was seen as a loop device but I didn’t figure it out unfortunately :frowning:

But does it show the expected contents? Is it only that df reports the wrong value, or is it also not listing the expected files and directories?

Can you share the output of this command from each node:

docker info --format '{{.Driver}}{{json .DriverStatus}}'

Hello @meyay and thanks for your suggestion.

I can’t see any obvious difference in the DriverStatus unfortunately as I could see thanks to your command :

Working servers :

co@api:/var/log/nginx$ sudo docker info --format '{{.Driver}}{{json .DriverStatus}}'
overlay2[["Backing Filesystem","extfs"],["Supports d_type","true"],["Native Overlay Diff","true"],["userxattr","false"]]
co@qa-api:~$ sudo docker info --format '{{.Driver}}{{json .DriverStatus}}'
overlay2[["Backing Filesystem","extfs"],["Supports d_type","true"],["Native Overlay Diff","true"]]
co@dev-api:/opt/umami-backend/amino$ sudo docker info --format '{{.Driver}}{{json .DriverStatus}}'
overlay2[["Backing Filesystem","extfs"],["Supports d_type","true"],["Native Overlay Diff","true"],["userxattr","false"]]

Failing server :

co@jump:~/umami-backend/dockprom$ sudo docker info --format '{{.Driver}}{{json .DriverStatus}}'
overlay2[["Backing Filesystem","extfs"],["Supports d_type","true"],["Native Overlay Diff","true"]]

Hmm, I was hopping to find inconsistent storage driver configuration amongst the nodes. But appearently all your systems use the overlay2 driver backed by ext4 . I don’t think that userxattr is resonsible for the behavior.

On a second though, the strorage driver idea was missleading, as the problem is with a container target folder of a bind-mount “volume”. The storage driver is not involved for such a folder.

It realy is puzzling what makes the bind-mount target folder inside the container shows /dev/loop1 as filesystem.

Maybe a full comparision of the ouput of docker info revels meaningful hints, but at this point I doubt that it will. The issue makes no sense :slight_smile:

Just for the sake of checking, did you try to mount --bind the root folder into another folder on the host itself?

mkdir /rootfs 
mount --bind / /rootfs
df -h /rootfs
umount /rootfs
1 Like

Hello @meyay , this is the result it provided :

co@jump:~/umami-backend/dockprom$ sudo mkdir /rootfs
co@jump:~/umami-backend/dockprom$ sudo mount --bind / /rootfs
co@jump:~/umami-backend/dockprom$ sudo df -g /rootfs
df: invalid option -- 'g'
Try 'df --help' for more information.
co@jump:~/umami-backend/dockprom$ sudo df -h /rootfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/md3        938G   38G  852G   5% /rootfs
co@jump:~/umami-backend/dockprom$ umount /rootfs
umount: /rootfs: umount failed: Operation not permitted.
co@jump:~/umami-backend/dockprom$ sudo umount /rootfs

And the docker info output is the following

failing server

@jump:~/umami-backend/dockprom$ sudo docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 11
  Running: 11
  Paused: 0
  Stopped: 0
 Images: 10
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version:
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-151-generic
 Operating System: Ubuntu Core 18
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 62.69GiB
 Name: jump..com
 ID: PT4I:LUGC:INOC:CGZQ:GFFD:NTEO:TE6Q:BDIB:HGII:5E5Y:6BKZ:S3L7
 Docker Root Dir: /var/snap/docker/common/var-lib-docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

ok server

co@dev-api:/opt/umami-backend/amino$ sudo docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 29
  Running: 18
  Paused: 0
  Stopped: 11
 Images: 20
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7eba5930496d9bbe375fdf71603e610ad737d2b2
 runc version: v1.0.0-0-g84113ee
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-151-generic
 Operating System: Ubuntu 18.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 62.77GiB
 Name: dev-api..com
 ID: GIKX:6VCM:X4Y3:DIWR:6Q3N:4GV5:VDEZ:I6E7:7RXU:RPPY:YNRC:HXLQ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

The only difference I see is with the docker Server Version (while the client is up to date on the same version) but I didn’t find how to upgrade it on the jump (aka. failing) server.

The result of the mount --bind check allows to rule out a general problem with it.

Though what realy hits the eye, is that the instance on the broken node is installed using a snap package. The snap package is a re-destribution that may or may not be modified and might be the reason why things behave differently.

I doubt that the different docker version itself, or the os beeing ubuntu core instead of the regular server should be responsible for the situation. Though, the snap package actualy might be the cause.

2 Likes

Thanks a lot @meyay for your suggestion regarding the installation mode.

I hence proceeded to install docker directly from apt packages with the following commands, and as you can see at the end of this post, it now works correctly.

I don’t know what is actually different between those 2 installs mode, but this solved my issue for now.

sudo apt-get update
sudo apt-get upgrade
sudo apt install docker.io containerd
sudo systemctl start docker
sudo systemctl enable docker

so that now I have the following

$ sudo docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 10
  Running: 9
  Paused: 0
  Stopped: 1
 Images: 9
 Server Version: 20.10.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version:
 runc version:
 init version:
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-151-generic
 Operating System: Ubuntu 18.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 62.69GiB
 Name: jump.my-backend-server.com
 ID: SZGV:VFSE:27QN:PJ3C:ZNFW:GXZU:VMSH:GQGP:V3SO:EINS:N72T:LDBX
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

and it works correctly why I report the df result :slight_smile:

 sudo docker exec -it nodeexporter sh -c 'df -h /rootfs / /host /host/sys /host/proc'
Filesystem                Size      Used Available Use% Mounted on
/dev/md3                937.3G     39.1G    850.6G   4% /rootfs
overlay                 937.3G     39.1G    850.6G   4% /
overlay                 937.3G     39.1G    850.6G   4% /
sysfs                        0         0         0   0% /rootfs/sys
proc                         0         0         0   0% /proc

Nice. Still curious: did the troublesome server show the expected contents for that mount? So: was it only that df reported the wrong value, or was it also not listing the expected files and directories?

Hello @avbentem ,
I’m not sure where I should have reported the mount issue to the snap package maintainer/s, but indeed, the / wasn’t mounted properly when I did the test before reinstalling docker from another source.