Problems using overlay2 on docker on a VPS

Hi

I’m trying to run docker on a KVM VPS and I’m having some problems with the overlay2 storage. I’m using Gentoo linux.

The setting on the VPS is the same as my testing machine at home where everything works. Kernel options are set the same, daemon.json, etc are configured the same. There are some differences of course, more free GB on my system than the VPS but I assume that 32 GB should be enough. RAM is only 2GB, would that be a problem reflected as overlay2 problem?

I have checked the Gentoo documentation about docker and QEMU a couple of times and I believe I’m not missing anything; this is not the first time that I use either technology but is the first time that I use both together. ( VPS + docker )

I have checked documentation in the docker website, posts on stackoverflow and a couple of other places but none seem to address the problem I’m having except for recommending using overlay2 and pygrub if the system is running on Xen, which is not the case as far as I can tell.

This is the error reported by docker
cat /var/log/docker-err.log

time="2023-09-04T01:24:14.959884415Z" level=info msg="Starting up"
time="2023-09-04T01:24:14.991187440Z" level=info msg="[graphdriver] trying configured driver: overlay2"
time="2023-09-04T01:24:14.992365941Z" level=error msg="failed to mount overlay: no such device" storage-driver=overlay2
failed to start daemon: error initializing graphdriver: driver not supported: overlay2

this is the status when i check services running
rc-status

Runlevel: default
 net.enp0s3                                                                              [  started  ]
 netmount                                                                                [  started  ]
 metalog                                                                                 [  started  ]
 sshd                                                                                    [  started  ]
 chronyd                                                                                 [  started  ]
 cronie                                                                                  [  started  ]
 docker                                                                                  [  crashed  ]
 local                                                                                   [  started  ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed/wanted
 containerd                                                                              [  started  ]

docker -v

Docker version 24.0.5, build ced0996600

uname -a
Linux patito 6.1.31-gentoo #3 SMP PREEMPT_DYNAMIC Sun Jun 25 09:04:44 GMT 2023 x86_64 QEMU Virtual CPU version 2.5+ AuthenticAMD GNU/Linux

kernel
cat /usr/src/linux/.config | grep -i overlay

CONFIG_EFI_CUSTOM_SSDT_OVERLAYS=y
CONFIG_OVERLAY_FS=y
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

docker - daemon.json
/etc/docker/daemon.json

{
  "storage-driver": "overlay2"
}

docker - /etc/conf.d/docker
cat /etc/conf.d/docker

# /etc/conf.d/docker: config file for /etc/init.d/docker

# where the docker daemon output gets piped
# this contains both stdout and stderr. If  you need to separate them,
# see the settings below
DOCKER_LOGFILE="/var/log/docker.log"

# where the docker daemon stdout gets piped
# if this is not set, DOCKER_LOGFILE is used
#DOCKER_OUTFILE="/var/log/docker-out.log"

# where the docker daemon stderr gets piped
# if this is not set, DOCKER_LOGFILE is used
DOCKER_ERRFILE="/var/log/docker-err.log"

# where docker's pid get stored
#DOCKER_PIDFILE="/run/docker.pid"

# Settings for process limits (ulimit)
#DOCKER_ULIMIT="-c unlimited -n 1048576 -u unlimited"

# seconds to wait for sending SIGTERM and SIGKILL signals when stopping docker
#DOCKER_RETRY="TERM/60/KILL/10"

# where the docker daemon itself is run from
#DOCKERD_BINARY="/usr/bin/dockerd"

# any other random options you want to pass to docker
# DOCKER_OPTS="--log-level info --selinux-enabled --data-root /mnt/docker"
DOCKER_OPTS="--log-level info --data-root /mnt/docker"

If this post is badly tagged, please let me know and I’ll try to fix it.

Any help will be very appreciated.

If any more information is required to solve the problem, please let me know and I’ll reply as soon as possible.

thanks


Moderator note: The question was also asked on the Gentoo forum: Gentoo Forums :: View topic - Problems using overlay2 on docker on a VPS

Please share the output of docker info

Hi


Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false

Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info

It seems the CLI tool can’t talk to the daemon, so you have a bigger problem than just overlay2 :man_shrugging:

Totally agree with @bluepuma77 :slight_smile:

Is it safe to assume it is because of the missbehaving storage driver?

I wanted to see the output of docker info to see the backing filesystem.
As an alternative the output of this command will do:

 mount | grep $(findmnt -n -o SOURCE --target /var/lib/docker)

Note: not every storage driver supports every backing filesystem:

I guess the Gentoo documentation mentions device-mapper and btrfs for a reason instead of overlay (although it mentiones overlay2 later), but you probably know much more about Gentoo than me and you wrote that you had another machine with the same parameters so it should work too. I still share the documentation link in case someone would check that

https://wiki.gentoo.org/wiki/Docker#Storage_driver

Is the disk mounted properly at /mnt/docker?

I think @meyay missed this setting so please use his command with your data root.

Isn’t it just the source? What about the following commands?

grep overlay /proc/modules
grep overlay /proc/filesystems

Running

mount | grep $(findmnt -n -o SOURCE --target /var/lib/docker)

I get
/dev/vda3 on / type ext4 (rw,noatime)

But, since I have docker in a different location, I also did

mount | grep $(findmnt -n -o SOURCE --target /mnt/docker)

and got
/dev/vda3 on / type ext4 (rw,noatime)

The VM only has one partition for the whole system except /boot which is a small separate partition.

/mnt is only a folder on the same file system.

On my local machine /mnt is a different disk but I also use ext4 in all of them. On this two systems I only use ext4

About

grep overlay /proc/modules

Doesn’t print anything since I don’t have overlay as a module, it is part of the kernel. Same as my local system.

Although, there is a difference with

grep overlay /proc/filesystems

On the VM I get nothing but on my local system i get. Even when the docker service is stoped
nodev overlay

About the
cat /usr/src/linux/.config | grep -i overlay

I posted it to show the kernel compilation options since in Gentoo is quite common to compile your own kernel and I have done so on both systems.

I’ll read the message linked, for now I’m going to try to find why the difference in loaded filesystems

Thanks for all the time and help, let’s see if we can get to the bottom of this.

I just wanted to make sure you use a backing filesystem supported by overlay2.

I don’t use docker on systems that are not officially supported: https://docs.docker.com/engine/install/#server

Have you tried asking in a Gentoo forum? It is more likely to find Gentoo users that use Docker in a Gentoo forum, than Docker users in the Docker rum that run Docker on Gentoo.

Hi

I did ask.

No answers so far.

I asked here hoping for a deeper understanding of docker itself which may get me closer to a solution, for instance a requirement in the kernel that I may be missing or a knowledge from somebody saying that it can’t be done becuase of … reasons, or … something.

Something that may be important is confirmation that without overlay the daemon won’t work (if I tell docker to use overlay instead of btrfs)

On the Gentoo side I’m hoping for an insight about the system and the overlayfs, may be that it doesn’t work with .img qemu images, or that I need some setting that I haven’t used …

So my hopes are a bit different on each site

I’ll keep trying.

For now I have gotten a deeper understanding of overlayfs

Thanks again for the time and help dedicated

I understand, though you are dealing with a problem, no one that runs docker-ce on a supported system does. This reduces the chances that someone else actually had the same problem and found a solution for it.

Are you familiar with the script from the moby repository that checks if the host kernel provides all required and optional modules?

$ curl -L https://github.com/moby/moby/raw/master/contrib/check-config.sh | bash
1 Like

Yes, I know it. I thought that I posted the results but obviously forgot to do it. I’ll update the original post to add those results.

I can’t edit that one, so I’ll post the results here

info: reading kernel config from /boot/config-6.1.31-gentoo ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled
- CONFIG_NETFILTER_XT_MARK: enabled
- CONFIG_IP_NF_NAT: enabled
- CONFIG_NF_NAT: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_CGROUP_BPF: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: missing
    (cgroup swap accounting is currently enabled)
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled
- CONFIG_IP_VS: enabled
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled
- CONFIG_SECURITY_SELINUX: enabled
- CONFIG_SECURITY_APPARMOR: enabled
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_NETFILTER_XT_MATCH_BPF: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: enabled
  - "macvlan":
    - CONFIG_MACVLAN: enabled
    - CONFIG_DUMMY: enabled
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled
    - CONFIG_NF_CONNTRACK_FTP: enabled
    - CONFIG_NF_NAT_TFTP: enabled
    - CONFIG_NF_CONNTRACK_TFTP: enabled
- Storage Drivers:
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Thanks for the reminder

Looks good to me.

Though, this one looks confusing. Something is not adding up here. If the filesystem where /mnt/docker is located indeed is ext4, why would docker try to use a storage driver that is not supported by the backing filesystem?

It doesn’t make sense and could be very well a bug. Though, this bug could be inherited from moby (docker upstream project, which is high likely the upstream project for the gentoo version as well) or the project where the gentoo docker version is maintained.

I hope you will eventually find a solution!

I found the problem and solved it, it is quite bizarre and unexpected, plus the solution led me to a new mistery but at least the problem has been solved.

tl;dr.
The wrong kernel was being booted and it didn’t have, aparently, overlay fs included.

Full answer

After posting a reply to aviro, I realized that my kernel version, 6.1.46, was not the same as in the original message, 6.1.31. That could perfectly be because I was copying most of the information from the original site where I posted the question, the docker community forums, and I have done a lot of tests and attempts to fix the problem, but that got me thinking about the kernel version loaded.

Checked that /usr/src/linux was pointing to the right kernel, 6.1.46, and it was, recompiled with overlay fs as a module to post the information related on the original post, did make && make modules_install && make install followed by grub-mkconfig -o /boot/grub/grub.cfg and rebooted.

After rebooting, I ran a uname -a again and the kernel was 6.1.31, why?

May be I did the mkconfig with the wrong path so I did it again, carefully, and rebooted again. Same thing, still 6.1.31.

After that reboot I checked /boot/grub/grub.cfg and the right option, 6.1.46 was there as the first option to boot the system. So just in case there was a problem with some random line not being updated, I did a rm /boot/*6.1.31* and ran grub-mkconfig -o /boot/grub/grub.cfg, checked with grep and there was no trace of 6.1.31, only 6.1.46.

Rebooted again and there it was, 6.1.31 again. But everything looked fine, it should have booted 6.1.46.

Then I rebooted and controled manually the boot process, editing by hand the numbers to load 6.1.46 but the system failed, couldn’t find the kernel.

That was key.

I changed back to 6.1.31, let it boot and checked my mount points (/boot is a different partition).

And there it was, or should I say, it wasn’t, /boot was not being mounted. At some point on the life of this system, I made a mistake on /etc/fstab and let /boot with noauto so it was not being mounted.

I changed that, removed the noauto for /boot on /etc/fstab, rebooted the system and voila!. The system was 6.1.46 and overlayfs was working.

Although, and this is the new part that I don’t get. even when /boot was a folder and not a partition, the 6.1.46 kernel was there and the /boot/grub/grub.cfg had the correct configuration, so why was it not being read and loaded?. On the grub menu presented at boot, there was only one option, 6.1.31 but I couldn’t find it on the disk not with locate and not with find, so where is it? how was it being loaded? the /boot/grub/grub.cfg read from the disk with cat had 6.1.46.

Well, after this long story, I have solved the problem and got a new enigma, which, for now is going to remain unsolved because I had used too much time already.

Thanks for the help, with out trying to answer that last comment from aviro I wouldn’t have stumbled into the solution.

Now it only remains to mark this thread as solved.