Docker Daemon stops from time to time

Good Morning everyone,
first of all I want to apologise to give you such an amount of Log files, but I’m totaly new in docker and don’t know how to start.

My current setup is running Proxmox VE and on an unprivileged LXC Container running my DNS Server with docker compose. It noe happened several times that in the middle of the night I see that the docker daemon is not running anymore. So I gathered all related journalctl files and found them:

Now I’m sitting here and try to figure out what is going on. A simple systemctl start docker did start the engine again, but this can’t be the solution to notice in my morning my DNS Server is not available and I have to ssh to my machine.

Can anybode give me some insights hwo I’m able to debug it even more?

Cheers,
Gamie

I hope someone using lxc containers will come around and respond to your question.

In the meantime, please provide details about the lxc image you use as base for your lxc container and provide the output of docker version. Did you install docker from the official Docker repositories? Or did you install docker the os distro repos or even from snap?

Heyho,

I’m using Debian 11 standard as LXC Image :slight_smile:

Following results came from docker version

dockeruser@dns:~$ docker version
Client: Docker Engine - Community
 Version:           20.10.22
 API version:       1.41
 Go version:        go1.18.9
 Git commit:        3a2c30b
 Built:             Thu Dec 15 22:28:22 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.22
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.9
  Git commit:       42c8b31
  Built:            Thu Dec 15 22:26:14 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.15
  GitCommit:        5b842e528e99d4d4c1686467debf2bd4b88ecd86
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I installed from official Docker repos:

dockeruser@dns:~$ cat /etc/apt/sources.list.d/docker.list 
deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian   bullseye stable

OS and docker version looks good to me.

I took a look at the pastbin logs, and I am curious which storage driver your docker engine uses, as neither overlay nor overlay2 or fuse-overlay seem to work. Can you share the output of docker info as well?

I tried docker on pve/lxc using a Ubuntu lxc image long time ago, but never really used it. From what I remember, I had problems running docker in an unprivileged lxc container without the “nesting” feature enabled. Though, it has been so long that I don’t remember the actual problems I had without nesting.

Sure thing, here is the output:

dockeruser@dns:~$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
  compose: Docker Compose (Docker Inc., v2.14.1)
  scan: Docker Scan (Docker Inc., v0.23.0)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: 20.10.22
 Storage Driver: vfs
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 5b842e528e99d4d4c1686467debf2bd4b88ecd86
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.15.53-1-pve
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 768MiB
 Name: dns
 ID: BJ6I:DKL3:IM6J:IKIO:6VPM:4DNM:YL2W:BH53:BLT4:EBML:YZG3:QR4B
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

And I got nesting activated on that container :slight_smile:

Observations & thoughs:

  • 768MiB RAM: might be a little too low for OS, Docker and the containers
  • VFS Storage Driver: is not recommended for production environment (and wastes a lot of filesystem storage)

I have PVE 7.3 running, and the kernel comes with overlay2 support (lsmod | grep overlay). I started the docker-test container (which was created either with PVE7.1 or 7.2), and it turns out it uses overlay2 as storage driver out of the box as well.

Hey, thanks for the input.

About the RAM: I monitor it all the time and it didn’t run out yet. :slight_smile:

But the storage driver … this is the reason I asked for help here. I never heard about storage drivers :sweat_smile: Is it possible to change that?

There must be a reason vfs is used…

Usually the reason is that the kernel lacks features or the docker data root is not on a suitable backing file system:

If your storage pool for the lxc containers uses zfs, then according Google Consulting the vfs storage driver is used - people don’t seem to be able to use the zfs storage driver (I didn’t take a closer look).

In this situation people seem to use fuse-overlayfs (update: ignore all posts except the ones from c-goes, actually his 2nd post should be everything necessary)
I found this post in GH discussion that describes an alternative to fuse-overlayfs.

Best I can do is provide assistance in the detective work, as I don’t use docker in lxc containers.
It is up to you to follow these leads!

Update: I added a comment for the fuse-overlay link AND forgot to mention something important: if the storage driver is changed, existing images and containers will disappear, as they are not migrated to the changed storage driver. Thus, images need to be re-pulled, containers to be re-created.

Holy crap … after reading all of this I‘m speechless. There is a lot of work to do for me :saluting_face:

Thank you very much for all of the hints. I thought I‘m done with my setup. But as I see, I have to redo all again :face_with_peeking_eye: But yes, my two FS backends are ZFS…

But recreating the containers won’t be to hard, everything is created with docker-compose :slightly_smiling_face:

Indeed, it is going to be some tinkering.

I just want to make sure you know the impacts of changing the storage driver, so that it doesn’t feel like things disappear for no reason.

Please keep us posted regarding the path you choose and how your experience is, so others with the same problem will have a chance to see how you applied the solution.

Thank you very much! I’ll update you as soon as I switched everything

1 Like

Hello again! Faster here that I’m expected!

Would you look at my docker info:

dockeruser@stephan-portainer:~$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
  compose: Docker Compose (Docker Inc., v2.14.1)
  scan: Docker Scan (Docker Inc., v0.23.0)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 20.10.22
 Storage Driver: fuse-overlayfs
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9ba4b250366a5ddde94bb7c9d1def331423aa323
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.15.74-1-pve
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 1GiB
 Name: stephan-portainer
 ID: JMHQ:ILR6:J4PN:XE7S:X4EL:DLXR:2E2C:HETI:TLMC:O4TJ:ZHQ7:N2D6
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

And what should I say, it was very easy! So as mentioned, Debian 11 is my LXC template

  1. docker compose down on everyon my containers
  2. shutdown Container over Proxmox GUI
  3. Enable FUSE in the Container Features. Which means on my unprivileged Container are nesting and fuse enabled
  4. installed package fuse-overlayfs on the machine and rebooted again
  5. New Storage driver was active. Last step was to delete everything on /var/lib/docker/vfs and /var/lib/docker/image/vfs

One thing I have to mention! My docker compose files had static mounts for the external data! Which means:

    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./portainer-data:/data

and did not let manage docker the volumes! For that I don’t lose my data! If anybody use the volume management from docker, please backup you files first before docker compose down! AND IMPORTANT see what happend to me in #13

Well, Now I have to repead this step for my 20 LXC Container :slight_smile: But huge thanks to you @meyay for the detective work! Without I wouldn’t be ableto make that switch. Maybe then I don’t waste a lot of space now and habe a more stable system :slight_smile:

I would like to give @meyay the solution checkmark, but since I wrote it down on thos post I would checkmark this post here for other people searching for that solution.

1 Like

Hey everyone,

I got an IMPORTANT Info for everyone! On one of my LXC Container had, for whatever reason, overlay2 as storage driver with ZFS as backend FS enabled. Why did I noticed it? Because it crashed my complete Proxmox Node!

It will crash when you make a backup. As soon as the backup starts the behaviour will happend what I asked here: [SOLVED] - Backup crashes WebUI / Promox Envoirement | Proxmox Support Forum Digging further in I noticed then the other storage driver.

My fix was to force fuse-overlayfs in /etc/docker/daemon.json

Now the backups running fine again :grinning: