ZFS pool suddenly unmounted by Docker (maybe)?

We are utilizing ZFS on Linux and we create the pool and filesystem as follows:

zpool create -o ashift=12 -m /dbs dbs_pool /dev/nvme0n1

zfs create -o recordsize=4k -o atime=off -o mountpoint=/dbs/data -o redundant_metadata=most -o compression=lz4  -o logbias=throughput -o primarycache=metadata -o xattr=sa dbs/data

zfs create -o recordsize=128k -o atime=off -o mountpoint=/dbs/ogs -o redundant_metadata=most -o compression=off -o logbias=latency -o xattr=sa dbs/logs

We then start a Docker container to access the above data and logs. After a few minutes one or more of
the above filesystems are automatically unmounted.

if we do: zfs mount -a then get back the filesystems but by that time the processes accessing the
filesystems have already died.

Some posts indicate that we have to add a dependency to mount zfs before we start Docker. So this is what we did:

sudo systemctl edit docker.service and add the following lines to it:

   After=zfs-mount.service
   Requires=zfs-mount.service
   Wants=zfs-mount.service
   BindsTo=zfs-mount.service

But it did not make any difference. It seems that something in the Linux system that is causing the unmount.

Can anyone please help with this.

Kernel version: 5.10.186-179.751
OpenZFS version: 2.1.12
Docker version 20.10.23, build 7155243

Is there some way for us to check if the problem is caused by Docker.

We are not utilizing ZFS storage in Docker. We are simply providing bind mounts to Docker and it so happens that the bind mounts have ZFS filesystem on them. Docker only sees these as volumes but otherwise is not instructed in anyway about ZFS.

You could check the daemon logs. If docker is involved, it would high likely leave traces of it in its logs:

Furthermore, you can check the logs in /var/log/, which I assume you already did.

nothing found in /var/log/* of much interest
tried dmesg
tried journalctl
docker events --since ‘10m’ – simply hangs

there is nothing that suggests that zfs should be unmounted. I am wondering if there is some linux tracing that can be enabled to determine why zfs got unmounted.

I couldn’t easily follow your question in your first post so I edited it to add code blocks. Please, format your posts as described in the following guide: How to format your forum posts

What happens if you don’t use Docker to mount the folders? Will they be unmounted in that case after a while?

Sorry to bring up the thread. I think I got the same issue.

OS: NixOs 23.11
Kernel: Linux 6.6.23
ZFS: 2.2.3
Docker: 24.0.5

I have a ZPool with a few filesystems, all mounted in /mnt, as following. And I use raid/docker as Docker data folder.

raid                            /mnt/data
raid/db                         /mnt/data/db
raid/conf                       /mnt/data/conf
raid/docker                     /mnt/data/docker
raid/home                       /mnt/data/home
raid/backup                     /mnt/data/backup
raid/upload                     /mnt/data/upload
raid/downloads                  /mnt/data/downloads

Docker config:

{
  "data-root": "/mnt/data/docker",
  "group": "docker",
  "hosts": [
    "tcp://127.0.0.1:2375",
    "fd://"
  ],
  "live-restore": true,
  "log-driver": "journald",
  "storage-driver": "zfs"
}

My problem is that, every time dockerd starts, it unmounts all ZFS filesystems “after” its data folder. By “after”, I mean by the order of the above list, so they are “home”, “backup”, “upload”, “downloads”.

And by “dockerd starts”, I mean exactly at that moment. The mounts were fine (and stayed there) as long as I do not start Docker. And I checked via mount, they were there and gone right before and after the command starting Docker.

If I mounts the filesystem, for example, “home” outside /mnt/data, it would not be unmounted.

I cannot say for certain this is a Docker issue, since it starts via systemd, it is possible that systemd does the trick. But I have not found any other services would do this.

Also, I did not have this issue until I remade my system last Dec.

I think your issue is different. In the original post, it was about bind mounting from a ZFS filesystem, but you store the docker data root on ZFS, which means that is the backing filesystem of Docker.

See the supported backing filesystems here:

https://docs.docker.com/storage/storagedriver/select-storage-driver/#supported-backing-filesystems

It shows that the ZFS storage driver is supported only on ZFS, and you also shared you use the ZFS storage driver which is okay, but you couldn’t really change that as the documentation shows that only fuse-overlayfs and vfs are supported on any filesystem. Then you go to the documentation of the ZFS storage driver:

Note : There is also a FUSE implementation of ZFS on the Linux platform. This is not recommended. The native ZFS driver (ZoL) is more tested, more performant, and is more widely used. The remainder of this document refers to the native ZoL port.

To be honest I’m not sure if it means what I think it means, but fuse-overlayfs and vfs are not recommended if you have any other solutions anyway.

The documentation also states

The ZFS on Linux (ZoL) port is healthy and maturing. However, at this point in time it is not recommended to use the zfs Docker storage driver for production use unless you have substantial experience with ZFS on Linux.

Then scroll down to Prerequisites

  • ZFS requires one or more dedicated block devices, preferably solid-state drives (SSDs).

Although it doesn’t say (as it is stated in the LXD documentation) that you can’t use the pool for other purposes, but what you experience indicates you can’t.

Assuming this is a Docker issue, giving it worked before, and there is no doc about it requiring a whole zpool, this should be considered as a bug recently introduced.

Just to fill in the information, Docker creates a lot of legacy filesystems in its data folder, which works fine all the time.

The documentation could be incomplete too. Just because something works, it doesn’t mean it is supported and a new version could break it. For example Docker Desktop was never supported on Windows Server, but some people managed to install it and later it stopped working. I might be wrong, of course, but when you use the ZFS storage driver, volumes will be created as zfs volumes so Docker needs to manage the pool.

Even though I didn’t find a specific statement in the docs that “Docker needs a dedicated ZFS pool” there are multiple clues indicating it does.

Another quote from the ZFS driver configuration guide:

Create a new zpool on your dedicated block device or devices, and mount it into /var/lib/docker/. Be sure you have specified the correct devices, because this is a destructive operation. This example adds two devices to the pool.

$ sudo zpool create -f zpool-docker -m /var/lib/docker /dev/xvdf /dev/xvdg

You can read the rest of the guide to learn about how the ZFS driver works. Since Docker doesn’t recommend it for production, I can easily imagine that it worked until now and it changed. YIf you suspect it is abug, you can search for similar issues on GitHub or open a new issue if you can’t find an existing related issue

Thanks. I am trying to find some certain evidence. Right now I am getting confusing information, Docker did it, Systemd did it. Or, this is my most concern, the Docker service configuration of Systemd did it.

Still digging.

If it helps, you can check the source code:

This is for Docker CE v26.0.0, so if you have different version, change it in the URL.

There are some unmounts in the code, but as far as I can decide after such a quick check that shouldn’t unmount not Docker-related datasets.

You can also try to enable debug mode in the Docker daemon (--debug flag or “debug”: true, in the json)

For now, I think it is an issue with multiple factors.

  1. I tried in Archlinux in Virtualbox (virtual disks for ZFS), Docker 26/24 + Systemd 255. No issue.

  2. In my NixOS, Docker 24 + Systemd 254/255. ZFS got unmounted at starting of Docker.

  3. In my NixOS, Docker 25/26. No issue.

  4. I tried to mess up things. Docker 24 told me it cannot find zfs in PATH and failed to start storage driver “zfs”, and yet, still unmounted.

  5. With Docker 24, I stopped the service and mounted all ZFS filesystems. After a night of sleeping, all ZFS filesystems were unmounted, unexpectedly. Comparing to running Docker, it unmounted some ZFS filesystems.

    This is the most confusing. Unless Systemd would run dockerd for some kind of information regularly even though I stopped the service, I could not see why this happened.

  6. If I uninstall Docker, no ZFS filesystems were unmounted.

Thank you for the update about your tests.

Have you checked the system logs? You could use journald to look for log entries that can explain why it happens. To find out when it happened, you could try running zpool history.

I really doubt that systemd would run the Docker daemon regurarly unless there is a systemd service configured to run a docker container or communicate with the Docker socket when the docker.socket systemd unit is not stopped. I assume you stopped it as when you stop docker.service there is a warning saying the docker.socket can still activate the docker.service. If that happens, you would see Docker running in the morning and not being stopped anymore.

Since nothing is unmounted unless Docker is installed, maybe the issue has to do something with state of the ZFS pool even when the Docker daemon is not running. And when you uninstall Docker, that state is reset. Next time you could try to keep Docker installed, but rename the dockerd binary so you can be sure that nothing can start it regurarly.

You can also check cronjobs on the system and see if there is anything after Docker is installed that can unmount pools
 Maybe there is a bug somewhere and either ZFS or something that comes with Docker can’t handle Docker’s datasets and other datasets together.

Another idea is even though you want to use a single zfs pool, if you have more disks, you could create another pool just for Docker temporarily. If you don’t have other disks, you can create virtual disks and create a zfs pool on that. Then if you see that no ZFS filesystems are unmounted, then Docker indeed needs its own pool for whatever reason. You can also test the case when you have two zfs pools and you use Docker’s pool to mount some datasets on the host as you do now. Then if only one pool is affected, the one that Docker uses, that is another reason to believe that Docker needs its own ZFS pool. If both pools are affected, we can rule this case out.

By the way it seems I haven’t asked it, but did you install Docker from the official repository (download.docker.com)?

Have you checked the system logs? You could use journald to look for log entries that can explain why it happens. To find out when it happened, you could try running zpool history.

Yes, did a glance, nothing caught my eye.

If that happens, you would see Docker running in the morning and not being stopped anymore.

I was thinking the same. But I saw that Docker was still stopped the next morning. And if Docker was started not by me, I think it should behave the same as started by me, that unmount some of the filesystems, not all of them.

The Archlinux test seemed indicating this is an issue with the dockerd I used. And yes, in NixOS, it was built from GitHub - moby/moby: The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems, instead of released binaries. And patches are applied. But for now, I have not seen something in patch that directly related to ZFS.

To make things worse, I cannot reproduce this in Virtualbox with even NixOS.

The last thing I checked was the properties of the pool and filesystems, giving it does not occur in my testing VM, no differences.

I think there could be a system configuration combined with docker 24 to trigger this issue. But for now, I have no idea.

Gladly the issue was gone after switching to docker 25.

1 Like